A Neural Network Approach for Pricing Correlated Health Risks

Laporta, Alessandro G.; Levantesi, Susanna; Petrella, Lea

doi:10.3390/risks13050082

Open AccessFeature PaperArticle

A Neural Network Approach for Pricing Correlated Health Risks

by

Alessandro G. Laporta

¹,

Susanna Levantesi

^1,*

and

Lea Petrella

²

¹

Department of Statistics, Sapienza University of Rome, 00185 Roma, Italy

²

MEMOTEF Department, Sapienza University of Rome, 00185 Roma, Italy

^*

Author to whom correspondence should be addressed.

Risks 2025, 13(5), 82; https://doi.org/10.3390/risks13050082

Submission received: 6 March 2025 / Revised: 18 April 2025 / Accepted: 21 April 2025 / Published: 24 April 2025

(This article belongs to the Special Issue Market-Consistent Actuarial Valuation and Risk-Based Capital Assessment)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, the actuarial literature involving machine learning in insurance pricing has flourished. However, most actuarial machine learning research focuses on property and casualty insurance, while using such techniques in health insurance is yet to be explored. In this paper, we discuss the use of neural networks to set the price of health insurance coverage following the structure of a classical frequency-severity model. In particular, we propose negative multinomial neural networks to jointly model the frequency of possibly correlated medical claims and Gamma neural networks to estimate the expected claim severity. Using a case study based on real-world health insurance data, we highlight the overall better performance of the neural network models with respect to more established regression models, both in terms of accuracy (frequency models achieve an average out-of-sample deviance of 8.54 compared to 8.61 for classical regressions) and risk diversification, as indicated by the ABC lift metric, which is

5.62 \times 10^{- 3}

for neural networks versus

8.27 \times 10^{- 3}

for traditional models.

Keywords:

health insurance pricing; neural networks; multinomial distribution; gamma distribution

1. Introduction

Statistical learning has been widely used in actuarial science since the 1980s. In the pricing domain, actuaries rapidly embraced linear models and generalized linear models (GLMs), which became the standard approach for pricing models. Meanwhile, advancements in statistical learning and computer science have led to the development of more sophisticated machine learning techniques.

A seminal work encompassing the use of several machine learning algorithms for insurance ratemaking was put forward by Dugas et al. (2003), where the authors compare linear regression, generalized linear models, tree-based models, neural networks, and support vector machines. As noted in Blier-Wong et al. (2021); Dugas et al. (2003) concluded with the hope that their work would encourage actuaries to adopt neural networks as a modeling tool for ratemaking. Just over a decade later, machine learning algorithms—especially neural networks—have become increasingly common in actuarial research and practice. For instance, Guelman (2012) uses gradient boosting for auto insurance cost modeling, while Spedicato et al. (2018) explores insurance pricing optimization with multiple machine learning models. Ghahari et al. (2019) applies deep learning to agricultural risk management, and Lee and Lin (2018) introduces a boosting machine for general insurance. Similarly, Henckaerts et al. (2021) leverages tree-based techniques like GBM for car insurance tariffs, and Schelldorfer and Wüthrich (2019) employs neural networks to enhance GLM performance in non-life insurance. Additionally, Wüthrich (2020) suggests methods to address bias in neural network models for insurance portfolios. Machine learning has also been extensively used in insurance claim reserving. Gabrielli et al. (2020) improves the performance of the over-dispersed Poisson model for general insurance claim reserving using neural network embedding, while Wüthrich (2018) explores various machine learning techniques to assess individual claim reserving.

Neural networks have also been applied in actuarial research for various tasks. Chapados et al. (2002) compared different statistical learning models for estimating the pure premium, while Rajitha and Sakthivel (2017) used neural networks to estimate a posteriori claim frequency. The renewed interest in neural networks for actuarial pricing is largely driven by M. Wüthrich and the Swiss actuarial community. Wüthrich and Merz (2019) first introduced a Poisson neural network to estimate claim frequencies in a car insurance dataset. Later, Schelldorfer and Wüthrich (2019) proposed the Combined Actuarial Neural Network (CANN), an innovative approach that integrates a GLM with a neural network to capture nonlinear relationships. Wüthrich (2019) further explored the CANN methodology, while Wüthrich (2020) addressed bias regularization in neural network models. Additionally, Lorentzen and Mayer (2020) introduced a set of model-agnostic tools to extract interpretable insights from neural networks. For a comprehensive review of machine learning applications in actuarial sciences, see Blier-Wong et al. (2021), Richman (2021a, 2021b).

While neural networks have been explored in insurance pricing, most research has focused on property and casualty insurance, with limited attention to health insurance applications. Some works in the literature deal with health insurance pricing using tree-based machine learning techniques, such as XGB, random forest, decision tree, or other machine learning techniques (k-Nearest Neighbours, Support Vector Machine). Duncan et al. (2016) test various regression models, including random forests, decision trees, and boosted trees, directly modeling total allowed health care costs. ul Hassan et al. (2021) use machine learning techniques to predict medical insurance costs, not accounting for the claim frequency. Orji and Ukwandu (2024) leverage ensemble machine learning methods—Extreme Gradient Boosting, Gradient-boosting Machine, and Random Forest to predict medical insurance costs. Kaushik et al. (2022) use neural networks to predict health insurance premiums and costs, but do not consider the claim frequency. Our work aims to bridge the gap in applying neural networks to health insurance by demonstrating how they can be used for pricing coverage within the framework of a classical frequency-severity model.

Neural networks offer significant advantages over traditional machine learning models in health insurance pricing. Health insurance data are characterized by complex interdependencies among factors such as medical history, demographics, and claims information. Neural networks excel at capturing these nonlinear relationships, ensuring a more precise representation of risk and cost patterns than machine learning algorithms (Talaei Khoei et al. 2023). They eliminate the need for extensive manual feature engineering, as they automatically extract relevant features from raw data. In contrast, traditional machine learning algorithms often require significant manual feature selection. While they can perform well with smaller datasets and are generally more interpretable, they may not capture intricate, nonlinear relationships within large datasets.

Accuracy in predictive modeling is crucial for health insurers, particularly in pricing, underwriting, and assessing healthcare costs (Drewe-Boss et al. 2022; Kaushik et al. 2022). Neural networks have demonstrated superior predictive performance in insurance pricing, leveraging large-scale datasets to refine risk assessments (Holvoet et al. 2025). Their adaptability further allows them to process heterogeneous data sources effectively.

While neural networks have often been regarded as “black-box” models, explainable AI (XAI) addresses this limitation by identifying the key factors driving predictions, improving transparency and building trust, essential for ensuring regulatory compliance and facilitating business adoption in the insurance industry.

Health insurance contracts often cover a wide range of correlated medical events, such as medical visits and diagnostic tests. For example, a diagnostic test often requires a referral from a prior medical visit. To account for these dependencies, we adopt a multivariate approach to model medical claims. We introduce a negative multinomial neural network to model the frequency of correlated medical claims jointly. This approach is novel in insurance pricing, as most existing studies rely on a univariate Poisson distribution, which is more suited to car insurance claims. While Jose et al. (2022) previously proposed a negative binomial model, their approach remains limited to the univariate case. We compare the performance of the proposed model against the estimates produced by a negative multinomial regression (see Zhang et al. 2017). We then use a Gamma neural network to estimate the expected claim severity, i.e., the average cost of a given claim, already introduced in a different insurance domain by Delong et al. (2021). Using a neural network approach appears to be particularly appealing in health insurance since the number of claims (and thus the size of the data) is usually larger than in other insurance branches. Moreover, we deepen the understanding of our neural network models by applying a set of model-agnostic tools (XAI) proposed in Lorentzen and Mayer (2020), which allow us to shed light on the data representation learned by the models. The premiums estimated by neural networks are then compared to those provided by the simpler regression models through the methods proposed in Denuit et al. (2019). Our analysis is carried out on a health claim dataset provided by a primary Italian health insurance company.

In summary, our study provides the following contributions:

Neural network implementation within a classical frequency-severity framework: A key contribution of this study is the integration of neural networks into the traditional frequency-severity structure, maintaining a model structure that is familiar and interpretable from an actuarial perspective.
Use of negative multinomial neural networks for correlated claims modeling: The paper introduces a negative multinomial neural network to model the frequency of correlated medical claims (e.g., visits and diagnostics) jointly, offering a more realistic representation of health insurance claims processes compared to the widely used univariate Poisson models.
Accounting for modeling claim dependencies: Using a multivariate approach, the model explicitly captures dependencies among different types of medical claims, which, to our knowledge, has not been previously explored in health insurance.
Gamma neural network for claim severity: We propose Gamma neural networks to model claim severity, complementing the frequency model and enabling a full end-to-end neural network-based pricing model.
Empirical assessment on real-world health insurance data: The proposed models are validated on a real-world dataset, showing superior performance over traditional regression-based methods.
Model interpretability through XAI Tools: We also deepen the understanding of the models’ internal representations by applying a set of XAI tools, allowing us to investigate the data representation learned by the neural networks and identify key drivers of model predictions.

The remainder of the paper is structured as follows: In Section 2, we briefly review the frequency-severity approach and we detail the proposed models based on neural networks; Section 3 is devoted to data description; Section 4 reports the results obtained using the proposed models; Section 5 is devoted to ratemaking; Section 6 concludes the paper.

2. Frequency-Severity Approach and Proposed Models

Setting the pure premium of an insurance policy mainly consists of evaluating the cost associated with the risk coverage provided by the insurance contract. Therefore, the insurer has to predict the expected total claim amount

S_{i}

for each policyholder through a predictive model

f (.)

mapping the policyholder risk factors

x_{i}

to the predicted loss cost:

E (S_{i} | x_{i}) = f (x_{i})

. A popular method for health insurance cost modeling is to consider frequency-severity modeling (see Frees et al. 2011), which is the primary statistical approach for modeling non-life insurance claims. This approach splits the total claim amount for a given policyholder i into a compound sum that accounts for the number of filed claims and determines the individual medical claim sizes. Thus, the total claim amount

S_{i}

is represented by a compound random variable with

N_{i}

describing the number of claims that occur over one year to the generic policyholder i and

{\tilde{S}}_{i, 1}

, …,

{\tilde{S}}_{i, N_{i}}

describing the i.i.d. individual claim sizes defined as claim severities. More formally:

S_{i} = {\tilde{S}}_{i, 1} + {\tilde{S}}_{i, 2} + \dots + {\tilde{S}}_{i, N_{i}} = \sum_{j = 1}^{N_{i}} {\tilde{S}}_{i, j} .

(1)

Assuming the independence between

N_{i}

and

{\tilde{S}}_{i, 1}, \dots, {\tilde{S}}_{i, N_{i}}

, we have that:

E (S_{i}) = E (N_{i}) \cdot E ({\tilde{S}}_{i, *}),

(2)

where

{\tilde{S}}_{i, *}

is the cost for a generic claim filed by policyholder i (see Klugman et al. 2012 for further details). Health insurance pricing, defined in terms of the pure premium, involves estimating the expected total claim amount

E (S_{i} | x_{i})

for an insurance policy over the course of one year, based on a set of risk factors

x_{i}

. In the following, we illustrate how to model the claim frequency

E (N_{i} | x_{i})

and the claim severity

E ({\tilde{S}}_{i, 1} | x_{i})

for a set of different claim types using neural networks and making a comparison with other regressions.

2.1. Claim Frequency Model

Here, we describe the neural network we propose for modeling the claim frequency of medical visits, dental care treatments, and diagnostic testing. A typical approach involves fitting a univariate negative binomial regression to estimate the claim frequency of each claim type. The negative binomial is a common distributional assumption for health insurance claim counts since it has the advantage of capturing the overdispersion characterizing such claims (see, for instance, Ismail and Zamani (2013) and Frees et al. (2011)). However, using a univariate technique would neglect possible correlations between the occurrence of the different claim types, which is often observed in health insurance (Erhardt and Czado 2012).

For this reason, we adopt a multivariate approach by proposing negative multinomial neural networks to estimate the claim frequency for different claim types. The advantage of this technique is twofold. First, it models the claim frequency of various claim types via neural network, accounting for nonlinearities, covariates interactions, and overdispersion. Second, the multivariate approach provided by the negative multinomial distribution allows capturing possible correlations between different perils.

2.1.1. Negative Multinomial Distribution and Negative Multinomial Regression

A negative multinomial distribution, extensively discussed in Sibuya et al. (1964), provides a model for positively correlated multivariate count data characterized by overdispersion, i.e., where the variance is greater than the mean. Regression models relying on this distributional assumption have already been implemented in different fields, such as genomics (Kim et al. 2018) and medical statistics (Waller and Zelterman 1997).

More formally, let us consider an r-dimensional vector of counts

N_{i} = (N_{1, i}, \dots, N_{r, i})

. The probability mass for

N_{i}

under a negative multinomial distribution with parameters

p_{i} = (p_{i, 1}, \dots, p_{i, r + 1})

and shape parameter

α

, where

\sum_{j = 1}^{r + 1} p_{i, j} = 1

and

α > 0

, is:

f (N_{i} | p, α) = \frac{Γ (α + \sum_{j = 1}^{r} N_{j, i})}{Γ (α) \prod_{j = 1}^{r} N_{j, i}!} \cdot \prod_{j = 1}^{r} p_{i, j}^{N_{j, i}} p_{i, r + 1}^{α},

(3)

where the vector of parameters

p_{i} = (p_{i, 1}, \dots, p_{i, r + 1})

is defined as follows:

p_{i, j} = \frac{μ_{j, i}}{α + \sum_{k = 1}^{r} μ_{k, i}}, for j = 1, \dots, r, p_{i, r + 1} = \frac{α}{α + \sum_{k = 1}^{r} μ_{k, i}}

(4)

with

μ_{i} = (μ_{1, i}, \dots, μ_{r, i})

as the mean parameter vector.

For ease of discussion, we set

m_{i} = \sum_{j = 1}^{r} N_{j, i}

and rearrange the probability mass function in Equation (3) as follows:

f (N_{i} | μ_{i}, α) = \frac{Γ (α + m_{i})}{Γ (α) \prod_{j = 1}^{r} N_{j, i}!} {(\frac{α}{α + \sum_{j = 1}^{r} μ_{j, i}})}^{α} \prod_{j = 1}^{r} {(\frac{μ_{j, i}}{α + \sum_{k = 1}^{r} μ_{k, i}})}^{N_{j, i}} .

(5)

As shown in Waller and Zelterman (1997), the expectation of the count random variable

N_{i}

characterized by a negative multinomial distribution is defined as

E (N_{i}) = μ_{i} = (μ_{1, i}, \dots, μ_{r, i})

(6)

and its covariance matrix is

Cov (N_{i}) = α \cdot [diag (\frac{p_{i}}{p_{i, r + 1}}) + (\frac{p_{i}}{p_{i, r + 1}}) {(\frac{p_{i}}{p_{i, r + 1}})}^{'}] .

(7)

Fitting the distribution involves estimating the parameters

μ

and

α

presented in Equation (5), which are usually obtained via MLE.

The general framework of a regression approach with a negative multinomial distribution was first introduced by Kim et al. (2018). This approach models the relationship between the multivariate response count variable and a set of covariates capturing the inner correlation structure between counts. Namely, we have:

μ_{i} = \exp (x_{i}^{'} β), for i = 1, 2, \dots, I,

(8)

where

β = (β_{1}, \dots, β_{J})

is the

r \times J

regression parameters matrix with J regressors and I is the number of policyholders. Then, from Equations (4) and (6), we have:

E (N_{i} | x_{i}) = \exp (x_{i}^{'} β), for i = 1, 2, \dots, I,

(9)

and the vector

p_{i} = (p_{i, 1}, \dots, p_{i, J + 1})

in Equation (3) is given by:

p_{i, j} = \frac{\exp (x_{i}^{'} β_{j})}{α + \sum_{k = 1}^{J} \exp (x_{i}^{'} β_{k})}, for j = 1, \dots, J, p_{i, J + 1} = \frac{α}{α + \sum_{k = 1}^{J} \exp (x_{i}^{'} β_{k})}

(10)

where

β

and

α

is the set of parameters to be estimated via maximum likelihood. Then, the covariance matrix is obtained by feeding back Equation (10) in Equation (7). The set of parameters

β

and

α

is obtained maximizing the following log-likelihood:

\begin{matrix} l (β, α) & = \sum_{i = 1}^{I} \log [Γ (α + m_{i})] - \sum_{i = 1}^{I} \log [Γ (α)] - \sum_{i = 1}^{I} \log (\prod_{j = 1}^{J} N_{j, i}!) - \\ \sum_{i = 1}^{I} (m_{i} + α) \log (α + \sum_{j = 1}^{J} \exp (x_{i}^{'} β_{j})) + \sum_{i = 1}^{I} \sum_{j = 1}^{r} N_{j, i} \cdot x_{i}^{'} β_{j} + \sum_{i = 1}^{I} α \log (α) \end{matrix}

(11)

where

m_{i} = \sum_{j = 1}^{J} N_{j, i}

.

Note that maximizing

l (β, α)

in Equation (11) is equivalent to minimizing the deviance

D^{N M} (N, β, α)

:

D^{N M} (N, β, α) = 2 \cdot \sum_{i = 1}^{I} [- (m_{i} + α^{- 1}) \log (\frac{1 + m_{i}}{1 + \sum_{j = 1}^{J} \exp (x_{i}^{'} β_{j})}) + \sum_{j = 1}^{J} N_{j, i} (\frac{\log (N_{j, i})}{x_{i}^{'} β_{j}})] .

(12)

The same deviance is used to train the neural network presented in the next subsection.

2.1.2. Negative Multinomial Neural Networks (NM-NNs)

The neural network approach we propose consists of a feed-forward neural network with three output layers to model the multivariate claim count response

N_{i}

with respect to the feature set

x_{i}

. The higher flexibility of neural networks provided by their intricate inner structure should account for the dependence that characterizes the different types of claim counts.

Given our set of covariates

x_{i}

and considering a network of depth K, the output layer for the neural network is defined as follows:

z^{NM} (x_{i}) (θ) = ψ (θ_{0}^{K + 1} + \sum_{l = 1}^{q_{K}} θ_{l}^{K + 1} z_{l}^{(K : 1)} (x_{i})) = ψ (a^{(F)} (x_{i})), for i = 1, \dots, I,

(13)

where

z^{NM} (x_{i}) (θ) = (z_{1}^{NM} (x_{i}) (θ), \dots, z_{J}^{NM} (x_{i}) (θ))

is the J-dimensional output produced by network for the

x_{i}

string of observations. Note that in this specific formulation

θ_{0}^{K + 1}

is a tridimensional vector of output biases and

θ_{l}^{K + 1}

is the tridimensional vector of weights connecting the l-th neuron in the last hidden layer to the output layer. The function

ψ

is the activation function applied to the input

a^{(F)} (x_{i})

of the output layer. For a visual representation of the network with

K = 2

and

J = 3

, see Figure 1.

The network has to be trained to minimize a given loss function. Since our goal is to model multivariate correlated count data, a logical choice for the loss function is the negative multinomial deviance. Rearranging Equation (12), we have the following deviance defined with respect to the network parameters

θ

:

D^{NM} (N, θ) = 2 \cdot \sum_{i = 1}^{I} [- (m_{i} + α^{- 1}) \log (\frac{1 + m_{i}}{1 + \sum_{j = 1}^{J} z_{j}^{NM} (x_{i}) (θ)}) + \sum_{j = 1}^{J} N_{j, i} \log (\frac{N_{j, i}}{z_{j}^{NM} (x_{i}) (θ)})]

(14)

More specifically, the optimal set of parameters

\hat{θ}

is obtained by minimizing the negative multinomial deviance via stochastic gradient descent. Note that in the estimation process, the scale parameter

α

is considered as a given, and is obtained by preemptively performing a negative multinomial regression on the same set of data

(N, x)

. Estimating the

α

parameter using an ad hoc network structure might represent possible future research. Given the optimal set of parameters

\hat{θ}

, we can compute the expected number of claims as follows:

E^{nn} (N_{i} | x_{i}) = (E^{nn} (N_{1, i} | x_{i}), \dots, E^{nn} (N_{J, i} | x_{i})) = z^{N M} (x_{i}) (\hat{θ})

(15)

for

i = 1, 2, \dots, I

, where the superscript ‘

nn

’ denotes that the expectation is obtained via a neural network.

2.1.3. Gamma Neural Network (Gamma-NN)

The Gamma-NN is a feed-forward neural network with a one-dimensional output layer. Considering the typical set of covariates

x_{i}

and a network of depth K, keeping in mind the notation of Section 2.1.2, we can define the output layer of the network as follows:

z^{G} (x_{i}) (θ) = ψ (θ_{0}^{K + 1} + \sum_{l = 1}^{q_{K}} θ_{l}^{K + 1} z_{l}^{(K : 1)} (x_{i})), for i = 1, 2, \dots, I_{p} .

(16)

The network architecture is shown in Figure 2.

To estimate the set of parameters

θ

, we have to define an appropriate loss function to be minimized by the network model. We train the model on Gamma deviance:

D^{G} (S, θ) = - 2 \sum_{i = 1}^{I_{p}} \sum_{j = 1}^{n_{i}} [\ln \frac{{\tilde{S}}_{i, j}}{z^{G} (x_{i}) (θ)} - \frac{{\tilde{S}}_{i, j} - z^{G} (x_{i}) (θ)}{z^{G} (x_{i}) (θ)}],

(17)

where

n_{i}

is the number of claims submitted by a given insured i. In particular, we employ stochastic gradient descent to obtain the estimate for the set of parameters

\hat{θ}

. We can compute the expected claim severity for the i-th policyholder as follows:

E^{nn} ({\tilde{S}}_{i, *} | x_{i}) = z^{G} (x_{i}) (\hat{θ}),

(18)

where the superscript ‘

nn

’ shows that the expectation is obtained via a neural network.

Note that in the following sections, we will fit a separate Gamma neural network for each type of peril considered in our application.

3. The Data

In this paper, the dataset considered stems from an Italian insurer and contains the claims collected in a general health insurance plan during 2018. The insurance plan is designed for managers and retired managers affiliated with a specific industry in Italy, making it an employer-based health insurance. The dataset comprises 132,499 policyholders, including both current employees and former employees. Additionally, policyholders have the option to enroll their relatives, such as parents, spouse, and children below 25 years of age, in the insurance coverage. In total, there are 273,950 insured individuals included in the dataset.

The dataset covers three classes of claims:

Medical visits with a wide range of specialist doctors, such as cardiologists, pediatricians, neurologists, and many more.
Dental care treatments, including, among others, dental braces, implants, and oral surgery.
Diagnostic tests, e.g., magnetic resonance imaging, blood tests, and electrocardiogram.

For each insured, the dataset reports the following information: number of claims filed during the year (split between Visits, Dentalcare, and Diagnostic), severity for such claims, age, gender, regional area, firm dimension, family member type, years of permanence in the coverage, and ID code. Table 1 offers an overview of the data contained in the dataset. The response variables for the frequency models discussed in Section 2.1.2 are the claim counts

N_{1, i}, N_{2, i}, N_{3, i}

. While the responses for the model proposed in Section 2.1.3 are

{\tilde{S}}_{i j}^{1}

,

{\tilde{S}}_{i j}^{2}

and

{\tilde{S}}_{i j}^{3}

.

The dataset consists of 273,950 observations and six covariates. There are 205,625 claimants, with the total monetary value of submitted claims amounting to approximately 233 million euros. Below, we explore the dataset via summary and descriptive statistics.

3.1. Covariates

Figure 3 presents the histograms and frequency tables for the covariates included in the dataset. The age variable (AG) is heavily skewed toward older individuals, reflecting a predominantly senior population. Notably, there is a marked ‘dip’ in the distribution between ages 25 and 40. This gap stems from the specific subscription policies of the Italian insurer: since policyholders are typically firm managers, it is uncommon for them to be under 40 years old1. Additionally, managers are not permitted to extend insurance coverage to their children over the age of 25. These factors together explain the limited number of insured individuals in the 25–40 age range. The gender variable (GE) shows a relatively balanced distribution between males and females. However, although the overall population shows a balanced gender distribution, the gender balance is not maintained within the subset of policyholders. In fact, approximately 80% of the policyholders are male. This skew is due to the specific demographic characteristics of the firm managers who are eligible for the insurance policy, where the managerial positions are predominantly held by men. The permanence (PE) is an integer variable reporting the years of permanence in the insurance plan; its minimum is 1 (for newcomers), and its maximum is 41 (for early adopters). The histogram of this variable shows a decreasing trend. However, there is a strong peak at 38 years, connected to the subscription of the health coverage by a large number of firms whose employees (or pensioners) are still enrolled in the insurance plan. As for the Region (RE), we observe a strong concentration of insured individuals in two out of the 21 Italian regions: ‘Lombardia’ and ‘Lazio’, which account for the majority of policyholders. The dimension variable (DM) is a proxy for the firm’s dimension the policyholder belongs. More specifically, the variable reports the number of managers working in the firm, and its value is the same across all the insureds belonging to the same family. It ranges from 1 to 1500, with a strong concentration below 100, representing the small to medium-sized firms (that are specific to the Italian economy). The family member type (FA) is a categorical variable representing which kind of family member the insured is. From Figure 3, we observe that the most relevant classes are: Policyholder, Spouse, and Children. In contrast, Parents and Ex-Spouse are almost negligible since they cover, on aggregate, fewer than 200 insured in the dataset.

Table 2 presents the measures of association between the various covariates. The results indicate a strong positive relationship between family member type (FA) and age (AG), as evidenced by an

η^{2}

value of 0.739, which is intuitively reasonable. Similarly, PE and AG exhibit a moderate correlation of 0.676, reflecting the fact that individuals with longer durations of coverage are more likely to be older. Associations between categorical and numerical variables are generally weak, with most correlations close to zero (e.g., between GE and PE). On the other hand, the association between categorical variables appears to be relatively strong, as highlighted by the

χ^{2}

statistic presented in the table.

3.2. Response Variable

In Table 3, we give a general overview of the claim counts in the dataset,

N_{1, i}

,

N_{2, i}

, and

N_{3, i}

, by reporting their summary statistics.

From Table 3, we notice that a considerable portion of insureds submit at least a claim (last row in the table), which is peculiar to health insurance, where events have a higher frequency with respect to other non-life insurance types, i.e., property insurance. More specifically, 50% of the insureds request at least a medical visit or a diagnostic test during the year, while 25% undergo some sort of dental treatment. The average claim frequency for an average insured during the year is

1.91

for medical visits,

1.59

for dental treatments, and

7.49

for diagnostic tests. The frequency for the latter claim type appears to be exceptionally high, and this is also due to the approach used by the insurer to record the claim when an insured undergoes a diagnostic test2.

These claims exhibit overdispersion, as shown in Table 3, where the variance significantly exceeds the mean. To choose the suitable discrete distribution for the marginals of the claim counts

N_{1, i}, N_{2, i}, N_{3, i}

, we compare the Poisson distribution with a negative binomial distribution by considering two Goodness-of-fit (GoF) measures: the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). The Poisson is a typical distributional assumption when it comes to claim frequency modeling. However, a different distribution may be more appropriate when the data are characterized by overdispersion. The results in Table 4 indicate that the negative binomial distribution better fits the claim counts, as reflected by its lower AIC and BIC values. Although the SSE and MAE are identical for both distributions, the lower AIC and BIC suggest the negative binomial provides a more accurate fit.

Given the nature of the claim types, it is interesting to evaluate the correlation between their occurrence. In Figure 4, we report the Spearman correlogram between the claim counts. The results show that diagnostic exams are strongly correlated to visits; this is a mild surprise since the referral given by a medical visit is frequently an essential requirement to undergo an in-depth diagnostic test. Therefore, we choose to jointly model the different claim counts to capture such correlation using a multivariate approach via the negative multinomial distribution. Given the low correlation between dental care claims and the other types of claims, an argument could be made for modeling the dental claims alone while using a multivariate approach for visits and diagnostic tests. However, for simplicity, we still decide to model the three perils jointly.

In the next section, we introduce the negative multinomial regression framework that we will use to model the claim frequencies of visits, dental treatments, and diagnostic tests.

3.3. Claim Severity

Here, we provide some descriptive statistics to characterize the claim costs. In Table 5, we display some summary statistics. The distributions of the different claim severities are right-skewed. This feature is commonly well described through a Gamma distribution. In particular, we notice that Diagnostic and Dentalcare claims seem far more skewed than medical visits.

Note3 we do not distinguish between regular and large claims in this study, and we consider all of them together. However, a case could be made for using two different approaches when modeling small and large claims, as in Denuit and Lang (2004) and Albrecher et al. (2017).

4. Results

To evaluate the general performance of the NM-NN with respect to the benchmark NM-Regression (NMR), we test the model over the dataset presented above. The results are obtained through five-fold cross-validation, where the dataset is divided into five folds, and at each iteration, three of the five folds are used as a training set for the models (NM-NN and NMR), one as a validation set, and one as a testing set.

The network is trained for up to 2000 epochs with early stopping based on the validation loss, using a patience of 200 epochs to ensure a balance between convergence and overfitting prevention. The best-performing weights on the validation set are retained. The model adopts a five-hidden layer structure of dimension

(50, 40, 30, 20, 10)

with the ReLu as the activation function4. As for the variables presented in Table 1: AG, PE5, DM are min-max scaled, RE and FA are treated using a

d = 1

embedding layer, and GE is dummy encoded. In the multinomial regression model, AG, PE, and DM are treated as continuous variables, while RE, FA, and GE are dummy encoded.

We fit a different Gamma GAMLSS (Generalized Additive Model for Location, Scale, and Shape) and a Gamma-NN for each claim type (Visits, Dentalcare, and Diagnostic). In the GAMLSS, continuous variables are treated via cubic splines, while categorical variables are dummy encoded. For the Gamma-NN, the models are trained over 1000 epochs using early stopping on the validation set to prevent overfitting, with a patience of 200 epochs. Each network adopts a standard three-hidden layer structure of dimension

(30, 20, 10)

(Schelldorfer and Wüthrich 2019) with a ReLu activation function. Data are preprocessed as follows: AG, PE, DM are min-max scaled, RE and FA are treated using a

d = 1

embedding layer, and GE is dummy encoded. The results discussed below stem from a five-fold cross-validation. We implemented the models in R (version: 4.2.1) using Tensorflow (version: 2.11.0).

4.1. Models Performance

The performance of the frequency models is measured in terms of negative multinomial deviance (see Equations (12) and (14)), where a lower deviance signals a better model. For both models, we estimate the claim frequencies for medical visits, dental treatments, and diagnostic tests. Figure 5 compares the in-sample (left panel) and the out-of-sample (right panel) performance for the NMR and the NM-NN. In particular, we report the negative multinomial deviance over the five data folds. The results are very stable over the different folds, both in-sample and out-of-sample. The neural network model consistently achieves a better performance with respect to NMR since it returns a lower deviance. In Appendix A, we report a comparison between the models discussed above and more traditional univariate approaches, such as GLMs with Poisson and Negative Binomial distributions. The comparison is based on both in-sample and out-of-sample performance, evaluated using SSE and MAE metrics. Overall, the NM-NN model consistently outperforms the benchmark models across all measures.

As for the Gamma-NN, we evaluate the performance of the different models using the Gamma deviance, where the lower the deviance, the better the model. Given the set of features

x

, each model returns an estimate for the expected claim severity

E ({\tilde{S}}_{i, *} | x_{i})

of a medical visit, dental treatment, or diagnostic test. Figure 6 compares the in-sample (left panels) and the out-of-sample (right panels) performance for the Gamma-GAMLSS and the Gamma-NN. In each plot, we report the Gamma deviance over the five data folds to evaluate the stability of our results. The outcomes seem relatively robust for each claim severity model across the five-fold, both in-sample and out-of-sample. Neural network models consistently outperform classical regression models, always returning lower deviance. In Appendix B, we also compare the Gamma-NN with a traditional Gamma GLM using non-penalized performance metrics such as SSE and MAE. The results confirm that the neural network model achieves superior performance in both cases.

Neural networks are often celebrated for their outstanding predictive performance since they easily learn a good representation of the training data that generalizes well to new data. Despite their effectiveness, these models frequently encounter a meaningful drawback: their lack of explainability. Neural networks, in particular, consist of many parameters and a deeply layered architecture, making it challenging for modelers to interpret the outcomes. To address these limitations, recent years have seen a surge in research dedicated to model-agnostic techniques (see Friedman and Popescu 2008a, as well as actuarial case studies in Lorentzen and Mayer 2020 and Henckaerts et al. 2021), all designed to enhance the interpretability of machine learning models. Therefore, in the remainder of this section, we present the information retrieved using such model-agnostic tools. We investigate the variables’ importance, their main effects and the possible presence of interactions.

4.2. Variable Importance

In Figure 7, we report the Variable Importance metric (see Friedman and Popescu 2008a) to find the most relevant variables in our claim frequency models. The variables are ranked from left to right, from the most important to the less relevant. For both models, the two most relevant variables are age (AG) and region (RE). However, their ranking is different: the age variable is by far the most important variable for NM-NN, while it only achieves second place in NMR. In particular, the increase in deviance is much higher in the Network model (

0.12

) than in the NMR (

0.06

), signaling that the multinomial regression is probably missing some information when modeling the relationship between the age variable and the claim frequencies. As for the regional variable, even though it scores first in the NMR and second in the NM-NN, it has almost the same importance for the two models

0.07

. In both models, the other variables are far less relevant; however, some of them appear to hold slightly greater significance in neural network models with respect to the NMR (GE and PE), except for the family member variable (FA), which is more relevant in the regression than in the neural network.

To learn which variables are more relevant in claim severity prediction, in Figure 8, we compare the variable importance for the different regression techniques. The covariates are ranked from left to right, starting with the most important one. The top plots report the variable importance for GAMLSSs, while the bottom plots display the variable importance for neural networks. Claim severity models for Visits (Figure 8a) show some radical differences when it comes to variable importance. The only variable that seems to be relevant for the GAMLSS is the region (RE), while the NN has several important covariates—age (AG) and region (RE) above all. In particular, the region’s importance is comparable between the two models, while the age is where the real difference arises since the variable is the most important in the NN, whilst it is almost irrelevant for the GAMLSS. Other differences between GAMLSS and NN are given by the FA and PE variables, which are relevant only in the latter model. In contrast, Dentalcare models (Figure 8b) show a similar variable importance plot. Both models strongly rely on the age variable for their predictions, with minor but still relevant importance for the region. Even though the plots are roughly the same, we notice a slightly higher importance for variables in the NN model. In a similar fashion to Visits, the claim severity models for Diagnostic display two different variable importance plots. Indeed, except for region (RE), the variables entering the GAMLSS are deemed irrelevant. The neural network extracts important information also from the age (AG), the permanence (PE) and the family member type (FA).

In the following, we further investigate our models, looking at main effects.

4.3. Main Effects

Partial Dependence (PD) profiles, as introduced by Friedman and Popescu (2008a), are known to be a powerful tool for analyzing the marginal effect of a covariate on the model’s response. They provide insight into the main effect of covariate j by averaging its influence across all observations. In essence, PD profiles capture the overall impact of variable j on the model’s predictions, revealing whether its relationship with the response is linear, monotonic, or exhibits more complex patterns.

In the following lines, we analyze the marginal effect for the different covariates considered in the NMR and the NM-NN. In particular, we will discuss the marginal effect of some selected variables on the claim frequencies of medical visits, dental treatments, and diagnostic tests.

In particular, we observe:

Visits: Figure 9 displays the partial dependence plots produced by the neural network and the NMR for the Visits claim frequency, for (AG) and (RE). We notice a first significant difference when comparing the PD plot for the age variable (AG). The marginal effect captured by NMR (in red) is exponential. It starts at a claim frequency of 0 and caps at 5. In contrast, the behavior captured by the NM-NN (in blue) is more complex. This PD plot starts at 2, and then the curve decreases to its minimum around 15 years of age, reaching a frequency of 1. Then, the curve starts slowly increasing, followed by a substantial increase after the age of 50, reaching a maximum of 3.3 at around 80 years. Therefore, the marginal effect produced by the Network captures specific features, such as the higher frequency of medical visits associated with younger ages connected to pediatric visits and the steady claim frequency at older ages. The region effect is somewhat different for the two models. The NMR captures a strong claim frequency for ’Lazio’, which does not happen for the neural network.

Dentalcare: Figure 10 reports the PD plots for Dentalcare claim frequency. Also, this kind of claim shows some different behaviors for the PD plot age. The NMR PD plot shows an exponential trend, while the marginal effect produced by the neural network is almost parabolic, capping at 75 years of age, with a slight bump around 15 years of age. The bump is connected to dental braces and teenage oral surgery. Looking at the PD plots for the FA, we notice that the effect of this variable for the NM-NN is almost flat, while the NMR effects change across the different types of family members. Similar to Visits claim frequency, the NMR tries to capture the effects not registered by the AG PD plots.

Diagnostic: Figure 11 reports the PD plots for the Diagnostic claim frequency. The age (AG) has a quasi-linear trend in the neural network with two small bumps around the thirties and eighties, while the trend is exponential in the NMR. The neural network PD plot for permanence (PE) shows a substantial effect on the claim frequency for a high value of the permanence, while the NMR fails at doing so.

As shown via the different PD plots, the major difference between the two models is related to the age AG main effect. This difference is primarily due to the NMR structural form, which lacks the flexibility to capture the shape of the main effect. In contrast, the NM-NN seems to capture all the information provided by the age variable. Therefore, the first reason for the performance gap in Figure 5 is probably due to a poor modeling of the AG main effect from the NMR. This issue could be addressed using a polynomial variate or a spline. However, this is only part of the story since the different performances may also be associated with possible interactions between variables, which the PD plot cannot detect.

To complete the analysis, we study the main effects for the claim severity models (Gamma GAMLSS and Gamma NN) and the different perils. For the sake of brevity, in Figure 12, we report only the Partial Dependence plot for the age obtained for Dentalcare perils, which appears to be particularly interesting. This PD plot shows a relevant difference at younger ages, where the GAMLSS shows a decreasing unitary claim cost between 0 and 35 years of age; while the NN main effect starts at a low value and then increases, reaching a peak at 15 years, capturing the high cost associated to dental braces that characterize teenagers. In this case, the neural network seems to find a better fit for this specific effect.

4.4. Interaction Effects

After examining the main effects, we now conduct a detailed analysis of potential interaction effects among covariates as captured by the neural network model. A model-agnostic way to measure the interaction between two variables is based on partial dependence profiles introduced by Friedman and Popescu (2008a). To evaluate the existence of interaction effects, we employ the H-statistic proposed by Friedman and Popescu (2008a), which quantifies the interaction strength between two covariates by determining the proportion of prediction variance attributable to their interaction.

In Figure 13, we plot the values of the H-statistic for the NM-NN and each possible pairwise interaction between covariates. We do not report the plot for the NMR since the model is not designed to capture pairwise interactions between variables. Each pane in Figure 13 reports the interactions detected for the different claim types: for Visits the age AG has a weak interaction with both permanence PE and gender GE; Dentalcare claim frequency shows two strong interactions for the permanence PE variable with dimension DM and gender GE; Diagnostic claims have two relevant pairwise interactions for the gender GE variable with permanence PE and dimension DM, moreover, we also notice a small interaction between age AG and region RE.

In Figure 14, we report the H-statistic for each possible pairwise interaction between variables entering the Gamma Neural Networks.

Namely, interactions captured by the Gamma-NN for medical visits are presented in the left pane of Figure 14. The most relevant interaction is the one between permanence PE and age AG. Such interaction is quite peculiar since its H-statistic exceeds 1; this happens when the variance of joint interaction between the variables is greater than that of their 2-dimensional PD plot. Other relevant interactions are observed between gender and family member type, firm dimension and gender, and permanence and dimension.

From the central pane of Figure 14, we notice that all the relevant interactions for the dental treatment Gamma-NN model lean on the company dimension variable (DM). In particular, this variable interacts with permanence (PE), gender (GE), and family member type (FA). It is curious to observe such relevance for the dimensional variable since this covariate has a low importance (Figure 8b) and an almost flat PD plot (Figure 12). The grouped PD plots that will be discussed in the second part of this subsection will allow understanding if such interactions are relevant.

The left plot of Figure 14 reports variable interactions for the Diagnostic Gamma-NN model. The plot reports a plethora of relevant interactions, and among the most relevant ones, we have those between the permanence and the gender, the dimension and the gender, the permanence and the age, and the gender and the family member.

To gain insight into the behavior of such interaction effects, we use grouped PD plots to visualize the effect given by the interaction on claim frequencies in the following lines. A grouped PD plot represents the main effect of a variable conditioned on the different values of another variable. Therefore, the plot reports v PD curves, where v is the number of possible values for the conditioning covariate. The interaction is considered meaningful if the curves exhibit distinct patterns when analyzed under varying values of the conditioning variable. In particular, we expect the different PD curves to be non-parallel.

As an example, with reference to the interaction spotted by the NM-NN for Visits, in Figure 15, we study the interaction between gender GE and age AG. Such a figure displays a different behavior for the age PD plot conditioned on females. In particular, we notice a higher claim frequency for female insureds between 20 and 60 years of age. This increased frequency is probably associated with gynecologic visits and pregnancy-related visits.

From the results discussed in this section, the proposed neural network models appear as a clear winner over the benchmark regressions. The network models perform better in-sample and out-of-sample. These results are achieved thanks to the neural network’s flexibility, which is not restricted by a multiplicative structural form. The network architecture offers sufficient complexity so that it is capable of reflecting nonlinearities in the explanatory variables and interactions between them. It would also be possible to improve further the regressions by manually plugging the different pairwise interactions spotted by the NM-NN or the Gamma-NN. In this sense, neural networks can also serve as a complementary tool for the regressions, where, in a first step, the modeler using a neural network spots the weaknesses of the simpler regression model, such as missing interactions or main effects. Then, in a second step, the modeler improves the simpler regression model via brute force modeling of its functional form.

However, this is not enough to determine whether it is worth choosing neural networks over NMR and GAMLSS. When determining the price (or the potential cost) of a risk coverage, it is also vital to consider business-related metrics. Therefore, in the next section, we combine the frequency models discussed and the claim severity model in a pricing model and compare the different tariff structures using practical economic metrics relevant for an insurance company.

5. Ratemaking

Now that we have extensively discussed claim frequency models (Section 2.1.2) and claim severity models (Section 2.1.3), we can combine them to complete the frequency-severity approach displayed in Section 2 devoted to pure premium evaluation. For instance, it is possible to compute the pure premium for a set of insureds according to their characteristics by combining the NM-NN and the Gamma-NN. In our specific case, to obtain the pure premium for the health insurance plan presented in this work, we must tweak Equation (2) to account for the different claim types. In particular, the pure premium obtained via neural network models is defined as follows:

{\hat{π}}^{nn} (x_{i}) = \sum_{k = 1}^{3} E^{nn} (S_{i}^{k} | x_{i}) = \sum_{k = 1}^{3} E^{nn} (N_{k, i} | x_{i}) \cdot E^{nn} ({\tilde{S}}_{i, *}^{k} | x_{i}),

(19)

for

i = 1, \dots, I

, where

E^{nn} (N_{k, i} | x_{i})

is defined as in Equation (15) and

E^{nn} ({\tilde{S}}_{i, *}^{k} | x_{i})

is obtained as in Equation (18).

While using the NMR and GAMLSS, we have:

{\hat{π}}^{reg} (x_{i}) = \sum_{k = 1}^{3} E^{reg} (S_{i}^{k} | x_{i}) = \sum_{k = 1}^{3} E^{reg} (N_{k, i} | x_{i}) \cdot E^{reg} ({\tilde{S}}_{i, *}^{k} | x_{i}),

(20)

for

i = 1, \dots, I

.

As is common in actuarial pricing, the claim frequency and claim severity models discussed in this work are estimated in order to optimize a goodness-of-fit measure. Thus, until now, when comparing neural networks to simpler regressions, we have mainly focused on the statistical performance of the models. Therefore, even if neural networks have outperformed regressions from a statistical standpoint in the previous section, it is crucial to understand if the premium produced by such models provides added value to the business in which the premium is to be implemented. Thus, it is also important to consider an economic criterion when deciding whether it is worth implementing a given model in insurance applications. Therefore, it is crucial to go beyond the classical deviance metric. That is where model lift metrics are useful. In summary, the model’s effectiveness reflects its capacity to mitigate adverse selection in pricing. Specifically, it measures how well the model assigns actuarially fair rates to policyholders, reducing the risk of losing potential policyholders to competitors offering more refined pricing structures.

In this section, we employ two model lift methods proposed by Denuit et al. (2019) to evaluate the performance of a set of candidate premiums. The metrics proposed by the authors aim at assessing the following two aspects of a given premium: the variability of the resulting premium amounts, as larger premium differentiation induces greater lift, and the ability of the premium amount to match the actual total claim amount S for increasing risk profiles. The first objective is tackled using Lorenz curves (LC), while the second point is assessed considering concentration curves (CC). For an extensive discussion on this kind of curve, see Denuit et al. (2019). Given an insurance portfolio, if we consider the subset of insured gathering a certain percentage of policies associated with smaller premiums (i.e., those insured that are likely to be lost to a potential competitor because of their low-risk profile), the concentration curve compares the premium amounts belonging to such group to their aggregate losses. The relative positioning of the Lorenz and concentration curves enables the actuary to accurately evaluate the effectiveness of the premium under analysis.

Thus, we employ the two lift metrics discussed by Denuit et al. (2019) (ABC and ICC) to compare the premiums obtained combining the NMR and the GAMLSS

{\hat{π}}^{reg} (x)

(Equation (20)) and neural networks

{\hat{π}}^{nn} (x)

(Equation (19)). We use such metrics also to compare the results obtained by our novel approach with a set of premiums

{\hat{π}}^{ind} (x)

produced using neural networks without taking into account the dependence between the different health perils. More specifically, we have combined the claim frequencies obtained via three different Negative Binomial Neural Networks, one for each claim type, with the claim severities stemming from the Gamma-NN discussed in Section 2.1.3. This comparison highlights the added value of the multinomial approach, which accounts for the dependence between different claim types, compared to a ratemaking approach that does not consider such dependencies.

In Table 6, we report the ABC and ICC metrics for the three sets of premiums. All premiums are computed on the out-of-sample data, more specifically, on the first fold of the 5-fold cross-validation. In particular, we notice that premiums issued from neural network models proposed in this work return both a lower ABC and ICC, signaling that neural network models produce a better lift if compared to GLMs. In other terms, the lower ABC registered by NN signals that the premium produced by this model is closer to the actual risk presented in the insurance portfolio, while the lower ICC means that such premiums cover the expected share of true premiums in the portfolio. The same can be argued by comparing the premiums

{\hat{π}}^{nn} (x_{i})

obtained using the NM-NN and the Gamma-NN with the benchmark premium

{\hat{π}}^{ind} (x)

stemming from the so-called independent approach. We observe that the model introduced in this work produces lower values for both ABC and ICC.

Thus, even from a business metric standpoint, neural network models have proven to have some added value if compared to NMR and GAMLSS, since their greater precision translates into better premiums. In Appendix C, we further explore the comparison between the set of premiums generated by neural networks and those obtained through more traditional approaches, providing additional insights using both graphical tools and standard evaluation metrics.

To further improve the discussed premiums, it would be necessary to complement the informative set presented in Section 3, including, for instance, additional covariates, such as policyholders’ yearly income and level of education, which are generally good drivers for health expenditure.

6. Conclusions

Neural networks are a powerful tool for performing multi-dimensional, nonlinear regressions in insurance ratemaking, enabling more precise risk assessment and pricing by effectively capturing complex relationships within large-scale insurance datasets.

In this paper, we propose an innovative application of neural network models within the context of health insurance pricing, with a specific focus on their actuarial relevance and interpretability. We first introduce a neural network with a multivariate output structure designed to model potentially correlated health claim counts, namely, medical visits, dental care treatments, and diagnostic exams. This model minimizes a Negative Multinomial deviance and is coupled with Gamma neural networks to estimate claim severities, ultimately providing an estimate of the pure premium for each insured individual.

The performance of the proposed models has been benchmarked against more traditional regression-based methods, such as Negative Multinomial Regression (NMR) and GAMLSS. Our results indicate that neural networks offer clear advantages in terms of predictive accuracy, as confirmed by lower deviance, SSE, and MAE values, as well as in terms of risk segmentation, as demonstrated by model lift metrics discussed in Section 5. These findings suggest that neural network models not only provide a statistically superior fit, but also lead to better risk diversification and more efficient premium structures6 Moreover, we enhance understanding of models’ internal representations by model-agnostic XAI tools, improving the transparency and trustworthiness of machine learning applications in actuarial practice, which is essential for regulatory and business adoption.

Author Contributions

Conceptualization, A.G.L., S.L. and L.P.; methodology, A.G.L., S.L. and L.P.; software, A.G.L.; validation, S.L. and L.P.; formal analysis, S.L. and L.P.; data curation, A.G.L.; writing—original draft, A.G.L. and S.L.; writing—review and editing, S.L. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available because the data comes from a private health insurance.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Frequency Models

Table A1. SSE in-sample.

Model	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5
NegMult NN	$3.7949 \times 10^{7}$	$3.7658 \times 10^{7}$	$3.7590 \times 10^{7}$	$3.7563 \times 10^{7}$	$3.7895 \times 10^{7}$
NegMult Reg	$3.8629 \times 10^{7}$	$3.8379 \times 10^{7}$	$3.8256 \times 10^{7}$	$3.8226 \times 10^{7}$	$3.8602 \times 10^{7}$
NegBin GLM	$3.8649 \times 10^{7}$	$3.8398 \times 10^{7}$	$3.8291 \times 10^{7}$	$3.8260 \times 10^{7}$	$3.8637 \times 10^{7}$
Poisson GLM	$3.8517 \times 10^{7}$	$3.8272 \times 10^{7}$	$3.8142 \times 10^{7}$	$3.8109 \times 10^{7}$	$3.8481 \times 10^{7}$

Table A2. SSE out-of-sample.

Model	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5
NegMult NN	$1.2357 \times 10^{7}$	$1.2582 \times 10^{7}$	$1.2602 \times 10^{7}$	$1.2726 \times 10^{7}$	$1.2651 \times 10^{7}$
NegMult Reg	$1.2576 \times 10^{7}$	$1.2814 \times 10^{7}$	$1.2818 \times 10^{7}$	$1.2967 \times 10^{7}$	$1.2881 \times 10^{7}$
NegBin	$1.2581 \times 10^{7}$	$1.2819 \times 10^{7}$	$1.2829 \times 10^{7}$	$1.2979 \times 10^{7}$	$1.2893 \times 10^{7}$
Poisson	$1.2543 \times 10^{7}$	$1.2784 \times 10^{7}$	$1.2788 \times 10^{7}$	$1.2922 \times 10^{7}$	$1.2832 \times 10^{7}$

Table A3. MAE in-sample.

Model	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5
NegMult NN	4.17	4.16	4.14	4.14	4.16
NegMult Reg	4.25	4.23	4.22	4.22	4.24
NegBin	4.26	4.24	4.22	4.22	4.24
Poisson	4.25	4.23	4.21	4.21	4.23

Table A4. MAE out-of-sample.

Model	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5
NegMult NN	4.14	4.15	4.17	4.19	4.18
NegMult Reg	4.21	4.22	4.24	4.26	4.24
NegBin	4.21	4.22	4.24	4.26	4.24
Poisson	4.20	4.21	4.23	4.25	4.23

Appendix B. Severity Models

Table A5. In-sample SSE values for Visits, Dentalcare, and Diagnostic models (scaled by

10^{7}

).

Table A5. In-sample SSE values for Visits, Dentalcare, and Diagnostic models (scaled by

10^{7}

).

	Visits			Dentalcare			Diagnostic
Fold	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM
1	$9.00 \times 10^{7}$	$9.05 \times 10^{7}$	$9.05 \times 10^{7}$	$3.09 \times 10^{8}$	$3.11 \times 10^{8}$	$3.12 \times 10^{8}$	$6.15 \times 10^{7}$	$6.18 \times 10^{7}$	$6.18 \times 10^{7}$
2	$9.58 \times 10^{7}$	$9.63 \times 10^{7}$	$9.63 \times 10^{7}$	$3.13 \times 10^{8}$	$3.14 \times 10^{8}$	$3.16 \times 10^{8}$	$6.05 \times 10^{7}$	$6.08 \times 10^{7}$	$6.08 \times 10^{7}$
3	$9.25 \times 10^{7}$	$9.30 \times 10^{7}$	$9.30 \times 10^{7}$	$3.10 \times 10^{8}$	$3.11 \times 10^{8}$	$3.12 \times 10^{8}$	$6.45 \times 10^{7}$	$6.48 \times 10^{7}$	$6.49 \times 10^{7}$
4	$9.22 \times 10^{7}$	$9.26 \times 10^{7}$	$9.27 \times 10^{7}$	$3.05 \times 10^{8}$	$3.07 \times 10^{8}$	$3.08 \times 10^{8}$	$6.23 \times 10^{7}$	$6.26 \times 10^{7}$	$6.26 \times 10^{7}$
5	$8.94 \times 10^{7}$	$8.98 \times 10^{7}$	$8.99 \times 10^{7}$	$3.06 \times 10^{8}$	$3.08 \times 10^{8}$	$3.09 \times 10^{8}$	$6.24 \times 10^{7}$	$6.27 \times 10^{7}$	$6.28 \times 10^{7}$

Table A6. Out-of-sample SSE values for Visits, Dentalcare, and Diagnostic models (scaled by

10^{7}

).

Table A6. Out-of-sample SSE values for Visits, Dentalcare, and Diagnostic models (scaled by

10^{7}

).

	Visits			Dentalcare			Diagnostic
Fold	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM
1	$3.47 \times 10^{7}$	$3.48 \times 10^{7}$	$3.49 \times 10^{7}$	$1.04 \times 10^{9}$	$1.04 \times 10^{9}$	$1.05 \times 10^{9}$	$1.90 \times 10^{8}$	$1.91 \times 10^{8}$	$1.91 \times 10^{8}$
2	$2.86 \times 10^{7}$	$2.87 \times 10^{7}$	$2.88 \times 10^{7}$	$1.02 \times 10^{9}$	$1.02 \times 10^{9}$	$1.03 \times 10^{9}$	$2.32 \times 10^{8}$	$2.34 \times 10^{8}$	$2.34 \times 10^{8}$
3	$2.89 \times 10^{7}$	$2.90 \times 10^{7}$	$2.91 \times 10^{7}$	$1.00 \times 10^{9}$	$1.01 \times 10^{9}$	$1.01 \times 10^{9}$	$2.00 \times 10^{8}$	$2.01 \times 10^{8}$	$2.01 \times 10^{8}$
4	$3.19 \times 10^{7}$	$3.21 \times 10^{7}$	$3.21 \times 10^{7}$	$1.05 \times 10^{9}$	$1.05 \times 10^{9}$	$1.06 \times 10^{9}$	$1.92 \times 10^{8}$	$1.93 \times 10^{8}$	$1.93 \times 10^{8}$
5	$2.93 \times 10^{7}$	$2.94 \times 10^{7}$	$2.94 \times 10^{7}$	$1.04 \times 10^{9}$	$1.05 \times 10^{9}$	$1.05 \times 10^{9}$	$2.23 \times 10^{8}$	$2.24 \times 10^{8}$	$2.24 \times 10^{8}$

Table A7. In-sample MAE values for Visits, Dentalcare, and Diagnostic models.

	Visits			Dentalcare			Diagnostic
Fold	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM
1	34.99	35.17	35.18	242.57	243.75	244.80	40.49	40.70	40.73
2	35.07	35.24	35.25	243.05	244.29	245.31	40.49	40.69	40.72
3	35.07	35.25	35.26	243.54	244.77	245.77	40.60	40.80	40.84
4	35.06	35.24	35.25	243.72	244.96	245.97	40.56	40.76	40.80
5	35.02	35.20	35.21	243.12	244.31	245.36	40.57	40.77	40.80

Table A8. Out-of-sample MAE values for Visits, Dentalcare, and Diagnostic models.

	Visits			Dentalcare			Diagnostic
Fold	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM	Gamma NN	GAMLSS	GLM
1	35.16	35.33	35.35	243.75	244.94	245.93	40.44	40.64	40.68
2	35.06	35.24	35.25	243.09	244.30	245.35	40.70	40.90	40.93
3	34.98	35.15	35.16	243.18	244.39	245.43	40.52	40.73	40.76
4	34.98	35.16	35.18	243.18	244.46	245.49	40.47	40.68	40.71
5	35.04	35.23	35.24	242.97	244.17	245.16	40.57	40.78	40.81

Appendix C. Lift Measures

To further assess the economic relevance of the two premium structures,

{\hat{π}}^{nn} (x_{i})

and

{\hat{π}}^{reg} (x_{i})

, we provide in this appendix a comparative analysis based on model lift metrics. Rather than focusing strictly on profitability, model lift captures the ability of a pricing model to differentiate risks and reduce adverse selection by aligning premiums with expected losses.

A commonly used approach to illustrate lift involves quantile plots. These plots visually assess how well a model ranks policyholders by risk. The process includes sorting policies by predicted loss cost, dividing them into equal-sized quantile bins (e.g., deciles), and comparing the average predicted premium and the average observed loss within each bin. By constructing these plots for both models, one can evaluate and compare their ability to (i) accurately match actual average costs per quantile, and (ii) preserve monotonicity in the observed pure premiums across risk segments. A model displaying higher alignment and smoother monotonic trends is generally considered to provide better lift. Additionally, a third important indicator is the vertical spread between the first and last quantiles. A larger difference between the observed average loss of the lowest and highest predicted risk groups indicates a stronger ability of the model to distinguish between low-risk and high-risk policyholders. This distance effectively quantifies the model’s lift and provides insight into its potential to support risk segmentation and fairer pricing.

In Figure A1, we present the decile plots for the two sets of premiums. From an accuracy perspective, the premiums derived from the neural network model exhibit a noticeably better alignment with observed losses compared to those from the GLM. Regarding monotonicity, both models perform satisfactorily, as the observed loss curves increase consistently across deciles, without any irregularities.

When it comes to vertical spread (the difference between the observed average loss in the lowest and highest deciles), the neural network premiums demonstrate a slightly greater discriminatory power. Specifically, the average observed loss ranges from EUR 276.65 in the first decile to EUR 1392.04 in the last, whereas for the GLM, the range spans from EUR 326.53 to EUR 1351.41.

Overall, considering both predictive accuracy and vertical spread, the premium structure

{\hat{π}}^{nn} (x_{i})

clearly outperforms its GLM counterpart

{\hat{π}}^{reg} (x_{i})

.

To further evaluate the two sets of premiums, we calculated the Gini index based on the risk ranking generated by each premium set. The Gini index is a useful metric for assessing the “lift” of an insurance rating plan, as it quantifies the plan’s ability to differentiate between the best and worst risks. The neural network model outperforms the GLM model, with a Gini index of 0.2558 compared to 0.2447. Note that while the Gini index does not directly quantify the profitability of a rating plan, it serves as an indicator of how effectively the plan segments the population by risk. When an insurer has the flexibility to adjust pricing or underwriting, this enhanced ability to differentiate between risks can lead to greater profitability.

Figure A1. Comparison for the two set of premiums

{\hat{π}}^{nn} (x_{i})

and

{\hat{π}}^{reg} (x_{i})

using decile plots.

Figure A1. Comparison for the two set of premiums

{\hat{π}}^{nn} (x_{i})

and

{\hat{π}}^{reg} (x_{i})

using decile plots.

Appendix D. Methodology Flowchart

Below, we present the flowchart illustrating the model selection methodology adopted in this study. The diagram summarizes the key steps followed for evaluating and comparing neural network-based models with traditional approaches, based on statistical performance metrics and model lift measures.

Figure A2. Methodology flowchart.

Notes

1	The Italian labor market is seniority driven, making it particularly challenging to attain managerial positions at younger ages.
2	For instance, a blood test claim is split up in each of its components, such as the following: hemoglobin (Hgb) test, red cell distribution width (RDW) test, bilirubin test, calcium test, and so on. For each single component, the insurance company registers a different claim, leading to inflated claim counts.
3	We can observe that our dataset contains claims with very low severities. These cases generally occur when insured limits are low. For example, among the Diagnostic claims, the bilirubin test, which has an insured limit of EUR 2.5, has a generally low claim cost. The same applies to the bite wings claim (belonging to Dentalcare claims), which has an insured limit of EUR 5. Overall, our dataset includes small claim amounts since each claim event is split into different components. For instance, the blood test claim is divided into a hemoglobin (Hgb) test, red cell distribution width (RDW) test, bilirubin test, calcium test, and so on. The insurance company registers a different claim for every component, leading to small claim amounts.
4	The network architecture was selected through grid search, testing various depths and activation functions to identify the optimal configuration.
5	This variable is first log transformed. The log transformation of this variable is also used for the NMR.
6	In Appendix D, we summarize the different steps adopted in our study to validate the neural network-based models through a flowchart.

References

Albrecher, Hansjörg, Jan Beirlant, and Jozef L. Teugels. 2017. Reinsurance: Actuarial and Statistical Aspects. Hoboken: John Wiley & Sons. [Google Scholar]
Blier-Wong, Christopher, Hélène Cossette, Luc Lamontagne, and Etienne Marceau. 2021. Machine learning in p&c insurance: A review for pricing and reserving. Risks 9: 4. [Google Scholar] [CrossRef]
Chapados, Nicolas, Yoshua Bengio, Pascal Vincent, Joumana Ghosn, Charles Dugas, Ichiro Takeuchi, and Linyan Meng. 2002. Estimating car insurance premia: A case study in high-dimensional data inference. Advances in Neural Information Processing Systems 2: 1369–76. [Google Scholar]
Delong, Łukasz, Mathias Lindholm, and Mario V. Wüthrich. 2021. Making tweedie’s compound poisson model more accessible. European Actuarial Journal 11: 185–226. [Google Scholar] [CrossRef]
Denuit, Michel, and Stefan Lang. 2004. Non-life rate-making with bayesian gams. Insurance: Mathematics and Economics 35: 627–47. [Google Scholar] [CrossRef]
Denuit, Michel, Dominik Sznajder, and Julien Trufin. 2019. Model selection based on lorenz and concentration curves, gini indices and convex order. Insurance: Mathematics and Economics 89: 128–39. [Google Scholar] [CrossRef]
Drewe-Boss, Philipp, Dirk Enders, Jochen Walker, and Uwe Ohler. 2022. Deep learning for prediction of population health costs. BMC Medical Informatics and Decision Making 22: 32. [Google Scholar] [CrossRef]
Dugas, Charles, Yoshua Bengio, Nicolas Chapados, Pascal Vincent, Germain Denoncourt, and Christian Fournier. 2003. Statistical learning algorithms applied to automobile insurance ratemaking. In CAS Forum. Princeton: Citeseer, vol. 1, pp. 179–214. [Google Scholar]
Ian Duncan, Micheal Loginov, and Micheal Ludkovski. 2016. Testing alternative regression frameworks for predictive modeling of health care costs. North American Actuarial Journal 20: 65–87. [Google Scholar] [CrossRef]
Erhardt, Vinzenz, and Claudia Czado. 2012. Modeling dependent yearly claim totals including zero claims in private health insurance. Scandinavian Actuarial Journal 2012: 106–29. [Google Scholar] [CrossRef]
Frees, Edward, Jie Gao, and Marjorie Ab. 2011. Predicting the frequency amount of health care expenditures. North American Actuarial Journal 15: 377–92. [Google Scholar] [CrossRef]
Friedman, Jerome H., and Bogdan E. Popescu. 2008a. Predictive learning via rule ensembles. The Annals of Applied Statistics 2: 916–54. [Google Scholar] [CrossRef]
Gabrielli, Andrea, Ronald Richman, and Mario V. Wüthrich. 2020. Neural network embedding of the over-dispersed poisson reserving model. Scandinavian Actuarial Journal 2020: 1–29. [Google Scholar] [CrossRef]
Ghahari, Azar, Nathaniel K. Newlands, Vyacheslav Lyubchich, and Yulia R. Gel. 2019. Deep learning at the interface of agricultural insurance risk and spatio-temporal uncertainty in weather extremes. North American Actuarial Journal 23: 535–50. [Google Scholar] [CrossRef]
Guelman, Leo. 2012. Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications 39: 3659–67. [Google Scholar] [CrossRef]
Henckaerts, Roel, Marie-Pier Côté, Katrien Antonio, and Roel Verbelen. 2021. Boosting insights in insurance tariff plans with tree-based machine learning methods. North American Actuarial Journal 25: 255–85. [Google Scholar] [CrossRef]
Holvoet, Freek, Katrien Antonio, and Roel Henckaerts. 2025. Neural networks for insurance pricing with frequency and severity data: A benchmark study from data preprocessing to technical tariff. North American Actuarial Journal, 1–44. [Google Scholar] [CrossRef]
Ismail, Noriszura, and Hossein Zamani. 2013. Estimation of claim count data using negative binomial, generalized poisson, zero-inflated negative binomial and zero-inflated generalized poisson regression models. In Casualty Actuarial Society E-Forum. Berlin and Heidelberg: Springer, vol. 41, pp. 1–28. [Google Scholar]
Jose, Alex, Angus S. Macdonald, George Tzougas, and George Streftaris. 2022. A combined neural network approach for the prediction of admission rates related to respiratory diseases. Risks 10: 217. [Google Scholar] [CrossRef]
Kaushik, Keshav, Akashdeep Bhardwaj, Ashutosh Dhar Dwivedi, and Rajani Singh. 2022. Machine learning-based regression framework to predict health insurance premiums. International Journal of Environmental Research and Public Health 19: 7898. [Google Scholar] [CrossRef]
Kim, Juhyun, Yiwen Zhang, Joshua Day, and Hua Zhou. 2018. Mglm: An r package for multivariate categorical data analysis. The R Journal 10: 73. [Google Scholar] [CrossRef]
Klugman, Stuart A., Harry H. Panjer, and Gordon E. Willmot. 2012. Loss Models: From Data to Decisions. Wiley Series in Probability and Statistics; Hoboken: Wiley. [Google Scholar]
Lee, Simon C. K., and Sheldon Lin. 2018. Delta boosting machine with application to general insurance. North American Actuarial Journal 22: 405–25. [Google Scholar] [CrossRef]
Lorentzen, Christian, and Michael Mayer. 2020. Peeking into the Black Box: An Actuarial Case Study for Interpretable Machine Learning. SSRN. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3595944 (accessed on 20 April 2025). [CrossRef]
Orji, Ugochukwu, and Elochukwu Ukwandu. 2024. Machine learning for an explainable cost prediction of medical insurance. Machine Learning with Applications 15: 100516. [Google Scholar] [CrossRef]
Rajitha, C., and K. Sakthivel. 2017. Artificial intelligence for estimation of future claim frequency in non-life insurance. Global Journal of Pure and Applied Sciences 13: 1701–10. [Google Scholar]
Richman, Ronald. 2021a. Ai in actuarial science—A review of recent advances—Part 1. Annals of Actuarial Science 15: 207–29. [Google Scholar] [CrossRef]
Richman, Ronald. 2021b. Ai in actuarial science—A review of recent advances—Part 2. Annals of Actuarial Science 15: 230–58. [Google Scholar] [CrossRef]
Schelldorfer, Jürg, and Mario V. Wüthrich. 2019. Nesting Classical Actuarial Models into Neural Networks. SSRN. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3320525 (accessed on 20 April 2025). [CrossRef]
Sibuya, Masaaki, Isao Yoshimura, and Ryoichi Shimizu. 1964. Negative multinomial distribution. Annals of the Institute of Statistical Mathematics 16: 409–26. [Google Scholar] [CrossRef]
Spedicato, Giorgio, Christophe Dutang, and Leonardo Petrini. 2018. Machine learning methods to perform pricing optimization. A comparison with standard glms. Variance 12: 69–89. [Google Scholar]
Talaei Khoei, Tala, Hadjar Ould Slimane, and Naima Kaabouch. 2023. Deep learning: Systematic review, models, challenges, and research directions. Neural Computing and Applications 35: 23103–124. [Google Scholar] [CrossRef]
ul Hassan, Ch. Anwar, Jawaid Iqbal, Saddam Hussain, Hussain AlSalman, Mogeeb A. A. Mosleh, and Syed Sajid Ullah. 2021. A computational intelligence approach for predicting medical insurance cost. Mathematical Problems in Engineering 2021: 1162553. [Google Scholar] [CrossRef]
Waller, Lance A., and Daniel Zelterman. 1997. Log-linear modeling with the negative multinomial distribution. Biometrics 53: 971–82. [Google Scholar] [CrossRef]
Wüthrich, Mario V. 2018. Machine learning in individual claims reserving. Scandinavian Actuarial Journal 2018: 465–80. [Google Scholar] [CrossRef]
Wüthrich, Mario V. 2019. From Generalized Linear Models to Neural Networks, and Back. SSRN. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3491790 (accessed on 20 April 2025). [CrossRef]
Wüthrich, Mario V. 2020. Bias regularization in neural network models for general insurance pricing. European Actuarial Journal 10: 179–202. [Google Scholar] [CrossRef]
Wüthrich, Mario V., and Michael Merz. 2019. Editorial: Yes, we cann! ASTIN Bulletin 49: 1–3. [Google Scholar] [CrossRef]
Zhang, Yiwen, Hua Zhou, Jin Zhou, and Wei Sun. 2017. Regression models for multivariate count data. Journal of Computational and Graphical Statistics: A Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America 26: 1–13. [Google Scholar] [CrossRef]

Figure 1. Negative multinomial neural network architecture with

K = 2

,

J = 3

,

q_{0} = 7

,

q_{1} = 5

and

q_{2} = 4

. The dimensions reported for the input and hidden layers in the image also include the bias term.

Figure 1. Negative multinomial neural network architecture with

K = 2

,

J = 3

,

q_{0} = 7

,

q_{1} = 5

and

q_{2} = 4

. The dimensions reported for the input and hidden layers in the image also include the bias term.

Figure 2. Gamma neural network architecture with

K = 2

,

q_{0} = 7

,

q_{1} = 5

and

q_{2} = 3

. The dimensions reported for the input and hidden layers in the image also include the bias term.

Figure 2. Gamma neural network architecture with

K = 2

,

q_{0} = 7

,

q_{1} = 5

and

q_{2} = 3

. The dimensions reported for the input and hidden layers in the image also include the bias term.

Figure 3. Histograms for the covariates in the dataset.

Figure 4. Correlogram for the claim counts.

Figure 5. Performance for both NMR and NM-NN, in-sample (left pane) and out-of-sample (right pane). On the y-axis, we report the Negative Multinomial deviance, while the x-axis stores the reference data fold.

Figure 6. Performance for both GAMLSS and NN, in-sample (left panes) and out-of-sample (right panes). On the y-axis, we report the Gamma deviance, while the x-axis stores the reference data fold.

Figure 7. Variable Importance for NMR (top) and Negative Multinomial Neural Network (bottom). The results reported are obtained on the last testing fold of the five-fold crossvalidation.

Figure 8. Variable Importance for GAMLSS (left) and neural network (right). The results reported are obtained on the last testing fold of the five-fold cross-validation. Each plot reports the results for a different claim type.

Figure 9. Partial dependence plots for Visits.

Figure 10. Partial dependence plots for Dentalcare.

Figure 11. Partial dependence plots for Diagnostic.

Figure 12. Partial dependence plots for Dentalcare—NN.

Figure 13. H-statistic—for the NM-NN.

Figure 14. H-statistic—for the Gamma NN.

Figure 15. Visits claim frequency grouped PD plots for AG and GE produced by the NM-NN.

Table 1. Summary of the variables available in the dataset.

Variable	Description
Claim counts—discrete dependent variable
$N_{1, i}$ (Visits claim counts)	number of medical visits claims submitted by the insured during the year.
$N_{2, i}$ (Dentalcare claim counts)	number of dentalcare claims submitted by the insured during the year.
$N_{3, i}$ (Diagnostic claim counts)	number of diagnostic claims submitted by the insured during the year.
Positive dependent variable
${\tilde{S}}_{i j}^{1}$ (Visits claim severity)	Visits claim severity (EUR) for the j-th claim submitted
	by policyholder i.
${\tilde{S}}_{i j}^{2}$ (Dentalcare claim severity)	Dentalcare claim severity (EUR) for the j-th claim submitted
	by policyholder i.
${\tilde{S}}_{i j}^{3}$ (Diagnostic claim severity)	Diagnostic claim severity (EUR) for the j-th claim submitted
	by policyholder i.
Covariates
AG (age)	Age of the insured (in years)
GE (gender)	Gender of the insured (male/female)
PE (permanence)	Years of permanence in the insurance coverage for the insured
RE (region)	Italian region of residence for the insured
	(categorical variables with 21 classes)
DM (dimension)	Dimension of the company the policyholder is working
	for (number of employees), for insured different from
	the policyholder, the dataset reports the value of the policyholder
FA (Family member)	Categorical variable reporting the family member type:
	Policyholder, Spouse, Ex-Spouse, Parents, and Children
Additional information
Insured ID	Insured identifier.

Table 2. Associations between variables: The Chi-square test (

χ^{2}

) is used for categorical variables, with an asterisk indicating significance at the 0.05 level. Correlation is used for numerical variables, and

η^{2}

is applied to assess the relationship between categorical and numerical variables.

Table 2. Associations between variables: The Chi-square test (

χ^{2}

) is used for categorical variables, with an asterisk indicating significance at the 0.05 level. Correlation is used for numerical variables, and

η^{2}

is applied to assess the relationship between categorical and numerical variables.

	GE	RE	FA	PE	AG	DM
GE		36.247 *	116,038 *	0.001	0.000	0.000
RE	36.247 *		752.672 *	0.004	0.007	0.017
FA	116,038 *	752.6 *		0.153	0.739	0.001
PE	0.001	0.004	0.153		0.676	−0.060
AG	0.000	0.007	0.739	0.676		−0.042
DM	0.000	0.017	0.001	−0.060	−0.042

Table 3. Claim counts summary statistics.

Summary Statistics	$N_{1, i}$	$N_{2, i}$	$N_{3, i}$
Median	1	0	1
Mean	1.91	1.59	7.49
95% Quantile	7	9	37
Variance	8.15	19.40	233.08
Proportion of positive claims	0.59	0.27	0.4

Table 4. GoF statistics for the distribution of the different claim counts. We report the Akaike and Bayesian information criteria. A lower value for the information criterion signals a better fit. We also report the Sum of Squared Errors (SSE) and the Mean Absolute Error (MAE), which provide additional insight into the model’s performance. Smaller values for SSE and MAE indicate a better fit to the data.

Type of Claim	Measure	Poisson	Negative Binomial
Visits $N_{1, i}$	AIC	1,340,309	1,017,746
	BIC	1,340,320	1,017,767
	SSE	$22.3 \times 10^{5}$	$22.3 \times 10^{5}$
	MAE	1.903	1.903
Dentalcare $N_{2, i}$	AIC	1,772,291	729,530
	BIC	1,772,302	729,551
	SSE	$53.1 \times 10^{5}$	$53.1 \times 10^{5}$
	MAE	2.370	2.370
Diagnostic $N_{3, i}$	AIC	5,723,392	1,428,572
	BIC	5,723,402	1,428,593
	SSE	$640.3 \times 10^{5}$	$640.3 \times 10^{5}$
	MAE	9.615	9.615

Table 5. Claim cost summary statistics in euros.

Summary Stats	Visits	Dentalcare	Diagnostic
Min.	10	2	1
1st Qu.	70	60	4
Median	100	120	10.5
Mean	105.8	266.53	38.73
3rd Qu.	130	330	49
Max.	7800	10,100	18,080
95%	199	950	150

Table 6. ABC and ICC measure of lift for the two sets of premiums.

Model	ABC	ICC
${\hat{π}}^{reg} (x)$	0.00827	0.37760
${\hat{π}}^{ind} (x)$	0.00736	0.44360
${\hat{π}}^{nn} (x)$	0.00562	0.37706

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Laporta, A.G.; Levantesi, S.; Petrella, L. A Neural Network Approach for Pricing Correlated Health Risks. Risks 2025, 13, 82. https://doi.org/10.3390/risks13050082

AMA Style

Laporta AG, Levantesi S, Petrella L. A Neural Network Approach for Pricing Correlated Health Risks. Risks. 2025; 13(5):82. https://doi.org/10.3390/risks13050082

Chicago/Turabian Style

Laporta, Alessandro G., Susanna Levantesi, and Lea Petrella. 2025. "A Neural Network Approach for Pricing Correlated Health Risks" Risks 13, no. 5: 82. https://doi.org/10.3390/risks13050082

APA Style

Laporta, A. G., Levantesi, S., & Petrella, L. (2025). A Neural Network Approach for Pricing Correlated Health Risks. Risks, 13(5), 82. https://doi.org/10.3390/risks13050082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Neural Network Approach for Pricing Correlated Health Risks

Abstract

1. Introduction

2. Frequency-Severity Approach and Proposed Models

2.1. Claim Frequency Model

2.1.1. Negative Multinomial Distribution and Negative Multinomial Regression

2.1.2. Negative Multinomial Neural Networks (NM-NNs)

2.1.3. Gamma Neural Network (Gamma-NN)

3. The Data

3.1. Covariates

3.2. Response Variable

3.3. Claim Severity

4. Results

4.1. Models Performance

4.2. Variable Importance

4.3. Main Effects

4.4. Interaction Effects

5. Ratemaking

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Frequency Models

Appendix B. Severity Models

Appendix C. Lift Measures

Appendix D. Methodology Flowchart

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI