Implementing Custom Loss Functions in Advanced Machine Learning Structures for Targeted Outcomes

Hitchen, Thomas; Nadarajah, Saralees

doi:10.3390/jrfm18070348

Open AccessArticle

Implementing Custom Loss Functions in Advanced Machine Learning Structures for Targeted Outcomes

by

Thomas Hitchen

and

Saralees Nadarajah

^*

Department of Mathematics, University of Manchester, Manchester M13 9PL, UK

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2025, 18(7), 348; https://doi.org/10.3390/jrfm18070348

Submission received: 23 May 2025 / Revised: 18 June 2025 / Accepted: 19 June 2025 / Published: 24 June 2025

(This article belongs to the Section Financial Technology and Innovation)

Download

Browse Figures

Versions Notes

Abstract

In the era of rapid technological advancement and ever-increasing data availability, the field of risk modeling faces both unprecedented challenges and opportunities. Traditional risk modeling approaches, while robust, often struggle to capture the complexity and dynamic nature of modern risk factors. This paper aims to provide a method for dealing with the insurance pricing problem of pricing predictability and MLOT (Money Left On Table) when writing a book of risks. It also gives an example of how to improve risk selection through suitable choices of machine learning algorithm and acquainted loss function. We apply this methodology to the provided data and discuss the impacts on risk selection and predictive power of the models using the data provided.

Keywords:

insurance; machine learning; risk

1. Introduction

Prediction accuracy and reliability has been a very important topic in both industry and academia alike. An early modeling technique used in the insurance modeling is generalized linear models, as in Murphy et al. (2000), with modeling around frequency and severity of risks as well as conversion and renewal probabilities. Early views of this topic were presented in 1994 with Renshaw (1994). It even occurs as the topic of a paper published as early as 1971 in Grimes (1971). One of the shortcomings associated with generalized linear models is the assumption that the relationship between the predictors and target variable are linear, with Cunha and Bravo (2022) documenting the non-linear relationship.

Prediction accuracy is crucial in regression because it directly measures the model’s ability to reliably estimate the dependent variable based on the independent variables. High prediction accuracy ensures that the regression model captures the underlying patterns in the data effectively, leading to trustworthy insights and actionable decisions. Refenes et al. (1994) provided a convincing background on the utility of neural networks within a financial framework and compared them to existing regression models, showing the advantages of more complex machine learning models. Bhardwaj and Anand (2020) outlines a comparison between multiple linear regression, decision trees, and gradient boosting related to health insurance price prediction.

As existing technology and statistical techniques developed and advanced, the industry around actuarial science and insurance moved into more algorithmic machine learning structures in their risk and pricing strategies. The new norm for insurance companies is to deploy Gradient Boosting Machines (GBMs) in their modeling for pricing and risk as in Clemente et al. (2023) and Guelman (2012). Gradient boosting applications have been a heavily researched topic in the last decade within the insurance industry. Bhardwaj and Anand (2020) applies it to health insurance, Su and Bai (2020) apply it to third party auto insurance for frequency and severity models, and Yang et al. (2018) applies it to premium prediction.

Neural networks for prediction accuracy are not currently used in the market for strong prediction models, as there is often a delay with the uncertainties around ’black box’ models and their implications. As discussed in Harris et al. (2024), there are combined versions of neural networks used in a more explainable manner; for example, the combination of a generalized linear model with a neural network so at least some proportion of the model architecture is explainable, as shown by Wüthrich and Merz (2019) as a Combined Actuarial Neural Network (CANN). Other attempts at interpretability include LocalGLMNet, as shown in Richman and Wüthrich (2023).

Yunos et al. (2016) presented a very early and low-resolution version using only four, five, and six inputs and a relatively low amount of data for training and testing for their iteration of the neural network. Wilson et al. (2024) outlined a very strong case for the use of gradient boosting machines, neural networks, or hybrid models as a means of predicting loss costs.

Table 1 shows reporting data from the UK government UK Parliament Commons Library (2023) (based on data from Ernst Young, a leading consultant). Table 1 shows that only three of the last six years of the market in UK car insurance were profitable, and the margins experienced in the Net combined ratio had a maximum of 9.7%. To this end, for car insurance companies, it is important to have reliable and accurate predictions with a good selection of risk available to stand a chance of making money in the market.

The lack of profitability in the car insurance sector can be significantly attributed to the prevailing techniques and strategies employed within the insurance industry. Research highlights that profitability is closely tied to the management of various internal and external factors, including underwriting practices, handling of premium income, and restructuring of combined ratios which take into account claims and sinistrality ratios Abdelli (2023); Azmi et al. (2020); Solomon and Adewale (2022). The industry faces heightened pressures from emerging risks, shifts in customer behavior, and market saturation, necessitating enhanced risk assessment methodologies Drakulevski and Kaftandzieva (2021). Specifically, inadequate risk management and ineffective premium pricing techniques have been shown to exacerbate financial strain on insurers, thereby impacting overall profitability Msomi (2023). Furthermore, intensified competition prompts insurers to lower premiums, which, while attracting more clients, often leads to insufficient revenue to cover claims, illustrating a direct correlation between current pricing strategies and profitability outcomes Abdelli (2023); Adeel et al. (2022); Kulustayeva et al. (2020). This situation has highlighted the importance of refining risk management practices and investment strategies to foster sustainable profitability within the car insurance market as it grapples with evolving challenges Adeel et al. (2022); Drakulevski and Kaftandzieva (2021); Msomi (2023).

In this paper, we will contend with the question of how best to improve risk selection and reduce MLOT (Money Left On Table). MLOT is the amount by which the first-placed (cheapest) company was cheaper than the second-placed company. In the eyes of the cheapest company, it is most profitable for them to be cheapest, but by a very small amount. The second issue to contend with is the choice of risk selection; for a growing business, it is important to write as many policies as possible while still maintaining good profitability (lower loss ratio) through optimized pricing strategies. One way to approach both of these problems is through the use of suitable loss functions embedded in machine learning algorithms. We explain how this method would work in practice and the benefits of using this method as opposed to more frequently used methods in model building.

In this paper, we explore one way of attaining specific market outcomes with regard to a market price predicting model. We will explore using a neural network and a custom loss function to achieve desirable outcomes for the market predicting model. We build on what was achieved in Wilson et al. (2024) by taking into account complicated machine learning techniques such as Artificial Neural Networks (ANNs) but balancing the outcomes of reliability with business-oriented targets such as writing the correct volume as a business.

We need the following terminology:

Risk—A risk in insurance is the profile associated with one individual prospective policy: a person with all of their associated details and the associated car and its details.
Footprint—The footprint of a company is the selection of risks that a given insurance provider is willing to underwrite the risk of (i.e., one may choose to only write ages 30–60). The footprint is chosen according to the specific risk appetite of an insurance provider and its subsequent re-insurers.
Win—In market modeling, a win is a case where the price predicted is lower than that of the competitors.
Base rate—The base rate in this context is the adjustable rate to which all prices can be multiplied by some factor (say 1.1 for $10 %$ with biasing). It is a good way to control competitiveness uniformly and is often used as an offsetting tool for targeted price changes.
Exposure—Exposure refers to the measure of risk exposure, such as the number of insured units (e.g., vehicles, properties, or policy years).
EVY—Earned vehicle year is the number of vehicles with policies on record multiplied by the number of years enacted on the policies (e.g., if you have two car policies and each have been active for six months, you have one earned vehicle year).
Earned premium—Earned premium in insurance refers to the portion of the premium that an insurer has “earned” over a given period:

$Earned Premium = \frac{Total Premium \times Elapsed Time}{Total Policy Term} .$
Loss ratio—The loss ratio in insurance is a key financial metric that represents the proportion of claims paid to policyholders relative to the premiums earned by the insurer:

$Loss Ratio = \frac{Incurred Losses}{Earned Premium} \times 100 .$

2. Data Infrastructure

The data we are working with in this paper is provided by PeppercornAI Insurance, an MGA in the personal lines motor insurance industry operating within the UK and Northern Ireland. The data collected is from the time period of March 2024 to October 2024. As is the case for most insurers, the data were purchased from industry data suppliers (examples of which are Price Comparison Websites (PCWs), Experian Credit Score, and other niche data stores) that provide expertise on critical factors in insurance pricing. The data provided contains 71 variables to model and 1 target variable. The target variable is the average of the top 5 prices in the UK for a given risk when quoted. The X variables contain a large mix of data for a given risk, including information on the proposed driver, the proposed car to insure, geographical information of the car, etc. These variables are widely used by the UK insurance market, guided by regulations instilled by the Financial Conduct Authority to ensure no unjust discrimination.

Given the data provided, we are trying to build the best model, appropriate for the footprint of the company. To do this, we will use data that is outside the footprint but that shares enough common variables for us to derive information about certain factors. With this in mind, we have the following:

X train is 400,2871 rows across 71 variables.
X test is 181,080 rows across 71 variables.
Y train is 4,002,871 rows with the 1 target variable.
Y test is 181,080 rows with the 1 target variable.

3. Modeling the Market

Here, we discuss the challenges and opportunities within the market, present the model we build, and compare/contrast this with the current market standard equivalent

3.1. Why Model the Market?

In this section, we discuss the necessity to model the market and what it means to build a market model.

Traditionally, when trying to optimize a price for a given risk, a set of risk models would be built based on risk factors like the age of the driver, how long that person has held a license, and car details like the value of the car, etc. When this is the case, we use “Exposure” as the target variable. Exposure is a critical factor because it helps insurers estimate the likelihood and potential cost of claims associated with insuring a particular driver or vehicle. Exposure represents the degree to which the insured entity (e.g., the car or the driver) is susceptible to losses due to various risk factors.

For small and emerging businesses with no history of trading, their exposure starts at zero, and as they start to write business, their exposure increases. Naturally, the limitation to this is having no target variable to aim at. Most large insurers will use 1–2 years of previous history mixed with forecasts of future data to make accurate and reliable models. This presents the challenge of modeling what the market would like to price a certain risk.

With this in mind, small and emerging businesses will build a `proxy’ risk model that feeds in all the same information as a risk model except for the prior knowledge, then sets the target variable to be the price that the market employed for a given risk. For this paper, we use the average of the top five prices for a given risk. There are pros and cons to using this method. The pros are that by using an average of five companies as the target, we are protected in part from one company having extreme under pricing and damaging the prediction. Secondly, if the type of risk is quite obscure and only few (less than five) companies are willing to underwrite the risk, the target variable is not populated and thus the model would fail/decline to price what is deemed to be a volatile/high-risk profile case. The cons of pricing to the market are firstly, as indicated in the introduction, that the insurance industry lost money in 3 of the last 6 years. This means that on average, the prices were too low to be profitable, and that if the market is wrong then our prediction will be wrong, irrespective of model quality.

3.2. The Problem of Deselection

Deselection or `adverse selection’ is a significant issue in insurance markets, well described in the modern actuarial literature in Dionne et al. (2001) and Cohen and Siegelman (2010). It occurs when there is an asymmetry of information between the insurer and the insured, leading to a situation where those most likely to make a claim are also the most likely to purchase or renew insurance, while those less likely to claim may opt out of purchasing insurance altogether. This happens primarily for four reasons:

Information Asymmetry: Policyholders often have more detailed knowledge about their own risk levels (health, lifestyle, driving habits, etc.) than the insurance company.
Risk Misalignment: People posing higher risk are more motivated to obtain insurance, knowing that they are more likely to benefit from it. Conversely, those with lower risks might choose not to buy insurance or might seek minimal coverage.
Premium Challenges: Insurers typically set premiums based on average risk. However, if higher-risk individuals disproportionately purchase insurance, the overall risk pool worsens. The premiums set may not cover the actual claims costs, leading to potential losses for the insurer.
Market Impact: To counteract this, insurers may raise premiums, but this can exacerbate the problem. Higher premiums could drive low-risk individuals out of the market, leaving an even higher concentration of high-risk policyholders, leading to further premium increases and potentially causing a “death spiral” where the insurance pool collapses.

With reference to the deselection problem, an appropriate way of lowering costs is to build a model that accepts a high proportion of the market with the lowest price and then proceeds to eliminate bad risks with judgment-based overlays post modeling. Therefore, the goal of this modeling process is to find some optimized trade off between `Proportion of the market won’ and `Average cost per win’.

3.3. The Role of the Loss Function

Typically, in the insurance setting, there are a number of loss functions used for specific purposes in modeling. There is a wealth of literature in the actuarial and data science field concerning which loss functions are most appropriate in different situations. Denuit and Trufin (2017) discuss one example of a loss function in Tweedie loss and its utility in total loss. Tweedie loss functions are particularly useful for handling skewed data like total losses. Wüthrich and Merz (2023) provide a deep dive in the book on the role of loss functions in different model types. Often times, when tailoring to extreme values or specific targeted quantiles of data, we use quantile loss, as in Conradt et al. (2015).

In line with the deselection issue, we need to construct an asymmetrical loss function that penalizes errors below the prediction at a higher rate than the penalization above the prediction. This, in turn, will skew the predicted vs actual graph on average to have more over predictions and less under predictions, but it will in turn lower the `Average cost per win’. Below are some examples of custom asymmetrical loss functions compared against a standard asymmetrical loss function (quantile loss) that was deemed unfit for the type of model we are aiming to produce due to slow conversion rates and extreme over predictions. The quantile loss function is defined in Equation (1) as follows:

L_{α} (y, f (x)) = \{\begin{matrix} (α - 1) [y - f (x)], if y < f (x), \\ (1 - α) [y - f (x)], if y \geq f (x), \end{matrix}

(1)

where

α \in (0, 1)

is the given quantile. Figure 1 shows the loss functions in Equations (2)–(4) used comparatively against the loss function in Equation (1). In order, the images of Figure 1 correspond to Equations (1) and (2), Equations (1) and (3), and Equations (1) and (4), respectively.

L_{1} (y, f (x)) = \{\begin{matrix} |y - f (x)|, if y < f (x), \\ {[y - f (x)]}^{2}, if y \geq f (x), \end{matrix}

(2)

L_{2} (y, f (x)) = \{\begin{matrix} |y - f (x)|, if y < f (x), \\ {[y - f (x)]}^{1.5}, if y \geq f (x), \end{matrix}

(3)

L_{3} (y, f (x)) = \{\begin{matrix} {[y - f (x)]}^{1.4}, if y < f (x), \\ {[y - f (x)]}^{2}, if y \geq f (x) . \end{matrix}

(4)

Equation (2) shows a loss function that is linear on the over predictions and quadratic on the under predictions. Equation (3) is linear on over predictions and exponentiated at

X^{1.5}

on under predictions. Equation (4) is exponentiated to

X^{1.4}

on over predictions and

X^{2}

on under predictions. All of these loss functions were tested on a smaller 10% subset of the data through a period of trial and error on model outcomes to rule out and narrow down to a more suitable loss function. Equations (1) and (2) had the same problem as the one we are aiming to solve; it allowed for too small a proportion of cases where the predictions were lower than actual case. This meant that we could eliminate the over predictions from being corrected in a linear fashion to something more exponentiated. Equation (3) also had the problem of not winning an adequate proportion of cases for the reason that the gap between the powers is too big, with one side being much more extreme.

In the end, the loss function that was settled on was one that was slightly exponentiated (

X^{1.15}

) on the over predictions to control a large level of over prediction and more exponentiated on the under predictions to keep the bound on under prediction as small as possible (

X^{1.55}

). This is shown in Equation (5) and Figure 2, illustrating what the loss function looks like.

L (y, f (x)) = \{\begin{matrix} {[y - f (x)]}^{1.15}, if y < f (x), \\ {[y - f (x)]}^{1.55}, if y \geq f (x) . \end{matrix}

(5)

3.4. The Model

Here, we introduce the model type and the technical details about the model. For this application, a neural network is employed due to its ability to model complex, non-linear relationships between input features and the target variable. Traditional methods, such as linear regression, fail to capture these intricate patterns, making neural networks a more suitable choice for the problem.

The neural network architecture used in this model comprises 7 fully connected layers: an input layer with 71 neurons, a series of hidden layers with a sequentially lowering number of neurons, and an output layer with 1 neuron corresponding to the regression output. ReLU (Rectified Linear Unit) activation functions (

f (x) = \max (0, x)

) are employed in five of the hidden layers to introduce non-linearity, while the ELU (Exponential Linear Unit) function is used in the output layer. The network is trained using the custom loss function we defined earlier, optimized with the Adam optimizer, with a learning rate of 0.0002 and a batch size of 4096 over 500 epochs. Table 2 shows a record of the tested hyperparameters.

3.5. Model Outputs

Here, we discuss the outputs from the model when used on the test datatset regarding whether it matches the goals of the company. In Table 3, there are the differences between the predicted value and the actual value, the mean, the minimum, and maximum. At this point, it is worth categorizing errors below zero as a risk that we are very competitive for, as our price prediction is lower than that of the market, and it is assumed that in most scenarios, it is a piece of business that would be written. Opposing this is errors above zero, which indicate that the price predicted is above that of the market and therefore is uncompetitive and very unlikely to be a written piece of the business. Traditionally, in machine learning settings, the desired outcome from a modeling perspective is low mean error with a good amount of symmetry. The important distinction between the usual case and our case is the asymmetry in outcomes for pricing. In the case that the price is GBP 500 more expensive than the second place, this would not be bought. In the case that the pricing is GBP 500 cheaper than the second place, it would almost certainly be bought and at a substantial MLOT amount for the insurer. This asymmetry means that we must have asymmetrical error amounts that are both above and below.

All of the percentile values of the magnitude of errors were banded into ranges, see Table 4. These ranges were defined quite loosely to reflect specific risk patterns and potential losses incurred for a given group of risks (e.g., a risk that should be 400 is materially different than one that should be 1700; therefore, we may want more of one than the other given a specific footprint).

Here, we see that generally, there is a decent level of accuracy for all levels of the premium. The band in which the accuracy is best is the 300–600 range; this is most likely due to the higher density of training data available for that range. But, it should be noted that the training is performed with errors in mind on the higher end due to the nature of the loss function, so having a slightly lower proportion in the 10% range is not completely unreasonable.

Figure 3 shows the predicted vs. actual values; as we see, there is a bias towards over prediction (more above the line

y = x

than below), which protects us from drastic under pricing.

The specific subset of errors we are interested in in this case is the ones below zero, see Table 5. Therefore, we take the subset of all cases below zero and the min, max, and mean of this subset. This as described above as the MLOT problem. This subset includes all the cases we would expect to write as insurance risks. Typically, in the modeling space, the metrics used are employed on the whole set of data rather than specific subsets.

Below are the calculations and proportions of all the errors above (ones we will not write) and all the errors below (ones we would expect to write), see Table 6. Here, we see there is a clear intended bias of overestimating rather than under estimating, but with a substantial proportion ( 35%) still available to write. These results highlight the asymmetry we described earlier when constructing the loss function. It means we over price for 65% of cases and under price for 35% of cases, in line with our aims. The model that this research intends to replace had a split of 98.5% over predictions to 1.5% under predictions, which was built using the quantile loss function applied to a GBM. The clear problem attached to the old modeling method is that 1.5% is a very limited pool to select the risks from (essentially, for a growing company, one must write everything in the available pool).

3.6. Understanding the Black Box Model

Since a neural network is a `black box model’, it does not come with the added benefit of being able to clearly understand the mechanisms as with a generalized linear model, so we need other tools to understand things like factor-level importance. Some of the methods for understanding and visualizing the errors for black box regression models are discussed in Štrumbelj and Kononenko (2011). For this, we use partial dependence plots (as recommended in Harris et al. (2024)) to see whether certain factors would behave as we expect. In all future partial dependence plots shown, the gray bars show the exposure (number of cases) of a particular level of a factor, and the blue line shows the price relativity as a multiplicative factor. For example, we can see in Figure 4 that people with the intention of paying monthly (represented by 1) and those with the intention of paying annually (represented by 0) have a price relativity of about 5%.

It is important that we review some of the factors in the model to verify the behavior that we expect to occur. One example of this is IQLT or Initial Quote Lead Time; this is the time between the quote and the required time for the insurance product to take effect. Naturally, if a client prepares well in advance, the price is lower than that of people who are unprepared and need the insurance product to start immediately, and some gradient will exist between the two. Below, we can see that we have a good level of exposure at each level of the factor from 0 days (insurance product is required the same day) to 29 days before the insurance product is required, see Figure 5. Here, we see that planning for insurance 20+ days prior to the requirement results in a

\sim 19 %

discount. This behavior is normal for the industry; thus, we gain confidence in the ability of the model to not produce bizarre results for this factor.

We detail some important, higher-impact cases and whether there are any outlying cases that might not make sense. Figure 6 is a clear example of a case of high logical and numerical importance. Minimum license length is logically a very important factor, i.e., the longer someone has had their license with more experience driving, the more experienced they would be and therefore the less likely they are to incur a loss. Ultimately, drivers with zero experience are penalized at nearly double the price of those with 20+ years of experience. The high differential in the level pricing means this factor will be an important one in the model given that there is a good amount of exposure at each level of the factor.

We can also use the partial dependence plots to remove relatively ineffective factors and iteratively improve model factor selection for future builds. Figure 7 is an example of a factor that is quite ineffective given that there is a good amount of exposure at each level of the factor, but the highest price differential is 0.3%. Therefore, for future iterations of the model, we may seek to refine this factor or remove it entirely.

Figure Figure 8 shows a selection of other interesting factors, one with a nonlinear trend, one with a (loosely) linear trend, and the last with a convex-shaped trend. It is clear in the first case that a larger mileage represents a larger risk exposure in general (i.e., there exist more opportunities for a crash to happen). The interesting feature in this is the small upward tail in the far left hand area of the graph. This may suggest that an ultra-low mileage is representative of driving inactivity/lack of experience, hence the slight upward inflection.

The second is the number of claims that the profile has lodged in recent history. It is important to note that each additional claim represents a 10–12% price increase.

There is a slight concave shape to the final trend; this can be explained quite simply by making the case that cars with more powerful engines are more expensive and thus the insurance cost should be higher. The lower end has a slight upward trend for low horsepower, which is explained by cars with lower powers usually being older/obsolete, making them harder to find replacement parts for. Examples of cars that would fall into this category would be British Leyland classics like the old Mini or Triumph.

Figure 9 shows three plots reflecting the vehicle dimensions (length, height, and width). We show these plots of partial dependence to illustrate the importance of knowledge of a factor. The zeros in all three graphs reflect where the data provided was unknown. In the length case, the absence of information can cause the price to more than double. When height is unknown, the price can go up by almost a factor of four, and the width being null can be five times more expensive in the pricing. We see that for length, height, and width, where the bulk of the data exist, we see relatively explainable patterns of behavior. For example, shorter cars are often sporty, low to the ground, two-seaters, which would imply a higher price to fix/replace than a standard family saloon. All of this comes together to suggest that the three noted factors are important for the model.

4. Results and Discussion

In this section, we discuss the results of the current (champion) model with the proposed model (challenger). The champion model is a gradient boosting algorithm using the package HistGradientBoostingRegressor from the Sci-kit Learn library. For consistency, any reference to “V5” is the Gradient Boosting Algorininehm and “V6” is the ANN.

Table 7 purely considers the predictions of the model on the test dataset where the prediction was lower than the actual value. In practice, this means that this is a risk we are likely to write, as this would be the cheapest price available for a customer on their particular quote. To this end, we are interested in all cases where we are cheaper (underpredicted) than the next best price, as this is the book of risks we would be likely to write. Predicting a price of GBP 2000 when the best price is GBP 1000 means we would not write this risk and we therefore do not care about how far above we are. Table 7 below shows that on the given dataset, the worst underprediction for the V5 model was GBP 1151, whereas the V6 was GBP 1902. The average error for the V5 was GBP 60.5, but the V6 was GBP 70.25. These slightly worse performing metrics can be traded off against the significantly higher proportion of winning cases available to underwrite. Table 8 shows the former only has a limited selection of 2.44% of winning cases, meaning the risk selection criteria are very limited in the post model (e.g., if we know that `Jaguar’ has a particularly high loss ratio, we may choose to apply further price increases to that specific subset and reduce the 2.44% further). It is clear that with 34.7% of the market available to write, it is possible to make the necessary post model adjustments and still have enough choice of risk left over to build a healthy book of risks.

It should further be noted that the target variable is described as the average of the top five prices, meaning that the actual lowest price to beat would be some amount lower than that, and where there is significant dispute in the risk profiles (high deviation in the top five prices), the risk profile is less likely to be written. This means that 2.44% is an absolute upper limit to the number of risks written, and in actuality, it would probably be much lower than when taking into account the post model overlays and the `wins’ that do not drop below the actual lowest price. Table 9 is an example of such a case. The average of these values is GBP 540, which would be the target variable to `beat’. Then, any value from GBP 501–539 would be considered a win in the eyes of the model but would not be the cheapest price on the market, meaning it is unlikely to translate into a sale.

In Figure 10, we see two shaded regions for the proportion of the distribution of cases won in each model. On the left is the champion model; on the right is the challenger model. Clearly, the larger proportion of wins sits with the new model. The benefit of this is that it grants a wealth of selection of risk to the user. There is a greater level of control of the selection of risks being written in the post model analysis than the champion model.

Figure 11 shows the predicted vs actual values for both the V5 and V6 models. The live deployed model (V5) throws away a large portion of quotes and wildly overpredicts in many case. This is used as a defense mechanism to protect against wild under predictions, but it offers very little control over volume and risk selection. On the other hand, V6 offers a vast improvement on the risk selection and availability of risks to write and a better control over volume.

5. Conclusions

The purpose of this paper was to construct an excellent technical model in the field of insurance pricing while attempting to optimize over external factors like profitability, risk exposure, and volume of sales. This was based on the interaction between an ANN and a custom loss function to solve the deselection issue. The comparison model was improved upon by allowing for access to a larger group of potential written risks, allowing for more freedom to choose the risks in the post model overlays that dictate the final prices. The models were trained on data that includes both risks inside and outside the footprint of the underwriting rules applied at PeppercornAI, then tested on a secondary dataset containing only risks inside the footprint of the company to evaluate the actual performance that would be observed by both models should they both be promoted into a live environment.

The results showed that when taking into account factors outside of the exact prediction accuracy, the ANN significantly outperformed the gradient boosting machine in the field of risk selection and risk availability while not being overly compromised on prediction accuracy. These results have implications for the insurance sector, indicating a need for more significant model development and research in the area of proxy risk models (models without knowledge of exposure).

The study of insurance pricing through the use of ANNs presents substantial research implications, particularly in enhancing predictive accuracy and profit generation within the insurance sector. By successfully integrating a customized loss function to address deselection issues, this paper offers a strong case for the applicability of ANNs in insurance underwriting, suggesting that their optimization extends beyond mere prediction accuracy to include factors like risk exposure and profitability Bishop (2024). Furthermore, analyses indicate that ANNs can outperform traditional approaches, such as gradient boosting, in terms of risk selection, exemplifying a transformative potential for the insurance industry. This underscores the urgent need for comprehensive model development, particularly regarding proxy risk models, which operate without full exposure knowledge Bishop (2024); Hamida et al. (2024).

However, certain limitations hinder the generalization of the findings of this research. A primary limitation is the model’s dependency on historical data, which may not accurately reflect future risks due to market volatility and evolving risk profiles within the insurance landscape Kang and Xin (2024). Moreover, the performance evaluation primarily accounted for risks within specific underwriting guidelines, which may restrict sample diversity and potentially overlook various market segments Sarker (2021). These constraints could bias the results, ultimately calling for further studies that encompass a broader array of risk factors and datasets, especially those outside conventional underwriting rules. Expanding the scope of datasets could allow future research to examine how these models perform across different insurance lines and regulatory environments Sarker (2021).

Future research paths should emphasize the integration of user-friendly tools that promote model transparency and interpretability concerning the implications of utilizing complex machine learning methods in pricing strategies Kang and Xin (2024). There is also a pressing need for more diverse datasets that enable the assessment of risks not typically covered under existing underwriting models. Researchers should explore hybrid models that combine ANNs with other advanced analytical methods, which may enhance robustness and adaptability to changing market conditions Bishop (2024). Given the growing complexity of insurance products and regulatory challenges, the development of adaptable frameworks to accommodate diverse risk factors will be essential Kang and Xin (2024). The call for continuous innovation in this area further highlights the importance of interdisciplinary collaboration to address the challenges of trust and accountability in AI implementations within the insurance sector Bishop (2024); Hamida et al. (2024).

Author Contributions

Conceptualization, T.H. and S.N.; methodology: T.H. and S.N.; investigation: T.H. and S.N.; writing: T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

Both authors would like to thank the three referees and the editor for their careful reading and comments, which greatly improved this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abdelli, A. A. (2023). The guarantee pricing and the analysis of the automotive branch insurance profitability in Tunisia. American Journal of Economics and Business Innovation, 2(3), 145–157. [Google Scholar] [CrossRef]
Adeel, R., Habib, S., & Mehboob, S. (2022). Impact of premium on profit of insurance companies in Pakistan. Jinnah Business Review, 10(2), 35–42. [Google Scholar] [CrossRef]
Azmi, F., Irawan, T., & Sasongko, H. (2020). Determinants of profitability of general insurance companies in Indonesia. Jurnal Ilmiah Manajemen Fakultas Ekonomi, 6(2), 135–144. [Google Scholar] [CrossRef]
Bhardwaj, N., & Anand, R. (2020). Health insurance amount prediction. International Journal of Engineering Research & Technology (IJERT), 9, 1008–1011. [Google Scholar]
Bishop, N. (2024). Application of machine learning techniques in insurance underwriting. Journal of Actuarial Research, 2(1), 1–13. [Google Scholar] [CrossRef]
Clemente, C., Guerreiro, G. R., & Bravo, J. M. (2023). Modelling motor insurance claim frequency and severity using gradient boosting. Risks, 11, 163. [Google Scholar] [CrossRef]
Cohen, A., & Siegelman, P. (2010). Testing for adverse selection in insurance markets. Journal of Risk and Insurance, 77, 39–84. [Google Scholar] [CrossRef]
Conradt, S., Finger, R., & Bokusheva, R. (2015). Tailored to the extremes: Quantile regression for index-based insurance contract design. Agricultural Economics, 46, 537–547. [Google Scholar] [CrossRef]
Cunha, L., & Bravo, J. M. (2022, June 22–25). Automobile usage-based-insurance: Improving risk management using telematics data. 2022 17th Iberian Conference on Information Systems and Technologies (pp. 1–6), Madrid, Spain. [Google Scholar]
Denuit, M., & Trufin, J. (2017). Beyond the Tweedie reserving model: The collective approach to loss development. North American Actuarial Journal, 21, 611–619. [Google Scholar] [CrossRef]
Dionne, G., Gouriéroux, C., & Vanasse, C. (2001). Testing for evidence of adverse selection in the automobile insurance market: A comment. Journal of Political Economy, 109, 444–453. [Google Scholar] [CrossRef]
Drakulevski, L., & Kaftandzieva, T. (2021). Risk assessment providing solid grounds for strategic management in the insurance industry. European Scientific Journal, 17(15). [Google Scholar] [CrossRef]
Grimes, T. (1971). Claim frequency analysis in motor insurance. Journal of the Staple Inn Actuarial Society, 19, 147–154. [Google Scholar] [CrossRef]
Guelman, L. (2012). Gradient boosting trees for auto insurance loss cost modeling and prediction. Expert Systems with Applications, 39, 3659–3667. [Google Scholar] [CrossRef]
Hamida, A. B., Kacem, M., de Peretti, C., & Belkacem, L. (2024). Machine learning based methods for ratemaking health care insurance. International Journal of Market Research, 66(6), 810–831. [Google Scholar] [CrossRef]
Harris, R., Richman, R., & Wuthrich, M. V. (2024). Reflections on deep learning and the actuarial profession (al). Preprint. [Google Scholar] [CrossRef]
Kang, H., & Xin, R. (2024). Health insurance factor analysis. Advances in Economics Management and Political Sciences, 106(1), 200–211. [Google Scholar] [CrossRef]
Kulustayeva, A., Jondelbayeva, A., Nurmagambetova, A., Dossayeva, A. Z., & Bikteubayeva, A. S. (2020). Financial data reporting analysis of the factors influencing on profitability for insurance companies. Journal of Entrepreneurship and Sustainability Issues, 7(3), 2394. [Google Scholar] [CrossRef]
Msomi, T. S. (2023). Do underwriting profit factors affect general insurance firms’ profitability in South Africa? Insurance Markets and Companies, 15(1), 1–11. [Google Scholar] [CrossRef]
Murphy, K. P., Brockman, M. J., & Lee, P. K. W. (2000). Using generalized linear models to build dynamic pricing systems. Casualty Actuarial Society Forum, 2000, 107–139. [Google Scholar]
Refenes, A. N., Zapranis, A., & Francis, G. (1994). Stock performance modeling using neural networks: A comparative study with regression models. Neural Networks, 7, 375–388. [Google Scholar] [CrossRef]
Renshaw, A. E. (1994). Modelling the claims process in the presence of covariates. ASTIN Bulletin: The Journal of the IAA, 24, 265–285. [Google Scholar] [CrossRef]
Richman, R., & Wüthrich, M. V. (2023). LocalGLMnet: Interpretable deep learning for tabular data. Scandinavian Actuarial Journal, 2023, 71–95. [Google Scholar] [CrossRef]
Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2(3), 160. [Google Scholar] [CrossRef] [PubMed]
Solomon, F. O., & Adewale, A. J. (2022). Predictors of profitability in the nigerian insurance industry. Journal of Economics Finance and Management Studies, 5(11), 3367–3377. [Google Scholar] [CrossRef]
Su, X., & Bai, M. (2020). Stochastic gradient boosting frequency-severity model of insurance claims. PLoS ONE, 15, e0238000. [Google Scholar] [CrossRef]
Štrumbelj, E., & Kononenko, I. (2011, April 14–16). A general method for visualizing and explaining black-box regression models. 10th International Conference on Adaptive and Natural Computing Algorithms (pp. 21–30), Ljubljana, Slovenia. [Google Scholar]
Wilson, A. A., Nehme, A., Dhyani, A., & Mahbub, K. (2024). A comparison of generalised linear modelling with machine learning approaches for predicting loss cost in motor insurance. Risks, 12, 62. [Google Scholar] [CrossRef]
Wüthrich, M. V., & Merz, M. (2019). Yes, we CANN! ASTIN Bulletin: The Journal of the IAA, 49, 1–3. [Google Scholar] [CrossRef]
Wüthrich, M. V., & Merz, M. (2023). Statistical foundations of actuarial learning and its applications. Springer Nature. [Google Scholar]
UK Parliament Commons Library. (2023). The rising cost of UK car insurance. Available online: https://commonslibrary.parliament.uk/the-rising-cost-of-uk-car-insurance/ (accessed on 25 November 2024).
Yang, Y., Qian, W., & Zou, H. (2018). Insurance premium prediction via gradient tree-boosted Tweedie compound Poisson models. Journal of Business and Economic Statistics, 36, 456–470. [Google Scholar] [CrossRef]
Yunos, Z. M., Ali, A., Shamsuddin, S. M., Noriszura, I., & Sallehuddin, R. (2016). Predictive modelling for motor insurance claims using artificial neural networks. International Journal of Advances in Soft Computing and its Applications, 8, 160–172. [Google Scholar]

Figure 1. Examples of custom losses.

Figure 2. Final iteration of the loss function.

Figure 3. Predicted vs actual.

Figure 4. Partial dependence plot of payment intention.

Figure 5. Partial dependence plot of IQLT.

Figure 6. Partial dependence plot of license length.

Figure 7. Partial dependence plot of linked address numbers.

Figure 8. Partial dependence plots of some other notable factors.

Figure 9. Partial dependence plots of some other notable factors.

Figure 10. Illustration of potential wins as a comparison.

Figure 11. Illustration of potential wins as a comparison.

Table 1. History of net combined ratio, as reported in UK Parliament Commons Library (2023).

Year	2018	2019	2020	2021	2022	2023	2024 (Forecast)
Net combined ratio	94.7%	100.8%	90.3%	96.6%	111.1%	112.8%	96.0%
Profit (loss)	5.3%	(0.8) %	9.7%	3.4%	(11.1)%	(12.8)%	4.0%

Table 2. Record of tested hyperparameters.

Parameter	Tested Variables
Learning rate	0.001, 0.0001, 0.00001
Epoch number	100, 250, 500
Batch size	512, 1024, 2048, 4096

Table 3. Error values.

Data	Value
Mean error	57.759125
Min error	−1902.7294
Max error	3849.5874

Table 4. Percentile errors.

	1%	2%	5%	10%	20%	50%
All	0.041960	0.085124	0.212523	0.414109	0.731609	0.993793
300–600	0.045139	0.090509	0.226319	0.442269	0.764597	0.994981
600–1000	0.041625	0.083398	0.208231	0.404137	0.717365	0.992235
1000–2000	0.038628	0.076735	0.199166	0.387154	0.698673	0.989459
>2000	0.049209	0.086115	0.195079	0.405975	0.710017	0.991212

Table 5. Subset of under prediction errors.

Data	Value
Minimum value above zero:	1902.7294
Maximum value above zero:	0.0010
Average value above zero:	70.2549

Table 6. Proportion of over predictions and under predictions.

Data	Value
V6_Upper_Prop	0.6524
V6_Lower_Prop	0.3475

Table 7. Comparison of under prediction errors.

Data	V5	V6
Minimum value above zero:	1151.0295	1902.7294
Maximum value above zero:	0.0233	0.0010
Average value above zero:	60.5283	70.2549

Table 8. Comparison of over/under prediction proportions.

Data	V5	V6
Upper proportion	0.9755	0.6524
Lower proportion	0.0244	0.3475

Table 9. Example of a false win.

Position	Price
1	GBP 500
2	GBP 520
3	GBP 540
4	GBP 560
5	GBP 580

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hitchen, T.; Nadarajah, S. Implementing Custom Loss Functions in Advanced Machine Learning Structures for Targeted Outcomes. J. Risk Financial Manag. 2025, 18, 348. https://doi.org/10.3390/jrfm18070348

AMA Style

Hitchen T, Nadarajah S. Implementing Custom Loss Functions in Advanced Machine Learning Structures for Targeted Outcomes. Journal of Risk and Financial Management. 2025; 18(7):348. https://doi.org/10.3390/jrfm18070348

Chicago/Turabian Style

Hitchen, Thomas, and Saralees Nadarajah. 2025. "Implementing Custom Loss Functions in Advanced Machine Learning Structures for Targeted Outcomes" Journal of Risk and Financial Management 18, no. 7: 348. https://doi.org/10.3390/jrfm18070348

APA Style

Hitchen, T., & Nadarajah, S. (2025). Implementing Custom Loss Functions in Advanced Machine Learning Structures for Targeted Outcomes. Journal of Risk and Financial Management, 18(7), 348. https://doi.org/10.3390/jrfm18070348

Article Menu

Implementing Custom Loss Functions in Advanced Machine Learning Structures for Targeted Outcomes

Abstract

1. Introduction

2. Data Infrastructure

3. Modeling the Market

3.1. Why Model the Market?

3.2. The Problem of Deselection

3.3. The Role of the Loss Function

3.4. The Model

3.5. Model Outputs

3.6. Understanding the Black Box Model

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI