A Combined Neural Network Approach for the Prediction of Admission Rates Related to Respiratory Diseases
Abstract
:1. Introduction
- First, we investigated the efficiency of the Poisson and negative binomial CANN models for predicting admission rates related to respiratory diseases in a United States (US) working population. In particular, we began by considering Poisson NN models, including a CANN model, and developed modifications based on early stopping and dropout techniques, which improve their performances. Subsequently, motivated by the suitability of the negative binomial distribution when data are over-dispersed, we also developed negative binomial NNs and compared their predictive performances to those of the Poisson models. NN-based models were trained by minimising the corresponding deviance loss functions and compared using the testing data loss. Models under the negative binomial distributional assumption led to superior forecasting performances. The same result was obtained when we eventually compared the Poisson CANN to the negative binomial CANN. Furthermore, it is worth noting that while machine learning approaches and CANN models, under both Poisson and negative binomial distributional assumptions, have been explored for data-driven applications in the field of non-life insurance, to the best of our knowledge, this is the first paper that considered employing such methods for morbidity modelling in an insurance context.
- Second, we considered the bias-regularised version of the negative binomial NN and CANN models by modifying the intercept of their output layers following the approach of Wüthrich (2020). Additionally, we also addressed bias issues on a population level by extracting the last hidden layer of the NN and CANN models, fitting the corresponding negative binomial regression models and, therefore, controlling the portfolio bias by adjusting the intercept.
- Third, following the setup of Richman and Wüthrich (2020), we determined a nagging predictor in the case of the negative binomial NN and CANN models for taking advantage of the randomness of neural network calibrations to provide more stable predictions than those under a single neural network run.
- Finally, for providing reliable comparisons between the performance of the regression, NN, and CANN models under the negative binomial assumption, k-fold validation was carried out to allow us to evaluate the model’s predictive ability when different data configurations were considered for training and prediction purposes.
2. Data
2.1. Data Description
2.2. Data Pre-Processing
3. Models
3.1. Regression Models
3.2. Neural Network Model
3.3. CANN
4. Model Fitting
4.1. Hyperparameters
- Number of hidden layers: the number of hidden layers was kept at three.
- Activation function: for hidden layers, the hyperbolic tangent function, , was used. Any alternate non-linear activation function would work. The motivation behind a non-linear activation function is that a non-linear activation function allows for a non-linear model space, reducing the number of nodes needed and allowing the network to automatically capture the interaction effect of different features. For the output layer, an exponential function was used, which is the inverse of the link function () and, therefore, is in line with the underlying distributional assumption.
- Gradient descent method: the neural network training utilises a gradient descent optimisation algorithm for estimating the model weights. Ferrario et al. (2020) compared different GDMs in terms of performance and identified the Nesterov-accelerated adaptive moment estimation (Nadam) method as performing better compared to other similar methods. Hence, we also adapted the Nadam as the choice of GDM. An overview of the different GDMs is given in Ruder (2016), and additional details regarding the ‘Nadam’ method could be found in Dozat (2016).
- Validation set: the training of neural network models requires further splitting the learning set into a training set, , and a validation data set, . The validation data set is used as the evaluation set during the iterative process for estimating the model weights. In other words, tracks possible overfitting of the model to . For the network-based models discussed here, and an 80:20 split was used for the training and validation data sets. Once the training is complete, the final performance of the fitted model is assessed using the testing set.
- Loss function: the loss function is the objective function that the GDM algorithm minimises in order to estimate the model weights (Goodfellow et al. 2016). Numerous options exist in terms of the choice of the loss function. For instance, mean squared error (MSE), mean absolute error (MAE), and deviance loss are some of the popular choices of loss functions used in a regression problem. For our context, we adapted deviance loss as the loss function. The motivation behind the particular choice is that minimising the deviance loss is equivalent to maximising the corresponding log-likelihood function, which gives the MLE. The deviance loss is defined as the difference between the log-likelihood of the saturated or full model and the fitted model, and for a data set the Poisson deviance loss is given by
4.2. Batch Size and Epochs
- Step 1: Initially, for different model architectures of varying complexities, different batch sizes were considered, keeping the number of epochs fixed to 1000. All considered models involve three layers, with a different number of nodes in each layer. The different model architectures were fitted using batch sizes of 10,000, 30,000, 50,000, 75,000, 100,000, 175,000, 250,000, 500,000, and 750,000. The performances of the models were compared using the testing (out-of-sample) loss , i.e., the deviance loss (Equation (17)) under the testing data set. The results are shown in Table A4 and Table A5 and illustrated in Figure 6. The tables also show the learning (in-sample) deviance loss, , and the portfolio average, i.e., the average fitted mean for the full data set under the considered models.All models, irrespective of their level of complexity, performed well with a batch size of 175,000. As anticipated, complex models had a higher testing loss with smaller batch sizes due to over-fitting. In general, the testing loss presented a decreasing trend for all considered models as the batch size increased from 10,000 to 175,000. For batch sizes greater than 175,000, both testing loss and learning loss for simpler models started to rise. This indicates under-fitting and shows that for batch sizes greater than 175,000, the complexity of simpler models with fewer neurons in the hidden layers is insufficient to fit the data effectively. Hence, this analysis suggested choosing a batch size of 175,000. Nevertheless, as all models had a comparable testing loss for batch size 175,000, three of the simpler models (NN (25,20,15), NN (20,15,10), and NN (15,10,5)) were considered further for identifying the optimal number of epochs.
- Step 2: in order to find the optimal number of epochs, the NN (25,20,15), NN (20,15,10), and NN (15,10,5) models were fit using different choices of epochs (100, 250, 500, 1000, 1500, and 2000), keeping the batch size fixed at 175,000. The results of these models are given in Table 4.
- Step 1: we now initially alter the number of epochs, keeping the batch size fixed at 30,000. The results of this step are given in Table A6. For all considered model architectures, except for NN (100,75,50), the testing loss was lower than that for the Poisson regression model, when the number of epochs was 250. Moreover, the testing loss was lowest for all considered model architectures when the number of epochs was 250 (see Figure 7). Hence, the number of epochs was chosen as 250.
- Step 2: For all model architectures other than (100,75,50), different batch sizes were considered with the number of epochs fixed at 250. The results are given in Table A7 and Table A8 and illustrated in Figure 8. With a batch size of 30,000, all models, except for NN (15,10,5), had testing losses lower than that for the Poisson regression model. When the batch size increased to 50,000, the NN (50,35,25) also had a similar testing loss. However, a batch size of 30,000 was chosen, as all models performed well under this choice. The combination of a batch size of 30,000 and 250 epochs was deemed optimal under the second approach.
4.2.1. Comparison of Approaches
5. Model Improvements
5.1. Approaches for Preventing Over-Fitting
5.1.1. Regularisation
5.1.2. Early Stopping
5.1.3. Dropout
5.1.4. Comparison of Model Improvement Approaches for Avoiding Over-Fitting
6. Negative Binomial Neural Network Models
6.1. Bias Regularisation
6.2. Nagging Predictor
- The split of the learning data into training and validation sets;
- The split of the training data into mini-batches;
- Model initialisation.
7. -Fold Validation
8. Concluding Remarks
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CANN | Combined Actuarial Neural Network |
CRD | Chronic respiratory diseases |
COPD | Chronic obstructive pulmonary disease |
GLM | Generalized linear model |
BUGS | Bayesian inference Using Gibbs Sampling |
MTPL | French motor third-party liability insurance |
NN | Neural network |
FFNN | Feed-forward neural network |
ICD | International Statistical Classification of Diseases |
GDM | Gradient descent methods |
MSA | Metropolitan Statistical Area |
XAI | Explainable Artificial Intelligence |
Appendix A
Appendix A.1. ICD Chapters and Variable Lookup Tables
Chapter | Codes | Title |
---|---|---|
1 | A00–B99 | Certain infectious and parasitic diseases |
2 | C00–D49 | Neoplasms |
3 | D50–D89 | Diseases of the blood, blood-forming organs, and certain disorders involving the immune mechanism |
4 | E00–E89 | Endocrine, nutritional, and metabolic diseases |
5 | F01–F99 | Mental and behavioral disorders |
6 | G00–G99 | Diseases of the nervous system |
7 | H00–H59 | Diseases of the eye and adnexa |
8 | H60–H95 | Diseases of the ear and mastoid process |
9 | I00–I99 | Diseases of the circulatory system |
10 | J00–J99 | Diseases of the respiratory system |
11 | K00–K95 | Diseases of the digestive system |
12 | L00–L99 | Diseases of the skin and subcutaneous tissue |
13 | M00–M99 | Diseases of the musculoskeletal system and connective tissue |
14 | N00–N99 | Diseases of the genitourinary system |
15 | O00–O9A | Pregnancy, childbirth, and the puerperium |
16 | P00–P96 | Certain conditions originating in the perinatal period |
17 | Q00–Q99 | Congenital malformations, deformations, and chromosomal abnormalities |
18 | R00–R99 | Symptoms, signs, and abnormal clinical and laboratory findings not classified elsewhere |
19 | S00–T88 | Injury, poisoning, and certain other consequences of external causes |
20 | V00–Y99 | External causes of morbidity and mortality |
21 | Z00–Z99 | Factors influencing health status and contact with health services |
Value | Description |
---|---|
1 | Nation, unknown region |
4 | Connecticut |
5 | Maine |
6 | Massachusetts |
7 | New Hampshire |
8 | Rhode Island |
9 | Vermont |
11 | New Jersey |
12 | New York |
13 | Pennsylvania |
16 | Illinois |
17 | Indiana |
18 | Michigan |
19 | Ohio |
20 | Wisconsin |
22 | Iowa |
23 | Kansas |
24 | Minnesota |
25 | Missouri |
26 | Nebraska |
27 | North Dakota |
28 | South Dakota |
31 | Washington, DC |
32 | Delaware |
33 | Florida |
34 | Georgia |
35 | Maryland |
36 | North Carolina |
37 | South Carolina |
38 | Virginia |
39 | West Virginia |
41 | Alabama |
42 | Kentucky |
43 | Mississippi |
44 | Tennessee |
46 | Arkansas |
47 | Louisiana |
48 | Oklahoma |
49 | Texas |
52 | Arizona |
53 | Colorado |
54 | Idaho |
55 | Montana |
56 | Nevada |
57 | New Mexico |
58 | Utah |
59 | Wyoming |
61 | Alaska |
62 | California |
63 | Hawaii |
64 | Oregon |
65 | Washington |
97 | Puerto Rico |
PLANTYP | Incentive to Use Certain Provider | Primary Care Physician (PCP) Assigned? | Referrals from PCP to Specialists Required? | Out-of-Network Services Covered? | Partially or Fully Capitated? |
---|---|---|---|---|---|
2. Comprehensive plan | No | No | n/a | n/a | No |
3. Exclusive provider organization plan | Yes | Yes | Yes | No | No |
4. Health maintenance organization plan | Yes | Yes | Yes | No | Yes |
5. Non-capitated (non-cap) point-of-service plan | Yes | Yes | Yes | Yes | No |
6. Preferred provider organization plan | Yes | No | n/a | Yes | No |
7. Capitated (Cap) or partially capitated (part cap) point-of-service plan | Yes | Yes | Yes | Yes | Yes |
8. Consumer-driven health plan | Varies | No | n/a | Varies | No |
9. High-Deductible health plan | Varies | No | n/a | Varies | No |
Appendix A.2. Selection of Batch Size and Epochs
Model | Batch Size | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
Data | 0.0027 | |||
Pois. GLM | 2.6811 | 2.5516 | 0.0027 | |
NN (100,75,50) | 10,000 | 2.0815 | 4.3194 | 0.0027 |
NN (75,50,25) | 10,000 | 2.2348 | 3.3182 | 0.003 |
NN (50,35,25) | 10,000 | 2.3546 | 3.0803 | 0.0029 |
NN (35,25,20) | 10,000 | 2.4291 | 2.9139 | 0.0028 |
NN (25,20,15) | 10,000 | 2.5097 | 2.8273 | 0.0029 |
NN (20,15,10) | 10,000 | 2.5533 | 2.6676 | 0.0028 |
NN (15,10,5) | 10,000 | 2.5902 | 2.6593 | 0.0028 |
NN (100,75,50) | 30,000 | 2.1473 | 3.4493 | 0.0028 |
NN (75,50,25) | 30,000 | 2.3359 | 2.9862 | 0.0027 |
NN (50,35,25) | 30,000 | 2.4454 | 2.8612 | 0.0028 |
NN (35,25,20) | 30,000 | 2.5116 | 2.6836 | 0.0028 |
NN (25,20,15) | 30,000 | 2.5618 | 2.6794 | 0.0027 |
NN (20,15,10) | 30,000 | 2.5917 | 2.6164 | 0.0026 |
NN (15,10,5) | 30,000 | 2.619 | 2.5962 | 0.0026 |
Model | Batch Size | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
NN (100,75,50) | 50,000 | 2.2604 | 3.2272 | 0.0029 |
NN (75,50,25) | 50,000 | 2.4708 | 2.7257 | 0.0028 |
NN (50,35,25) | 50,000 | 2.5367 | 2.6943 | 0.0028 |
NN (35,25,20) | 50,000 | 2.5768 | 2.634 | 0.0029 |
NN (25,20,15) | 50,000 | 2.6049 | 2.6244 | 0.0029 |
NN (20,15,10) | 50,000 | 2.6146 | 2.6135 | 0.0029 |
NN (15,10,5) | 50,000 | 2.6316 | 2.5788 | 0.0028 |
NN (100,75,50) | 75,000 | 2.4316 | 2.8451 | 0.0031 |
NN (75,50,25) | 75,000 | 2.5442 | 2.6493 | 0.0028 |
NN (50,35,25) | 75,000 | 2.5761 | 2.635 | 0.0029 |
NN (35,25,20) | 75,000 | 2.6098 | 2.601 | 0.0029 |
NN (25,20,15) | 75,000 | 2.6376 | 2.5731 | 0.0027 |
NN (20,15,10) | 75,000 | 2.638 | 2.5725 | 0.0027 |
NN (15,10,5) | 75,000 | 2.6443 | 2.5701 | 0.0027 |
NN (100,75,50) | 100,000 | 2.5945 | 2.6077 | 0.0029 |
NN (75,50,25) | 100,000 | 2.6057 | 2.5973 | 0.0029 |
NN (50,35,25) | 100,000 | 2.6376 | 2.564 | 0.0027 |
NN (35,25,20) | 100,000 | 2.6492 | 2.545 | 0.0027 |
NN (25,20,15) | 100,000 | 2.6514 | 2.5559 | 0.0028 |
NN (20,15,10) | 100,000 | 2.6485 | 2.5516 | 0.0027 |
NN (15,10,5) | 100,000 | 2.6551 | 2.548 | 0.0026 |
NN (100,75,50) | 175,000 | 2.6591 | 2.5389 | 0.0028 |
NN (75,50,25) | 175,000 | 2.6635 | 2.5484 | 0.0028 |
NN (50,35,25) | 175,000 | 2.6666 | 2.5642 | 0.003 |
NN (35,25,20) | 175,000 | 2.6725 | 2.5597 | 0.0029 |
NN (25,20,15) | 175,000 | 2.6613 | 2.5418 | 0.0028 |
NN (20,15,10) | 175,000 | 2.6662 | 2.5525 | 0.0028 |
NN (15,10,5) | 175,000 | 2.665 | 2.5492 | 0.0028 |
NN (100,75,50) | 250,000 | 2.6743 | 2.5565 | 0.0027 |
NN (75,50,25) | 250,000 | 2.6757 | 2.5581 | 0.0028 |
NN (50,35,25) | 250,000 | 2.6754 | 2.5542 | 0.0028 |
NN (35,25,20) | 250,000 | 2.6777 | 2.5579 | 0.0028 |
NN (25,20,15) | 250,000 | 2.6767 | 2.5592 | 0.0028 |
NN (20,15,10) | 250,000 | 2.6836 | 2.5612 | 0.0028 |
NN (15,10,5) | 250,000 | 2.7349 | 2.6092 | 0.0032 |
NN (100,75,50) | 500,000 | 2.6823 | 2.5558 | 0.0028 |
NN (75,50,25) | 500,000 | 2.6871 | 2.5631 | 0.0028 |
NN (50,35,25) | 500,000 | 2.6877 | 2.5603 | 0.0028 |
NN (35,25,20) | 500,000 | 2.69 | 2.564 | 0.0028 |
NN (25,20,15) | 500,000 | 2.6934 | 2.5642 | 0.0028 |
NN (20,15,10) | 500,000 | 2.7624 | 2.6288 | 0.0029 |
NN (15,10,5) | 500,000 | 2.8025 | 2.6763 | 0.0029 |
NN (100,75,50) | 750,000 | 2.6825 | 2.5552 | 0.0028 |
NN (75,50,25) | 750,000 | 2.6892 | 2.5605 | 0.0028 |
NN (50,35,25) | 750,000 | 2.6889 | 2.56 | 0.0028 |
NN (35,25,20) | 750,000 | 2.7007 | 2.5684 | 0.0028 |
NN (25,20,15) | 750,000 | 2.7646 | 2.6296 | 0.003 |
NN (20,15,10) | 750,000 | 2.7947 | 2.6707 | 0.0033 |
NN (15,10,5) | 750,000 | 2.9142 | 2.8013 | 0.0051 |
Model | Epochs | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
Data | 0.0027 | |||
Pois.GLM | 2.6811 | 2.5516 | 0.0027 | |
NN (100,75,50) | 100 | 2.6866 | 2.563 | 0.0031 |
NN (75,50,25) | 100 | 2.6836 | 2.5606 | 0.003 |
NN (50,35,25) | 100 | 2.6869 | 2.5649 | 0.0032 |
NN (35,25,20) | 100 | 2.6851 | 2.5611 | 0.0028 |
NN (25,20,15) | 100 | 2.6917 | 2.568 | 0.0028 |
NN (20,15,10) | 100 | 2.696 | 2.5687 | 0.0028 |
NN (15,10,5) | 100 | 2.8045 | 2.6778 | 0.0029 |
NN (100,75,50) | 250 | 2.6518 | 2.5527 | 0.0026 |
NN (75,50,25) | 250 | 2.6589 | 2.5448 | 0.0028 |
NN (50,35,25) | 250 | 2.6599 | 2.5454 | 0.0026 |
NN (35,25,20) | 250 | 2.6615 | 2.5379 | 0.0028 |
NN (25,20,15) | 250 | 2.6605 | 2.5404 | 0.0026 |
NN (20,15,10) | 250 | 2.6627 | 2.541 | 0.0024 |
NN (15,10,5) | 250 | 2.6618 | 2.5396 | 0.0025 |
NN (100,75,50) | 500 | 2.4575 | 2.7481 | 0.0029 |
NN (75,50,25) | 500 | 2.5440 | 2.6737 | 0.0028 |
NN (50,35,25) | 500 | 2.6074 | 2.5824 | 0.0027 |
NN (35,25,20) | 500 | 2.6256 | 2.5703 | 0.0028 |
NN (25,20,15) | 500 | 2.6288 | 2.5703 | 0.0027 |
NN (20,15,10) | 500 | 2.6452 | 2.5683 | 0.0028 |
NN (15,10,5) | 500 | 2.6459 | 2.5569 | 0.0028 |
NN (100,75,50) | 1000 | 2.1653 | 3.4808 | 0.0028 |
NN (75,50,25) | 1000 | 2.3453 | 3.0112 | 0.0029 |
NN (50,35,25) | 1000 | 2.4357 | 2.8469 | 0.0027 |
NN (35,25,20) | 1000 | 2.5345 | 2.7145 | 0.0027 |
NN (25,20,15) | 1000 | 2.5661 | 2.6647 | 0.0027 |
NN (20,15,10) | 1000 | 2.5983 | 2.5962 | 0.0027 |
NN (15,10,5) | 1000 | 2.6194 | 2.5996 | 0.0028 |
NN (100,75,50) | 1500 | 2.1123 | 4.2328 | 0.0029 |
NN (75,50,25) | 1500 | 2.2884 | 3.4217 | 0.003 |
NN (50,35,25) | 1500 | 2.3753 | 2.9564 | 0.0029 |
NN (35,25,20) | 1500 | 2.4651 | 2.8431 | 0.0029 |
NN (25,20,15) | 1500 | 2.5323 | 2.7397 | 0.0028 |
NN (20,15,10) | 1500 | 2.5565 | 2.6925 | 0.0028 |
NN (15,10,5) | 1500 | 2.6032 | 2.6234 | 0.0027 |
NN (100,75,50) | 2000 | 2.105 | 4.8078 | 0.0029 |
NN (75,50,25) | 2000 | 2.2302 | 3.631 | 0.0029 |
NN (50,35,25) | 2000 | 2.3314 | 3.1311 | 0.0029 |
NN (35,25,20) | 2000 | 2.4312 | 2.923 | 0.0028 |
NN (25,20,15) | 2000 | 2.4837 | 2.764 | 0.0027 |
NN (20,15,10) | 2000 | 2.5611 | 2.6882 | 0.0026 |
NN (15,10,5) | 2000 | 2.5997 | 2.6522 | 0.0027 |
Model | Batch Size | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
Data | 0.0027 | |||
Pois.GLM | 2.6811 | 2.5516 | 0.0027 | |
NN (75,50,25) | 10,000 | 2.5453 | 2.6831 | 0.0031 |
NN (50,35,25) | 10,000 | 2.5917 | 2.6302 | 0.003 |
NN (35,25,20) | 10,000 | 2.6184 | 2.6045 | 0.003 |
NN (25,20,15) | 10,000 | 2.6305 | 2.5834 | 0.0029 |
NN (20,15,10) | 10,000 | 2.631 | 2.574 | 0.0029 |
NN (15,10,5) | 10,000 | 2.6469 | 2.561 | 0.0028 |
NN (75,50,25) | 30,000 | 2.6565 | 2.5493 | 0.0028 |
NN (50,35,25) | 30,000 | 2.6546 | 2.5515 | 0.0027 |
NN (35,25,20) | 30,000 | 2.6594 | 2.5442 | 0.0027 |
NN (25,20,15) | 30,000 | 2.6594 | 2.5463 | 0.0027 |
NN (20,15,10) | 30,000 | 2.6602 | 2.5402 | 0.0025 |
NN (15,10,5) | 30,000 | 2.6657 | 2.568 | 0.0025 |
NN (75,50,25) | 50,000 | 2.6751 | 2.5518 | 0.0027 |
NN (50,35,25) | 50,000 | 2.674 | 2.5484 | 0.0027 |
NN (35,25,20) | 50,000 | 2.6748 | 2.5572 | 0.0026 |
NN (25,20,15) | 50,000 | 2.6758 | 2.5533 | 0.0026 |
NN (20,15,10) | 50,000 | 2.6786 | 2.556 | 0.0026 |
NN (15,10,5) | 50,000 | 2.6824 | 2.5618 | 0.0027 |
NN (75,50,25) | 75,000 | 2.6824 | 2.5575 | 0.0027 |
NN (50,35,25) | 75,000 | 2.6773 | 2.5551 | 0.0027 |
NN (35,25,20) | 75,000 | 2.6851 | 2.5633 | 0.0027 |
NN (25,20,15) | 75,000 | 2.6857 | 2.5626 | 0.0028 |
NN (20,15,10) | 75,000 | 2.6844 | 2.5642 | 0.0027 |
NN (15,10,5) | 75,000 | 2.7231 | 2.5933 | 0.003 |
NN (75,50,25) | 100,000 | 2.6839 | 2.5603 | 0.0027 |
NN (50,35,25) | 100,000 | 2.6831 | 2.5586 | 0.0027 |
NN (35,25,20) | 100,000 | 2.69 | 2.5652 | 0.0028 |
NN (25,20,15) | 100,000 | 2.6877 | 2.5642 | 0.0028 |
NN (20,15,10) | 100,000 | 2.7066 | 2.5724 | 0.0029 |
NN (15,10,5) | 100,000 | 2.7639 | 2.6299 | 0.003 |
NN (75,50,25) | 175,000 | 2.686 | 2.5582 | 0.0028 |
NN (50,35,25) | 175,000 | 2.6912 | 2.5579 | 0.0028 |
NN (35,25,20) | 175,000 | 2.6896 | 2.5591 | 0.0028 |
NN (25,20,15) | 175,000 | 2.7061 | 2.5734 | 0.0028 |
NN (20,15,10) | 175,000 | 2.7658 | 2.6331 | 0.0029 |
NN (15,10,5) | 175,000 | 2.9795 | 2.8704 | 0.0059 |
Model | Batch Size | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
NN (75,50,25) | 250,000 | 2.6985 | 2.566 | 0.0028 |
NN (50,35,25) | 250,000 | 2.702 | 2.5674 | 0.0028 |
NN (35,25,20) | 250,000 | 2.7312 | 2.5956 | 0.003 |
NN (25,20,15) | 250,000 | 2.7579 | 2.6222 | 0.003 |
NN (20,15,10) | 250,000 | 2.8054 | 2.6816 | 0.0033 |
NN (15,10,5) | 250,000 | 2.8716 | 2.7557 | 0.0046 |
NN (75,50,25) | 500,000 | 2.721 | 2.5869 | 0.0029 |
NN (50,35,25) | 500,000 | 2.7048 | 2.5666 | 0.0028 |
NN (35,25,20) | 500,000 | 2.7868 | 2.6548 | 0.003 |
NN (25,20,15) | 500,000 | 2.7821 | 2.6511 | 0.0031 |
NN (20,15,10) | 500,000 | 2.8502 | 2.7323 | 0.0042 |
NN (15,10,5) | 500,000 | 3.7937 | 3.7045 | 0.0131 |
NN (75,50,25) | 750,000 | 2.7237 | 2.5909 | 0.003 |
NN (50,35,25) | 750,000 | 2.7492 | 2.6152 | 0.0029 |
NN (35,25,20) | 750,000 | 2.754 | 2.6209 | 0.0032 |
NN (25,20,15) | 750,000 | 2.8334 | 2.7084 | 0.0039 |
NN (20,15,10) | 750,000 | 2.8275 | 2.7067 | 0.0041 |
NN (15,10,5) | 750,000 | 6.4324 | 6.3651 | 0.0317 |
Appendix A.3. Code
Listing A1: Code for implementing . |
Listing A2: Code for implementing ridge regularisation with . |
Listing A3: Code for implementing early stopping using callback. |
Listing A4: Code for implementing dropout with dropout rate . |
Listing A5: Code for implementing the GLM bias regularisation approach for Poisson neuralnetwork model. |
References
- Allaire, Joseph J, and François Chollet. 2021. Keras: R Interface to ‘Keras’, R Package Version 2.7.0; Available online: https://CRAN.R-project.org/package=keras (accessed on 9 January 2022).
- Allaire, Joseph J., and Yuan Tang. 2021. Tensorflow: R Interface to ‘TensorFlow’, R Package Version 2.6.0; Available online: https://CRAN.R-project.org/package=tensorflow (accessed on 9 January 2022).
- Arık, Ayşe, Erengul Dodd, Andrew Cairns, and George Streftaris. 2021. Socioeconomic disparities in cancer incidence and mortality in england and the impact of age-at-diagnosis on cancer mortality. PLoS ONE 16: e0253854. [Google Scholar] [CrossRef] [PubMed]
- Arlot, Sylvain, and Alain Celisse. 2010. A survey of cross-validation procedures for model selection. Statistics Surveys 4: 40–79. [Google Scholar] [CrossRef]
- Aveyard, Paul, Min Gao, Nicola Lindson, Jamie Hartmann-Boyce, Peter Watkinson, Duncan Young, Carol A. C. Coupland, Pui San Tan, Ashley K. Clift, David Harrison, and et al. 2021. Association between pre-existing respiratory disease and its treatment, and severe COVID-19: A population cohort study. The Lancet Respiratory Medicine 9: 909–23. [Google Scholar] [CrossRef]
- Bengio, Yoshua, Réjean Ducharme, and Pascal Vincent. 2000. A neural probabilistic language model. In Advances in Neural Information Processing Systems 13. Cambridge: MIT Press. [Google Scholar]
- Blanc, Paul D., Isabella Annesi-Maesano, John R. Balmes, Kristin J. Cummings, David Fishwick, David Miedinger, Nicola Murgia, Rajen N. Naidoo, Carl J. Reynolds, Torben Sigsgaard, and et al. 2019. The occupational burden of nonmalignant respiratory diseases. an official american thoracic society and european respiratory society statement. American Journal of Respiratory and Critical Care Medicine 199: 1312–34. [Google Scholar] [CrossRef] [PubMed]
- Blier-Wong, Christopher, Hélène Cossette, Luc Lamontagne, and Etienne Marceau. 2020. Machine learning in P&C insurance: A review for pricing and reserving. Risks 9: 4. [Google Scholar] [CrossRef]
- Bousquet, Jean, Nikolai Khaltaev, and Alvaro A. Cruz. 2007. Global Surveillance, Prevention and Control of Chronic Respiratory Diseases. Geneva: World Health Organization. [Google Scholar]
- CDC. 2012. Chronic obstructive pulmonary disease among adults-United States, 2011. Morbidity and Mortality Weekly Report 61: 938–43. [Google Scholar]
- CDC. 2016. ICD-10-CM International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). Available online: https://www.cdc.gov/nchs/icd/icd-10-cm.htm (accessed on 12 December 2021).
- De Jong, Piet, and Gillian Z. Heller. 2008. Generalized Linear Models for Insurance Data. Cambridge: Cambridge University Press. [Google Scholar]
- Doney, Brent, Eva Hnizdo, Girija Syamlal, Greg Kullman, Cecil Burchfiel, Christopher J. Martin, and Priscah Mujuru. 2014. Prevalence of chronic obstructive pulmonary disease among us working adults aged 40 to 70 years: National health interview survey data 2004 to 2011. Journal of Occupational and Environmental Medicine/American College of Occupational and Environmental Medicine 56: 1088. [Google Scholar] [CrossRef] [Green Version]
- Dozat, Timothy. 2016. Incorporating nesterov momentum into adam. Paper presented at the 4th International Conference on Learning Representations, San Juan, Puerto Rico, May 2–4. [Google Scholar]
- Ferrario, Andrea, Alexander Noll, and Mario V. Wuthrich. 2020. Insights from inside neural networks. SSRN. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3226852 (accessed on 20 November 2021).
- Frees, Edward W. 2009. Regression Modeling with Actuarial and Financial Applications. Cambridge: Cambridge University Press. [Google Scholar]
- Geisser, Seymour. 1975. The predictive sample reuse method with applications. Journal of the American Statistical Association 70: 320–28. [Google Scholar] [CrossRef]
- Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Cambridge: MIT Press. [Google Scholar]
- Haberman, Steven, and Arthur E. Renshaw. 1996. Generalized linear models and actuarial science. Journal of the Royal Statistical Society: Series D (The Statistician) 45: 407–36. [Google Scholar] [CrossRef]
- Hardin, James W., James William Hardin, Joseph M. Hilbe, and Joseph Hilbe. 2007. Generalized Linear Models and Extensions. College Station: Stata Press. [Google Scholar]
- Hastie, Trevor, Robert Tibshirani, Jerome H. Friedman, and Jerome H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, vol. 2. [Google Scholar]
- Hilbe, Joseph M. 2011. Negative Binomial Regression. Cambridge: Cambridge University Press. [Google Scholar]
- Jung, Yoonsuh. 2018. Multiple predicting k-fold cross-validation for model selection. Journal of Nonparametric Statistics 30: 197–215. [Google Scholar] [CrossRef]
- LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521: 436–44. [Google Scholar] [CrossRef] [PubMed]
- Nelder, John Ashworth, and Robert W. M. Wedderburn. 1972. Generalized linear models. Journal of the Royal Statistical Society: Series A (General) 135: 370–84. [Google Scholar] [CrossRef]
- Ohlsson, Esbjörn, and Björn Johansson. 2010. Non-Life Insurance Pricing with Generalized Linear Models. Berlin: Springer, vol. 2. [Google Scholar]
- Ozkok, Erengul, George Streftaris, Howard R. Waters, and A. David Wilkie. 2014. Modelling critical illness claim diagnosis rates I: Methodology. Scandinavian Actuarial Journal 2014: 439–57. [Google Scholar] [CrossRef]
- Prechelt, Lutz. 1998. Early stopping-but when? In Neural Networks: Tricks of the Trade. Berlin: Springer, pp. 5–69. [Google Scholar]
- R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
- Richman, Ronald, and Mario V. Wüthrich. 2020. Nagging predictors. Risks 8: 83. [Google Scholar] [CrossRef]
- Richman, Ronald, and Mario V. Wüthrich. 2021. A neural network extension of the Lee–Carter model to multiple populations. Annals of Actuarial Science 15: 346–66. [Google Scholar] [CrossRef]
- Richman, Ronald, and Mario V. Wüthrich. 2022. LocalGLMnet: Interpretable Deep Learning for Tabular Data. Scandinavian Actuarial Journal, 1–25. [Google Scholar] [CrossRef]
- Rigby, Robert A., and D. Mikis Stasinopoulos. 2005. Generalized additive models for location, scale and shape. Applied Statistics 54: 507–54. [Google Scholar] [CrossRef] [Green Version]
- RStudio Team. 2021. RStudio: Integrated Development Environment for R. Boston: RStudio, PBC. [Google Scholar]
- Ruder, Sebastian. 2016. An overview of gradient descent optimization algorithms. arXiv arXiv:1609.04747. [Google Scholar]
- Russell, Stuart J. 2010. Artificial Intelligence a Modern Approach. Upper Saddle River: Pearson Education, Inc. [Google Scholar]
- Schelldorfer, Jürg, and Mario V. Wuthrich. 2019. Nesting classical actuarial models into neural networks. SSRN. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3320525 (accessed on 15 December 2021).
- Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15: 1929–58. [Google Scholar]
- Tzougas, George, and Ziyi Li. 2021. Neural Network Embedding of the Mixed Poisson Regression Model for Claim Counts. Available online: https://insurancedatascience.org/project/2021_london/ (accessed on 9 January 2022).
- WHO. 2022a. Asthma and COVID-19: Scientific Brief. Available online: https://www.who.int/publications-detail-redirect/who-2019-ncov-sci-brief-asthma-2021.1 (accessed on 19 April 2022).
- WHO. 2022b. Heath Topics-Chronic Respiratory Diseases. Available online: https://www.who.int/health-topics/chronic-respiratory-diseases#tab=tab_1 (accessed on 6 June 2022).
- Wüthrich, Mario V. 2020. Bias regularization in neural network models for general insurance pricing. European Actuarial Journal 10: 179–202. [Google Scholar] [CrossRef]
- Wüthrich, Mario V. 2021. The balance property in neural network modelling. Statistical Theory and Related Fields 6: 1–9. [Google Scholar] [CrossRef]
Variable | Description | Comment | Categories |
---|---|---|---|
ENROLID | Unique ID for individual | ID variable | - |
AGE | Age of the last birthday of the individual | ∈ | - |
SEX | Gender of the individual | Factor w/2 categories | 1: Male, 2: Female |
EMPREL | Relation to the primary beneficiary | Factor w/3 categories | 1: Employee, 2: Spouse, 3: Child/Other |
PLANTYP | Type of health plan individual is part of | Factor w/8 categories | 2: Comprehensive Plan, 3: Exclusive Provider Organization Plan, 4: Health Maintenance Organization Plan, 5: Non-Capitated (Non-Cap) Point-of-Service, 6: Preferred Provider Organization Plan, 7: Capitated (Cap) or Partially Capitated (PartCap) Point-of-Service Plan, 8: Consumer-Driven Health Plan, 9: High-Deductible Health Plan |
REGION | Geographical region of residence | Factor w/5 categories | 1: Northeast, 2: North Central, 3: South, 4: West, 5: Unknown |
EGEOLOC | Geographic location based on postal code of individual’s residence | Factor w/53 categories | See Table A2 |
UR | Urban/ rural indicator based on individual’s residence | Factor w/2 categories | 1: Rural, 2: Urban |
EECLASS | Employee classification | Factor w/9 categories | 1: Salary Non-union, 2: Salary Union, 3: Salary Other, 4: Hourly Non-union, 5: Hourly Union, 6: Hourly Other, 7: Non-union, 8: Union, 9: Unknown |
EESTATU | Status of employment | Factor w/9 categories | 1: Active Full Time, 2: Active Part Time or Seasonal, 3: Early Retiree, 4: Medicare Eligible Retiree, 5: Retiree (status unknown), 6: Comprehensive Omnibus Budget Reconciliation Act (COBRA) Continuee, 7: Long-Term Disability, 8: Surviving Spouse/Depend, 9: Unknown |
INDSTRY | Industry where the primary beneficiary is employed in | Factor w/10 categories | 1: Oil & Gas Extraction, Mining, 2: Manufacturing, Durable Goods, 3: Manufacturing, Nondurable Goods, 4: Transportation, Communications, Utilities, 5: Retail Trade, 6: Finance, Insurance, Real Estate, 7: Services, A: Agriculture, Forestry, Fishing, C: Construction, W: Wholesale |
HLTHPLAN | Whether the data are provided by the employer or a health plan | Factor w/2 categories | 0: Employer, 1: Health plan |
DATATYP | Whether the plan is on reimbursement or capitation basis | Factor w/2 categories | 1: Fee for service, 2: Encounter |
EXPOSURE | Period of enrollment- yearly exposure | ∈ | - |
Number of Admissions | Frequency |
---|---|
0 | 2,046,167 |
1 | 3527 |
2 | 292 |
3 | 77 |
4 | 14 |
5 | 14 |
6 | 7 |
7 | 1 |
8 | 1 |
Exposure Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
relative no. of records | 5.07% | 3.39% | 3.24% | 2.87% | 5.03% | 4.40% | 2.36% | 2.58% | 2.40% | 68.66% |
empirical frequency | 0.85% | 0.71% | 0.60% | 0.48% | 0.53% | 0.35% | 0.42% | 0.36% | 0.39% | 0.24% |
Model | Epochs | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
NN (25,20,15) | 100 | 2.8021 | 2.6751 | 0.0033 |
NN (25,20,15) | 250 | 2.7207 | 2.5843 | 0.0029 |
NN (25,20,15) | 500 | 2.6910 | 2.5668 | 0.0028 |
NN (25,20,15) | 1000 | 2.6713 | 2.5621 | 0.0028 |
NN (25,20,15) | 1500 | 2.6509 | 2.5480 | 0.0029 |
NN (25,20,15) | 2000 | 2.6212 | 2.5770 | 0.0028 |
NN (20,15,10) | 100 | 2.8200 | 2.6968 | 0.0038 |
NN (20,15,10) | 250 | 2.7433 | 2.6092 | 0.0031 |
NN (20,15,10) | 500 | 2.6983 | 2.5707 | 0.0029 |
NN (20,15,10) | 1000 | 2.6719 | 2.5634 | 0.0027 |
NN (20,15,10) | 1500 | 2.6505 | 2.5582 | 0.0028 |
NN (20,15,10) | 2000 | 2.6420 | 2.5872 | 0.0028 |
NN (15,10,5) | 100 | 3.3765 | 3.2795 | 0.0097 |
NN (15,10,5) | 250 | 2.8166 | 2.6936 | 0.0035 |
NN (15,10,5) | 500 | 2.7434 | 2.6171 | 0.0033 |
NN (15,10,5) | 1000 | 2.6704 | 2.5641 | 0.0028 |
NN (15,10,5) | 1500 | 2.6517 | 2.5407 | 0.0027 |
NN (15,10,5) | 2000 | 2.6457 | 2.5723 | 0.0028 |
Model | Value | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
NN (25,20,15) | 2.6614 | 2.5354 | 0.0027 | |
NN (25,20,15) | 2.8052 | 2.6787 | 0.0029 | |
NN (25,20,15) | 2.8045 | 2.6767 | 0.0027 | |
NN (25,20,15) | 2.6827 | 2.5536 | 0.0028 | |
NN (25,20,15) | 2.6614 | 2.5354 | 0.0027 | |
NN (20,15,10) | 2.6584 | 2.5380 | 0.0027 | |
NN (20,15,10) | 2.8049 | 2.6781 | 0.0028 | |
NN (20,15,10) | 2.8045 | 2.6767 | 0.0027 | |
NN (20,15,10) | 2.6817 | 2.5550 | 0.0028 | |
NN (20,15,10) | 2.6585 | 2.5380 | 0.0027 |
Model | Epochs | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
NN (25,20,15) | early stopped | 2.6654 | 2.5456 | 0.0025 |
NN (25,20,15) | 1000 epochs | 2.5635 | 2.6704 | 0.0028 |
NN (20,15,10) | early stopped | 2.6623 | 2.5403 | 0.0025 |
NN (20,15,10) | 1000 epochs | 2.5865 | 2.6216 | 0.0027 |
Model | Dropout Rate | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
Data | 0.0027 | |||
NN (25,20,15) | no dropout | 2.6581 | 2.5416 | 0.0027 |
NN (25,20,15) | p = 1% | 2.6563 | 2.5437 | 0.0025 |
NN (25,20,15) | p = 2% | 2.6571 | 2.5418 | 0.0025 |
NN (25,20,15) | p = 5% | 2.6622 | 2.5484 | 0.0023 |
NN (25,20,15) | p = 10% | 2.6708 | 2.5441 | 0.0022 |
NN (20,15,10) | no dropout | 2.6625 | 2.5458 | 0.0026 |
NN (20,15,10) | p = 1% | 2.6594 | 2.5478 | 0.0025 |
NN (20,15,10) | p = 2% | 2.6606 | 2.5411 | 0.0024 |
NN (20,15,10) | p = 5% | 2.6726 | 2.5554 | 0.0022 |
NN (20,15,10) | p = 10% | 2.6756 | 2.5566 | 0.0021 |
Model | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|
Data | 0.0027 | ||
Pois. reg | 2.6835 | 2.5388 | 0.0027 |
NN (20,15,10) | 2.6726 | 2.5160 | 0.0028 |
CANN (20,15,10) | 2.6708 | 2.5118 | 0.0027 |
NN (25,20,15) | 2.6686 | 2.5171 | 0.0026 |
CANN (25,20,15) | 2.6682 | 2.5181 | 0.0028 |
NB.reg | 1.0599 | 1.0599 | 0.0028 |
NN (20,15,10) | 1.0381 | 1.0251 | 0.0028 |
CANN (20,15,10) | 1.0435 | 1.0168 | 0.0028 |
NN (25,20,15) | 1.0424 | 1.0298 | 0.0029 |
CANN (25,20,15) | 1.0454 | 1.0150 | 0.0029 |
Model | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|
Data | 0.0027 | ||
NB.reg | 1.0599 | 1.0599 | 0.0028 |
NB.reg w/bias regu | 1.0600 | 1.0595 | 0.0027 |
NN (20,15,10) | 1.0381 | 1.0251 | 0.0028 |
NN (20,15,10) w/bias regu | 1.0379 | 1.0244 | 0.0027 |
CANN (20,15,10) | 1.0435 | 1.0168 | 0.0028 |
CANN (20,15,10) w/bias regu | 1.0431 | 1.0176 | 0.0027 |
NN (25,20,15) | 1.0424 | 1.0298 | 0.0029 |
NN (25,20,15) w/bias regu | 1.0418 | 1.0271 | 0.0027 |
CANN (25,20,15) | 1.0454 | 1.0150 | 0.0029 |
CANN (25,20,15) w/bias regu | 1.0442 | 1.0115 | 0.0027 |
Model | Index M | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
NN (20,15,10) | M = 50 | 1.0570 | 1.0647 | 0.0027 |
CANN (20,15,10) | M = 50 | 1.0605 | 1.0503 | 0.0027 |
Model | Index M | Learning Loss | Testing Loss | Portfolio Average |
---|---|---|---|---|
NB.reg | 1.0664 | 1.0807 | 0.0027 | |
NN (20,15,10) | M = 25 | 1.0460 | 1.0645 | 0.0027 |
CANN (20,15,10) | M = 25 | 1.0563 | 1.0722 | 0.0027 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jose, A.; Macdonald, A.S.; Tzougas, G.; Streftaris, G. A Combined Neural Network Approach for the Prediction of Admission Rates Related to Respiratory Diseases. Risks 2022, 10, 217. https://doi.org/10.3390/risks10110217
Jose A, Macdonald AS, Tzougas G, Streftaris G. A Combined Neural Network Approach for the Prediction of Admission Rates Related to Respiratory Diseases. Risks. 2022; 10(11):217. https://doi.org/10.3390/risks10110217
Chicago/Turabian StyleJose, Alex, Angus S. Macdonald, George Tzougas, and George Streftaris. 2022. "A Combined Neural Network Approach for the Prediction of Admission Rates Related to Respiratory Diseases" Risks 10, no. 11: 217. https://doi.org/10.3390/risks10110217
APA StyleJose, A., Macdonald, A. S., Tzougas, G., & Streftaris, G. (2022). A Combined Neural Network Approach for the Prediction of Admission Rates Related to Respiratory Diseases. Risks, 10(11), 217. https://doi.org/10.3390/risks10110217