Next Article in Journal
The Interplay of Leverage, Financing Constraints and Real Earnings Management: A Panel Data Approach
Next Article in Special Issue
The Copula Derived from the SAHARA Utility Function
Previous Article in Journal
Did the Islamic Stock Index Provide Shelter for Investors during the COVID-19 Crisis? Evidence from an Emerging Stock Market
Previous Article in Special Issue
A Bridge Life Insurance for Households—Diagnosis and Motives
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Class of Counting Distributions Embedded in the Lee–Carter Model for Mortality Projections: A Bayesian Approach

1
Sakhnin Academic College, Galilee St. 100, Sakhnin 30810, Israel
2
Faculty of Industrial Engineering and Technology Management, HIT—Holon Institute of Technology, 52 Golomb Street, Holon 5810201, Israel
3
Actuarial Research Center, University of Haifa, 199 Aba Khoushy Ave. Mount Carmel, Haifa 3498838, Israel
*
Author to whom correspondence should be addressed.
Risks 2022, 10(6), 111; https://doi.org/10.3390/risks10060111
Submission received: 24 March 2022 / Revised: 13 May 2022 / Accepted: 18 May 2022 / Published: 27 May 2022
(This article belongs to the Special Issue Actuarial Mathematics and Risk Management)

Abstract

:
The Lee–Carter model, the dominant mortality projection modeling in the literature, was criticized for its homoscedastic error assumption. This was corrected in extensions to the model based on the assumption that the number of deaths follows Poisson or negative binomial distributions. We propose a new class of families of counting distributions, namely, the ABM class, which belongs to a wider class of natural exponential families. This class is characterized by its variance functions and contains the Poisson and the negative binomial distributions as special cases, offering an infinite class of additional counting distributions to be considered. We are guided by the principle that the choice of distribution should be made from a pool of distributions as large as possible. To this end, and following a data mining approach, a training set of historical mortality data of the population could be modeled using the ABM’s rich choice of distributions, and the chosen distribution should be the one that proved to offer superior projection results on a test set of mortality data. As an alternative to parameter estimation via the singular value decomposition used in the classical Lee–Carter model, we adopted Bayesian estimation, harnessing the Markov Chain Monte Carlo methodology. A numerical study demonstrates that when fitting mortality data using this new class of distributions, while traditional distributions may provide desirable projections for some populations, for others, alternative distributions within the ABM class can potentially produce superior results for the entire population or particular age groups, such as the oldest-old.

1. Introduction

The seminal paper by Lee and Carter (1992) (LC) introduced a model which is one of the most well-known and widely applied models for forecasting mortality rates. Within this model, the time series of the log mortality rates, ln m x t , of each age is described by an age-specific intercept α x plus a common trend k t for all age groups multiplied by an age-specific coefficient β x ,
ln m x t = α x + β x k t + ε x t .
The error term ε x t is assumed to be distributed with a mean 0 and variance σ ε 2 , reflecting influences missed by the model. The age and time-specific mortality rate m x t is calculated as ( D x t / E x t ) , where D x t denotes the number of deaths in a population at age x , x = 1 , 2 , , P , and at time t , t = 1 , 2 , , T , and E x t is the exposure to the risk of death. To ensure the identifiably of model parameters, constraints are imposed such that the sum of β x over age is 1 and the sum of k t over time is 0. To forecast mortality rates into the future, a simple random walk with drift is proposed for k t :
k t = k t 1 + θ + w t .
The homoscedastic error assumption of the Lee–Carter model was criticized for its limiting impact on predictions (Brouhns et al. 2002; Danesi et al. 2015; Idrizi 2018). This led to the introduction of the Poisson log-bilinear LC-type model (Brouhns et al. 2002), which, in contrast, is intrinsically heteroscedastic, namely:
D x t P o i s s o n ( μ x t ) , μ x t = E x t m x t .
Here, the number of dead is directly modelled by a Poisson distribution, whose parameter is estimated by maximum likelihood estimation (MLE). This alternative approach gained momentum and alternative discrete distributions were proposed. In particular, a binomial distribution was proposed by Wang and Lu (2005) and a negative binomial distribution was suggested by Delwarde et al. (2007) and Renshaw and Haberman (2008). See Azman and Pathmanathan (2022) for further discussion of these distributions within the GLM framework.
The Lee–Carter model has been widely used for many purposes (Shair et al. 2018), such as forecasting mortality reduction factors, assessing the adequacy of retirement income, population projections and the projection of mortality trends for the oldest-old (older than 80, and in some sources 85). This age group is of considerable interest for policymakers as it is destined to grow as a proportion of the entire population and can outstrip existing infrastructures’ capacity (Buettner 2002). This is a fairly recent phenomenon. In Canada, for instance (Legare et al. 2015), the 21st century brought about the most significant gain in life expectancy at age 85 (7.79% for women and 9.93% for men). Clearly, policies need to be devised that can meet people’s special needs in what is called the fourth age Baltes and Smith (2003), and accurate mortality projection for this age group is a must. We shall, therefore, focus on the adequacy of potential underlying discrete distribution functions to produce accurate mortality projections using the Lee–Carter model for this age group. This will be discussed in Section 4.
In essence, while the above-cited papers relied on popular and commonly used discrete distributions, one cannot say that one particular distribution is universally superior. Indeed, it is entirely plausible to assume that there is a winning distribution for any given population or even for a specific population at a certain age range. Ideally, one should consider a rich class of family of counting distributions, much richer than the two already suggested, and use the data to pick the most suitable distribution for the population under study. This paper proposes an infinitely countable set of families of counting distributions, where the Poisson, negative binomial and Abel families of distributions are special cases. Our aim is to study this family, incorporate it into the framework of the LC model and use real data to seek the most suitable distribution for mortality projection. While there is little doubt that the distributions discussed above could prove adequate for specific populations or age groups, other distributions within the suggested family could have the upper hand.
The paper is organized as follows. Section 2 presents the new class of counting distributions. Section 3 is devoted to the new class and its Bayesian framework. Section 4 (divided into Section 4.1: Methods and Section 4.2: Results) reports a numerical study in which superior members of this class are chosen for mortality projections of the oldest-old in three populations. Finally, Section 5 offers a discussion.

2. A New Class of Counting Distributions on the Set of Nonnegative Integers N 0

The new class of families of counting distributions on the non-negative integers belongs to a wider class of natural exponential families (NEFs), characterized by their variance functions (VFs). In order to comprehend this class we decompose this section into subsections. We first present some preliminaries on NEFs and their associated VFs. We then introduce a class of NEFs having polynomial structure and then suggest the new class of families of counting distributions, named ABM, first introduced by Awad et al. (2016), where the class was defined and its usefulness for mortality projections was preliminary sketched. Furthermore, such a class has been investigated by Bar-Lev and Ridder (2021a, 2021b) from a classical frequency approach and has shown superiority with respect to various metrics or goodness-of-fit tests for different count datasets (for further details see item 6 in Section 2.3).

2.1. NEFs—Some Preliminaries

The following preliminaries are mainly taken from Letac and Mora (1990) and are briefly presented here for completeness.
Let ν be a non-Dirac positive Radon measure on R , and L ( θ ) = e θ x ν ( d x ) its Laplace transform. Assuming that Θ = int { θ R : L ( θ ) < } ϕ , then the NEF generated by ν is defined by the probability distributions
F = F θ : F θ ( ν ( d x ) ) = e θ x κ ( θ ) ν ( d x ) , θ Θ ,
where κ ( θ ) = log L ( θ ) , the cumulant transform of ν , is strictly convex and real analytic on Θ . If X θ represents a r.v. having distribution F θ of the form given in (1) then the expectation and variance of X θ are given, respectively, by E ( X θ ) m = κ ( θ ) and V ( X θ ) = κ ( θ ) where m = κ ( θ ) is strictly monotone and thus its inverse, say, θ = ψ ( m ) , m M = κ ( Θ ) is well defined. The set M of all means of (1) is called the mean parameter space of F . The variance of F θ can be expressed in terms of m by V ( m ) = κ ( θ ) = κ ( ψ ( m ) ) . The pair ( V , M ) is called the VF of F and it uniquely determines F within the class of NEFs. For example, ( m , R + ) and ( m 2 , R + ) are, respectively, the VFs of the Poisson and exponential NEFs and are uniquely determined by them.

2.2. The Mean Value Parametrization of NEFs

As indicated above, the VF of an NEF F uniquely determines F within the class of NEFs. Let ( V , M ) be a given VF of an NEF F generated by ν . Then, simple calculations show both θ = ψ ( m ) and the cumulant transform κ ( θ ) = κ ( ψ ( m ) ) of ν can be expressed in terms of m as:
θ = ψ ( m ) = d m V ( m ) + c 1 , k ( θ ) = k ( ψ ( m ) ) = m V ( m ) d m + c 2 ,
where one needs to determine the constants c 1 and c 2 so that F θ , θ Θ , constitutes a probability distribution (not an easy task). Accordingly, a mean value parametrization of an NEF F generated by a measure ν is given by:
F = exp ψ ( m ) x k ( ψ ( m ) ) ν ( d x ) , m M .
Such a representation of F is more natural as it is expressed in terms of the mean m rather than a somewhat artificial parameter θ . A comprehensive description of NEFs in terms of their mean value representation is reviewed in Bar-Lev and Kokonendji (2017).
Remark 1. 
The task of computing the constants c 1 and c 2 is not simple and might be rather cumbersome. However, from a Bayesian perspective, when (3) is used as a prior distribution on m, then in the calculation of the respective posterior distribution, such constants are cancelled out (as the likelihood function is the only relevant component). As this paper is concerned with a Bayesian framework, one can assume without any loss of generality that c 1 = c 2 = 0 . Henceforth, we indeed assume so.

2.3. Polynomial VFs of Counting NEFs Supported on the Set of Nonnegative Integers N 0

The innovative and breakthrough Proposition 4.4 of Letac and Mora (1990, p. 13) provided conditions under which a given VF ( V , M ) is associated with a counting NEF F supported on the set of non-negative integers N 0 , i.e., where all members of F are composed of counting distributions on N 0 . They provided general examples of two classes of VFs which fulfill the premises of their Proposition 4.4 and thus their associated NEFs’ distributions are supported on the non-negative integers. One of these two classes has the form:
V ( m ) = m i = 1 k 1 + m p i , p i > 0 , i = 1 , , k , k N 0 , M = R + , where i = 1 0 1 .
They proved that such VFs constitute counting NEFs supported on N 0 , namely, counting distributions with non-negative integer support. Moreover, their Proposition 4.4 enables to compute (at least theoretically and numerically) the corresponding measure ν (we skip details as they are irrelevant for our Bayesian framework analysis). Note that the two special cases of (4) with k = 0 and k = 1 correspond, respectively, to the Poisson and negative binomial NEFs. However, the general setting (4) for k 3 does not allow an explicit calculation of θ = ψ ( m ) and k ( θ ) = k ( ψ ( m ) ) in (2), implying that the mean value parametrization of the corresponding NEFs in the form (3) is not explicitly expressible in terms of m and thus becomes useless for any practical consideration.

2.4. A New Class of Polynomial VFs—The ABM NEFs

As we already noted, the fact that a given pair ( V , M ) is known to be a VF of some NEF does not necessarily enable the construction of the corresponding mean value parameterization (3), as in most cases the integrals for ψ ( m ) and k ( ψ ( m ) ) in (2) are not explicitly expressible analytically in closed forms, and indeed, this is the situation for the class (4) in its general form. Consequently, one needs to search for subclasses of (4) for which the integrals in (2) can be computed explicitly. One such special subclass takes the above point into consideration. Indeed, by taking in (4) the special case where
p 1 = p 2 = = p k ,
and denoting
p 2 k N 0 ,
we obtain a subclass of (4) with VFs with the form:
( V , M ) = m 1 + m p 1 p 2 , R + ) , p 1 > 0 , p 2 N 0 .
As (5) is a subclass of (4) and (4) satisfies the premises of Proposition 4.4 of Letac and Mora (1990) it follows that the subclass (5) are VFs associated with counting NEFs supported on the non-negative integers.
The subclass of VFs in (5) (hereafter called the ABM class) was first introduced by Awad et al. (2016) who showed that the corresponding ψ ( m ) and k ( ψ ( m ) ) (calculated from (2)) have, as opposed to the general form in (4), the following closed forms (the exact proof details appear in Bar-Lev and Kokonendji 2017):
θ = ψ ( m ) = ln m p 1 + m + i = 1 p 2 1 1 i p 1 i ( p 1 + m ) i + c 1 , where i = 1 0 = 0 ,
and
κ ( ψ ( m ) ) = p 1 p 2 ( p 2 1 ) ( m + p 1 ) p 2 1 + c 2 .
Thus, its mean value parametrization is given by the probability distribution:
F ( m , ν ( d x ) ) = exp x ln m p 1 + m + i = 1 p 2 1 1 i p 1 i ( p 1 + m ) i + c 1 + p 1 p 2 ( p 2 1 ) ( m + p 1 ) p 2 1 + c 2 , m R + , p 1 > 0 , p 2 N ,
where hereafter we denote this probability distribution by A B M ( p 1 , p 2 ) , where p 1 is a positive real number and p 2 is a non-negative integer. (For a classical frequency approach, the constants c 1 and c 2 have been computed by Bar-Lev and Ridder 2021b). However, as noted above, for a Bayesian framework they are cancelled out when computing the posterior distribution and thus can be taken to be c 1 = c 2 = 0 without any loss of generality).
Note that the ABM class of VFs m 1 + m p 1 p 2 , p 1 > 0 p 2 N 0 , or alternatively, the corresponding class A B M ( p 1 , p 2 ) p 2 N 0 of NEFs is composed of an infinitely countable set of families of counting NEFs supported on the non-negative integers. As special cases, this class contains the Poisson NEF ( p 2 = 0 ), the negative binomial NEF ( p 2 = 1 ) and the Abel NEF ( p 2 = 2 ), (c.f., Letac and Mora 1990, p. 31; Bar-Lev and Ridder 2019, for applications to car accident claims of a Swedish insurance company dataset).
Summarizing, this ABM NEF has the following features:
  • It is a class of counting distributions supported on the non-negative integers;
  • It is overdispersed as V ( m ) / m > 1 ;
  • It allows a mean value parameterization in a closed form;
  • It is infinitely divisible, which allows the construction of an exponential dispersion model (EDM) with dispersion parameter space equal to R + . EDMs are used to describe the error distribution in generalized linear models (see Jorgensen 1987, 1997);
  • p 1 is an unknown parameter to be estimated (see next section). p 2 N 0 is a parameter governing the particular model within the ABM class and is considered to be a decision variable (note that different values of p 2 determine different ABM NEFs). Accordingly, for given national datasets (i.e., those of US, Ireland and Ukraine), the goal will be to locate that value of p 2 , which minimizes a respective RMSE (see in the sequel). However, due to the rather cumbersome and intractable structure of the ABM probabilities (or likelihood) in (6) and the fact that the larger the p 2 , the larger the number of elements in the summands appearing in (6), no analytic solution for an optimal p 2 is feasible at all for achieving such a goal. Consequently, only numerical search algorithms are plausible. The search starts with p 2 = 0 (the Poisson NEF), p 2 = 1 (the negative binomial NEF), p 2 = 2 (the Abel NEF) and so on;
  • As already noted, the ABM class A B M ( p 1 , p 2 ) p 2 N 0 is composed of infinitely countable set of families of counting NEFs supported on the non-negative integers and thus can also be used to model real datasets by employing the classical frequency approach (and not only Bayesian). Indeed, the ABM class has been compared in Bar-Lev and Ridder (2021a, 2021b) with other common counting probability models (such as Poisson-inverse Gaussian distribution, new logarithmic distribution, an exponentiated discrete Lindley distribution) for various real count datasets stemming from automobile insurance claims, marketing, biometry, health, and social sciences (none of which is related to mortality projections). Members of the ABM counting class have shown superiority with respect to various metrics for goodness-of-fit tests (chi-squared test, Akaike information criterion (AIC), root-mean-square error (RMSE) and Kullback–Leibler divergence (KL)), and provided a much better fit for each of the datasets considered (more details can be found in Bar-Lev and Ridder 2021b).

3. ABM Based LC Model and its Bayesian Framework

As an alternative to parameter estimation via the singular value decomposition used in the classical LC model or the MLE in the cases discussed above, we adopt the Bayesian approach which offers advantages succinctly expressed in Antonio et al. (2015): a. The calibration and forecast steps are combined, which leads to more consistent estimates of the period effects; b. The Bayesian approach provides a natural framework for incorporating parameter uncertainty in mortality forecasts, which is relevant—for example—in the new insurance regulatory framework of Solvency II. The Bayesian approach allows adequate handling of small populations and missing data. Like Czado et al. (2005) and Pedroza (2006), we harness the power of the Markov Chain Monte Carlo (MCMC) methodology to estimate the model parameters and execute mortality projection. We note that the interest in Bayesian solutions in the context of mortality projections has recently gained momentum (Ellison et al. 2020; Graziani 2020; Hilton et al. 2019; Hunt and Blake 2020; Kogure et al. 2019; Liu et al. 2020; Njenga and Sherris 2020; Wong et al. 2018).
Suppose the number of deaths D x t in a population at age x and time t is distributed as follow:
D x t A B M ( p 1 , p 2 ) ( μ x t ) , μ x t = E x t m x t , m x t = exp ( α x + β x k t ) ,
where
α = ( α x min , , α x max ) , β = ( β x min , , β x max ) , k = ( k t min , , k t max ) .
Bayesian estimation of the unknown parameters α , β , k and p 1 are based on the joint posterior distribution function of α , β , k and p 1 given ( E x t , D x t ) , when x = x min , x min + 1 , , x max , t = t min , t min + 1 , , t max . The first step in the Bayesian estimation is to determine the prior probability functions for these parameters.
The prior distribution for k t and θ
Let k t = k t 1 + θ + w t , and let w t N ( 0 , σ w 2 ) and hence k t N ( θ , t σ w 2 ) . we assume σ w 2 g a m m a ( a k , b k ) and θ N ( θ 0 , σ θ 2 ) . The hyper-parameters θ 0 , a k , b k and σ θ 2 are arbitrary initial values.
The prior distribution for β x
We assume β x N ( 0 , σ β 2 ) x, where σ β 2 g a m m a ( a β , b β ) . The hyper-parameters a β , b β are arbitrary initial values.
The prior distribution for α x
We suppose that the prior distribution of α x N ( α 0 x , σ α 2 ) x, where σ α 2 g a m m a ( a α , b α ) . The hyper-parameters α 0 x , a α and b α are arbitrary initial values.
The prior distribution for p 1
We let p 1 g a m m a ( a p 1 , b p 1 ) . The hyper-parameters a p 1 , b p 1 are arbitrary initial values.

MH (Metropolis–Hastings) Algorithm for Estimating the Parameters α , β , k and p 1

Suppose the D x t s are independent random variables, which are distributed as (6) and g ( Ξ ) is the joint prior distribution of the unknown parameters Ξ = ( α , β , k , p 1 ) . Then, the posterior distribution of Ξ , given all available data D = { d x t } and p 2 , can be represented as follows:
f ( Ξ D , p 2 ) x t exp d x t ln E x r m x t p 1 + E x t m x t + i = 1 p 2 1 1 i p 1 i p 1 + E x t m x t + p 1 p 2 p 2 1 p 1 + E x t m x t p 2 1 × g ( Ξ ) .
See Appendix A for the marginal posterior distributions of α , β , k and p 1 . We now describe the estimation of α , β , k and p 1 using the MH, conditioned on the data and all other parameters at their respective iterations. The superscript denotes the iteration number of the parameter of interest.
Estimation of k t using the MH algorithm
Let the marginal posterior distribution of k t be f ( k t D , α , β , k t , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) . The estimation of k t , is achieved by the following steps, where.
  • Draw k t * from the proposal density function N ( k t ( i ) , σ t 2 ) , such that σ t 2 is assumed known;
  • Calculate the following probability:
    Ψ k t ( i ) , k t * = min 1 , f ( k t * D , α , β , k t ( i ) , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( k t ( i ) D , α , β , k t ( i ) , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) ,
    where k t = ( k t min , , k t 1 , k t + 1 , . k t max ) ;
  • Draw a value u from uniform probability function in range U ( 0 , 1 ) and decide in accordance with the following formula:
    i f u Ψ k t ( i ) , k t * then k t ( i + 1 ) = k t * i f u > Ψ k t ( i ) , k t * then k t ( i + 1 ) = k t ( i ) ; .
  • Going over all values of t, we have:
    k ( i + 1 ) = k t min ( i + 1 ) , , k t ( i + 1 ) , k t + 1 ( i ) , , k t max ( i ) ;
  • Transforming k ( i + 1 ) and α ( i ) to assure identifiably:
    k ( i + 1 ) k ¯ k ( i + 1 ) , α ( i ) + β ( i ) k ¯ α ( i ) ,
    where
    k ¯ = 1 T j t k j ( i + 1 ) + j > t k j ( i ) ;
  • Repeat steps 1 to 5.
Estimation of β x using MH algorithm
Let the marginal posterior distribution of β x be f ( β x D , α , β x , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) . The estimation of β x is achieved by the following steps.
  • Draw β x * from the proposal density function N ( β x ( i ) , σ β 2 ) , such that σ β 2 is assumed to be known;
  • Calculate the following probability:
    Ψ β x ( i ) , β x * = min 1 , f ( β x * D , α , β x ( i ) , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( β x ( i ) D , α , β x ( i ) , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) ,
    where
    β x = ( β x min , , β x 1 , β x + 1 , . β x max ) .
  • Draw a value u from uniform probability function in range U ( 0 , 1 ) and decide in accordance with the following formula:
    i f u Ψ β x ( i ) , β x * t h e n β x ( i + 1 ) = β x * i f u > Ψ β x ( i ) , β x * t h e n β x ( i + 1 ) = β x ( i ) ; .
  • Going over all values of x, we have:
    β ( i + 1 ) = β x min ( i + 1 ) , , β x ( i + 1 ) , β x + 1 ( i ) , , β x max ( i ) ; .
  • Transforming k ( i + 1 ) and β ( i + 1 ) to assure identifiably:
    β ( i + 1 ) β s u m β ( i + 1 ) , k ( i + 1 ) × β s u m k ( i + 1 ) ,
    where
    β s u m = j x β j ( i + 1 ) + j > x β j ( i ) ;
  • Repeat steps 1 to 5.
Estimation of α x using MH algorithm
Let the marginal posterior distribution of α x be f ( α x D , α x , β , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) . The estimation of α x , is achieved by the following steps;
  • Draw α x * from the proposal density function N ( α x ( i ) , σ α 2 ) , such that σ α 2 is assumed known;
  • Calculate the following probability:
    Ψ α x ( i ) , α x * = min 1 , f ( α x * D , α x ( i ) , β , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( α x ( i ) D , α x ( i ) , β , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) ,
    where
    α t = ( α x min , , α x 1 , α x + 1 , . α x max ) ;
  • Draw a value u from uniform probability function in range U ( 0 , 1 ) and decide in accordance with the following formula:
    i f u Ψ α x ( i ) , α x * t h e n α x ( i + 1 ) = α x * i f u > Ψ α x ( i ) , α x * t h e n α x ( i + 1 ) = α x ( i ) ; .
  • Receiving α ( i + 1 ) in ( i + 1 ) th iteration as follows:
    α x ( i + 1 ) = α x min ( i + 1 ) , , α x ( i + 1 ) , α x + 1 ( i ) , , α x max ( i ) ;
  • Repeat steps 1 to 4.
Estimation of p 1 using MH algorithm
Let the marginal posterior distribution of p 1 be f ( p 1 D , α , β , k , θ , σ α 2 , σ β 2 , σ w 2 ) , proportional to the product of the likelihood (6) and the gamma prior distribution of p 1 . The estimation of p 1 is achieved by the following steps;
  • Draw p 1 * from the probability function g a m m a ( α p 1 , b p 1 ) , such that α p 1 and b p 1 are hyperparameters and are assumed known;
  • Calculate the following probability:
    Ψ p 1 ( i ) , p 1 * = min 1 , f ( p 1 * D , α , β , k , θ , σ α 2 , σ β 2 , σ w 2 ) f ( p 1 ( i ) D , α , β , k , θ , σ α 2 , σ β 2 , σ w 2 ) ;
  • Draw a value u from uniform probability function in range U ( 0 , 1 ) and decide in accordance with the following formula:
    i f u Ψ p 1 ( i ) , p 1 * t h e n p 1 ( i + 1 ) = p 1 * i f u > Ψ p 1 ( i ) , p 1 * t h e n p 1 ( i + 1 ) = p 1 ( i ) ; .
  • Then receiving p ( i + 1 ) in ( i + 1 ) th iteration;
  • Repeat steps 1 to 4.
Estimation of θ , σ α 2 , σ β 2 and σ w 2 using the Gibbs sampler
The Gibbs sampler can be used for estimating θ , σ α 2 , σ β 2 and σ w 2 since the marginal posterior distribution of these parameters can be written explicitly (See: (Czado et al. 2005)). The following are the marginal posterior sampling distributions of each of these parameters, conditioned on the data and all other parameters at their respective iterations.
1. Sampling θ :
The posterior probability function of the parameter θ , is presented as follows:
f ( θ D , α , β , k , σ α 2 , σ β 2 , σ w 2 , p 1 ) = f ( θ k , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( k θ , σ w 2 ) f ( θ ) .
The prior probability function of the parameter θ is N ( θ 0 , σ θ 2 ) , and the hyper-parameters θ 0 and σ θ 2 are set by the user, hence the posterior probability function of the parameter is:
( θ k , σ α 2 , σ β 2 , σ w 2 ) N θ 0 σ w 2 T σ θ 2 + σ w 2 , T σ θ 2 + σ w 2 1 .
2. Sampling σ α 2 :
The posterior probability function of the parameter σ α 2 is presented as follows:
f ( σ α 2 D , α , β , k , θ , σ β 2 , σ w 2 , p 1 ) f ( α σ α 2 ) f ( σ α 2 ) .
The prior probability function of the parameter σ α 2 such that σ α 2 g a m m a ( a α , b α ) , and the hyper-parameters a α and b α are set by the user, so the posterior probability function of the parameter is:
( σ α 2 D , α , β , k , θ , σ β 2 , σ w 2 , p 1 ) g a m m a a α + x max 2 , b α + 1 2 x = x min x max α x α ¯ 2 ,
where
α ¯ = 1 x max x = x min x max α x .
3. Sampling σ β 2 :
The posterior probability function of the parameter σ β 2 is presented as follows:
f ( σ β 2 D , α , β , k , θ , σ α 2 , σ w 2 , p 1 ) f ( β σ β 2 ) f ( σ β 2 ) .
The prior probability function of the parameter σ β 2 such that σ β 2 g a m m a ( a β , b β ) , and the hyper-parameters a β and b β are set by the user, so the posterior probability function of the parameter is:
( σ β 2 D , α , β , k , θ , σ α 2 , σ w 2 , p 1 ) g a m m a a β + x max 2 , b β + 1 2 x = x min x max β x 2 .
4. Sampling σ w 2 :
The posterior probability function of the parameter σ w 2 , is presented as follows:
f ( σ w 2 D , α , β , k , θ , σ α 2 , σ β 2 , p 1 ) f ( k σ w 2 ) f ( σ w 2 ) .
The prior probability function of the parameter σ w 2 such that σ w 2 g a m m a ( a k , b k ) , and the hyper-parameters a k and b k are set by the user, so the posterior probability function of the parameter is:
( σ w 2 D , α , β , k , θ , σ α 2 , σ β 2 , p 1 ) g a m m a a k + T 2 , b k + 1 2 t = t min t max k t k t 1 θ 2 .

4. Numerical Experiment

4.1. Methods

To test the adequacy of the ABM class, we analyzed mortality data of men in Ireland, Ukraine and the USA, downloaded from the database of Human Mortality Database (https://www.mortality.org accessed on 8 March 2013). The data contain the number of dead and the size of the population exposed to risk by age and year; Ireland’s and the USA’s data are for 1950–2007, and Ukraine’s data are for 1959–2009. This analysis aims to examine sixteen models within the ABM class, p 2 = 0 , , 15 , with a particular emphasis on forecasting the mortality of the oldest-old (see below an argumentation for the restriction of the values of p 2 to { 0 , , 15 } ). These models also include the Poisson and negative binomial models which feature widely in the literature, for which p 1 = 0 and p 2 = 1 , respectively. Adopting a data mining approach, the models were fitted using training sets and were examined using test sets. The training sets contained data up to 2000 and the test sets, aimed at monitoring the quality of predictions, contained data from 2001 to 2007 for Ireland and the USA, and from 2001 to 2009 for Ukraine. Predictions are carried out with the estimated parameters, ln [ m x , t + s ] = α x + β * k t + s , s = 1 , 2 , , where model performance (using the test sets) was checked using the root of the mean squared errors (RMSE), which was calculated in two ways:
  • Predicting mortality rates ( μ ) by age. In other words, after model parameters were estimated, mortality rates were predicted for a given age across years. For instance, predicting mortality rates for those age 70 was carried out over the years beyond 2000;
  • Predicting mortality rates ( μ ) by cohort. In other words, after model parameters were estimated, mortality rates were predicted for a cohort that was at a particular age at the beginning of the test period. For example, predicted mortality rates in 2001–2007 for a cohort aged 70 in 2001.
For every member of the ABM class (controlled by p 2 ), the Markov chains used to obtain posterior distributions/parameter estimates comprised 4000 iterations with the first 1000 considered a burn-in period. Convergence was established using graphical means and a sensitivity analysis ascertained that the choice of arbitrary initial hyper-parameters did not affect the final outcomes. We report the outcomes for the Poisson distribution ( p 2 = 0 ) and the negative binomial distribution ( p 2 = 1 ) . In addition, we examined the ABM members for which p 2 { 2 , , 15 } . We limited our reporting to p 2 { 2 , , 15 } since, for the data under study, increasing p 2 beyond 15 (our study explored all models up to p 2 = 50 ) did not alter our findings of the optimal p 2 for varying ages and resulted in much larger RMSE than those found up to p 2 = 15. In practice, we analyzed all 16 models within the p 2 range and, for each, we estimated p 1 as well as all other unknown parameters. Finally, we reported graphically the RMSE for the Poisson, negative binomial and for the ABM member which produced the minimal RMSE for various ages.

4.2. Results

Figure 1a,b shows Ireland’s mortality projections by age and cohort, respectively. A superior model for an age range is the one which produced the smallest RMSE. It is evident that the Poisson performs best for most ages above 70, with the negative binomial lagging behind. However, at a very old age, the Poisson diverges and the best ABM member to be chosen instead is A B M ( · , p 2 = 3 ) . A similar result is shown for the USA (Figure 2a,b), except that the best ABM for the very old is A B M ( · , p 2 = 4 ) . A different picture emerges when we focus on Ukraine’s mortality projections by age and cohort (Figure 3a,b, respectively). While the Poisson and negative binomial perform well (with Poisson being better), for ages above 96 (by cohort) or 104 (by age), the negative binomial drifts away, leaving A B M ( · , p 2 = 10 ) to be the winner of the ABM class. Clearly, the recommended projection policy for Ukraine is to use the Poisson for most ages but to rely on A B M ( · , p 2 = 10 ) for very old ages. Naturally, other countries with their specific national datasets may yield different ABM models (i.e., different p 2 ’s) for mortality projections.

5. Discussion

Several extensions to the LC model assume that the number of deaths is distributed Poisson or negative binomial. These distributions have offered adequate mortality projections in several populations reported in the literature. It is not implausible that cases where these two failed were not reported. Rather than deciding a priori to choose a particular distribution, we aimed to enrich the LC model by allowing a richer pool of candidate distributions. The chosen distribution would be the one providing the best projection for the population and age range under study. To achieve this goal, we proposed a new class of counting distributions on the non-negative integers, the ABM class, which belongs to a wider class of natural exponential families characterized by their variance functions. This class includes the Poisson and negative binomial distributions which are included in an infinitely countable set of additional members. A data mining approach was adopted whereby the model is fitted using a training set and tested using a test set with the RMSE used to pick the winning model. As an alternative to parameter estimation via the singular value decomposition (SVD) used in the classical LC model, we adopted Bayesian estimation, harnessing the Markov Chain Monte Carlo (MCMC) methodology. While we do not suggest that MCMC is superior to SVD (for a comparison of the two, see Ichikawa et al. 2021), we still promote the former since the Bayesian framework frees us from the burden of calculating the normalizing constant of the ABM. This is indeed a great plus, even though running the MCMC requires more computer time for the mid-size databases of the kind used for national mortality projections. We note, however, that the use of MCMC is rather costly and might fail if the dataset is huge. So, perhaps other Bayesian techniques such as Variational Bayes can be more helpful. However, employing such a suggestion is beyond the scope of this paper. We examined ABM models for three countries and established that, for the countries examined, the commonly adopted Poisson distribution is justified except for a very old age for which an alternative member of the ABM class offers better projection. We do not claim that to suggest a superior model. When deciding on an underlying model, one can adopt as an example the Poisson model or the negative binomial model. Rather than adopting a model, we suggest adopting a class of models (the ABM) comprising the Poisson, negative binomial, and numerous other counting distributions. The superiority of this approach lies in the ability to choose a model amongst candidate models. Since no one single model necessarily fits every population and every age group well, the ABM class could allow picking, as an example, the Poisson for members of the population aged under 50, the negative binomial for those aged 50 to 80, and another member of the class to the oldest-old. The suggested criteria for preferring one member of the class over another is the mean squared projection errors (RMSE). This advantage is gained at the cost of additional complexity, which is justified given the financial benefits associated with more accurate modeling. We conclude that it is no longer appropriate to assume a single distribution for the whole process of mortality projection. Instead, for every country and every relevant range of ages, a desirable approach is to pick a member of the ABM class that provides the best mortality projection. In the numerical study reported here, neither the Poisson nor the negative binomial distributions adequately serve the very oldest-old and superior alternatives are within reach in the suggested novel ABM class of distributions.

Author Contributions

Conceptualization, Y.A., S.K.B.-L. and U.M.; methodology, Y.A., S.K.B.-L. and U.M.; software, Y.A.; validation, Y.A., S.K.B.-L. and U.M.; formal analysis, Y.A., S.K.B.-L. and U.M.; investigation, Y.A., S.K.B.-L. and U.M.; resources, Y.A., S.K.B.-L. and U.M.; data curation, Y.A.and U.M.; writing—original draft preparation, Y.A., S.K.B.-L. and U.M.; writing—review and editing, Y.A., S.K.B.-L. and U.M.; visualization, Y.A., S.K.B.-L. and U.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: the database of Human Mortality Database (https://www.mortality.org access on 8 March 2013).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1

The marginal posterior probability function of the time index k t is
f ( k t D , α , β , k t , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( D t min k t min , α , β , p 1 ) × f ( k t min θ , σ w 2 ) × j = t min + 1 t max f ( D j k j , α , β , p 1 ) × f ( k j k j 1 , θ , σ w 2 ) ,
where
f ( D t α , β , k t , p 1 ) = x exp d x t ln E x r μ x t p 1 + E x t μ x t + i = 1 p 2 1 1 i p 1 i p 1 + E x t μ x t + p 1 p 2 p 2 1 p 1 + E x t μ x t p 2 1 .
For the remaining expressions we distinguish between three cases:
  • For t = t min , the marginal posterior probability function of the time index k t is:
    f ( k t D , α , β , k t , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( D t α , β , k t , p 1 ) × f ( k t θ , σ w 2 ) × f ( k t + 1 k t , θ , σ w 2 ) ,
    where
    f ( k t θ , σ w 2 ) = exp 1 2 σ w 2 k t θ 2 and
    f ( k t + 1 k t , θ , σ w 2 ) = exp 1 2 σ w 2 k t k t 1 θ 2 .
  • For t min < t < t max , the marginal posterior probability function of the time index k t is:
    f ( k t D , α , β , k t , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( D t α , β , k t , p 1 ) × f ( k t k t 1 , θ , σ w 2 ) × f ( k t + 1 k t , θ , σ w 2 ) ,
    where
    f ( k t k t 1 , θ , σ w 2 ) = exp 1 2 σ w 2 k t k t 1 θ 2 and
    f ( k t + 1 k t , θ , σ w 2 ) = exp 1 2 σ w 2 k t + 1 k t θ 2 .
  • For t = t max , the marginal posterior probability function of the time index k t is:
    f ( k t D , α , β , k t , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) f ( D t α , β , k t , p 1 ) × f ( k t k t 1 , θ , σ w 2 ) ,
    where
    f ( k t k t 1 , θ , σ w 2 ) = exp 1 2 σ w 2 k t k t 1 θ 2 .

Appendix A.2

The marginal posterior probability function of β x is
f ( β x D , α , β x , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) j = x min x max f ( D j α , β j , k , p 1 ) × f ( β j ) f ( D x α , β x , k , p 1 ) × f ( β x ) ,
where
f ( D x α , β x , k , p 1 ) = t exp d x t ln E x r μ x t p 1 + E x t μ x t + i = 1 p 2 1 1 i p 1 i p 1 + E x t μ x t + p 1 p 2 p 2 1 p 1 + E x t μ x t p 2 1 ,
and
f ( β x ) = exp 1 2 σ β 2 β x 2 .

Appendix A.3

The marginal posterior probability function of α x is
f ( α x D , α x , β , k , θ , σ α 2 , σ β 2 , σ w 2 , p 1 ) j = x min x max f ( D j α j , β , k , p 1 ) × f ( α j ) f ( D x α x , β , k , p 1 ) × f ( α x ) ,
where
f ( D x α x , β , k , p 1 ) = t exp d x t ln E x r μ x t p 1 + E x t μ x t + i = 1 p 2 1 1 i p 1 i p 1 + E x t μ x t + p 1 p 2 p 2 1 p 1 + E x t μ x t p 2 1 ,
and
f ( β x ) = exp 1 2 σ α 2 α x α 0 x 2 .

References

  1. Antonio, Katrien, Anastasios Bardoutsos, and Wilbert Ouburg. 2015. Bayesian Poisson log-bilinear models for mortality projections with multiple populations. Eur. Actuar. J. 5: 245–281. [Google Scholar] [CrossRef]
  2. Awad, Yaser, Shaul K. Bar-Lev, and Udi Makov. 2016. A New Class Counting Distributions Embedded in the Lee–Carter Model for Mortality Projections: A Bayesian Approach. Technical Report No. 146. Israel: Actuarial Research Center, University of Haifa. [Google Scholar]
  3. Azman, Shafiqah, and Dharini Pathmanathan. 2020. The GLM framework of the Lee–Carter model: A multi-country study. Journal of Applied Statistics 49: 752–63. [Google Scholar] [CrossRef]
  4. Baltes, B. Paul, and Jacqui Smith. 2003. New frontiers in the future of aging: From successful aging of the young old to the dilemmas of the fourth age. Gerontology 49: 123–35. [Google Scholar] [CrossRef] [PubMed]
  5. Bar-Lev, K. Shaul, and Ad Ridder. 2019. Monte Carlo methods for insurance risk computation. International Journal of Statistics and Probability 8: 54–74. [Google Scholar] [CrossRef]
  6. Bar-Lev, K. Shaul, and Ad Ridder. 2021a. Exponential dispersion models for overdispersed zero-inflated count data. Communications in Statistics-Simulation and Computation. Available online: https://doi.org/10.1080/03610918.2021.1934020 (accessed on 10 August 2021).
  7. Bar-Lev, K. Shaul, and Ad Ridder. 2021b. New exponential dispersion models for count data—The ABM and LM classes. ESAIM: Probability and Statistics 25: 31–52. [Google Scholar] [CrossRef]
  8. Bar-Lev, K. Shaul, and Clestin C. Kokonendji. 2017. On the mean value parametrization of natural exponential families—A revisited review. Mathematical Methods of Statistics 26: 159–75. [Google Scholar] [CrossRef]
  9. Brouhns, Natacha, Michel Denuit, and Jeroen K. Vermunt. 2002. A Poisson log-bilinear regression approach to the construction of project life-table. Insurance: Mathematics and Economics 31: 373–93. [Google Scholar]
  10. Buettner, Thomas. 2002. Approaches and experiences in projecting mortality patterns for the oldest-old. North American Actuarial Journal 6: 14–29. [Google Scholar] [CrossRef]
  11. Czado, Claudia, Antoine Delwarde, and Michel Denuit. 2005. Bayesian Poisson log-bilinear mortality projections. Insurance: Mathematics and Economics 36: 260–84. [Google Scholar] [CrossRef] [Green Version]
  12. Danesi, L. Ivan, Steven Haberman, and Pietro Millossovich. 2015. Forecasting mortality in subpopulations using Lee—Carter type models: A comparison. Insurance: Mathematics and Economics 62: 151–61. [Google Scholar] [CrossRef]
  13. Delwarde, Antoine, Michel Denuit, and Christian Partrat. 2007. Negative binomial version of the Lee–Carter model for mortality forecasting. Applied Stochastic Models in Business and Industry 23: 385–401. [Google Scholar] [CrossRef]
  14. Ellison, Joanne, Erengul Dodd, and Jonathan J. Forster. 2020. Forecasting of cohort fertility under a hierarchical Bayesian approach. Journal of the Royal Statistical Society: Series A (Statistics in Society) 183: 829–56. [Google Scholar] [CrossRef]
  15. Graziani, Rebecca. 2020. Stochastic Population Forecasting: A Bayesian Approach Based on Evaluation by Experts. Developments in Demographic Forecasting 49: 21–42. [Google Scholar]
  16. Hilton, Jason, Erengul Dodd, Jonathan J. Forster, and Peter W. F. Smith. 2019. Projecting UK mortality by using Bayesian generalized additive models. Journal of the Royal Statistical Society: Series C (Applied Statistics) 68: 29–49. [Google Scholar] [CrossRef]
  17. Hunt, Andrew, and David Blake. 2020. A Bayesian Approach to Modeling and Projecting Cohort Effects. North American Actuarial Journal 25: S235–S254. [Google Scholar] [CrossRef]
  18. Ichikawa, Shota, Hiroyuki Yamamoto, and Takumi Morita. 2021. Comparison of a Bayesian estimation algorithm and singular value decomposition algorithms for 80-detector row CT perfusion in patients with acute ischemic stroke. La Radiologia Medica 126: 795–803. [Google Scholar] [CrossRef]
  19. Idrizi, Olgerta. 2018. The Heteroscedasticity Impact on Actuarial Science: Lee Carter Error Simulation. European Journal of Engineering and Formal Sciences 1: 1–12. [Google Scholar] [CrossRef]
  20. Jorgensen, Bent. 1987. Exponential dispersion models (with discussion). Journal of the Royal Statistical Society, Series B 49: 127–62. [Google Scholar]
  21. Jorgensen, Bent. 1997. The Theory of Exponential Dispersion Models. Monographs on Statistics and Probability 76. London: Chapman and Hall. [Google Scholar]
  22. Kogure, Atsuyuki, Takahiro Fushimi, and Shinichi Kamiya. 2019. Mortality Forecasts for Long-Term Care Subpopulations with Longevity Risk: A Bayesian Approach. North American Actuarial Journal 25: S534–44. [Google Scholar] [CrossRef]
  23. Lee, D. Ronald, and Lawrence R. Carter. 1992. Modelling and forecasting the time series of US mortality. Journal of the American Statistical Association 87: 659–71. [Google Scholar]
  24. Légaré, Jacques, Yann Décarie, Kim Deslandes, and Yves Carrière. 2015. Canada’s oldest old: A population group which is fast growing, poorly apprehended and at risk from lack of appropriate services. Population Change and Lifecourse Strategic Knowledge Cluster Discussion Paper Series/Un Réseau Stratégique de Connaissances Changements de Population et Parcours de vie Document de Travail 3: 1. Available online: https://ir.lib.uwo.ca/pclc/vol3/iss2/1 (accessed on 1 April 2021).
  25. Letac, Gerard, and Marianne Mora. 1990. Natural real exponential families with cubic variance functions. Annals of Statistics 18: 1–37. [Google Scholar] [CrossRef]
  26. Liu, Zhen, Xiaoqian Sun, Leping Liu, and Yu-Bo Wang. 2020. Bayesian Poisson Log-normal Model with Regularized Time Structure for Mortality Projection of Multi-population. arXiv arXiv:2010.04775. [Google Scholar]
  27. Njenga, N. Carolyn, and Michael Sherris. 2020. Modeling mortality with a Bayesian vector autoregression. Insurance: Mathematics and Economics 94: 40–57. [Google Scholar] [CrossRef]
  28. Pedroza, Claudia. 2006. A Bayesian forecasting model: Predicting U.S male mortality. Biostatistics 7: 530–50. [Google Scholar] [CrossRef]
  29. Renshaw, Arthur, and Steven Haberman. 2008. On simulation-based approaches to risk measurement in mortality with specific reference to Poisson Lee–Carter modelling. Insurance: Mathematics and Economics 42: 797–816. [Google Scholar] [CrossRef] [Green Version]
  30. Shair, N. Syazreen, Muhammad A. S. Rosmizan, Mohammad J. M. S. Ting, and Mohd A. A. Zaini. 2018. Projected Malaysian Lifetable: Evaluations of The Lee–Carter and Poisson Log-Bilinear Models. International Journal of Modern Trends in Social Sciences 1: 60–72. [Google Scholar]
  31. Wang, Duolao, and Pengjun Lu. 2005. Modelling and forecasting mortality distributions in England and Wales using the Lee—Carter model. Journal of Applied Statistics 32: 873–85. [Google Scholar] [CrossRef]
  32. Wong, S. T. Jackie, Jonathan J. Forster, and Peter W. F. Smith. 2018. Bayesian mortality forecasting with overdispersion. Insurance: Mathematics and Economics 83: 206–21. [Google Scholar] [CrossRef] [Green Version]
Figure 1. RMSE for projecting mortality rate, Ireland, 2001–2007. (a) by age; (b) by cohort.
Figure 1. RMSE for projecting mortality rate, Ireland, 2001–2007. (a) by age; (b) by cohort.
Risks 10 00111 g001
Figure 2. RMSE for projecting mortality rate, USA, 2001–2007. (a) by age; (b) by cohort..
Figure 2. RMSE for projecting mortality rate, USA, 2001–2007. (a) by age; (b) by cohort..
Risks 10 00111 g002
Figure 3. RMSE for projecting mortality rate, Ukraine, 2001–2009. (a) by age; (b) by cohort..
Figure 3. RMSE for projecting mortality rate, Ukraine, 2001–2009. (a) by age; (b) by cohort..
Risks 10 00111 g003
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Awad, Y.; Bar-Lev, S.K.; Makov, U. A New Class of Counting Distributions Embedded in the Lee–Carter Model for Mortality Projections: A Bayesian Approach. Risks 2022, 10, 111. https://doi.org/10.3390/risks10060111

AMA Style

Awad Y, Bar-Lev SK, Makov U. A New Class of Counting Distributions Embedded in the Lee–Carter Model for Mortality Projections: A Bayesian Approach. Risks. 2022; 10(6):111. https://doi.org/10.3390/risks10060111

Chicago/Turabian Style

Awad, Yaser, Shaul K. Bar-Lev, and Udi Makov. 2022. "A New Class of Counting Distributions Embedded in the Lee–Carter Model for Mortality Projections: A Bayesian Approach" Risks 10, no. 6: 111. https://doi.org/10.3390/risks10060111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop