*Article* **A New Class of Counting Distributions Embedded in the Lee–Carter Model for Mortality Projections: A Bayesian Approach**

**Yaser Awad <sup>1</sup> , Shaul K. Bar-Lev <sup>2</sup> and Udi Makov 3,\***


**Abstract:** The Lee–Carter model, the dominant mortality projection modeling in the literature, was criticized for its homoscedastic error assumption. This was corrected in extensions to the model based on the assumption that the number of deaths follows Poisson or negative binomial distributions. We propose a new class of families of counting distributions, namely, the ABM class, which belongs to a wider class of natural exponential families. This class is characterized by its variance functions and contains the Poisson and the negative binomial distributions as special cases, offering an infinite class of additional counting distributions to be considered. We are guided by the principle that the choice of distribution should be made from a pool of distributions as large as possible. To this end, and following a data mining approach, a training set of historical mortality data of the population could be modeled using the ABM's rich choice of distributions, and the chosen distribution should be the one that proved to offer superior projection results on a test set of mortality data. As an alternative to parameter estimation via the singular value decomposition used in the classical Lee–Carter model, we adopted Bayesian estimation, harnessing the Markov Chain Monte Carlo methodology. A numerical study demonstrates that when fitting mortality data using this new class of distributions, while traditional distributions may provide desirable projections for some populations, for others, alternative distributions within the ABM class can potentially produce superior results for the entire population or particular age groups, such as the oldest-old.

**Keywords:** Lee–Carter; counting distributions; mortality projections; natural exponential family

#### **1. Introduction**

The seminal paper by Lee and Carter (1992) (LC) introduced a model which is one of the most well-known and widely applied models for forecasting mortality rates. Within this model, the time series of the log mortality rates, ln *mxt*, of each age is described by an age-specific intercept *α<sup>x</sup>* plus a common trend *k<sup>t</sup>* for all age groups multiplied by an age-specific coefficient *βx*,

$$
\ln m\_{\rm xt} = \alpha\_{\rm x} + \beta\_{\rm x} k\_{\rm t} + \varepsilon\_{\rm xt}.
$$

The error term *εxt* is assumed to be distributed with a mean 0 and variance *σ* 2 *ε* , reflecting influences missed by the model. The age and time-specific mortality rate *mxt* is calculated as (*Dxt*/*Ext*), where *Dxt* denotes the number of deaths in a population at age *x*, *x* = 1, 2, · · · , *P*, and at time *t*, *t* = 1, 2, · · · , *T*, and *Ext* is the exposure to the risk of death. To ensure the identifiably of model parameters, constraints are imposed such that the sum of *β<sup>x</sup>* over age is 1 and the sum of *k<sup>t</sup>* over time is 0. To forecast mortality rates into the future, a simple random walk with drift is proposed for *k<sup>t</sup>* :

$$k\_t = k\_{t-1} + \theta + w\_t.$$

**Citation:** Awad, Yaser, Shaul K. Bar-Lev, and Udi Makov. 2022. A New Class of Counting Distributions Embedded in the Lee–Carter Model for Mortality Projections: A Bayesian Approach. *Risks* 10: 111. https:// doi.org/10.3390/risks10060111

Academic Editors: Ermanno Pitacco and Annamaria Olivieri

Received: 24 March 2022 Accepted: 18 May 2022 Published: 27 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The homoscedastic error assumption of the Lee–Carter model was criticized for its limiting impact on predictions (Brouhns et al. 2002; Danesi et al. 2015; Idrizi 2018). This led to the introduction of the Poisson log-bilinear LC-type model (Brouhns et al. 2002), which, in contrast, is intrinsically heteroscedastic, namely:

$$D\_{\rm xt} \sim Poisson(\mu\_{\rm xt})\_{\prime} \mu\_{\rm xt} = E\_{\rm xt} m\_{\rm xt}.$$

Here, the number of dead is directly modelled by a Poisson distribution, whose parameter is estimated by maximum likelihood estimation (MLE). This alternative approach gained momentum and alternative discrete distributions were proposed. In particular, a binomial distribution was proposed by Wang and Lu (2005) and a negative binomial distribution was suggested by Delwarde et al. (2007) and Renshaw and Haberman (2008). See Azman and Pathmanathan (2022) for further discussion of these distributions within the GLM framework.

The Lee–Carter model has been widely used for many purposes (Shair et al. 2018), such as forecasting mortality reduction factors, assessing the adequacy of retirement income, population projections and the projection of mortality trends for the oldest-old (older than 80, and in some sources 85). This age group is of considerable interest for policymakers as it is destined to grow as a proportion of the entire population and can outstrip existing infrastructures' capacity (Buettner 2002). This is a fairly recent phenomenon. In Canada, for instance (Legare et al. 2015), the 21st century brought about the most significant gain in life expectancy at age 85 (7.79% for women and 9.93% for men). Clearly, policies need to be devised that can meet people's special needs in what is called the fourth age Baltes and Smith (2003), and accurate mortality projection for this age group is a must. We shall, therefore, focus on the adequacy of potential underlying discrete distribution functions to produce accurate mortality projections using the Lee–Carter model for this age group. This will be discussed in Section 4.

In essence, while the above-cited papers relied on popular and commonly used discrete distributions, one cannot say that one particular distribution is universally superior. Indeed, it is entirely plausible to assume that there is a winning distribution for any given population or even for a specific population at a certain age range. Ideally, one should consider a rich class of family of counting distributions, much richer than the two already suggested, and use the data to pick the most suitable distribution for the population under study. This paper proposes an infinitely countable set of families of counting distributions, where the Poisson, negative binomial and Abel families of distributions are special cases. Our aim is to study this family, incorporate it into the framework of the LC model and use real data to seek the most suitable distribution for mortality projection. While there is little doubt that the distributions discussed above could prove adequate for specific populations or age groups, other distributions within the suggested family could have the upper hand.

The paper is organized as follows. Section 2 presents the new class of counting distributions. Section 3 is devoted to the new class and its Bayesian framework. Section 4 (divided into Section 4.1: Methods and Section 4.2: Results) reports a numerical study in which superior members of this class are chosen for mortality projections of the oldest-old in three populations. Finally, Section 5 offers a discussion.

#### **2. A New Class of Counting Distributions on the Set of Nonnegative Integers** N**<sup>0</sup>**

The new class of families of counting distributions on the non-negative integers belongs to a wider class of natural exponential families (NEFs), characterized by their variance functions (VFs). In order to comprehend this class we decompose this section into subsections. We first present some preliminaries on NEFs and their associated VFs. We then introduce a class of NEFs having polynomial structure and then suggest the new class of families of counting distributions, named ABM, first introduced by Awad et al. (2016), where the class was defined and its usefulness for mortality projections was preliminary sketched. Furthermore, such a class has been investigated by Bar-Lev and Ridder (2021a, 2021b) from

a classical frequency approach and has shown superiority with respect to various metrics or goodness-of-fit tests for different count datasets (for further details see item 6 in Section 2.3).

#### *2.1. NEFs—Some Preliminaries*

The following preliminaries are mainly taken from Letac and Mora (1990) and are briefly presented here for completeness.

Let *ν* be a non-Dirac positive Radon measure on R, and *L*(*θ*) = R *e <sup>θ</sup><sup>x</sup> ν*(*dx*) its Laplace transform. Assuming that Θ = int{*θ* ∈ R : *L*(*θ*) < ∞} 6= *φ*, then the NEF generated by *ν* is defined by the probability distributions

$$\mathcal{F} = \left\{ F\_{\theta} : F\_{\theta}(\nu(d\mathbf{x})) = e^{\theta \mathbf{x} - \mathbf{x}(\theta)} \,\nu(d\mathbf{x}), \,\theta \in \Theta \right\},\tag{1}$$

where *κ*(*θ*) = log *L*(*θ*), the cumulant transform of *ν*, is strictly convex and real analytic on Θ. If *X<sup>θ</sup>* represents a r.v. having distribution *F<sup>θ</sup>* of the form given in (1) then the expectation and variance of *X<sup>θ</sup>* are given, respectively, by *E*(*X<sup>θ</sup>* ) .<sup>=</sup> *<sup>m</sup>* <sup>=</sup> *<sup>κ</sup>* 0 (*θ*) and *V*(*X<sup>θ</sup>* ) = *κ* 00(*θ*) where *m* = *κ* 0 (*θ*) is strictly monotone and thus its inverse, say, *θ* = *ψ*(*m*), *m* ∈ *M* = *κ* 0 (Θ) is well defined. The set *M* of all means of (1) is called the mean parameter space of F. The variance of *F<sup>θ</sup>* can be expressed in terms of *m* by *V*(*m*) = *κ* 00(*θ*) = *κ* 00(*ψ*(*m*)). The pair (*V*, *M*) is called the VF of F and it uniquely determines F within the class of NEFs. For example, (*m*, R+) and (*m*<sup>2</sup> , R+) are, respectively, the VFs of the Poisson and exponential NEFs and are uniquely determined by them.

#### *2.2. The Mean Value Parametrization of NEFs*

As indicated above, the VF of an NEF F uniquely determines *F* within the class of NEFs. Let (*V*, *M*) be a given VF of an NEF F generated by *ν*. Then, simple calculations show both *θ* = *ψ*(*m*) and the cumulant transform *κ*(*θ*) = *κ*(*ψ*(*m*)) of *ν* can be expressed in terms of *m* as:

$$\theta = \psi(m) = \int \frac{dm}{V(m)} + c\_{1\prime} \ k(\theta) = k(\psi(m)) = \int \frac{m}{V(m)} dm + c\_{2\prime} \tag{2}$$

where one needs to determine the constants *c*<sup>1</sup> and *c*<sup>2</sup> so that *F<sup>θ</sup>* , *θ* ∈ Θ, constitutes a probability distribution (not an easy task). Accordingly, a mean value parametrization of an NEF F generated by a measure *ν* is given by:

$$\mathcal{F} = \{ \exp\{\psi(m)\mathbf{x} - k(\psi(m))\} \boldsymbol{\nu}(d\mathbf{x}), \ \boldsymbol{m} \in M \}. \tag{3}$$

Such a representation of F is more natural as it is expressed in terms of the mean *m* rather than a somewhat artificial parameter *θ*. A comprehensive description of NEFs in terms of their mean value representation is reviewed in Bar-Lev and Kokonendji (2017).

**Remark 1.** *The task of computing the constants c*<sup>1</sup> *and c*<sup>2</sup> *is not simple and might be rather cumbersome. However, from a Bayesian perspective, when (3) is used as a prior distribution on m, then in the calculation of the respective posterior distribution, such constants are cancelled out (as the likelihood function is the only relevant component). As this paper is concerned with a Bayesian framework, one can assume without any loss of generality that c*<sup>1</sup> = *c*<sup>2</sup> = 0*. Henceforth, we indeed assume so.*

#### *2.3. Polynomial VFs of Counting NEFs Supported on the Set of Nonnegative Integers* N<sup>0</sup>

The innovative and breakthrough Proposition 4.4 of Letac and Mora (1990, p. 13) provided conditions under which a given VF (*V*, *M*) is associated with a counting NEF *F* supported on the set of non-negative integers N0, i.e., where all members of *F* are composed of counting distributions on N0. They provided general examples of two classes of VFs which fulfill the premises of their Proposition 4.4 and thus their associated NEFs' distributions are supported on the non-negative integers. One of these two classes has the form:

$$V(m) = m \prod\_{i=1}^{k} \left( 1 + \frac{m}{p\_i} \right), p\_i > 0, i = 1, \dots, k, k \in \mathbb{N}\_0, M = \mathbb{R}^+, \text{where } \prod\_{i=1}^{0} \doteq 1. \tag{4}$$

They proved that such VFs constitute counting NEFs supported on N0, namely, counting distributions with non-negative integer support. Moreover, their Proposition 4.4 enables to compute (at least theoretically and numerically) the corresponding measure *ν* (we skip details as they are irrelevant for our Bayesian framework analysis). Note that the two special cases of (4) with *k* = 0 and *k* = 1 correspond, respectively, to the Poisson and negative binomial NEFs. However, the general setting (4) for *k* ≥ 3 does not allow an explicit calculation of *θ* = *ψ*(*m*) and *k*(*θ*) = *k*(*ψ*(*m*)) in (2), implying that the mean value parametrization of the corresponding NEFs in the form (3) is not explicitly expressible in terms of *m* and thus becomes useless for any practical consideration.

#### *2.4. A New Class of Polynomial VFs—The ABM NEFs*

As we already noted, the fact that a given pair (*V*, *M*) is known to be a VF of some NEF does not necessarily enable the construction of the corresponding mean value parameterization (3), as in most cases the integrals for *ψ*(*m*) and *k*(*ψ*(*m*)) in (2) are not explicitly expressible analytically in closed forms, and indeed, this is the situation for the class (4) in its general form. Consequently, one needs to search for subclasses of (4) for which the integrals in (2) can be computed explicitly. One such special subclass takes the above point into consideration. Indeed, by taking in (4) the special case where

$$p\_1 = p\_2 = \dots = p\_{k'}$$

and denoting

$$p\_2 \doteq k \in \mathbb{N}\_{0\star}$$

we obtain a subclass of (4) with VFs with the form:

$$m(V,M) = m\left(1 + \frac{m}{p\_1}\right)^{p\_2}, \mathbb{R}^+\text{), } p\_1 > 0, p\_2 \in \mathbb{N}\_0. \tag{5}$$

As (5) is a subclass of (4) and (4) satisfies the premises of Proposition 4.4 of Letac and Mora (1990) it follows that the subclass (5) are VFs associated with counting NEFs supported on the non-negative integers.

The subclass of VFs in (5) (hereafter called the ABM class) was first introduced by Awad et al. (2016) who showed that the corresponding *ψ*(*m*) and *k*(*ψ*(*m*)) (calculated from (2)) have, as opposed to the general form in (4), the following closed forms (the exact proof details appear in Bar-Lev and Kokonendji 2017):

$$\theta = \psi(m) = \ln \frac{m}{p\_1 + m} + \sum\_{i=1}^{p\_2 - 1} \frac{1}{i} \frac{p\_1^i}{(p\_1 + m)^i} + c\_{1\prime} \text{ where } \sum\_{i=1}^0 = 0, \text{ and }$$

and

$$\kappa(\psi(m)) = -\frac{p\_1^{p\_2}}{(p\_2 - 1)(m + p\_1)^{p\_2 - 1}} + c\_2.$$

Thus, its mean value parametrization is given by the probability distribution:

$$\begin{split} F(m, \nu(d\mathbf{x})) &= \exp\left\{ \mathbf{x} \left[ \ln \frac{m}{p\_1 + m} + \sum\_{i=1}^{p\_2 - 1} \frac{1}{i} \frac{p\_1^i}{(p\_1 + m)^i} + c\_1 \right] \right. \\ &\left. \left. + \frac{p\_1^{p\_2}}{(p\_2 - 1)(m + p\_1)^{p\_2 - 1}} + c\_2 \right\} / m \in \mathbb{R}^+, p\_1 > 0, p\_2 \in \mathbb{N} \end{split} \tag{6}$$

where hereafter we denote this probability distribution by *ABM*(*p*1, *p*2), where *p*<sup>1</sup> is a positive real number and *p*<sup>2</sup> is a non-negative integer. (For a classical frequency approach, the constants *c*<sup>1</sup> and *c*<sup>2</sup> have been computed by Bar-Lev and Ridder 2021b). However, as noted above, for a Bayesian framework they are cancelled out when computing the posterior distribution and thus can be taken to be *c*<sup>1</sup> = *c*<sup>2</sup> = 0 without any loss of generality).

Note that the ABM class of VFs <sup>n</sup> *m* 1 + *<sup>m</sup> p*1 *p*<sup>2</sup> , *p*<sup>1</sup> > 0 o *p*2∈N0 , or alternatively, the corresponding class {*ABM*(*p*1, *<sup>p</sup>*2)}*p*2∈N<sup>0</sup> of NEFs is composed of an infinitely countable set of families of counting NEFs supported on the non-negative integers. As special cases, this class contains the Poisson NEF (*p*<sup>2</sup> = 0), the negative binomial NEF (*p*<sup>2</sup> = 1) and the Abel NEF (*p*<sup>2</sup> = 2), (c.f., Letac and Mora 1990, p. 31; Bar-Lev and Ridder 2019, for applications to car accident claims of a Swedish insurance company dataset).

Summarizing, this ABM NEF has the following features:


#### **3. ABM Based LC Model and its Bayesian Framework**

As an alternative to parameter estimation via the singular value decomposition used in the classical LC model or the MLE in the cases discussed above, we adopt the Bayesian approach which offers advantages succinctly expressed in Antonio et al. (2015): a. The calibration and forecast steps are combined, which leads to more consistent estimates of the period effects; b. The Bayesian approach provides a natural framework for incorporating parameter uncertainty in mortality forecasts, which is relevant—for example—in the new insurance regulatory framework of Solvency II. The Bayesian approach allows adequate handling of small populations and missing data. Like Czado et al. (2005) and Pedroza (2006), we harness the power of the Markov Chain Monte Carlo (MCMC) methodology to estimate the model parameters and execute mortality projection. We note that the interest in Bayesian solutions in the context of mortality projections has recently gained momentum (Ellison et al. 2020; Graziani 2020; Hilton et al. 2019; Hunt and Blake 2020; Kogure et al. 2019; Liu et al. 2020; Njenga and Sherris 2020; Wong et al. 2018).

Suppose the number of deaths *Dxt* in a population at age *x* and time *t* is distributed as follow:

$$D\_{\rm xt} \sim ABM(p\_1, p\_2)(\mu\_{\rm xt})\_\prime \text{ } \mu\_{\rm xt} = E\_{\rm xt}m\_{\rm xt} \text{ } m\_{\rm xt} = \exp(\alpha\_\mathbf{x} + \beta\_\mathbf{x}k\_l)\_\prime$$

where

$$\mathfrak{a} = (\mathfrak{a}\_{\mathfrak{x}\_{\min}} \cdots \mathfrak{,} \mathfrak{a}\_{\mathfrak{x}\_{\max}})', \mathfrak{f} = (\mathfrak{z}\_{\mathfrak{x}\_{\min}} \cdots \mathfrak{,} \mathfrak{z}\_{\mathfrak{x}\_{\max}})', \mathfrak{k} = (k\_{l\_{\min}} \cdots \mathfrak{,} k\_{l\_{\max}}).$$

Bayesian estimation of the unknown parameters *α*, *β*, *k* and *p*<sup>1</sup> are based on the joint posterior distribution function of *α*, *β*, *k* and *p*<sup>1</sup> given (*Ext*, *Dxt*), when *x* = *x*min, *x*min + 1, · · · , *x*max, *t* = *t*min, *t*min + 1, · · · , *t*max. The first step in the Bayesian estimation is to determine the prior probability functions for these parameters.

#### **The prior distribution for** *k<sup>t</sup>* **and** *θ*

Let *k<sup>t</sup>* = *kt*−<sup>1</sup> + *θ* + *w<sup>t</sup>* , and let *w<sup>t</sup>* ∼ *N*(0, *σ* 2 *<sup>w</sup>*) and hence *k<sup>t</sup>* ∼ *N*(*θ*, *tσ* 2 *w*). we assume *σ* −2 *<sup>w</sup>* ∼ *gamma*(*a<sup>k</sup>* , *b<sup>k</sup>* ) and *θ* ∼ *N*(*θ*0, *σ* 2 *θ* ). The hyper-parameters *θ*0, *a<sup>k</sup>* , *b<sup>k</sup>* and *σ* 2 *θ* are arbitrary initial values.

#### **The prior distribution for** *β<sup>x</sup>*

We assume *β<sup>x</sup>* ∼ *N*(0, *σ* 2 *β* ) ∀ *x*, where *σ* −2 *<sup>β</sup>* ∼ *gamma*(*aβ*, *bβ*). The hyper-parameters *aβ*, *b<sup>β</sup>* are arbitrary initial values.

#### **The prior distribution for** *α<sup>x</sup>*

We suppose that the prior distribution of *α<sup>x</sup>* ∼ *N*(*α*0*x*, *σ* 2 *α* ) ∀ *x*, where *σ* −2 *<sup>α</sup>* ∼ *gamma*(*aα*, *bα*). The hyper-parameters *α*0*x*, *a<sup>α</sup>* and *b<sup>α</sup>* are arbitrary initial values.

#### **The prior distribution for** *p***<sup>1</sup>**

We let *p*<sup>1</sup> ∼ *gamma*(*ap*<sup>1</sup> , *bp*<sup>1</sup> ). The hyper-parameters *ap*<sup>1</sup> , *bp*<sup>1</sup> are arbitrary initial values.

#### *MH (Metropolis–Hastings) Algorithm for Estimating the Parameters α*, *β*, *k and p*<sup>1</sup>

Suppose the *Dxt* 0 *s* are independent random variables, which are distributed as (6) and *g*(Ξ) is the joint prior distribution of the unknown parameters Ξ = (*α*, *β*, *k*,*p*1) 0 . Then, the posterior distribution of Ξ, given all available data *D* = {*dxt*} and *p*2, can be represented as follows:

$$\begin{split} &f(\boldsymbol{\Xi} \mid D, p\_2) \propto \\ &\prod\_{\boldsymbol{\Pi}} \prod\_{\boldsymbol{t}} \exp\left[d\_{\boldsymbol{\mathcal{H}}} \left(\ln \frac{\boldsymbol{E}\_{\text{tr}} m\_{\boldsymbol{\mathcal{H}}}}{p\_{\boldsymbol{1}} + \boldsymbol{E}\_{\text{tr}} m\_{\boldsymbol{\mathcal{H}}}} + \sum\_{\boldsymbol{t}=1}^{p\_2 - 1} \frac{1}{i} \frac{p\_{\boldsymbol{1}}^{i}}{(p\_{\boldsymbol{1}} + \boldsymbol{E}\_{\text{tr}} m\_{\boldsymbol{\mathcal{H}}})} \right) + \frac{p\_1^{p\_2}}{(p\_2 - 1)(p\_{\boldsymbol{1}} + \boldsymbol{E}\_{\text{tr}} m\_{\boldsymbol{\mathcal{H}}})^{p\_2 - 1}} \right] \times g(\boldsymbol{\Xi}). \end{split}$$

See Appendix A for the marginal posterior distributions of *α*, *β*, *k* and *p*1. We now describe the estimation of *α*, *β*, *k* and *p*<sup>1</sup> using the MH, conditioned on the data and all other parameters at their respective iterations. The superscript denotes the iteration number of the parameter of interest.

#### **Estimation of** *k<sup>t</sup>* **using the MH algorithm**

Let the marginal posterior distribution of *k<sup>t</sup>* be *f*(*k<sup>t</sup>* | *D*, *α*, *β*, *k*−*<sup>t</sup>* , *θ*, *σ* 2 *α* , *σ* 2 *β* , *σ* 2 *<sup>w</sup>*, *p*1). The estimation of *k<sup>t</sup>* , is achieved by the following steps, where.


$$\Psi\left(k\_t^{(i)},k\_t^\*\right) = \min\left(1, \frac{f(k\_t^\* \mid D, \mathfrak{a}, \mathfrak{b}, k\_{-t}^{(i)}, \theta, \sigma\_{\mathfrak{a}\prime}^2, \sigma\_{\mathfrak{b}\prime}^2, \sigma\_{\mathfrak{w}\prime}^2, p\_1)}{f(k\_t^{(i)} \mid D, \mathfrak{a}, \mathfrak{b}, k\_{-t}^{(i)}, \theta, \sigma\_{\mathfrak{a}\prime}^2, \sigma\_{\mathfrak{b}\prime}^2, \sigma\_{\mathfrak{w}\prime}^2, p\_1)}\right).$$

where *k*−*<sup>t</sup>* = (*k<sup>t</sup>* min, . . . , *kt*−1, *kt*+1, . . . .*k<sup>t</sup>* max) 0 ;

3. Draw a value *u* from uniform probability function in range *U*(0, 1) and decide in accordance with the following formula:

$$\left\{ \begin{array}{l} if \ \mu \ \leq \ \Psi \left( k\_{t}^{(i)}, k\_{t}^{\*} \right) \ \text{then} \ k\_{t}^{(i+1)} = k\_{t}^{\*} \\\ if \ \mu \ > \ \Psi \left( k\_{t}^{(i)}, k\_{t}^{\*} \right) \ \text{then} \ k\_{t}^{(i+1)} = k\_{t}^{(i)}; \ \end{array} \right\}.$$

4. Going over all values of *t*, we have:

$$k^{(i+1)} = \left( k^{(i+1)}\_{t\_{\min}}, \dots, k^{(i+1)}\_{t}, k^{(i)}\_{t+1}, \dots, k^{(i)}\_{t\_{\max}} \right)';$$

5. Transforming *k* (*i*+1) and *α* (*i*) to assure identifiably:

$$k^{(i+1)} - \overline{k} \to k^{(i+1)}, \mathfrak{a}^{(i)} + \mathcal{B}^{(i)}\overline{k} \to \mathfrak{a}^{(i)}, \mathfrak{b}$$

where

$$\overline{k} = \frac{1}{T} \left( \sum\_{j \le t} k\_j^{(i+1)} + \sum\_{j > t} k\_j^{(i)} \right);$$

6. Repeat steps 1 to 5.

#### **Estimation of** *β<sup>x</sup>* **using MH algorithm**

Let the marginal posterior distribution of *β<sup>x</sup>* be *f*(*β<sup>x</sup>* | *D*, *α*, *β*−*x*, *k*, *θ*, *σ* 2 *α* , *σ* 2 *β* , *σ* 2 *<sup>w</sup>*, *p*1). The estimation of *β<sup>x</sup>* is achieved by the following steps.


$$\Psi\left(\boldsymbol{\beta}\_{\boldsymbol{x}}^{(i)},\boldsymbol{\beta}\_{\boldsymbol{x}}^{\*}\right) = \min\left(1,\frac{f(\boldsymbol{\beta}\_{\boldsymbol{x}}^{\*}\mid D,\boldsymbol{a},\boldsymbol{\beta}\_{-\boldsymbol{x}}^{(i)},\boldsymbol{k},\boldsymbol{\theta},\sigma\_{\boldsymbol{a}\prime}^{2},\sigma\_{\boldsymbol{\beta}\prime}^{2},\sigma\_{\boldsymbol{w}\prime}^{2},p\_{1})}{f(\boldsymbol{\beta}\_{\boldsymbol{x}}^{(i)}\mid D,\boldsymbol{a},\boldsymbol{\theta}\_{-\boldsymbol{x}\prime}^{(i)},\boldsymbol{k},\boldsymbol{\theta},\sigma\_{\boldsymbol{a}\prime}^{2},\sigma\_{\boldsymbol{\beta}\prime}^{2},\sigma\_{\boldsymbol{w}\prime}^{2},p\_{1})}\right).$$

where

$$
\boldsymbol{\beta}\_{-\boldsymbol{\upalpha}} = (\beta\_{\text{x } \text{min}} \dots \beta\_{\text{x}-1} \beta\_{\text{x}+1} \dots \beta\_{\text{x} \text{max}})^{\prime} .
$$

3. Draw a value *u* from uniform probability function in range *U*(0, 1) and decide in accordance with the following formula:

$$\left\{ \begin{array}{l} if \ u \ \leq \ \Psi \left( \mathcal{S}\_{\boldsymbol{x}}^{(i)}, \mathcal{B}\_{\boldsymbol{x}}^{\*} \right) \; then \ \mathcal{S}\_{\boldsymbol{x}}^{(i+1)} = \mathcal{B}\_{\boldsymbol{x}}^{\*} \\\ if \ u \ > \ \Psi \left( \mathcal{S}\_{\boldsymbol{x}}^{(i)}, \mathcal{B}\_{\boldsymbol{x}}^{\*} \right) \; then \ \mathcal{S}\_{\boldsymbol{x}}^{(i+1)} = \mathcal{B}\_{\boldsymbol{x}}^{(i)}; \ \end{array} \right\}.$$

4. Going over all values of *x*, we have:

$$\boldsymbol{\beta}^{(i+1)} = \left( \boldsymbol{\beta}^{(i+1)}\_{\mathbf{x}\_{\min}}, \dots, \boldsymbol{\beta}^{(i+1)}\_{\mathbf{x}}, \boldsymbol{\beta}^{(i)}\_{\mathbf{x}+1'}, \dots, \boldsymbol{\beta}^{(i)}\_{\mathbf{x}\_{\max}} \right);' \dots$$

5. Transforming *k* (*i*+1) and *β* (*i*+1) to assure identifiably:

$$\frac{\mathcal{S}^{(i+1)}}{\mathcal{S}\_{sum}} \to \mathcal{S}^{(i+1)},\\k^{(i+1)} \times \mathcal{S}\_{sum} \to k^{(i+1)}.$$

where

$$\beta\_{sum} = \left(\sum\_{j \le x} \beta\_j^{(i+1)} + \sum\_{j > x} \beta\_j^{(i)}\right);$$

6. Repeat steps 1 to 5.

#### **Estimation of** *α<sup>x</sup>* **using MH algorithm**

Let the marginal posterior distribution of *α<sup>x</sup>* be *f*(*α<sup>x</sup>* | *D*, *α*−*x*, *β*, *k*, *θ*, *σ* 2 *α* , *σ* 2 *β* , *σ* 2 *<sup>w</sup>*, *p*1). The estimation of *αx*, is achieved by the following steps;


$$\Psi\left(\mathfrak{a}\_{\mathbf{x}}^{(i)},\mathfrak{a}\_{\mathbf{x}}^{\*}\right) = \min\left(1,\frac{f(\mathfrak{a}\_{\mathbf{x}}^{\*} \mid D, \mathfrak{a}\_{-\mathbf{x}'}^{(i)}\mathfrak{f}, k, \theta, \sigma\_{\mathfrak{a}}^{2}, \sigma\_{\mathfrak{b}'}^{2}, \sigma\_{\mathfrak{w}'}^{2} p\_{1})}{f(\mathfrak{a}\_{\mathbf{x}}^{(i)} \mid D, \mathfrak{a}\_{-\mathbf{x}'}^{(i)}\mathfrak{f}, k, \theta, \sigma\_{\mathfrak{a}'}^{2}, \sigma\_{\mathfrak{b}'}^{2}, \sigma\_{\mathfrak{w}'}^{2} p\_{1})}\right).$$

where

$$\mathfrak{a}\_{-t} = (\mathfrak{a}\_{\mathfrak{x}\min\nu} \dots \mathfrak{a}\_{\mathfrak{x}-1\prime} \mathfrak{a}\_{\mathfrak{x}+1\prime} \dots \mathfrak{a}\_{\mathfrak{x}\max})^{\prime};$$

3. Draw a value *u* from uniform probability function in range *U*(0, 1) and decide in accordance with the following formula:

$$\left\{ \begin{array}{l} if \ u \ \leq \ \Psi \left( \mathfrak{a}\_{\mathcal{X}}^{(i)}, \mathfrak{a}\_{\mathcal{X}}^{\*} \right) \; then \ \mathfrak{a}\_{\mathcal{X}}^{(i+1)} = \mathfrak{a}\_{\mathcal{X}}^{\*} \\\ if \ u \ > \ \Psi \left( \mathfrak{a}\_{\mathcal{X}}^{(i)}, \mathfrak{a}\_{\mathcal{X}}^{\*} \right) \; then \ \mathfrak{a}\_{\mathcal{X}}^{(i+1)} = \mathfrak{a}\_{\mathcal{X}}^{(i)}; \end{array} \right\}.$$

4. Receiving *α* (*i*+1) in (*i* + 1)th iteration as follows:

$$\mathfrak{a}\_{\mathbf{x}}^{(i+1)} = \left( \mathfrak{a}\_{\mathbf{x}\_{\min}}^{(i+1)}, \dots, \mathfrak{a}\_{\mathbf{x}}^{(i+1)}, \mathfrak{a}\_{\mathbf{x}+1'}^{(i)}, \dots, \mathfrak{a}\_{\mathbf{x}\_{\max}}^{(i)} \right);$$

5. Repeat steps 1 to 4.

#### **Estimation of** *p***<sup>1</sup> using MH algorithm**

Let the marginal posterior distribution of *p*<sup>1</sup> be *f*(*p*<sup>1</sup> | *D*, *α*, *β*, *k*, *θ*, *σ* 2 *α* , *σ* 2 *β* , *σ* 2 *w*), proportional to the product of the likelihood (6) and the gamma prior distribution of *p*1. The estimation of *p*<sup>1</sup> is achieved by the following steps;


$$\Psi\left(p\_1^{(i)}, p\_1^\*\right) = \min\left(1, \frac{f(p\_1^\* \mid D, \mathfrak{a}, \mathfrak{b}, k, \theta, \sigma\_{\mathfrak{a}'}^2 \sigma\_{\mathfrak{b}'}^2 \sigma\_w^2)}{f(p\_1^{(i)} \mid D, \mathfrak{a}, \mathfrak{b}, k, \theta, \sigma\_{\mathfrak{a}'}^2 \sigma\_{\mathfrak{b}'}^2 \sigma\_w^2)}\right);$$

3. Draw a value *u* from uniform probability function in range *U*(0, 1) and decide in accordance with the following formula:

$$\left\{ \begin{array}{l} if \ \mu \ \leq \ \Psi \left( p\_1^{(i)}, p\_1^\* \right) \ \text{then} \ p\_1^{(i+1)} = p\_1^\* \\\ if \ \mu > \ \Psi \left( p\_1^{(i)}, p\_1^\* \right) \ \text{then} \ p\_1^{(i+1)} = p\_1^{(i)}; \end{array} \right\}.$$


#### **Estimation of** *θ***,** *σ* **2** *α* **,** *σ* **2** *β* **and** *σ* **2** *<sup>w</sup>* **using the Gibbs sampler**

The Gibbs sampler can be used for estimating *θ*, *σ* 2 *α* , *σ* 2 *β* and *σ* 2 *<sup>w</sup>* since the marginal posterior distribution of these parameters can be written explicitly (See: Czado et al. 2005). The following are the marginal posterior sampling distributions of each of these parameters, conditioned on the data and all other parameters at their respective iterations.

1. Sampling *θ*:

The posterior probability function of the parameter *θ*, is presented as follows:

$$f(\theta \mid D, \mathfrak{a}, \mathfrak{f}, \mathfrak{k}, \sigma\_{\mathfrak{a}'}^2 \sigma\_{\mathfrak{f}'}^2 \sigma\_{\mathfrak{w}'}^2 p\_1) = f(\theta \mid \mathfrak{k}, \sigma\_{\mathfrak{a}'}^2 \sigma\_{\mathfrak{f}'}^2 \sigma\_{\mathfrak{w}'}^2 p\_1) \approx f(\mathsf{k} \mid \theta, \sigma\_{\mathfrak{w}}^2) f(\theta).$$

The prior probability function of the parameter *θ* is *N*(*θ*0, *σ* 2 *θ* ), and the hyper-parameters *θ*<sup>0</sup> and *σ* 2 *θ* are set by the user, hence the posterior probability function of the parameter is:

$$N(\theta \mid k, \sigma\_{\mathfrak{u}'}^2, \sigma\_{\beta'}^2, \sigma\_w^2) \sim N\left(\frac{\theta\_0 \sigma\_w^2}{T \sigma\_\theta^2 + \sigma\_w^2}, \left(T \sigma\_\theta^2 + \sigma\_w^2\right)^{-1}\right).$$

2. Sampling *σ* 2 *α* :

> The posterior probability function of the parameter *σ* 2 *α* is presented as follows:

$$f(\sigma\_{\mathfrak{a}}^2 \mid D, \mathfrak{a}, \mathfrak{z}, \mathfrak{k}, \mathfrak{k}, \sigma\_{\mathfrak{z}}^2 \sigma\_{\mathfrak{w}\prime}^2 p\_1) \propto f(\mathfrak{a} \mid \sigma\_{\mathfrak{a}}^2) f(\sigma\_{\mathfrak{a}}^2).$$

The prior probability function of the parameter *σ* 2 *α* such that *σ* −2 *<sup>α</sup>* ∼ *gamma*(*aα*, *bα*), and the hyper-parameters *a<sup>α</sup>* and *b<sup>α</sup>* are set by the user, so the posterior probability function of the parameter is:

$$(\sigma\_{\mathfrak{a}}^{-2} \mid \mathsf{D}, a, \theta, k, \theta, \sigma\_{\mathfrak{b}}^{2}, \sigma\_{\mathfrak{w}}^{2}, p\_{1}) \sim \operatorname{gamma} \left( a\_{\mathfrak{a}} + \frac{\mathsf{x}\_{\text{max}}}{2}, b\_{\mathfrak{a}} + \frac{1}{2} \sum\_{\mathbf{x} = \mathbf{x}\_{\text{min}}}^{\mathsf{x}\_{\text{max}}} \left( a\_{\mathbf{x}} - \overline{a} \right)^{2} \right),$$

where

$$\overline{\mathfrak{a}} = \frac{1}{\mathfrak{x}\_{\text{max}}} \sum\_{\mathbf{x} = \mathfrak{x}\_{\text{min}}}^{\mathfrak{x}\_{\text{max}}} \mathfrak{a}\_{\mathbf{x}}.$$

3. Sampling *σ* 2 *β* :

> The posterior probability function of the parameter *σ* 2 *β* is presented as follows:

$$f(\sigma\_{\beta}^{2} \mid D, \mathfrak{a}, \mathfrak{b}, \mathfrak{k}, \mathfrak{b}, \mathfrak{o}, \sigma\_{\mathfrak{a}}^{2}, \sigma\_{\mathfrak{w}}^{2}, p\_{1}) \propto f(\beta \mid \sigma\_{\beta}^{2}) f(\sigma\_{\beta}^{2}) .$$

The prior probability function of the parameter *σ* 2 *β* such that *σ* −2 *<sup>β</sup>* ∼ *gamma*(*aβ*, *bβ*), and the hyper-parameters *a<sup>β</sup>* and *b<sup>β</sup>* are set by the user, so the posterior probability function of the parameter is:

$$(\sigma\_{\beta}^{-2} \mid \mathcal{D}, a, \beta, k, \theta, \sigma\_{u}^{2}, \sigma\_{w}^{2}, p\_{1}) \sim \operatorname{gamma} \left( a\_{\beta} + \frac{\mathbf{x}\_{\text{max}}}{2}, b\_{\beta} + \frac{1}{2} \sum\_{\mathbf{x} = \mathbf{x}\_{\text{min}}}^{\mathbf{x}\_{\text{max}}} (\beta\_{\mathbf{x}})^{2} \right).$$

4. Sampling *σ* 2 *w*:

> The posterior probability function of the parameter *σ* 2 *<sup>w</sup>*, is presented as follows:

$$f(\sigma\_w^2 \mid D, \alpha, \beta, k, \theta, \sigma\_\alpha^2, \sigma\_{\beta'}^2 p\_1) \propto f(k \mid \sigma\_w^2) f(\sigma\_w^2).$$

The prior probability function of the parameter *σ* 2 *<sup>w</sup>* such that *σ* −2 *<sup>w</sup>* ∼ *gamma*(*a<sup>k</sup>* , *b<sup>k</sup>* ), and the hyper-parameters *a<sup>k</sup>* and *b<sup>k</sup>* are set by the user, so the posterior probability function of the parameter is:

$$(\sigma\_w^{-2} \mid \mathcal{D}, a, \beta, k, \theta, \sigma\_u^2, \sigma\_{\hat{\theta}'}^2 p\_1) \sim \operatorname{gamma} \left( a\_k + \frac{T}{2}, b\_k + \frac{1}{2} \sum\_{t=t\_{\text{min}}}^{t\_{\text{max}}} (k\_t - k\_{t-1} - \theta)^2 \right).$$

#### **4. Numerical Experiment**

#### *4.1. Methods*

To test the adequacy of the ABM class, we analyzed mortality data of men in Ireland, Ukraine and the USA, downloaded from the database of Human Mortality Database (https://www.mortality.org accessed on 8 March 2013). The data contain the number of dead and the size of the population exposed to risk by age and year; Ireland's and the USA's data are for 1950–2007, and Ukraine's data are for 1959–2009. This analysis aims to examine sixteen models within the ABM class, *p*<sup>2</sup> = 0, . . . , 15, with a particular emphasis on forecasting the mortality of the oldest-old (see below an argumentation for the restriction of the values of *p*<sup>2</sup> to {0, . . . , 15}). These models also include the Poisson and negative binomial models which feature widely in the literature, for which *p*<sup>1</sup> = 0 and *p*<sup>2</sup> = 1, respectively. Adopting a data mining approach, the models were fitted using training sets and were examined using test sets. The training sets contained data up to 2000 and the test sets, aimed at monitoring the quality of predictions, contained data from 2001 to 2007 for Ireland and the USA, and from 2001 to 2009 for Ukraine. Predictions are carried out with the estimated parameters, ln[*mx*,*t*+*<sup>s</sup>* ] = <sup>∧</sup> *α<sup>x</sup>* + ∧ *β* ∗ ∧ *kt*+*<sup>s</sup>* ,*s* = 1, 2, . . . , where model performance (using the test sets) was checked using the root of the mean squared errors (RMSE), which was calculated in two ways:


For every member of the ABM class (controlled by *p*2), the Markov chains used to obtain posterior distributions/parameter estimates comprised 4000 iterations with the first 1000 considered a burn-in period. Convergence was established using graphical means and a sensitivity analysis ascertained that the choice of arbitrary initial hyper-parameters did not affect the final outcomes. We report the outcomes for the Poisson distribution (*p*<sup>2</sup> = 0) and the negative binomial distribution (*p*<sup>2</sup> = 1). In addition, we examined the ABM members for which *p*<sup>2</sup> ∈ {2, . . . , 15}. We limited our reporting to *p*<sup>2</sup> ∈ {2, . . . , 15} since, for the data under study, increasing *p*<sup>2</sup> beyond 15 (our study explored all models up to *p*<sup>2</sup> = 50) did not alter our findings of the optimal *p*<sup>2</sup> for varying ages and resulted in much larger RMSE than those found up to *p*<sup>2</sup> = 15. In practice, we analyzed all 16 models within the *p*<sup>2</sup> range and, for each, we estimated *p*<sup>1</sup> as well as all other unknown parameters. Finally, we reported graphically the RMSE for the Poisson, negative binomial and for the ABM member which produced the minimal RMSE for various ages.

#### *4.2. Results*

Figure 1a,b shows Ireland's mortality projections by age and cohort, respectively. A superior model for an age range is the one which produced the smallest RMSE. It is evident that the Poisson performs best for most ages above 70, with the negative binomial lagging behind. However, at a very old age, the Poisson diverges and the best ABM member to be chosen instead is *ABM*(·, *p*<sup>2</sup> = 3). A similar result is shown for the USA (Figure 2a,b), except that the best ABM for the very old is *ABM*(·, *p*<sup>2</sup> = 4). A different picture emerges when we focus on Ukraine's mortality projections by age and cohort (Figure 3a,b, respectively). While the Poisson and negative binomial perform well (with Poisson being better), for ages above 96 (by cohort) or 104 (by age), the negative binomial drifts away, leaving *ABM*(·, *p*<sup>2</sup> = 10) to be the winner of the ABM class. Clearly, the recommended projection policy for Ukraine is to use the Poisson for most ages but to rely on *ABM*(·, *p*<sup>2</sup> = 10) for very old ages. Naturally, other countries with their specific national datasets may yield different ABM models (i.e., different *p*2's) for mortality projections.

**Figure 1.** RMSE for projecting mortality rate, Ireland, 2001–2007. (**a**) by age; (**b**) by cohort.

**Figure 2.** RMSE for projecting mortality rate, USA, 2001–2007. (**a**) by age; (**b**) by cohort.

**Figure 3.** RMSE for projecting mortality rate, Ukraine, 2001–2009. (**a**) by age; (**b**) by cohort.

#### **5. Discussion**

Several extensions to the LC model assume that the number of deaths is distributed Poisson or negative binomial. These distributions have offered adequate mortality projections in several populations reported in the literature. It is not implausible that cases where

these two failed were not reported. Rather than deciding a priori to choose a particular distribution, we aimed to enrich the LC model by allowing a richer pool of candidate distributions. The chosen distribution would be the one providing the best projection for the population and age range under study. To achieve this goal, we proposed a new class of counting distributions on the non-negative integers, the ABM class, which belongs to a wider class of natural exponential families characterized by their variance functions. This class includes the Poisson and negative binomial distributions which are included in an infinitely countable set of additional members. A data mining approach was adopted whereby the model is fitted using a training set and tested using a test set with the RMSE used to pick the winning model. As an alternative to parameter estimation via the singular value decomposition (SVD) used in the classical LC model, we adopted Bayesian estimation, harnessing the Markov Chain Monte Carlo (MCMC) methodology. While we do not suggest that MCMC is superior to SVD (for a comparison of the two, see Ichikawa et al. 2021), we still promote the former since the Bayesian framework frees us from the burden of calculating the normalizing constant of the ABM. This is indeed a great plus, even though running the MCMC requires more computer time for the mid-size databases of the kind used for national mortality projections. We note, however, that the use of MCMC is rather costly and might fail if the dataset is huge. So, perhaps other Bayesian techniques such as Variational Bayes can be more helpful. However, employing such a suggestion is beyond the scope of this paper. We examined ABM models for three countries and established that, for the countries examined, the commonly adopted Poisson distribution is justified except for a very old age for which an alternative member of the ABM class offers better projection. We do not claim that to suggest a superior model. When deciding on an underlying model, one can adopt as an example the Poisson model or the negative binomial model. Rather than adopting a model, we suggest adopting a class of models (the ABM) comprising the Poisson, negative binomial, and numerous other counting distributions. The superiority of this approach lies in the ability to choose a model amongst candidate models. Since no one single model necessarily fits every population and every age group well, the ABM class could allow picking, as an example, the Poisson for members of the population aged under 50, the negative binomial for those aged 50 to 80, and another member of the class to the oldest-old. The suggested criteria for preferring one member of the class over another is the mean squared projection errors (RMSE). This advantage is gained at the cost of additional complexity, which is justified given the financial benefits associated with more accurate modeling. We conclude that it is no longer appropriate to assume a single distribution for the whole process of mortality projection. Instead, for every country and every relevant range of ages, a desirable approach is to pick a member of the ABM class that provides the best mortality projection. In the numerical study reported here, neither the Poisson nor the negative binomial distributions adequately serve the very oldest-old and superior alternatives are within reach in the suggested novel ABM class of distributions.

**Author Contributions:** Conceptualization, Y.A., S.K.B.-L. and U.M.; methodology, Y.A., S.K.B.-L. and U.M.; software, Y.A.; validation, Y.A., S.K.B.-L. and U.M.; formal analysis, Y.A., S.K.B.-L. and U.M.; investigation, Y.A., S.K.B.-L. and U.M.; resources, Y.A., S.K.B.-L. and U.M.; data curation, Y.A.and U.M.; writing—original draft preparation, Y.A., S.K.B.-L. and U.M.; writing—review and editing, Y.A., S.K.B.-L. and U.M.; visualization, Y.A., S.K.B.-L. and U.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. This data can be found here: the database of Human Mortality Database (https://www.mortality.org access on 8 March 2013).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

*Appendix A.1*

The marginal posterior probability function of the time index *k<sup>t</sup>* is

$$\begin{array}{c} f(k\_{l}\mid D, \mathsf{a}, \mathsf{\mathsf{e}}, \mathsf{\mathsf{k}}, k\_{-l}, \mathsf{\mathsf{e}}, \mathsf{\mathsf{e}}\_{\mathsf{a}}^{2}, \mathsf{\mathsf{e}}\_{\mathsf{\mathsf{e}}}^{2}, p\_{1}) \propto f(D\_{l\_{\min}} \mid k\_{l\_{\min}}, \mathsf{\mathsf{a}}, \mathsf{\mathsf{e}}, p\_{1}) \times f(k\_{l\_{\min}} \mid \mathsf{\mathsf{e}}, \mathsf{\mathsf{e}}\_{\mathsf{w}}^{2})) \\ \qquad \qquad \times \prod\_{j=l\_{\min}+1}^{l\_{\max}} f(D\_{j} \mid k\_{j}, \mathsf{a}, \mathsf{\mathsf{e}}, p\_{1}) \times f(k\_{j} \mid k\_{j-1}, \mathsf{\mathsf{e}}, \mathsf{\mathsf{e}}\_{\mathsf{w}}^{2}), \end{array}$$

where

$$f(D\_{l} \mid a, \beta, k\_{l}, p\_{1}) = \\\\\prod\_{\lambda \vdash x} \exp\left[d\_{\lambda t} \left(\ln \frac{E\_{\ge t} \mu\_{\rm xt}}{p\_{1} + E\_{\rm xt} \mu\_{\rm xt}} + \sum\_{i=1}^{p\_{2}-1} \frac{1}{i} \frac{p\_{1}^{i}}{(p\_{1} + E\_{\rm xt} \mu\_{\rm xt})} \right) + \frac{p\_{1}^{p\_{2}}}{(p\_{2}-1)(p\_{1} + E\_{\rm xt} \mu\_{\rm xt})^{p\_{2}-1}} \right].$$

For the remaining expressions we distinguish between three cases:

1. For *t* = *t*min, the marginal posterior probability function of the time index *k<sup>t</sup>* is: *f*(*k<sup>t</sup>* | *D*, *α*, *β*, *k*−*<sup>t</sup>* , *θ*, *σ* 2 *α* , *σ* 2 *β* , *σ* 2 *<sup>w</sup>*, *p*1) ∝ *f*(*D<sup>t</sup>* | *α*, *β*, *k<sup>t</sup>* , *p*1) × *f*(*k<sup>t</sup>* | *θ*, *σ* 2 *<sup>w</sup>*) × *f*(*kt*+<sup>1</sup> | *k<sup>t</sup>* , *θ*, *σ* 2 *<sup>w</sup>*), where

$$f(k\_t \mid \theta, \sigma\_w^2) = \exp\left(-\frac{1}{2\sigma\_w^2} (k\_t - \theta)^2\right) \text{ and}$$

$$f(k\_{t+1} \mid k\_t, \theta, \sigma\_w^2) = \exp\left(-\frac{1}{2\sigma\_w^2} (k\_t - k\_{t-1} - \theta)^2\right).$$

2. For *t*min < *t* < *t*max, the marginal posterior probability function of the time index *k<sup>t</sup>* is:

$$\begin{array}{c} f(k\_{\mathsf{f}} \mid D, \mathsf{a}, \mathsf{\theta}, k\_{-\mathsf{f}}, \mathsf{\theta}, \sigma^{2}\_{\mathsf{u}\prime} \sigma^{2}\_{\mathsf{\theta}\prime} \sigma^{2}\_{\mathsf{w}\prime} p\_{1}) \propto f(D\_{\mathsf{f}} \mid \mathsf{a}, \mathsf{\theta}, k\_{\mathsf{f}}, p\_{1}), \\ \qquad \qquad \qquad \qquad \times f(k\_{\mathsf{f}} \mid k\_{\mathsf{f}-1}, \mathsf{\theta}, \sigma^{2}\_{\mathsf{w}\prime}) \propto f(k\_{\mathsf{f}+1} \mid k\_{\mathsf{f}}, \mathsf{\theta}, \sigma^{2}\_{\mathsf{w}\prime}), \end{array}$$

where

$$f(k\_{\mathfrak{t}} \mid k\_{\mathfrak{t}-1}, \theta, \sigma\_w^2) = \exp\left(-\frac{1}{2\sigma\_w^2} (k\_{\mathfrak{t}} - k\_{\mathfrak{t}-1} - \theta)^2\right) \text{ and}$$

$$f(k\_{\mathfrak{t}+1} \mid k\_{\mathfrak{t}}, \theta, \sigma\_w^2) = \exp\left(-\frac{1}{2\sigma\_w^2} (k\_{\mathfrak{t}+1} - k\_{\mathfrak{t}} - \theta)^2\right).$$

3. For *t* = *t*max, the marginal posterior probability function of the time index *k<sup>t</sup>* is: *f*(*k<sup>t</sup>* | *D*, *α*, *β*, *k*−*<sup>t</sup>* , *θ*, *σ* 2 *α* , *σ* 2 *β* , *σ* 2 *<sup>w</sup>*, *p*1) ∝ *f*(*D<sup>t</sup>* | *α*, *β*, *k<sup>t</sup>* , *p*1) × *f*(*k<sup>t</sup>* | *kt*−1, *θ*, *σ* 2 *<sup>w</sup>*),

where

$$f(k\_t \mid k\_{t-1}, \theta\_\prime \sigma\_w^2) = \exp\left(-\frac{1}{2\sigma\_w^2}(k\_t - k\_{t-1} - \theta)^2\right).$$

*Appendix A.2*

The marginal posterior probability function of *β<sup>x</sup>* is

$$\begin{array}{c} f(\boldsymbol{\beta\_{\boldsymbol{x}}} \mid \boldsymbol{D}, \boldsymbol{a}, \boldsymbol{\beta\_{-\boldsymbol{x}}}, \boldsymbol{k}, \boldsymbol{\theta\_{\boldsymbol{a}}} \sigma\_{\boldsymbol{a}\prime}^{2} \sigma\_{\boldsymbol{\beta}\prime}^{2} \sigma\_{\boldsymbol{w}\prime}^{2} \boldsymbol{p\_{1}}) \\ \quad \propto \prod\_{j=\boldsymbol{x}\_{\min}}^{\boldsymbol{x}\_{\max}} f(\boldsymbol{D}\_{j} \mid \boldsymbol{a}, \boldsymbol{\beta\_{j}}, \boldsymbol{k}, \boldsymbol{p\_{1}}) \times f(\boldsymbol{\beta\_{j}}) \propto f(\boldsymbol{D}\_{\boldsymbol{x}} \mid \boldsymbol{a}, \boldsymbol{\beta\_{\boldsymbol{x}}}, \boldsymbol{k}, \boldsymbol{p\_{1}}) \times f(\boldsymbol{\beta\_{\boldsymbol{x}}}) \boldsymbol{k} \end{array}$$

where

$$\begin{aligned} f(D\_{\mathcal{X}} \mid \mathfrak{a}, \boldsymbol{\beta}\_{\mathcal{X}}, k, p\_1) &= \\ \prod\_{\mathcal{I}} \exp \left[ d\_{\mathcal{X}t} \left( \ln \frac{E\_{\mathcal{X}t\mu\_{\mathcal{X}t}}}{p\_1 + E\_{\mathcal{X}t\mu\_{\mathcal{X}t}}} + \sum\_{i=1}^{p\_2 - 1} \frac{1}{t} \frac{p\_1^i}{(p\_1 + E\_{\mathcal{X}t}\mu\_{\mathcal{X}t})} \right) + \frac{p\_1^{p\_2}}{(p\_2 - 1)(p\_1 + E\_{\mathcal{X}t}\mu\_{\mathcal{X}t})^{p\_2 - 1}} \right] \end{aligned}$$

and

$$f(\beta\_{\mathbf{x}}) = \exp\left(-\frac{1}{2\sigma\_{\beta}^2} (\beta\_{\mathbf{x}})^2\right).$$

*Appendix A.3*

The marginal posterior probability function of *α<sup>x</sup>* is

$$\begin{array}{c} f(\mathfrak{a}\_{\mathbf{x}} \mid \mathbf{D}, \mathfrak{a}\_{-\mathbf{x}\prime} \boldsymbol{\beta}, \mathbf{k}, \mathfrak{b}\_{\prime} \sigma\_{\mathbf{a}\prime}^{2} \sigma\_{\mathbf{b}\prime}^{2} \sigma\_{\mathbf{w}\prime}^{2} p\_{1}) \\ \quad \propto \prod\_{j=\mathbf{x}\_{\min}}^{\mathbf{x}\_{\max}} f(\mathbf{D}\_{j} \mid \mathfrak{a}\_{j}, \mathfrak{b}\_{\prime} \boldsymbol{\beta}, \mathbf{k}, p\_{1}) \times f(\mathfrak{a}\_{j}) \approx f(\mathbf{D}\_{\mathbf{x}} \mid \mathfrak{a}\_{\mathbf{x}\prime} \boldsymbol{\beta}, \mathbf{k}, p\_{1}) \times f(\mathfrak{a}\_{\mathbf{x}}), \end{array}$$

where

$$\frac{f(D\_{\mathbf{x}} \mid \mathbf{a}\_{\mathbf{x}}, \boldsymbol{\beta}, k, p\_1)}{\prod\_{t} \exp\left[d\_{\mathbf{x}t} \left(\ln \frac{E\_{\mathbf{x}t}\mu\_{\mathbf{x}t}}{p\_1 + E\_{\mathbf{x}t}\mu\_{\mathbf{x}t}} + \sum\_{i=1}^{p\_2 - 1} \frac{1}{i} \frac{p\_1^i}{(p\_1 + E\_{\mathbf{x}t}\mu\_{\mathbf{x}t})} \right) + \frac{p\_1^{p\_2}}{(p\_2 - 1)(p\_1 + E\_{\mathbf{x}t}\mu\_{\mathbf{x}t})^{p\_2 - 1}}\right]}$$

and

$$f(\beta\_{\mathbf{x}}) = \exp\left(-\frac{1}{2\sigma\_{\mathbf{a}}^2}(\alpha\_{\mathbf{x}} - \alpha\_{0\mathbf{x}})^2\right).$$

#### **References**

