A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model

Nie, Cong; Liu, Xiaoming; Provost, Serge; Ren, Jiandong

doi:10.3390/stats8030077

Open AccessArticle

A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model

Department of Statistical and Actuarial Sciences, The University of Western Ontario, London, ON N6A 3K7, Canada

^*

Author to whom correspondence should be addressed.

Stats 2025, 8(3), 77; https://doi.org/10.3390/stats8030077

Submission received: 24 July 2025 / Revised: 16 August 2025 / Accepted: 22 August 2025 / Published: 27 August 2025

Download

Browse Figures

Versions Notes

Abstract

The phase-type aging model (PTAM) belongs to a class of Coxian-type Markovian models that can provide a quantitative description of well-known aging characteristics that are part of a genetically determined, progressive, and irreversible process. Due to its unique parameter structure, estimation via the MLE method presents a considerable estimability issue, whereby profile likelihood functions are flat and analytically intractable. In this study, a Markov chain Monte Carlo (MCMC)-based Bayesian methodology is proposed and applied to the PTAM, with a view to improving parameter estimability. The proposed method provides two methodological extensions based on an existing MCMC inference method. First, we propose a two-level MCMC sampling scheme that makes the method applicable to situations where the posterior distributions do not assume simple forms after data augmentation. Secondly, an existing data augmentation technique for Bayesian inference on continuous phase-type distributions is further developed in order to incorporate left-truncated data. While numerical results indicate that the proposed methodology improves parameter estimability via sound prior distributions, this approach may also be utilized as a stand-alone statistical model-fitting technique.

Keywords:

parameter estimation; phase-type aging model; Markov chain Monte Carlo; Bayesian inference; data augmentation; Gibbs sampler; rejection sampling

1. Introduction

The phase-type aging model (PTAM) belongs to a class of Coxian Markovian models that were proposed in a previous study [1]. The purpose of the PTAM is to provide a quantitative description of well-known aging characteristics that are part of a genetically determined, progressive, and irreversible process. It provides a means of quantifying the heterogeneity in aging among individuals and of capturing the anti-selection effects.

Since the PTAM is nonlinear in its parameters, the parameter estimation turns out to be non-stable, which gives rise to an estimability issue (see [2]). The estimability issue of the PTAM originates from two complications. First, the structure of the PTAM, which is complicated by its matrix exponential form, makes it impossible to directly analyze the gradient and Hessian matrix of its likelihood function. In these instances, any hill-climbing optimization algorithms will be subject to the risk of becoming stuck in local maxima. The second aspect is that the structure of the PTAM gives rise to flat profile likelihood functions (see [2]). The estimability issue arising from flat profile likelihood functions is thoroughly discussed in [3]. Frequentists would potentially address the flat likelihood functions using regularization, which can be thought of as using a log prior density (see [4]).

To address both problems, a Bayesian approach is considered. Thus, the parameters are assumed to be random variables, which automatically eliminates the risk of being stuck in local maxima. This addresses the first problem. As for the second one, the Bayesian approach can improve parameter estimability by making use of sound prior information. Then, the posterior distributions will significantly depend on the prior distributions since the profile likelihood functions are flat and become nearly horizontal.

Moreover, there are convincing reasons for applying the Bayesian approach on the PTAM. In this context, it has previously been applied via the data augmentation Gibbs sampler, which consists of two iterative steps—a data augmentation step and a posterior sampling step (see [5]). The data augmentation step in relation to continuous phase-type distributions was thoroughly studied by the authors of [6], where an EM algorithm was proposed for estimating parameters of phase-type distributions. Based on the same data augmentation scheme, the authors of [7] considered Dirichlet and Gamma distributions as the conjugate prior distributions in the posterior sampling step, before developing an MCMC-based Bayesian method for continuous phase-type distributions. Later on, several studies were carried out regarding the data augmentation step in order to improve computational efficiency. The authors of [8] proposed the Exact Conditional Sampling (ECS) algorithm. Another efficient algorithm introduced by the authors of [9] involves uniformization and backward likelihood computation. However, these contributions all focus on the data augmentation step. In the context of the PTAM, the posterior sampling step also becomes more involved because of its parameter structure, since the posterior distributions are then no longer as simple as Dirichlet and Gamma distributions after data augmentation. This situation has not been studied before in the literature. Therefore, the first contribution of this study is to develop an MCMC algorithm for sampling the posterior distribution of the PTAM.

Another area that needs further development is the determination of a method for dealing with left-truncated data in the data augmentation step. Although the authors of [10] developed the EM algorithm for censored data from phase-type distributions, the case of left-truncated data in connection with the MCMC-based Bayesian approach has not previously been studied. The MCMC algorithms proposed in [7,8,9] are indeed only applicable to data that are not left-truncated. However, it is important to develop MCMC-based methods that handle left truncation because it is a common feature of real-life data. In particular, in the context of the PTAM, real-life data are left-truncated because it is unlikely that in practice, individuals will enter the study at the same physiological age. Accordingly, the second contribution of this study is to develop an MCMC algorithm for estimating the PTAM parameters when data are left-truncated.

The proposed MCMC algorithm utilizes a nested structure comprising two levels. In the outer level, augmented data are generated using the ECS algorithm proposed in [8] combined with the technique developed for left-truncated data. In the inner level, Gibbs sampling is applied to draw samples from the posterior distributions based on a newly developed rejection sampling scheme on a logarithmic scale. Thus, the proposed algorithm can be seen as a methodological extension of the existing data augmentation Gibbs sampler for continuous phase-type distributions. It can also be regarded as a further illustration of making use of the MCMC algorithm in the case of sampling from high-dimensional distributions. On applying the proposed algorithm, a Bayesian estimation of the PTAM parameters can be carried out. This will be illustrated with both simulated and actual data.

This paper is organized as follows. Preliminaries on the PTAM are introduced in Section 2. Section 3 presents a literature review on existing MCMC algorithms in connection with continuous phase-type distributions. In Section 4, the proposed MCMC algorithms for Bayesian inference on the PTAM are introduced. A simulation study is provided in Section 5 to validate the proposed approach. Meanwhile, parameter estimability is analyzed by comparing the proposed Bayesian approach with the frequentist one that was employed in [1]. In Section 6, the proposed Bayesian approach is applied to calibrate the PTAM to the Channing House data set which pertains to the residents of a retirement community. Lastly, some concluding remarks are included in Section 7.

2. The Phase-Type Aging Model

The phase-type aging model (PTAM) stems from the phase-type mortality model proposed in [11]. The motivation for analyzing the phase-type mortality model consists of linking its parameters to certain biological and physiological mechanisms of aging, so that the longevity risk facing annuity products can be measured more accurately. Experimental results showed that the phase-type mortality model with a four-state developmental period and a subsequent aging period achieved very satisfactory fitting results with respect to Swedish and USA cohort mortality data (see [11]). Later on, the authors of [12] applied the phase-type mortality model to Australian cohort mortality data.

Furthering the research in [11], the authors of [1] developed a parsimonious yet flexible representation of the PTAM for modeling various aging patterns. Similarly, the main objective of the PTAM is to describe the human aging process in terms of the evolution of the distribution of physiological ages, utilizing mortality rates as aging-related variables. Therefore, although the PTAM can reproduce mortality patterns, it ought not to be treated as a mortality model. In this context, the PTAM is most applicable at human ages beyond the attainment of adulthood, where, relatively speaking, the aging process is the most significant factor that contributes to the variability in lifetimes (see [1]).

2.1. Preliminaries

Definition 1.

Let

{X (t)}_{t \geq 0}

be a continuous time Markov chain (CTMC) defined on a finite state space

S = E \cup Δ = {1, 2, \dots, m} \cup Δ

, where

Δ = {m + 1}

is the absorbing state and

E

is the set of transient states. Let

{X (t)}_{t \geq 0}

have initial distribution

π^{'} = (π_{1}, π_{2}, \dots, π_{m})

over the transient states such that

π^{'} e = 1

, and let the transition intensity matrix be as follows:

\begin{matrix} Λ = [\begin{matrix} S & h \\ 0 & 0 \end{matrix}], \end{matrix}

(1)

where

h = - S e

and

e

is the column vector of ones.

T : = i n f {t \geq 0 | X (t) = m + 1}

is defined as the time until absorption. Then, T is said to follow a continuous phase-type (CPH) distribution denoted by

C P H (π, S)

of the order m, with

h

being defined as the exit vector.

Remark 1.

Given

T \sim C P H (π, S)

of order m,

(i) The probability density function (p.d.f.) of T is $f_{T} (t) = π^{'} e^{S t} h$ .
(ii) The cumulative density function (c.d.f.) of T is $F_{T} (t) = 1 - π^{'} e^{S t} e$ .

There is a long history of using phase-type distributions for survival modeling in the category of “absorbing time” distributions (see [6,11,12,13]).

Definition 2.

A CPH distribution of order m with representation

(π, S)

is said to be a Coxian distribution of order m if

π^{'} = (1, 0, 0, \dots, 0)

and the following holds true:

\begin{matrix} S = [\begin{matrix} - (λ_{1} + h_{1}) & λ_{1} & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ_{2} + h_{2}) & λ_{2} & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ_{m - 1} + h_{m - 1}) & λ_{m - 1} \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], \end{matrix}

(2)

where

λ_{i} > 0, h_{j} > 0

,

i = 1, 2, \dots, m - 1

, and

j = 1, 2, \dots, m

.

The Coxian distribution can often be visualized by a phase diagram such as that displayed in Figure 1 (see [1]).

2.2. The Phase-Type Aging Model

According to the authors of [1], the phase-type aging model (PTAM) belongs to a class of Coxian-type Markovian models, which can provide a quantitative description of the genetically determined, progressive, and irreversible aging process.

Definition 3.

The PTAM of order m is a Coxian distribution of order m with transition intensity matrix

S

and exit rate vector

h

such that the following applies:

\begin{matrix} S = [\begin{matrix} - (λ + h_{1}) & λ & 0 & 0 & \dots & 0 & 0 \\ 0 & - (λ + h_{2}) & λ & 0 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & 0 & 0 & \dots & - (λ + h_{m - 1}) & λ \\ 0 & 0 & 0 & 0 & \dots & 0 & - h_{m} \end{matrix}], h = [\begin{matrix} h_{1} \\ h_{2} \\ ⋮ \\ h_{m - 1} \\ h_{m} \end{matrix}], \end{matrix}

(3)

where

λ > 0

,

0 < h_{1} < h_{m}

, and

\begin{matrix} h_{i} = \{\begin{matrix} {(\frac{m - i}{m - 1} h_{1}^{s} + \frac{i - 1}{m - 1} h_{m}^{s})}^{\frac{1}{s}}, & s \neq 0, \\ h_{1}^{\frac{m - i}{m - 1}} h_{m}^{\frac{i - 1}{m - 1}}, & s = 0, \end{matrix} \end{matrix}

(4)

i = 1, 2, \dots, m

. This is denoted by

P T A M (h_{1}, h_{m}, s, λ, m)

.

As can be seen from Figure 2, the PTAM has a phase diagram that is similar to the Coxian distribution observed in Figure 1, the difference being the constant transition rate and the functionally related exit rates defined in (4).

(i): In Figure 2, each state in the Markov process represents the physiological age—a variable that reflects an individual’s health condition or frailty level. As the aging process progresses, the frailty level will increase, until the last state occurs. At which point, the individual’s health conditions have deteriorated to the point of causing death.
(ii): The transition rate $λ$ is assumed to be constant. The exiting rate $h_{i}$ is the dying rate or force of dying. With this setup, at a given calendar age, an individual will be assigned to a certain state. This mathematically describes the fact that the individuals involved will have different physiological ages at the same calendar age (see [1]).
(iii): The dying rates assume the structure given in (4), which is somewhat reminiscent of the well-known Box–Cox transformation introduced in [14]. The first and last dying rates— $h_{1}$ and $h_{m}$ —are included in the model parameters, whereas the remaining inbetween rates are interpolated based on the parameter s, which is a model parameter related to the curvature of the exit rate pattern. To verify this, Figure 3 presents the effect of s on the pattern of the exit rates. When $s = 1$ , the dying rates have a linear relationship. When $s > 1$ , the rates are concave, and when $s < 1$ , the rates are convex. In particular, when $s = 0$ , the rates behave exponentially. In practice, we believe that it is likely that $s < 1$ when calibrating to mortality data. That is, the dying rates increase faster than in a linear manner as an individual ages (see [1]). Throughout this study, $h_{i}$ will follow the structure given in (4), for $i = 1, 2, \dots, m$ .

The parameter structure of the PTAM proves to be parsimonious and flexible, which allows us to model the internal aging process explicitly. Further information is available in [1]. Since our study pertains to the PTAM, the processes being considered are homogeneous, use the same intensity for all transitions to the next stage, and have an intensity of moving to the absorbing stage consisting of a linear interpolation between the two end points. More general processes would require further consideration.

3. MCMC-Based Bayesian Inference for the CPH Distributions

3.1. The Data Augmentation Gibbs Sampler

Let

y

denote the observed data, which follow some probability distributions

p (y | θ)

, with

θ

being an unknown set of parameters and

x

being the latent data that are not observed. We, respectively, denote the prior and the posterior distributions associated with

θ

by

p (θ)

and

p (θ | y)

. The idea behind the Gibbs sampler with data augmentation is to augment the observed data,

y

, with latent data,

x

, so that the density function after data augmentation, i.e.,

p (θ | x, y)

, will have a more tractable form (see [5]). Figure 4 presents the iterative framework of the data augmentation Gibbs sampler. The technique consists of a data augmentation step and a posterior sampling step, which, respectively, correspond to sampling from

p (x | θ, y)

and

p (θ | x, y)

.

Appealing to the stationarity of the MCMC method, the samples generated from the data augmentation Gibbs sampler will converge to the target posterior distribution,

p (θ | y)

. We assume some familiarity with the theory of MCMC methods. More details are available in [15,16,17,18,19].

3.2. The Data Augmentation Step—Sampling from $p (x | θ, y)$

The data augmentation scheme with respect to the CPH distributions was proposed by the authors of [6]. It is widely applied to EM, MCMC, and variational Bayes algorithms (see [6,7,8,9,20]).

Consider a CPH distribution of order m with

θ = (π, S)

. Its likelihood function, given the data set

y = (y_{1}, y_{2}, \dots, y_{M})

, is as follows:

\begin{matrix} L (π, S; x) = \prod_{i = 1}^{M} π^{'} e^{S y_{i}} h, \end{matrix}

(5)

where

h = - S e

.

According to the authors of [6], a sample path associated with a CPH distribution can be characterized by the initial state, transitions among states, and the sojourn time at each state. Let

X = {\{X {(t)}^{(k)}\}}_{t \geq 0}, k = 1, 2, \dots, M

, be M independent sample paths augmented from observed absorption time data

y = \{y^{(k)}\} k = 1, 2, \dots, M

. Each sample path is generated by a CPH distribution

(π, S)

of the order m. Then, the likelihood function for the augmented data,

(x, y)

, is given as follows:

\begin{matrix} L (θ; x, y) = (\prod_{i = 1}^{m} π_{i}^{B_{i}}) (\prod_{i = 1}^{m} \prod_{j \neq i}^{m} λ_{i j}^{N_{i j}} e^{- λ_{i j} Z_{i}}) (\prod_{i = 1}^{m} h_{i}^{N_{i, m + 1}} e^{- h_{i} Z_{i}}), \end{matrix}

(6)

where

B_{i}

is the number of sample paths starting at state i among M individuals,

N_{i j}

is the total number of transitions from state i to state j among M individuals, and

Z_{i}

is the total sojourn time in state i among M individuals. In that case, the absorption time

y^{(k)}

for the kth individual is equal to the sum of the sojourn times of its corresponding sample path.

Sampling the latent sample path conditional on the absorption time proves to be a difficult problem. To solve it, the authors of [7] initially suggested to make use of a carefully designed Metropolis–Hastings algorithm with proposal distribution

p (x | θ, Y \geq y)

, where convergence to sampling from

p (x | θ, Y = y)

eventurally occurs. Rejection sampling is then utilized to draw the latent sample path from this proposal distribution. However, this method is time-consuming because many sample paths will be rejected if some data points in

y

are large. This will hinder the computational efficiency.

Later on, the authors of [8] further improved the methodology by making the following two main contributions:

(i): A faster and more efficient algorithm was developed to simulate the latent sample path from $p (x | θ, y)$ . The algorithm is named the Exact Conditional Sampling (ECS) algorithm.
(ii): Unlike the CPH distributions considered by the authors of [7], which assumed full and unstructured parameters, the authors of [8] took into account the special parameter structure, as indicated by the context of the experiment. Using a reliability model as an example, they discussed situations where parameters might be zero or have identical values. For identical parameters, they argued to combine all the relevant terms in (6) so that parameters with the same values will be sampled from one single distribution. The zero-valued parameters were simply set to a value of zero.

In this study, we elected to adopt the ECS algorithm for the data augmentation step. This are conveniently presented in Algorithms 1 and 2. It should be noted that since the PTAM belongs to a Coxian distribution whose underlying Markov process is irreversible, the ECS algorithm will be simpler than the original version that is proposed in Algorithms 1 and 2.

Algorithm 1 The ECS algorithm (see [8]) applied to the PTAM-obtained absorption times.

1:: Sample a starting state i from the probability mass function:

$\begin{matrix} P (X (0) = i | π, S, Y = y) = \frac{(e_{i}^{'} e^{S y} h) π_{i}}{π^{'} e^{S y} h} \end{matrix}$

and $t \leftarrow 0$ .
2:: With probability

$\begin{matrix} P (X [t, y) = i \cap Y {y} = m + 1 | S, Y = y, X (t) = i) = \frac{e^{S_{i i} (y - t)} h_{i}}{{e_{i}}^{'} e^{S (y - t)} h} \end{matrix}$

$X [t, y) \leftarrow i$ and $X (y) \leftarrow m + 1$ and end the algorithm; else continue.
3:: Sample the sojourn time d from

$\begin{matrix} p (δ = d | S, Y = y, X [t, t + δ) = i, X (t + δ) \in {1, 2, \dots, m} ∖ i) \\ = \frac{p_{i .}^{'} e^{S (y - t - d)} s (- S_{i i}) e^{S_{i i} d}}{\int_{0}^{y - t} p_{i .}^{'} e^{S (y - t - d)} s (- S_{i i}) e^{S_{i i} d} d δ} \end{matrix}$

and $X [t, t + d) \leftarrow i$ .
4:: $t \leftarrow t + d$ and $i \leftarrow i + 1$ , then go to Step 2.

3.3. The Posterior Sampling Step—Sampling from $p (θ | x, y)$

The next step consists of simulating the posterior distribution of the parameter

θ

from the augmented data. Fortunately, this step is quite straightforward for the CPH distributions. As the likelihood function consists of kernels of Dirichlet and Gamma distributions, it provides an indication to utilize Dirichlet and Gamma distributions as the conjugate prior distributions. According to the authors of [7], the prior distributions are as follows:

\begin{matrix} π^{'} & \sim Dirichlet (β_{1}, β_{2}, \dots, β_{m}), \end{matrix}

(7)

\begin{matrix} λ_{i j} & \sim Gamma (v_{i j}, ξ_{i j}), \end{matrix}

(8)

\begin{matrix} h_{i} & \sim Gamma (v_{i, m + 1}, ξ_{i, m + 1}), \end{matrix}

(9)

and the posterior distributions after data augmentation are as follows:

\begin{matrix} π^{'} | x, y & \sim Dirichlet (β_{1} + B_{1}, β_{2} + B_{2}, \dots, β_{m} + B_{m}), \end{matrix}

(10)

\begin{matrix} λ_{i j} | x, y & \sim Gamma (v_{i j} + N_{i j}, ξ_{i j} + Z_{i j}), \end{matrix}

(11)

\begin{matrix} h_{i} | x, y & \sim Gamma (v_{i, m + 1} + N_{i, m + 1}, ξ_{i, m + 1} + Z_{i, m + 1}), \end{matrix}

(12)

where the Gamma distributions are parameterized as shape and rate parameters.

Algorithm 2 The ECS algorithm (see [8]) applied to the PTAM-obtained right-censored times.

1:: Sample a starting state i from the probability mass function:

$\begin{matrix} P (X (0) = i | π, S, Y \geq y) = \frac{(e_{i}^{'} e^{S y} e) π_{i}}{π^{'} e^{S y} e} \end{matrix}$

and $t \leftarrow 0$ .
2:: With probability $m i n \{1, e^{S_{i i} (y - t)}\}$ , $d \leftarrow m a x {y - t, 0} + T$ , where $T \sim e x p (S_{i i})$ . Else, the sojourn time d is a sample from the finitely supported density on $[0, y - t)$ , i.e.,

$\begin{matrix} p (δ = d | S, Y \geq y, X (t) = i) \propto p_{i .}^{'} e^{S (y - t - d)} e (- S_{i i}) e^{S_{i i} d} \end{matrix}$

and $X [t, t + d) \leftarrow i$ .
3:: $j = i + 1$ .
4:: If $j = m + 1$ , end the algorithm; else, $t \leftarrow t + d$ and $i \leftarrow j$ , go to Step 2.

4. MCMC-Based Bayesian Inference for the PTAM

The MCMC algorithm for Bayesian inference on the PTAM being introduced in this section constitutes the principal contribution of this study. This contribution involves two aspects. Firstly, the proposed MCMC algorithm can be considered as a methodological extension of the existing algorithm in terms of sampling from

p (θ | x, y)

. This is due to the fact that the likelihood function of the PTAM is so involved that no simple conjugate prior distributions such as the Dirichlet and Gamma distributions are adequate. Although the authors of [8] consider special parameter structures such as zero-valued and identical parameters, the prior conjugacy still holds as it simply involves deleting and regrouping parameters. However, further extensions are required in the case of the PTAM, since its parameters exhibit more complicated functional relationships as a result of the constraint specified in (4). Secondly, similarly to the authors of [10], where the EM algorithm was developed for censored data from the CPH, we have developed the MCMC-based Bayesian approach for left-truncated data from the PTAM. This development is crucial for the estimation of the PTAM parameters based on real-life data, since it is unlikely that in practice, each individual will enter the study at the same physiological age. Thus, there exists additional difficulty with respect to analyzing left-truncated data.

With these contributions, a methodologically extended MCMC algorithm is proposed in order to carry out the sampling from

p (θ | x, y)

so that an MCMC-based Bayesian inference on the PTAM could be achieved, particularly for real-life data that are left-truncated.

4.1. Likelihood Function of the PTAM with Left-Truncated Data

Taking into account left-truncated data, the likelihood function for the PTAM after data augmentation is as follows:

\begin{matrix} L (λ, h_{1}, h_{m}, s; x, y) & = \frac{(\prod_{i = 1}^{m - 1} λ^{N_{i, i + 1} - Q_{i, i + 1}} e^{- λ Z_{i}^{A}}) (\prod_{i = 1}^{m} h_{i}^{N_{i, m + 1}} e^{- h_{i} G_{i}})}{(\prod_{i \in A} e^{- λ d_{i}})}, \end{matrix}

(13)

where

d_{i}

is the time at which individual i enters the study;

Q_{i j}

is the total number of transitions from state i to j, which occurred before the entry times;

G_{i}

is the total sojourn time in state i for the portions of the sample paths after the entry times;

N_{i j}

is as defined in Section 3; and

Z_{i}^{A}

is the total sojourn time in state i for the sample paths in

A

, as follows:

\begin{matrix} A : = \{k \in Z^{+} | the k th sample path enters the study before reaching state m\}, \end{matrix}

(14)

where

t_{j}^{(k)}

is the sojourn time at state j for the kth sample path.

The likelihood function (13) can be seen as a generalized version of the likelihood function (6) given in [6]. To verify this, if the data do not involve left truncation, then the

Q_{i j}

’s and

d_{i}

’s will be reduced to zero for all i and j;

A

will be reduced to the set of indices of all sample paths; and both the

G_{i}

’s and

Z_{i}^{A}

’s will be reduced to

Z_{i}

’s for all i. Thus, the likelihood function in (13) will boil down to (6) with

π^{'} = (1, 0, 0, \dots, 0)

. The details of the derivation of the likelihood function (13) are presented in Appendix B.

4.2. Characteristics of the Posterior Distribution of the PTAM

In the PTAM, the posterior distribution of the model parameters is no longer a product of independent kernels. To verify this, we start by substituting (4) into the likelihood function (13), as follows:

\begin{matrix} L (λ, h_{1}, h_{m}, s; x, y) & = \frac{(\prod_{i = 1}^{m - 1} λ^{N_{i, i + 1} - Q_{i, i + 1}} e^{- λ Z_{i}^{A}})}{(\prod_{i \in A} e^{- λ d_{i}})} \\ \times (\prod_{i = 1}^{m} {(\frac{m - i}{m - 1} h_{1}^{s} + \frac{i - 1}{m - 1} h_{m}^{s})}^{\frac{N_{i, m + 1}}{s}} e^{- {(\frac{m - i}{m - 1} h_{1}^{s} + \frac{i - 1}{m - 1} h_{m}^{s})}^{\frac{1}{s}} G_{i}}), \end{matrix}

(15)

where

s \neq 0

.

Then, the posterior distribution

p (λ, h_{1}, h_{m}, s | x, y)

can be written as follows:

\begin{matrix} p (λ, h_{1}, h_{m}, s | x, y) & \propto (π_{1} (λ) L_{1} (λ; x, y)) (π_{2} (h_{1}, h_{m}, s) L_{2} (h_{1}, h_{m}, s; x, y)), \end{matrix}

(16)

where

\begin{matrix} π_{1} (λ) L_{1} (λ; x, y) & = π_{1} (λ) (\prod_{i = 1}^{m - 1} λ^{N_{i, i + 1} - Q_{i, i + 1}} e^{- λ Z_{i}^{A}}) (\prod_{i \in A} e^{λ d_{i}}) \end{matrix}

(17)

and

\begin{matrix} π_{2} (h_{1}, h_{m}, s) L_{2} (h_{1}, h_{m}, s; x, y) & = π_{2} (h_{1}, h_{m}, s) \\ \times \prod_{i = 1}^{m} {(\frac{m - i}{m - 1} h_{1}^{s} + \frac{i - 1}{m - 1} h_{m}^{s})}^{\frac{N_{i, m + 1}}{s}} \\ \times \prod_{i = 1}^{m} e^{- {(\frac{m - i}{m - 1} h_{1}^{s} + \frac{i - 1}{m - 1} h_{m}^{s})}^{\frac{1}{s}} G_{i}}, \end{matrix}

(18)

with

π_{i}

and

L_{i}

denoting the respective prior distributions and likelihood functions, for

i = 1, 2

.

Based on (16), it is straightforward to see that the posterior distribution of the PTAM parameters can be decomposed into two independent posterior distributions—

p (λ | x, y)

and a joint posterior distribution

p (h_{1}, h_{m}, s | x, y)

, where

\begin{matrix} p (λ | x, y) & \propto π_{1} (λ) L_{1} (λ; x, y), \end{matrix}

(19)

\begin{matrix} p (h_{1}, h_{m}, s | x, y) & \propto π_{2} (h_{1}, h_{m}, s) L_{2} (h_{1}, h_{m}, s; x, y) . \end{matrix}

(20)

Thus, we can evaluate the posterior distribution for

λ

separately, using a gamma distribution (21), which will produce the posterior distribution (22) of the same class, as follows:

\begin{matrix} λ & \sim Gamma (v_{λ}, ξ_{λ}), \end{matrix}

(21)

\begin{matrix} λ | x, y & \sim Gamma (v_{λ} + \sum_{i = 1}^{m - 1} N_{i, i + 1} - \sum_{i = 1}^{m - 1} Q_{i, i + 1}, ξ_{λ} + \sum_{i = 1}^{m - 1} Z_{i}^{A} - \sum_{i \in A} d_{i}) . \end{matrix}

(22)

However, the likelihood function of

(h_{1}, h_{m}, s)

does not consist of independent kernels, which prevents one from determining conjugate priors. The prior distributions for

h_{1}, h_{m}

and s, which are then subjectively determined, are taken to be

π_{H_{1}} (h_{1})

,

π_{H_{m}} (h_{m})

and

π_{S} (s)

. We assume, for simplicity, that

h_{1}, h_{m}

and s are independently distributed. Accordingly, their joint prior distribution,

π_{2} (h_{1}, h_{m}, s)

, will be the product of

π_{H_{1}} (h_{1})

,

π_{H_{m}} (h_{m})

, and

π_{S} (s)

.

4.3. The Proposed Methodology for Sampling from $p (h_{1}, h_{m}, s | x, y)$

Next, a methodology needs to be developed to address the sampling from the joint posterior distribution

p (h_{1}, h_{m}, s | x, y)

. The Gibbs algorithm can be utilized again, further taking advantage of the MCMC method. In that case, the proposed algorithm will become a nested MCMC algorithm. The nested Gibbs algorithm samples from the joint posterior distribution, given the augmented data. The algorithm framework is presented in Figure 5, for a p-dimensional posterior distribution.

In the case of the joint posterior distribution for the PTAM,

p (θ | x, y)

in Figure 5 becomes

p (h_{1}, h_{m}, s | x, y)

. For example, in order to sample from

p (h_{1}, h_{m}, s | x, y)

in the

(k + 1)

th iteration, we need to sample from the corresponding conditional distributions. These are also the transition kernels of the Gibbs algorithm, as follows:

\begin{matrix} p [(h_{1}^{(k)}, h_{m}^{(k)}, s^{(k)}) \to (h_{1}^{(k + 1)}, h_{m}^{(k)}, s^{(k)})] & : = p (h_{1} | h_{m}^{(k)}, s^{(k)}, x, y), \end{matrix}

(23)

\begin{matrix} p [(h_{1}^{(k + 1)}, h_{m}^{(k)}, s^{(k)}) \to (h_{1}^{(k + 1)}, h_{m}^{(k + 1)}, s^{(k)})] & : = p (h_{m} | h_{1}^{(k + 1)}, s^{(k)}, x, y), \end{matrix}

(24)

\begin{matrix} p [(h_{1}^{(k + 1)}, h_{m}^{(k + 1)}, s^{(k)}) \to (h_{1}^{(k + 1)}, h_{m}^{(k + 1)}, s^{(k + 1)})] & : = p (s | h_{1}^{(k + 1)}, h_{m}^{(k + 1)}, x, y) . \end{matrix}

(25)

Since general notations are adopted in Figure 5, the concept of the nested MCMC algorithm is likely applicable to other models whose posterior distributions are complicated after data augmentation.

Define

g (h_{1}, h_{m}, s) : = π_{2} (h_{1}, h_{m}, s) L_{2} (h_{1}, h_{m}, s; x, y)

.

First, we introduce a sampling scheme for

p (h_{1} | h_{m}^{(k)}, s^{(k)}, x, y)

, which is the conditional distribution of

h_{1}

. We know the following:

\begin{matrix} p (h_{1} | h_{m}^{(k)}, s^{(k)}, x, y) = \frac{g (h_{1}, h_{m}^{(k)}, s^{(k)})}{\int_{0}^{h_{m}^{(k)}} g (h_{1}, h_{m}^{(k)}, s^{(k)}) d h_{1}} \propto g (h_{1}, h_{m}^{(k)}, s^{(k)}) . \end{matrix}

(26)

Rejection sampling can then be utilized in conjunction with

g (h_{1}, h_{m}^{(k)}, s^{(k)})

, as described in Algorithm 3.

Algorithm 3 The rejection sampling algorithm for

p (h_{1} | h_{m}^{(k)}, s^{(k)}, x, y)

.

1:: Calculate the maximum value of $ln (g (h_{1}, h_{m}^{(k)}, s^{(k)}))$ on $(0, h_{m}^{(k)})$ . Denote it by $m_{1}$ .
2:: Draw a pair of samples $(x, ln (y))$ . $X \sim Uniform (0, h_{m}^{(k)})$ and $ln (Y) = ln (U) + m_{1}$ , where $U \sim Uniform (0, 1)$ .
3:: while $ln (g (x, h_{m}^{(k)}, s^{(k)})) \leq ln (y)$ do
4:: repeat Step 2
5:: end while
6:: $h_{1}^{(k + 1)} \leftarrow x$ .

Secondly, we consider the sampling scheme for

p (h_{m} | h_{1}^{(k + 1)}, s^{(k)}, x, y)

, which is the marginal distribution of

h_{m}

. We know the following:

\begin{matrix} p (h_{m} | h_{1}^{(k + 1)}, s^{(k)}, x, y) = \frac{g (h_{m}, h_{1}^{(k + 1)}, s^{(k)})}{\int_{h_{1}^{(k + 1)}}^{\infty} g (h_{m}, h_{1}^{(k + 1)}, s^{(k)}) d h_{m}} \propto g (h_{m}, h_{1}^{(k + 1)}, s^{(k)}) . \end{matrix}

(27)

Rejection sampling can be utilized in conjunction with

g (h_{m}, h_{1}^{(k + 1)}, s^{(k)})

, as described in Algorithm 4.

Algorithm 4 The rejection sampling algorithm for

p (h_{m} | h_{1}^{(k + 1)}, s^{(k)}, x, y)

,

1:: Calculate the maximum value of $ln (g (h_{m}, h_{1}^{(k + 1)}, s^{(k)}))$ on $(h_{1}^{(k + 1)}, a)$ . Denote it by $m_{2}$ . In this case, a is a large enough truncation point.
2:: Draw a pair of samples $(x, ln (y))$ . $X \sim Uniform (h_{1}^{(k + 1)}, a)$ and $ln (Y) = ln (U) + m_{2}$ , where $U \sim Uniform (0, 1) .$
3:: while $ln (g (x, h_{1}^{(k + 1)}, s^{(k)})) \leq ln (y)$ do
4:: repeat Step 2
5:: end while
6:: $h_{m}^{(k + 1)} \leftarrow x$ .

Thirdly, we consider the sampling scheme for

p (s | h_{1}^{(k + 1)}, h_{m}^{(k + 1)}, x, y)

, which is the marginal distribution of s. We know the following:

\begin{matrix} p (s | h_{1}^{(k + 1)}, h_{m}^{(k + 1)}, x, y) = \frac{g (s, h_{1}^{(k + 1)}, h_{m}^{(k + 1)})}{\int_{- \infty}^{\infty} g (s, h_{1}^{(k + 1)}, h_{m}^{(k + 1)}) d s} \propto g (s, h_{1}^{(k + 1)}, h_{m}^{(k + 1)}) . \end{matrix}

(28)

Rejection sampling can be utilized in conjunction with

g (s, h_{1}^{(k + 1)}, h_{m}^{(k + 1)})

, as described in Algorithm 5.

Algorithm 5 The rejection sampling algorithm for

p (s | h_{1}^{(k + 1)}, h_{m}^{(k + 1)}, x, y)

.

1:: Calculate the maximum value of $ln (g (s, h_{1}^{(k + 1)}, h_{m}^{(k + 1)}))$ on $(b, c)$ . Denote it by $m_{3}$ . In this case, $| b |, c$ are large enough truncation points.
2:: Draw a pair of samples $(x, ln (y))$ . $X \sim Uniform (b, c)$ and $ln (Y) = ln (U) + m_{3}$ , where $U \sim Uniform (0, 1)$ .
3:: while $ln (g (x, h_{1}^{(k + 1)}, h_{m}^{(k + 1)})) \leq ln (y)$ do
4:: repeat Step 2
5:: end while
6:: $s^{(k + 1)} \leftarrow x$ .

The rejection sampling schemes presented in Algorithms 3–5 also constitute important original contributions. Unlike traditional rejection sampling where a proposal function is chosen to fully cover the target density, the proposed rejection sampling transforms them to a logarithmic scale. This differs from returning a log posterior value to a generic MCMC sampler, as the proposed method applies log-scale bounds within Gibbs updates for individual conditional distributions, some of which have truncated support. We note that the values of the posterior kernels are often too small to be handled by making use of the likelihood functions. In fact, sampling on a logarithmic scale is analogous to taking the logarithm of likelihood functions in order to find MLEs, since both frequentist and Bayesian methods will face the same problem caused by small likelihood function values. However, they deal with this problem differently. In some regular problems, frequentist inference may be performed by maximizing a function (often numerically), and using the curvature at the maximum to quantify uncertainty in an estimate. Other authors have addressed frequentist inference in a non-regular context (see [21]). This study focuses on Bayesian inference, in which case, the output is a (posterior) distribution rather than a point estimate. In this context, it will involve random sampling techniques instead of optimization techniques, as is the case for the rejection sampling on a logarithmic scale presented in Algorithms 3–5. Technical details regarding rejection sampling on a logarithmic scale are elaborated on in Appendix A.

4.4. The MCMC Algorithm for the PTAM

Combining all these building blocks in the data augmentation step and the posterior sampling step, Algorithm 6 presents the MCMC algorithm for Bayesian inference on the PTAM.

In Step 14, for the inner Gibbs sampling, at each iteration, the initial values are selected to be the parameter outputs in the previous iteration. Because the parameter outputs themselves also become increasingly accurate as they converge to the true posterior distribution, using the parameter outputs in previous iterations as the initial values is then believed to be more reasonable and objective. In that case, we can make the most of this algorithm.

Algorithm 6 The MCMC algorithm for Bayesian inference on the PTAM.

Require: The number of states, m, based on prior knowledge or subjective judgment.

Input:

1.: The data observations $y$ , and the entry times $d_{i}^{'} s$ if there are left-truncated data.
2.: The hyper-parameters for prior distributions.
3.: The number of states m.
4.: The number of inner and outer iterations: $w_{1}$ and $w_{2}$ .
5.: The size of the burn-in period and thinning rate, if possible.

Output: The posterior samples for

h_{1}

,

h_{m}

, s and

λ

, each of which has

w_{2}

sample points.

1:: Initialization $(λ^{(1)}, h_{1}^{(1)}, h_{m}^{(1)}, s^{(1)})$ .
2:: Initialization $(λ_{G i b b s}^{(1)}, h_{1, G i b b s}^{(1)}, h_{m, G i b b s}^{(1)}, s_{G i b b s}^{(1)}) \leftarrow (λ^{(1)}, h_{1}^{(1)}, h_{m}^{(1)}, s^{(1)})$ .
3:: for $k = 2 : w_{2}$ do
4:: Draw a sample path $x$ from $p (x | λ^{(k - 1)}, h_{1}^{(k - 1)}, h_{m}^{(k - 1)}, s^{(k - 1)}, y)$ , based on Algorithm 1 (or Algorithm 2 for right-censored data).
5:: Based on $x$ , calculate $(N, Q, Z^{A}, A, G)$ .
6:: for $j = 2 : w_{1}$ do
7:: Sample $h_{1, G i b b s}^{(j)}$ from $p (h_{1} | h_{m, G i b b s}^{(j - 1)}, s_{G i b b s}^{(j - 1)}, N, G)$ , based on Algorithm 3.
8:: Sample $h_{m, G i b b s}^{(j)}$ from $p (h_{m} | h_{1, G i b b s}^{(j)}, s_{G i b b s}^{(j - 1)}, N, G)$ , based on Algorithm 4.
9:: Sample $s_{G i b b s}^{(j)}$ from $p (s | h_{1, G i b b s}^{(j)}, h_{m, G i b b s}^{(j)}, N, G)$ , based on Algorithm 5.
10:: end for
11:: Sample $λ^{(k)}$ from $p (λ | N, Q, Z^{A})$ .
12:: $(h_{1}^{(k)}, h_{m}^{(k)}, s^{(k)}) \leftarrow (h_{1, G i b b s}^{(N)}, h_{m, G i b b s}^{(N)}, s_{G i b b s}^{(N)})$ .
13:: Reset the inner Gibbs sampling vector to zeros.
14:: $(λ_{G i b b s}^{(1)}, h_{1, G i b b s}^{(1)}, h_{m, G i b b s}^{(1)}, s_{G i b b s}^{(1)}) \leftarrow (λ^{(k)}, h_{1}^{(k)}, h_{m}^{(k)}, s^{(k)})$ .
15:: end for

5. Simulation Study

In this section, the proposed algorithm is implemented via a simulation study. The aim of the study is to demonstrate that the proposed MCMC algorithm is sound and that the parameter estimability of the PTAM can be improved via sound prior information. Consider the following experimental conditions:

(i): The underlying parameters are $m = 10, λ = 1.99908, h_{1} = 0.0008, h_{m} = 1.65349, s = - 0.11118$ . They were taken from the simulation study on Le Bras limiting distribution carried out in [1], except that m is assumed to take a moderate value of ten.
(ii): The sample size is 50.
(iii): There are 4500 iterations of the Gibbs sampler for data augmentation.
(iv): There are 500 iterations of the inner Gibbs sampling for the posterior distribution.
(v): The first 500 iterations are taken as burn-in, based on cumulative standard deviation plots (see [19]).
(vi): A thinning rate of 10 is adopted, based on autocorrelation functions (ACFs) (see [19]).
(vii): The prior distributions are assumed to be sound; the prior means remain close to the true parameter values with low variances.

$\begin{matrix} π_{H_{1}} (h_{1}) & = Gamma (v_{h_{1}} = 0.01, ξ_{h_{1}} = 10), \\ π_{H_{m}} (h_{m}) & = Gamma (v_{h_{m}} = 3, ξ_{h_{m}} = 1.5), \\ π_{S} (s) & = 8 e^{8 s}, s < 0, \\ π_{Λ} (λ) & = Gamma (v_{λ} = 24, ξ_{λ} = 16) . \end{matrix}$

We assume that $s < 0$ because the dying rate pattern forms a fairly convex increasing pattern, as displayed in Figure 3, which is consistent with the biological interpretation of dying rates.

After implementing Algorithm 6, the results are listed in Table 1 and are illustrated in Figure 6 and Figure 7.

In Table 1, the true parameters are all within the corresponding 95% credible intervals. This indicates that the proposed MCMC algorithm for Bayesian inference is quite satisfactory. It can be seen from Figure 6 that the correlations between

h_{1}, h_{m}

, and s are minimal. This indicates that the dependent structure of

h_{1}, h_{m}

, and s in the likelihood function has little effect on the shape of the posterior distributions, so that

h_{1}, h_{m}

, and s are still nearly independent, as was assumed in the prior distributions. This observation suggests that the estimability of

h_{1}, h_{m}

, and s could be poor. In fact, the same conclusion can also be reached by observing the diagonal panels in Figure 6, which reveal the shapes of their posterior distributions. In particular, the posterior distribution for s closely resembles the prior. This suggests that their distributions are less responsive to data so that the prior effects are, to some degree, still preserved in the behaviour of their posterior distributions. This indicates a weaker inferential power and therefore a poorer estimability. In contrast,

λ

is more estimable, as the shape of its posterior distribution differs substantially from that of its prior. Therefore, the role of prior distributions is crucial for dealing with flat likelihood functions. Sound prior information can improve the accuracy of the parameter estimates as the posterior distributions are highly dependent on the priors.

In Figure 7, the convergence of the proposed MCMC algorithm is being assessed by means of trace plots, ACFs, and ergodic mean plots. First, the trace plots demonstrate the stationarity of the MCMC samples in terms of level-off patterns, though there are occasionally a few spikes for

h_{1}

and s. However, such spikes are a normal phenomenon as the shapes of their posterior densities still remain close to their skewed prior densities due to poor estimability. Secondly, the ACFs for all parameters are within the tolerance range after the second lag. This indicates that the thinning rate effectively reduces the ACFs between the MCMC samples. Thirdly, the ergodic means all converge as the number of iterations increases. This suggests that the number of iterations, that is, 4500, is sufficient to believe that the simulated MCMC samples were approximately generated from the stationary distributions, which are the target posterior distributions.

Prior Sensitivity Analysis

To further validate the vital role of sound prior information in terms of estimability improvement, we now conduct a prior sensitivity analysis. Two alternative types of priors are tested. The first type is taken to be falsely informative, where the prior means deviate noticeably from the true parameter values with low variances. The second type is taken to be non-informative, where parameters are uniformly distributed. The results are listed in Table 2 and illustrated in Figure 8 and Figure 9.

It can be seen from Table 2 and Figure 8 that when priors are taken to be falsely informative, the 95% credible intervals for

h_{1}, h_{m}

, and s all failed to cover the true values. This is as expected because their likelihood functions are flat due to poor estimability. Then, the posterior distributions will be highly dependent on the prior distributions. On the other hand, the interval for

λ

remains narrow and covers the true value, which indicates a better estimability than that of

h_{1}, h_{m}

, and s.

Next, when the priors are taken to be non-informative, the shape of posterior density will be totally determined by the shape of the likelihood function. It can be seen from Table 2 and Figure 9 that while covering their MLEs as expected, the 95% credible intervals for

h_{1}, h_{m}

, and s are extremely wide. This further corroborates the flatness of their likelihood functions with the concomitant poor estimability. On the other hand, the interval for

λ

still remains narrow while covering its MLE, which indicates a better estimability.

Upon completing this prior sensitivity analysis, all conclusions are consistent with each other throughout this simulation study. The poor estimability of

h_{1}, h_{m}, s

, as well as the improved estimability of

λ

, has been demonstrated. The significant prior sensitivity on

h_{1}, h_{m}

, and s indicates that suitable prior information indeed plays a significant role in improving their estimability. Therefore, it is crucial to select priors that are as sound as possible when making Bayesian inference. Otherwise, deficient priors might yield unreliable parameter estimates, particularly when their estimability is poor.

6. Data Analysis

In Section 5, we have shown that the proposed MCMC algorithm can improve parameter estimability for the PTAM by making use of sound prior information. In this section, we will demonstrate that in addition to improving estimability via sound prior information, the proposed algorithm can also be utilized to adapt the PTAM to real-life data.

Consider the data collected from the Channing House, which is a retirement community in Palo Alto, California. The data consist of entry ages, ages at death, and ages at study end for 462 people who resided in the facility between January 1964 and July 1975 (see [22]). The Channing House data are chosen because all the residents in the community are approximately subject to the same circumstances, so that, relatively speaking, the aging process is the most significant factor contributing to the variability in their lifetimes, which is the process we intend to model using the PTAM. Moreover, the female data are chosen to preclude the effects of gender differences. Of the 361 females, 129 died while residing in Channing House, whereas the other 232 survived to the end of the study.

In practice, residents join a retirement community at various physiological ages. According to the Channing House data, the youngest entry age is 61. Thus, for modeling purposes, it will be assumed that the aging process starts at calendar age 50 for all residents. Under that setting, residents are expected to have variable physiological ages at the time of entering the study. As well, letting

m = 20

ought to be more than adequate.

Unlike what was assumed in Section 5, an underlying model does not exist. In that case, the prior distributions are surmised to be as follows:

\begin{matrix} π_{H_{1}} (h_{1}) & = Gamma (v_{h_{1}} = 0.002, ξ_{h_{1}} = 2), \\ π_{H_{m}} (h_{m}) & = Gamma (v_{h_{m}} = 12.5, ξ_{h_{m}} = 5), \\ π_{S} (s) & = e^{s}, s < 0, \\ π_{Λ} (λ) & = Gamma (v_{λ} = 1.5, ξ_{λ} = 5) . \end{matrix}

The priors are deliberately chosen in such a way that the model with parameters taken as the prior mean is far away from the Kaplan–Meier survival function estimates, as plotted in Figure 10. The purpose of proceeding this way is to more persuasively show that the proposed Bayesian approach is valid. In practice, of course, one should select the priors in such a way that the model with parameters taken as the prior mean is as close to the Kaplan–Meier survival function estimates as possible.

Using the proposed Bayesian approach, the parameter estimation results are displayed in Table 3.

In Figure 10, we illustrate the goodness of fit of the PTAM to the Channing House female data by plotting the fitted survival function along with the nonparametric Kaplan–Meier survival function estimates. In addition, for comparison purposes, we also plotted the model with parameters taken as the prior mean, the fitted model using the MLE method, and the fitted model obtained in [1]. It can be observed that the PTAM fits the Channing House female data very well, as the associated fitted survival function stays within the 95% confidence limits of the Kaplan–Meier estimates. The significant difference between the fitted model and the model with parameters taken as the prior mean, as mentioned earlier, very convincingly validates the proposed Bayesian approach. This difference clearly shows that the prior distributions are actually updated to the corresponding posterior distributions for the Channing House female data.

Furthermore, the fitted models with

m = 20

, whether they are estimated based on the MLEs or the proposed Bayesian method, are in very close agreement with the fitted model in [1], where

m = 100

. In fact, the fitted model with

m = 20

fits the data even better for ages between 91 and 101.

7. Concluding Remarks

An MCMC algorithm for Bayesian inference on the PTAM was proposed. Two contributions were made on the basis of existing MCMC algorithms for Bayesian inference on continuous phase-type distributions. First, a sampling scheme was proposed for posterior sampling after data augmentation. Secondly, an existing data augmentation technique was further developed to incorporate left-truncated data. In the simulation study, the proposed approach was applied to a ten-state PTAM. The results showed that with sound prior information, the proposed approach indeed improved parameter estimability by producing narrower credible intervals that captured the true values. Then, the results were also applied to calibrate the PTAM to a mortality data set collected from a retirement community, which produced reasonable results that are comparable to those obtained in previous work. All in all, while numerical results indicate that the proposed methodology improves parameter estimability for the PTAM as opposed to the MLE method, it may also be utilized as a stand-alone model-fitting technique.

Author Contributions

Conceptualization, X.L.; methodology, C.N.; software, C.N.; validation, C.N.; formal analysis, C.N.; investigation, C.N.; resources, X.L.; data curation, C.N.; writing—original draft preparation, C.N.; writing—review and editing, X.L., C.N., S.P. and J.R.; visualization, C.N.; supervision, X.L., S.P. and J.R.; project administration, X.L., S.P. and J.R.; funding acquisition, X.L. and S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada (grant number R0610A01).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The financial support of the Natural Sciences and Engineering Research Council of Canada is gratefully acknowledged. We would like to express our sincere thanks to both reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Rejection Sampling on a Logarithmic Scale

We briefly recall the rejection sampling method, which enables one to sample from a given continuous p.d.f.

p (θ)

where

θ \in (a, b)

, given that

f (θ) \propto p (θ)

.

Algorithm A1 The rejection sampling algorithm with a uniform proposal distribution.

1:: Calculate the global maximum of $f (θ)$ on $(a, b)$ . Define it as w.
2:: Draw a pair of uniformly distributed samples $(θ, y)$ ; $Θ \sim Uniform (a, b)$ and $Y \sim Uniform (0, w)$ .
3:: while $f (θ) \leq y$ do
4:: repeat Step 2
5:: end while
6:: Take $θ$ as the sample.

Note that Algorithm A1 utilizes a uniform distribution of

Θ

as the proposal distribution, with p.d.f. defined as

v (θ) = \frac{1}{b - a}

. Then, as required for implementing rejection sampling, a constant

c = \frac{w}{(b - a)}

is selected so that

c v (θ) = w \geq f (θ), \forall θ \in (a, b)

. According to the theory on rejection sampling, the proposal distribution

v (θ)

does not have to be uniform, as long as the requirement

c v (θ) \geq f (θ)

is satisfied. In this study, it is taken as the uniform distribution, which simplifies the process.

However, when f is a posterior kernel, its value is likely to be small by making use of the likelihood function. In fact, in the simulation study on the PTAM, its value is so small that it outputs a value of zero in R. Thus, in order to carry out the rejection sampling scheme, we have to transform the posterior kernel to a logarithmic scale. In other words, instead of comparing

f (θ)

and y, we compare

ln f (θ)

and

ln f (y)

; in which case, the distribution of

ln (Y)

has to be determined.

For

x \in (- \infty, ln (w))

, we have the following:

\begin{matrix} P (ln (Y) \leq x) & = P (Y \leq e^{x}) \\ = \frac{e^{x}}{w} . \end{matrix}

(A1)

Upon inverting the c.d.f., we may achieve the sampling of

ln (Y)

via the following relationship:

\begin{matrix} ln (Y) = ln (w) + ln (U), \end{matrix}

(A2)

where

U \sim Uniform (0, 1)

.

Therefore, one may sample on a logarithmic scale, as in Algorithm A2. This allows one to work with

ln (w)

, when w is so small that it outputs a value of zero in R.

Algorithm A2 Algorithm A1 on a logarithmic scale.

1:: Calculate the global maximum of $ln f (θ)$ on $(a, b)$ . Define it as w.
2:: Draw a pair of samples $(θ, ln (y))$ ; $Θ \sim Uniform (a, b)$ and $ln (Y) = ln (U) + w$ , where $U \sim Uniform (0, 1)$ .
3:: while $ln f (θ) \leq ln (y)$ do
4:: repeat Step 2
5:: end while
6:: Take $θ$ as the sample.

Algorithms 3–5 are then direct applications of Algorithm A2.

Appendix B. Data Augmentation with Left-Truncated Data

Appendix B.1. Case 1

To begin with, consider a sample path of the underlying Markov process of the PTAM presented in Table A1, where

m > 5

.

Table A1. A PTAM sample path generated from data augmentation.

state	1	2	3	4	5	$m + 1$
sojourn time	$t_{1}$	$t_{2}$	$t_{3}$	$t_{4}$	$t_{5}$	0

The likelihood function of this sample path is then as follows:

\begin{matrix} L (h_{1}, h_{m}, s, λ; x, y) = \prod_{i = 1}^{4} (λ e^{- (λ + h_{i}) t_{i}}) h_{5} e^{- (λ + h_{5}) t_{5}}, \end{matrix}

(A3)

as the inter-arrival time of a Markov process is exponentially distributed.

Now, without any loss of generality, suppose that the individual enters the study at time d where

t_{1} < d < t_{1} + t_{2}

. In that case, we only know that the individual is alive at time d with a physiological age in state 2. The data are then left-truncated at time d. According to the proposed likelihood function specified in (13), the likelihood function for this left-truncated data is as follows:

\begin{matrix} L (h_{1}, h_{m}, s, λ; x, y) & = \frac{\prod_{i = 1}^{4} (λ e^{- (λ + h_{i}) t_{i}}) h_{5} e^{- (λ + h_{5}) t_{5}}}{λ e^{- (λ + h_{1}) t_{1}} e^{- (λ + h_{2}) (d - t_{1})}} \\ = λ e^{- (λ + h_{2}) (t_{1} + t_{2} - d)} \prod_{i = 3}^{4} (λ e^{- (λ + h_{i}) t_{i}}) h_{5} e^{- (λ + h_{5}) t_{5}} . \end{matrix}

(A4)

Appendix B.2. Case 2

When the individual enters the study at the last physiological age, the likelihood function will be slightly different. To verify this, consider another case where the simulated sample path is that presented in Table A2.

Table A2. A PTAM sample path generated from data augmentation.

state	1	2	3	4	5	…	m	$m + 1$
sojourn time	$t_{1}$	$t_{2}$	$t_{3}$	$t_{4}$	$t_{5}$	…	$t_{m}$	0

Accordingly, we have

\sum_{i = 1}^{m - 1} t_{i} < d < \sum_{i = 1}^{m} t_{i}

. In that case, we only know that the individual is alive at time d with a chonological age in state m (the last state). The likelihood function is then as follows:

\begin{matrix} L (h_{1}, h_{m}, s, λ; x, y) & = \frac{\prod_{i = 1}^{m - 1} (λ e^{- (λ + h_{i}) t_{i}}) h_{m} e^{- h_{m} t_{m}}}{\prod_{i = 1}^{m - 1} (λ e^{- (λ + h_{i}) t_{i}}) e^{- h_{m} (d - \sum_{i = 1}^{m - 1} t_{i})}} \\ = h_{m} e^{- h_{m} (\sum_{i = 1}^{m} t_{i} - d)} . \end{matrix}

(A5)

Clearly, what makes the two cases different are the rates in the exponents. For the previous

m - 1

states, the rates include

λ

; however, this is not the case for the last state, which is attributable to the definition of the PTAM. Thus, in order to construct the likelihood function for left-truncated data, one needs to consider these two cases separately, which explains why set

A

must be defined.

Appendix B.3. Derivation of the Likelihood Function for the Case of Left-Truncated Data

Now, let us finally consider various sample paths corresponding to M individuals. To consider the two cases separately, let

A

be the set of indices of sample paths in the second case. Let

m^{(i)}

be the state right before absorption for the ith individual. Moreover, let

n^{(i)}

be such that

\sum_{j = 1}^{n^{(i)}} t_{j}^{(i)} < d_{i} < \sum_{j = 1}^{n^{(i)} + 1} t_{j}^{(i)}

. Then, the likelihood function is as follows:

\begin{matrix} L (λ, h_{1}, h_{m}, s; x, y) \\ = \prod_{i \in A} (λ e^{- (λ + h_{n^{(i)} + 1}) (\sum_{j = 1}^{n^{(i)} + 1} t_{j}^{(i)} - d_{i})} \prod_{j = n^{(i)} + 2}^{m^{(i)} - 1} (λ e^{- (λ + h_{j}) t_{j}^{(i)}}) h_{m^{(i)}} e^{- (λ + h_{m^{(i)}}) t_{m^{(i)}}^{(i)}}) \\ \times \prod_{i \notin A} (h_{m} e^{- h_{m} (\sum_{j = 1}^{m} t_{j}^{(i)} - d_{i})}) \\ = (λ^{\sum_{i \in A} (m^{(i)} - n^{(i)} - 1)}) (e^{- λ \sum_{i \in A} \sum_{j = 1}^{m^{(i)}} t_{j}^{(i)}}) (\prod_{i = 1}^{M} h_{m^{(i)}}) \\ \times (\prod_{i \in A} e^{- h_{n^{(i)} + 1} (\sum_{j = 1}^{n^{(i)} + 1} t_{j}^{(i)} - d_{i}) - \sum_{j = n^{(i)} + 2}^{m^{(i)}} h_{j} t_{j}^{(i)}}) (\prod_{i \notin A} e^{- h_{m} (\sum_{j = 1}^{m} t_{j}^{(i)} - d_{i})}) \\ \times (\prod_{i \in A} e^{λ d_{i}}) . \end{matrix}

(A6)

According to the definitions of

Q_{i j}

,

Z_{i}^{A}

, and

M_{i}

, as well as those of

N_{i j}

and

Z_{i}

given in [6], the following is obtained:

\begin{matrix} \sum_{i \in A} (m^{(i)} - n^{(i)} - 1) & = : \sum_{i = 1}^{m - 1} (N_{i, i + 1} - Q_{i, i + 1}), \end{matrix}

(A7)

\begin{matrix} \sum_{i \in A} \sum_{j = 1}^{m^{(i)}} t_{j}^{(i)} & = : \sum_{i = 1}^{m - 1} Z_{i}^{A}, \end{matrix}

(A8)

\begin{matrix} \prod_{i = 1}^{M} h_{m^{(i)}} & = : \prod_{i = 1}^{m} h_{i}^{N_{i, m + 1}}, \end{matrix}

(A9)

and

\begin{matrix} (\prod_{i \in A} e^{- h_{n^{(i)} + 1} (\sum_{j = 1}^{n^{(i)} + 1} t_{j}^{(i)} - d_{i}) - \sum_{j = n^{(i)} + 2}^{m^{(i)}} h_{j} t_{j}^{(i)}}) (\prod_{i \notin A} e^{- h_{m} (\sum_{j = 1}^{m} t_{j}^{(i)} - d_{i})}) = : \prod_{i = 1}^{m} e^{- h_{i} G_{i}} . \end{matrix}

(A10)

Substituting (A7)–(A10) into (A6) yields the final representation given in (13).

References

Cheng, B.; Jones, B.; Liu, X.; Ren, J. The mathematical mechanism of biological aging. N. Am. Actuar. J. 2021, 25, 73–93. [Google Scholar] [CrossRef]
Cheng, B. A Class of Phase-Type Aging Models and their Lifetime Distributions. Ph.D. Thesis, Western University, London, ON, Canada, 2021. [Google Scholar]
Raue, A.; Kreutz, C.; Maiwald, T.; Bachmann, J.; Schilling, M.; Klingmüller, U.; Timmer, J. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 2009, 25, 1923–1929. [Google Scholar] [CrossRef] [PubMed]
Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 1993, 80, 27–38. [Google Scholar] [CrossRef]
Tanner, M.A.; Wong, W.H. From EM to data augmentation: The emergence of MCMC Bayesian computation in the 1980s. Stat. Sci. 2010, 25, 506–516. [Google Scholar] [CrossRef]
Asmussen, S.; Nerman, O.; Olsson, M. Fitting phase-type distributions via the EM algorithm. Scand. J. Stat. 1996, 23, 419–441. [Google Scholar]
Bladt, M.; Gonzalez, A.; Lauritzen, S.L. The estimation of phase-type related functionals using Markov chain Monte Carlo methods. Scand. Actuar. J. 2003, 4, 280–300. [Google Scholar] [CrossRef]
Aslett, L.J.; Wilson, S.P. Markov chain Monte Carlo for inference on phase-type models. In Proceedings of the 2011 International Statistical Institute World Statistics Congress (ISI WSC), Dublin, Ireland, 21–26 August 2011; Volume 120. [Google Scholar]
Watanabe, R.; Okamura, H.; Dohi, T. An efficient MCMC algorithm for continuous PH distributions. In Proceedings of the 2012 Winter Simulation Conference (WSC), Berlin, Germany, 9–12 December 2012; pp. 1–12. [Google Scholar]
Olsson, M. Estimation of phase-type distributions from censored data. Scand. J. Stat. 1996, 23, 443–460. [Google Scholar]
Lin, X.S.; Liu, X. Markov aging process and phase-type law of mortality. N. Am. Actuar. J. 2007, 11, 92–109. [Google Scholar] [CrossRef]
Su, S.; Sherris, M. Heterogeneity of Australian population mortality and implications for a viable life annuity market. Insur. Math. Econ. 2012, 51, 322–332. [Google Scholar] [CrossRef]
Aalen, O.O. Phase-type distributions in survival analysis. Scand. J. Stat. 1995, 22, 447–463. [Google Scholar]
Box, G.E.; Cox, D.R. An analysis of transformations. J. R. Stat. Soc. Ser. B (Methodol.) 1964, 26, 211–243. [Google Scholar] [CrossRef]
Brooks, S.; Gelman, A.; Jones, G.; Meng, X.L. Handbook of Markov Chain Monte Carlo; CRC Press: New York, NY, USA, 2011. [Google Scholar]
Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
Geman, S.; Geman, D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, PAMI-6, 721–741. [Google Scholar] [CrossRef] [PubMed]
Tanner, M.A.; Wong, W.H. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 1987, 82, 528–540. [Google Scholar] [CrossRef]
Lynch, S.M. Introduction to Applied Bayesian Statistics and Estimation for Social Scientists; Springer Science & Business Media: Berlin/Heidelberg, Germany; New York, NY, USA, 2007. [Google Scholar]
Okamura, H.; Watanabe, R.; Dohi, T. Variational Bayes for phase-type distribution. Commun. Stat.-Simul. Comput. 2014, 43, 2031–2044. [Google Scholar] [CrossRef]
Kolassa, J.E. Confidence intervals for parameters lying in a random polygon. Can. J. Stat. 1999, 27, 149–161. [Google Scholar] [CrossRef]
Hyde, J. Testing survival with incomplete observations. In Biostatistics Casebook; John Wiley: Hoboken, NJ, USA, 1980; pp. 31–46. [Google Scholar]

Figure 1. Phase diagram for a Coxian distribution.

Figure 2. Phase diagram for the PTAM.

Figure 3. Behaviour of the exit rate vector for various values of s with

m = 100

.

Figure 3. Behaviour of the exit rate vector for various values of s with

m = 100

.

Figure 4. Iterative framework of the data augmentation Gibbs sampler.

Figure 5. The MCMC algorithm framework for the proposed methodology.

Figure 6. Posterior distributions and parameter correlations obtained from the MCMC samples.

Figure 7. Diagnostic plots of the MCMC samples.

Figure 8. (Left) panel: parameter estimates and 95% credible intervals for falsely informative priors. (Right) panel: enlarged plot for

h_{1}

.

Figure 8. (Left) panel: parameter estimates and 95% credible intervals for falsely informative priors. (Right) panel: enlarged plot for

h_{1}

.

Figure 9. (Left) panel: parameter estimates and 95% credible intervals for non-informative priors. (Right) panel: enlarged plot for

h_{1}

.

Figure 9. (Left) panel: parameter estimates and 95% credible intervals for non-informative priors. (Right) panel: enlarged plot for

h_{1}

.

Figure 10. Survival functions of the PTAM calibrated to the Channing House female data using maximum likelihood estimates and the proposed Bayesian approach, the calibrated survival function with parameters taken as the prior mean, the calibrated survival function obtained in [1], and the Kaplan–Meier estimates of the survival function and corresponding 95% confidence limits.

Table 1. Posterior means and 95% credible intervals obtained from the MCMC algorithm and the true parameters.

Parameter	True	Posterior Mean	95% Credible Interval
$h_{1}$	0.00080	0.001326039	(0.00006395895, 0.00353304747)
$h_{m}$	1.65349	1.770213467	(1.230238, 2.478763)
s	$- 0.11118$	$- 0.085397339$	( $- 0.347967221, - 0.001345516$ )
$λ$	1.99908	1.977085034	(1.698050, 2.258719)

Table 2. The 95% credible intervals obtained from falsely informative and non-informative priors, MLEs, and true parameters.

Parameter	True	MLE	Falsely Informative Priors	Non-Informative Priors
$h_{1}$	0.00080	0.001210155	(0.01122067, 0.02576926)	(0.0006370074, 0.0514890336)
$h_{m}$	1.65349	0.957246758	(3.936409, 6.170863)	(0.5202541, 21.9348132)
s	$- 0.11118$	$- 1.989832312$	$(- 4.8561343, - 0.5556046)$	$(- 49.4620012, - 0.7391115)$
$λ$	1.99908	2.514277429	(1.749293, 2.170121)	(1.819338, 3.221812)

Table 3. Posterior means and 95% credible intervals obtained from the MCMC algorithm for the Channing House female data.

Parameter	Posterior Mean	95% Credible Interval
$h_{1}$	0.0045658	(0.00006130736, 0.00923589434)
$h_{m}$	2.475408	(1.459456, 3.422970)
s	$- 1.085645$	$(- 1.8089289, - 0.1331294)$
$λ$	0.4906715	(0.4353424, 0.5284059)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nie, C.; Liu, X.; Provost, S.; Ren, J. A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model. Stats 2025, 8, 77. https://doi.org/10.3390/stats8030077

AMA Style

Nie C, Liu X, Provost S, Ren J. A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model. Stats. 2025; 8(3):77. https://doi.org/10.3390/stats8030077

Chicago/Turabian Style

Nie, Cong, Xiaoming Liu, Serge Provost, and Jiandong Ren. 2025. "A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model" Stats 8, no. 3: 77. https://doi.org/10.3390/stats8030077

APA Style

Nie, C., Liu, X., Provost, S., & Ren, J. (2025). A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model. Stats, 8(3), 77. https://doi.org/10.3390/stats8030077

Article Menu

A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model

Abstract

1. Introduction

2. The Phase-Type Aging Model

2.1. Preliminaries

2.2. The Phase-Type Aging Model

3. MCMC-Based Bayesian Inference for the CPH Distributions

3.1. The Data Augmentation Gibbs Sampler

3.2. The Data Augmentation Step—Sampling from $p (x | θ, y)$

3.3. The Posterior Sampling Step—Sampling from $p (θ | x, y)$

4. MCMC-Based Bayesian Inference for the PTAM

4.1. Likelihood Function of the PTAM with Left-Truncated Data

4.2. Characteristics of the Posterior Distribution of the PTAM

4.3. The Proposed Methodology for Sampling from $p (h_{1}, h_{m}, s | x, y)$

4.4. The MCMC Algorithm for the PTAM

5. Simulation Study

Prior Sensitivity Analysis

6. Data Analysis

7. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Rejection Sampling on a Logarithmic Scale

Appendix B. Data Augmentation with Left-Truncated Data

Appendix B.1. Case 1

Appendix B.2. Case 2

Appendix B.3. Derivation of the Likelihood Function for the Case of Left-Truncated Data

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Markov Chain Monte Carlo Procedure for Efficient Bayesian Inference on the Phase-Type Aging Model

Abstract

1. Introduction

2. The Phase-Type Aging Model

2.1. Preliminaries

2.2. The Phase-Type Aging Model

3. MCMC-Based Bayesian Inference for the CPH Distributions

3.1. The Data Augmentation Gibbs Sampler

3.2. The Data Augmentation Step—Sampling from p ( x | θ , y )

3.3. The Posterior Sampling Step—Sampling from p ( θ | x , y )

4. MCMC-Based Bayesian Inference for the PTAM

4.1. Likelihood Function of the PTAM with Left-Truncated Data

4.2. Characteristics of the Posterior Distribution of the PTAM

4.3. The Proposed Methodology for Sampling from p ( h 1 , h m , s | x , y )

4.4. The MCMC Algorithm for the PTAM

5. Simulation Study

Prior Sensitivity Analysis

6. Data Analysis

7. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Rejection Sampling on a Logarithmic Scale

Appendix B. Data Augmentation with Left-Truncated Data

Appendix B.1. Case 1

Appendix B.2. Case 2

Appendix B.3. Derivation of the Likelihood Function for the Case of Left-Truncated Data

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. The Data Augmentation Step—Sampling from $p (x | θ, y)$

3.3. The Posterior Sampling Step—Sampling from $p (θ | x, y)$

4.3. The Proposed Methodology for Sampling from $p (h_{1}, h_{m}, s | x, y)$