Next Article in Journal
Random Spacing between Metal Tree Electrodeposits in Linear DLA Arrays
Previous Article in Journal
Rényi Entropy Power Inequalities via Normal Transport and Rotation
Previous Article in Special Issue
Noise Enhanced Signal Detection of Variable Detectors under Certain Constraints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Computational Methods for Sampling from the Posterior Distribution of a Bivariate Survival Model, Based on AMH Copula in the Presence of Right-Censored Data

by
Erlandson Ferreira Saraiva
1,
Adriano Kamimura Suzuki
2,* and
Luis Aparecido Milan
3
1
Instituto de Matemática, Universidade Federal de Mato Grosso do Sul, Campo Grande 79070-900, Brazil
2
Departamento de Matemática Aplicada e Estatística, Universidade de São Paulo, São Carlos 13566-590, Brazil
3
Departamento de Estatística, Universidade de São Carlos, São Carlos 13565-905, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(9), 642; https://doi.org/10.3390/e20090642
Submission received: 27 June 2018 / Revised: 20 August 2018 / Accepted: 23 August 2018 / Published: 27 August 2018
(This article belongs to the Special Issue Foundations of Statistics)

Abstract

:
In this paper, we study the performance of Bayesian computational methods to estimate the parameters of a bivariate survival model based on the Ali–Mikhail–Haq copula with marginal distributions given by Weibull distributions. The estimation procedure was based on Monte Carlo Markov Chain (MCMC) algorithms. We present three version of the Metropolis–Hastings algorithm: Independent Metropolis–Hastings (IMH), Random Walk Metropolis (RWM) and Metropolis–Hastings with a natural-candidate generating density (MH). Since the creation of a good candidate generating density in IMH and RWM may be difficult, we also describe how to update a parameter of interest using the slice sampling (SS) method. A simulation study was carried out to compare the performances of the IMH, RWM and SS. A comparison was made using the sample root mean square error as an indicator of performance. Results obtained from the simulations show that the SS algorithm is an effective alternative to the IMH and RWM methods when simulating values from the posterior distribution, especially for small sample sizes. We also applied these methods to a real data set.

1. Introduction

In survival studies, it is common to observe two or more lifetimes for the same client, patient or equipment. For instance, in a bivariate scenario, the lifetimes of a pair of organs can be observed, such as a pair of kidneys, liver, or eyes in patients; or the lifetimes of engines in a twin-engine airplane.
These variables are usually correlated and we are interested in the bivariate model that considers the dependence between them. The copula model is useful for modeling this kind of bivariate data. It has been used in several articles, including the following: [1] describes a comparison between bivariate frailty models, and models based on bivariate exponential and Weibull distributions; [2] proposes a copula model to study the association between survival time of individuals infected with HIV and persistence time of infection; [3] models the association of bivariate failure times by copula functions, and investigates two-stage parametric and semi-parametric procedures; and [4] considers a Gaussian copula model and estimates the copula association parameter using a two-stage estimation procedure.
According to [5,6], a copula is a joint distribution function of random variables for which the marginal probability distribution of each variable is uniformly distributed on the interval [ 0 , 1 ] .
There are many parametric copula families in the literature, each one representing a different dependence structure between the random variables. One advantage of a copula model is its simplicity when applied to model bivariate data. This is explored by many authors in survival analysis. Among them are: Romeo et al. [7] and da Cruz et al. [8], who considered the Archimedean copula family; Louzada et al. [9] and Suzuki et al. [10], who considered the Farlie–Gumbel–Morgenstern (FGM) copula; and Romeo et al. [11], who considered the two-parameter Archimedean family of power variance function (PVF) copulas.
In this paper, we apply the Ali–Mikhail–Haq (AMH) copula to model bivariate survival data with random right-censored observations. From a practical point of view, the main reason for using the AMH copula is that it is an Archimedean copula that allows both positive and negative values for the dependence parameter, and whose mathematical formula is simpler than other Archimedean copulas. Another advantage is that assuming the AMH copula, the Kendall rank-order correlation τ between the bivariate lifetimes is a monotonic function of the dependence parameter ϕ . According to [12], the Kendall’s τ can range from (approximately) 0.18 to 0.33 , with τ = 0 when ϕ = 0 ; and the Spearman’s ρ associated to ϕ can range (approximately) from 0.2711 to 0.4784 , indicating that the AMH copula is adequate for modeling bivariate data with a weak correlation.
In order to proceed with the copula model it is necessary to specify the marginal distributions. At this point, several probability distributions could be considered. Generally, the choice for marginal distributions depends on the application. We restrict our analysis to the case where the marginal distributions are Weibull distributions. This is because it is a very flexible distribution for the modeling of various types of lifetime data. In addition, the parametrization of the Weibull distribution—as well as the mathematical expression of the AMH copula—is very attractive from the mathematical point of view, allowing the development of a Bayesian approach to estimate the parameters of interest in a clear and concise way.
As the conditional posterior distributions for parameters of interest does not follow any familiar distribution, the estimation procedure was carried out using versions of the Metropolis–Hastings algorithm, referred to here as Independent Metropolis–Hastings (IMH), Random Walk Metropolis (RWM) and Metropolis–Hastings (MH). MH refers to the Metropolis–Hastings algorithm with a natural candidate generating density whose parameters depend on the hyperparameter values and the observed data. Since the creation of a good candidate generating density in IMH and RWM can be difficult, we also used the slice sampling algorithm [13].
Combining IMH, RWM, MH and SS in different ways, we developed three MCMC algorithms to estimate the model parameters. A simulation study was carried out with the objective of investigating the behavior of each algorithm. The data sets were generated by considering different sample sizes and percentages of right-censored observations. Based on the root mean square error (RMSE), we identified the algorithms with the best performances when estimating the model parameters. We also compared the performances of the three algorithms using the effective sample size and the integrated autocorrelation time [14]. Results obtained from these simulations show that the algorithm that applied the SS algorithm is an effective alternative for standard MCMC methods (IMH and RWM) when simulating values from the posterior distribution of the model parameters, especially when the sample size is small.
We applied the three proposed algorithms to a real data set. This data set is related to diabetic retinopathy, described in The Diabetic Retinopathy Study Research Group [15], and is available in the `survival’ package [16] of the R software [17]. For this case, we compared the performance of the algorithms. Comparison was based on the RMSE relative to the empirical distribution function obtained from Kaplan–Meier estimates.
The remainder of the paper is organized as follows. In Section 2, we introduce the bivariate survival model based on the AMH copula with Weibull marginal distributions. The Bayesian approach and the three MCMC algorithms are described in Section 3. In Section 4, the simulation study is reported. In Section 5 we apply the three algorithms to the real data set. Section 6 summarizes our findings.

2. Bivariate Survival Model and Observed Data

Let ( T 1 , T 2 ) be the vector of bivariate lifetimes of an item (or an individual) with marginal density functions ( f ( t 1 | θ 1 ) , f ( t 2 | θ 2 ) ) and the survival functions be ( S ( t 1 | θ 1 ) , S ( t 2 | θ 2 ) ) , where θ 1 and θ 2 are unknown parameters (scalars or vectors).
Consider that ( T 1 , T 2 ) comes from the copula C ˜ ϕ , where ϕ is a parameter showing dependence between T 1 and T 2 . Then the joint survival function for ( T 1 , T 2 ) is given by
S ( t 1 , t 2 | θ , ϕ ) = C ˜ ϕ S 1 ( t 1 | θ 1 ) , S 2 ( t 2 | θ 2 ) ,
where θ = ( θ 1 , θ 2 ) and ϕ is a dependence parameter.
We also assume that copula C ˜ ϕ is given by the Ali–Mikhail–Haq copula [18]. Thus, we have
S ( t 1 , t 2 | θ , ϕ ) = C ˜ ϕ S 1 ( t 1 | θ 1 ) , S 2 ( t 2 | θ 2 ) = S 1 ( t 1 | θ 1 ) S 2 ( t 2 | θ 2 ) 1 ϕ 1 S 1 ( t 1 | θ 1 ) 1 S 2 ( t 2 | θ 2 ) ,
for ϕ [ 1 , 1 ) . Note that under this assumption the survival functions and the dependence structure can be visualized separately with the dependence structure represented by the copula.
Let ( T 11 , T 12 ) , , ( T n 1 , T n 2 ) and ( C 11 , C 12 ) , , ( C n 1 , C n 2 ) be a sample of size n of bivariate lifetimes and censured bivariate lifetimes, respectively. Suppose ( T i 1 , T i 2 ) and ( C i 1 , C i 2 ) are independent, for i = 1 , , n . Consider t i j = m i n ( T i j , C i j ) —the i-th observed value and δ i j —a censorship indicator given by
δ i j = 1 , if   the   lifetime   is   uncensored ,   i . e . , T i j = t i j ; 0 , if   the   lifetime   is   censored ,   i . e . , T i j > t i j ,
for j = 1 , 2 and i = 1 , , n . We denote the observed values using t = ( t 1 , t 2 ) and δ = ( δ 1 , δ 2 ) , where t 1 = ( t 11 , , t n 1 ) , t 2 = ( t 12 , , t n 2 ) , δ 1 = ( δ 11 , , δ n 1 ) and δ 2 = ( δ 12 , , δ n 2 ) .
The likelihood function for ( θ , ϕ ) , given ( t , δ ) , is (see Lawless, [19])
L ( θ , ϕ | t , δ ) = i = 1 n f ( t i 1 , t i 2 | θ , ϕ ) δ i 1 δ i 2 S ( t 1 ) δ i 1 ( 1 δ i 2 ) S ( t 2 ) ( 1 δ i 1 ) δ i 2 S ( t i 1 , t i 2 | θ , ϕ ) ( 1 δ i 1 ) ( 1 δ i 2 )
where f ( t i 1 , t i 2 | θ , ϕ ) = d 2 S ( t i 1 , t i 2 | θ , ϕ ) d t i 1 d t i 2 is the joint probability density function for ( t i 1 , t i 2 ) , S ( t 1 ) = d S ( t i 1 , t i 2 | θ , ϕ ) d t i 1 , S ( t 2 ) = d S ( t i 1 , t i 2 | θ , ϕ ) d t i 2 , and S ( t i 1 , t i 2 | θ , ϕ ) is the copula given by (1), for i = 1 , , n .
From Equation (1), we have
d 2 S ( t i 1 , t i 2 | θ , ϕ ) d t i 1 d t i 2 = f 1 ( t i 1 | θ 1 ) f 2 ( t i 2 | θ 2 ) ( 1 + ϕ ) ( 1 + ϕ F 1 ( t i 1 | θ 1 ) F 2 ( t i 2 | θ 2 ) ) 2 ϕ ( F 1 ( t i 1 | θ 1 ) + F 2 ( t i 2 | θ 2 ) ) 1 ϕ F 1 ( t i 1 | θ 1 ) F 2 ( t i 2 | θ 2 ) 3 , d S ( t i 1 , t i 2 | θ , ϕ ) d t i 1 = f 1 ( t i 1 | θ 1 ) S 2 ( t i 2 | θ 2 ) 1 ϕ F 2 ( t i 2 | θ 2 ) 1 ϕ F 1 ( t i 1 | θ 1 ) F 2 ( t i 2 | θ 2 ) 2 , d S ( t i 1 , t i 2 | θ , ϕ ) d t i 2 = f 2 ( t i 2 | θ 2 ) S 1 ( t i 1 | θ 1 ) 1 ϕ F 1 ( t i 1 | θ 1 ) 1 ϕ F 1 ( t i 1 | θ 1 ) F 2 ( t i 2 | θ 2 ) 2 ,
where F j ( t i j | θ j ) = 1 S j ( t i j | θ j ) is the cumulative distribution function for j = 1 , 2 and i = 1 , , n .

Weibull Marginal Distribution

Assume that the marginal distributions for T 1 and T 2 are given by Weibull distributions [20], i.e.,
T i 1 | α 1 , β 1 W e i b u l l ( α 1 , β 1 ) and T i 2 | α 2 , β 2 W e i b u l l ( α 2 , β 2 ) ,
with shape parameter α j and scale parameter β j α j [21], each one having a probability density function
f ( t i j | α j , β j ) = β j α j t i j α j 1 e x p { β j t i α j }
for j = 1 , 2 and i = 1 , , n .
The survival function S j ( t i j | θ j ) and hazard function h j ( t i j | θ j ) are
S j ( t i j | θ j ) = e x p β j t i j α j and h j ( t i j | θ j ) = β j α j t i j α j 1
respectively, where θ j = ( α j , β j ) for j = 1 , 2 and i = 1 , , n .
Thus, the joint survival function in (1) is
S ( t i 1 , t i 2 | θ , ϕ ) = e x p β 1 t i 1 α 1 e x p β 2 t i 2 α 2 1 ϕ 1 e x p β 1 t i 1 α 1 1 e x p β 2 t i 2 α 2
where θ = ( θ 1 , θ 2 ) . The likelihood function for ( θ , ϕ ) is
L ( θ , ϕ | t , δ ) j = 1 2 β j r j α j r j exp α j i = 1 n δ i j l o g ( t i j ) β j i = 1 n t i j α j i = 1 n Ψ i ( θ , ϕ | t , δ ) ,
where r j = i = 1 n δ i j is the number of uncensored observations for j = 1 , 2 , Ψ ( θ , ϕ | t , δ ) = k = 1 4 Ψ i k ( θ , ϕ | t , δ ) , and
Ψ i 1 ( θ , ϕ | t , δ ) = ( 1 + ϕ ) ( 1 + ϕ F 1 ( t i 1 | θ 1 ) F 2 ( t i 2 | θ 2 ) ) 2 ϕ ( F 1 ( t i 1 | θ 1 ) + F 2 ( t i 2 | θ 2 ) ) δ i 1 δ i 2 , Ψ i 2 ( θ , ϕ | t , δ ) = 1 ϕ F 2 ( t i 2 | θ 2 ) δ i 1 ( 1 δ i 2 ) , Ψ i 3 ( θ , ϕ | t , δ ) = 1 ϕ F 1 ( t i 1 | θ 1 ) δ i 2 ( 1 δ i 1 ) , Ψ i 4 ( θ , ϕ | t , δ ) = 1 ϕ F 1 ( t i 1 | θ 1 ) F 2 ( t i 2 | θ 2 ) ( δ i 1 + δ i 2 + 1 ) ,
for i = 1 , , n .

3. Bayesian Approach

In order to develop the Bayesian approach, we need to specify the prior distributions for α j , β j and ϕ , for j = 1 , 2 . We assume that priors are independent, i.e., π ( θ , ϕ ) = π ( θ ) π ( ϕ ) = j = 1 2 π ( α j ) π ( β j ) π ( ϕ ) . Therefore, we consider the following prior distributions
α j | a j 1 , a j 2 Γ ( a j 1 , a j 2 ) and β j | b j 1 , b j 2 Γ ( b j 1 , b j 2 ) ,
where Γ ( · ) is the Gamma distribution and a j 1 , a j 2 , b j 1 and b j 2 are known hyperparameters, all of them with support on ( 0 , + ) , for j = 1 , 2 . The parametrization of the Gamma distribution is such that the mean is a j 1 / a j 2 and the variance is a j 1 / a j 2 2 , for j = 1 , 2 . The choice of values for the hyperparameters depends on the application. In the remainder of the article, we set up the hyperparameters values that give prior distributions with large variances. In particular, we set a j 1 = b j 1 = 0.01 , for j = 1 , 2 . For ϕ we chose the uniform prior distribution on the interval ( 1 , 1 ) , ϕ U ( 1 , 1 ) .
Using Bayes theorem, the joint posterior distribution for ( θ , ϕ ) is
π ( θ , ϕ | t , δ ) L ( θ , ϕ | t , δ ) π ( θ ) π ( ϕ ) ,
where L ( θ , ϕ | t , δ ) is given in Equation (3).
The conditional posterior distributions are
π ( α j | t , δ , θ α j , ϕ ) α j a j 1 + r j 1 exp α j i = 1 n δ i j l o g ( t i j ) a j 2 β j i = 1 n t i j α j i = 1 n Ψ i ( θ , ϕ | t , δ ) ,
π ( β j | t , δ , θ β j , ϕ ) β j b j 1 + r j 1 exp β j b j 2 + i = 1 n t i j α j i = 1 n Ψ i ( θ , ϕ | t , δ ) and
π ( ϕ | t , δ , θ ) L ( θ , ϕ | t , δ ) I ϕ ( 1 , 1 ) ,
where θ ν j , for ν j { α j , β j } , is the vector of parameters θ without the parameter ν j , j = 1 , 2 .
The conditional posterior distributions in Equations (4)–(6) are not familiar distributions. Thus, in order to simulate from conditional posterior distributions, we used the Metropolis–Hastings algorithm. At each iteration, the Metropolis–Hastings algorithm considers a value generated from a proposal distribution. This value is accepted according to a properly specified acceptance probability. This procedure guarantees the convergence of the Markov chain for the target density. More details on the Metropolis–Hastings algorithm can be found in [22,23,24,25] and their references.

3.1. MCMC for α j

Without loss of generality, we describe here how to update parameter α 1 conditional on all other parameters, θ α 1 = ( β 1 , α 2 , β 2 ) and ϕ . The update procedure for α 2 is similar.
Let ( α 1 , θ α 1 , ϕ ) be the current state of the Markov chain. Consider α 1 a value generated from a candidate generating density q [ α 1 | α 1 ] . The value α 1 is accepted with probability ψ ( α 1 | α 1 ) = m i n ( 1 , A α 1 ) , where
A α 1 = L ( α 1 , θ α 1 , ϕ | t , δ ) π ( α 1 ) L ( α 1 , θ α 1 , ϕ | t , δ ) π ( α 1 ) q [ α 1 | α 1 ] q [ α 1 | α 1 ] ,
and L ( · | y ) is the likelihood function, given in Equation (3).
The Metropolis–Hastings algorithm is implemented as follows.
  • Metropolis–Hastings Algorithm: Let the current state of the Markov chain be α 1 ( l 1 ) , θ α 1 ( l 1 ) , ϕ ( l 1 ) , where l is the l-th iteration of the algorithm, α 1 ( l 1 ) , θ α 1 ( l 1 ) = β 1 ( l 1 ) , α 2 ( l 1 ) , β 2 ( l 1 ) and ϕ ( l 1 ) are the values of α 1 , θ α 1 and ϕ in ( l 1 ) -th iteration, respectively, for l = 1 , , L , in which, α ( 0 ) , θ α 1 ( 0 ) and ϕ ( 0 ) are the initial values. At the l-th iteration of the algorithm, we updated α 1 as follows:
    (1)
    Generate α 1 q [ α 1 | α 1 ] ;
    (2)
    Calculate ψ ( α 1 | α 1 ) = m i n ( 1 , A α 1 ) , where A α 1 is given by (7);
    (3)
    Generate U U ( 0 , 1 ) . If u ψ ( α 1 | α 1 ) accept α 1 and do α 1 ( l ) = α 1 . Otherwise, reject α 1 and set α 1 ( l ) = α 1 ( l 1 ) .

3.1.1. Two Common Choices for q [ · ]

To implement the Metropolis–Hastings algorithm, the candidate-generating density q [ α 1 | α 1 ] needs to be specified. Generally, one may explore the form of the conditional posterior distribution to set the candidate-generating density. For example, if we can write π ( α 1 | y , θ α 1 , ϕ ) as π ( α 1 | y , θ α 1 , ϕ ) η ( α 1 ) h ( α 1 ) , where h ( α 1 ) is a density that can be easily generated and η ( α 1 ) is uniformly bounded, then we may set up q ( α 1 | α 1 ) = h ( α 1 ) . However, this is not the case for π ( α 1 | y , θ α 1 ) .
Another option is to generate α 1 from a candidate generating density that does not depend on the current α 1 value. That is, we may set up q [ α 1 | α 1 ] = q [ α 1 ] . Thus, we have a special case of the original MH algorithm, called Independent Metropolis–Hastings (IMH), where A α 1 is given in (7) and simplifies to
A α 1 = L ( α 1 , θ α 1 , ϕ | t , δ ) π ( α 1 ) L ( α 1 , θ α , ϕ | t , δ ) π ( α 1 ) q [ α 1 ] q [ α 1 ] .
In order to implement this case, one may set q [ α 1 ] as the prior distribution, i.e., q [ α 1 ] = π ( α 1 ) . Then, A α 1 is given by the likelihood ratios,
A α 1 = L ( α 1 , θ α 1 , ϕ | t , δ ) L ( α 1 , θ α , ϕ | t , δ ) .
This algorithm is implemented as follows.
  • Independent Metropolis–Hastings Algorithm: Let the current state of the Markov chain be α 1 ( l 1 ) , θ α ( l 1 ) , ϕ ( l ) . For the l-th iteration of the algorithm do the following:
    (1)
    Generate α 1 from the prior distribution α 1 Γ ( a 11 , a 12 ) ;
    (2)
    Calculate ψ ( α 1 | α 1 ) = m i n ( 1 , A α 1 ) , where A α 1 is given by (8);
    (3)
    Generate U U ( 0 , 1 ) . If u ψ ( α 1 | α 1 ) accept α 1 and set α 1 ( l ) = α 1 . Otherwise, reject α 1 and set α 1 ( l ) = α 1 ( l 1 ) .
Although the choice of the prior distribution as the candidate generating density may be mathematically attractive, it usually leads to a slow convergence of the algorithm. This happens when vague prior information is available and prior distribution has large variance. As a consequence, many of the proposed values are rejected.
An alternative is to explore the neighborhood of the current value of the Markov chain to propose a new value. This method is termed the random walk Metropolis (RWM). In the RWM, the candidate value α 1 is generated from a symmetric density g ( · ) . That is, we set up q [ α 1 | α 1 ] = g ( | α 1 α 1 | ) and the probability of generating a move from α 1 to α 1 depends only on the distance between them. For this case, A α 1 given in (7) simplifies to
A α 1 = L ( α 1 , θ α 1 , ϕ | t , δ ) π ( α 1 ) L ( α 1 , θ α 1 , ϕ | t , δ ) π ( α 1 )
since the proposal kernels from numerator and denominator cancel.
In order to implement the RWM it is necessary to simulate α 1 setting α 1 = α 1 + ε , where ε is a random perturbation generated from a Normal distribution with mean 0 and variance σ α 1 2 , ε N ( 0 , σ α 1 2 ) , meaning that α 1 N ( α 1 , σ α 1 2 ) . This algorithm is implemented as follows.
  • Random Walk Metropolis Algorithm: Let the current state of the Markov chain be α 1 ( l 1 ) , θ α 1 ( l 1 ) , ϕ ( l ) . For the l-th iteration of the algorithm, l = 1 , , L , do the following:
    (1)
    Generate ε N ( 0 , σ α 1 2 ) and set α 1 = α 1 ( l 1 ) + ε ;
    (2)
    Calculate ψ ( α 1 | α 1 ) = m i n ( 1 , A α 1 ) , where A α 1 is given by (9);
    (3)
    Generate U U ( 0 , 1 ) . If u ψ ( α 1 | α 1 ) accept α 1 and set α 1 ( l ) = α 1 . Otherwise, reject α 1 and set α 1 ( l ) = α 1 ( l 1 ) .
An issue in RWM is how to choose the value of σ α 1 2 . It has a strong influence on the efficiency of the algorithm. If σ α 1 2 is too small, the random perturbations will be small in magnitude and almost all will be accepted. The consequence is that it will take a large number of iterations to explore the entire state-space. On the other hand, if σ α 1 2 is large there will be many rejections of the proposed values, slowing down the convergence. More details on this issue can be found in [23,26,27,28].
Typically, one may fix the value of σ α 1 2 by testing some values on a few pilot runs and then choosing a value whose acceptance ratio lies between 20 % and 30 % (see, for example, [24,25]). Thus, after a pilot run we set up σ α 2 = 1 .

3.1.2. Slice Sampling Algorithm

An alternative to the IMH and RWM sampling from some generic distribution is the slice sampling algorithm. This algorithm is a type of Gibbs sampling based on the simulation of specific uniform random variables. Here we explain the algorithm slice sampling in the context of the simulation of α 1 . The sampling procedure for α 2 is similar. More details about SS can be found in [13].
In SS, an auxiliary variable U is introduced and the joint distribution π ( α 1 , U | t , δ , θ α 1 , ϕ ) is given by a uniform distribution over the region U = { ( α 1 , u ) : 0 < u < κ ( α 1 ) } below the curve defined by κ ( α 1 ) . From (4), we have
κ ( α 1 ) = α 1 a 11 + r 1 1 exp α 1 i = 1 n δ i 1 l o g ( t i 1 ) a 12 β 1 i = 1 n t i 1 α 1 i = 1 n Ψ i ( θ , ϕ | t , δ ) .
Marginalizing π ( α 1 , U | t , δ , θ α 1 , ϕ ) over U yields π ( α 1 | t , δ , θ α 1 , ϕ ) , so sampling from π ( α 1 , U | t , δ , θ α 1 , ϕ ) and discarding U is equivalent to sampling from π ( α 1 | t , δ , θ α 1 , ϕ ) .
As sampling from π ( α 1 , U | t , δ , θ α 1 , ϕ ) is not straightforward, we implemented a Gibbs sampling algorithm where at every iteration l, we first generate U ( l ) U 0 , κ α 1 ( l 1 ) and then sample α 1 ( l ) U ( A ) , where A = { α 1 : u ( l ) < κ ( α 1 ) } . However, as the inverse of κ ( α 1 ) cannot be obtained analytically, we adopted the following procedure to update α 1 :
(i)
Let λ = 0.01 and A ˜ be an empty set.
(a)
For m = 1 , 2 , :
Set α 1 ( m ) = α 1 ( l 1 ) m λ
If u ( l ) < κ α 1 ( m ) do A ˜ = A ˜ α 1 ( m ) else break
(b)
For m = 1 , 2 , :
Set α 1 + ( m ) = α 1 ( l 1 ) + m λ
If u ( l ) < κ α 1 + ( m ) do A ˜ = A ˜ α 1 + ( m ) else break
(ii)
Generate α 1 ( l ) U ( m i n ( A ˜ ) , m a x ( A ˜ ) ) .
This algorithm is implemented as follows.
  • Slice sampling algorithm: Let the current state of the Markov chain be α 1 ( l 1 ) , θ α 1 ( l 1 ) , ϕ ( l 1 ) and u ( l 1 ) . For the l-th iteration of the algorithm, l = 1 , , L :
    (1)
    Generate U ( l ) U 0 , κ α 1 ( l 1 ) , where κ ( · ) is given by (10).
    (2)
    obtain A ˜ , conditional on u ( l ) .
    (3)
    Generate α 1 ( l ) U ( m i n ( A ˜ ) , m a x ( A ˜ ) ) .

3.2. MCMC for β j and ϕ

Note from (5) that the conditional posterior distribution for the scale parameter β 1 , π ( β 1 | t , δ , θ β 1 , ϕ ) , is given by the kernel of a Gamma distribution with parameters b 11 + r 11 and b 12 + i = 1 n t i 1 α 1 multiplied by η ( β 1 ) = i = 1 n Ψ i ( θ , ϕ | t , δ ) . In other words, π ( β 1 | t , δ , θ β 1 , ϕ ) may be written as π ( β 1 | y , θ β 1 ) η ( β 1 ) h ( β 1 ) , where h ( β 1 ) is the density of the Gamma distribution Γ b 11 + r 11 , b 12 + i = 1 n t i 1 α 1 with η ( β 1 ) being uniformly bounded. Thus, we set up the candidate generating density for β 1 as q ( β 1 | β 1 ) = h ( β 1 ) . The acceptance probability for the generated value β 1 is given by ψ ( β 1 | β 1 ) = m i n ( 1 , A β 1 ) , where
A β 1 = η ( β 1 ) η ( β 1 ) .
This algorithm is implemented as follows.
  • Metropolis–Hastings Algorithm: Let the current state of the Markov chain be β 1 ( l 1 ) , θ β 1 ( l 1 ) , ϕ ( l 1 ) , where θ β 1 ( l 1 ) = α 1 ( l ) , α 2 ( l 1 ) , β 2 ( l 1 ) . For the l-th iteration of the algorithm, l = 1 , , L :
    (1)
    Generate β 1 Γ b 11 + r 11 , b 12 + i = 1 n t i 1 α 1 ( l ) .
    (2)
    Calculate ψ ( β 1 | β 1 ) = m i n ( 1 , A β 1 ) , where A β 1 is given by (11).
    (3)
    Generate U U ( 0 , 1 ) . If u ψ ( β 1 | β 1 ) accept β 1 and set β 1 ( l ) = β 1 . Otherwise, reject β 1 and set β 1 ( l ) = β 1 ( l 1 ) .
The Metropolis–Hastings algorithm for updating β 2 is similar. To update the dependence parameter ϕ conditional on the remaining parameters θ = ( α 1 , β 1 , α 2 , β 2 ) , we used the following IMH algorithm. Let G ϕ be a grid from 1 to 1 with increments of 0.1 . Consider [ I a , I a + 1 ) , an interval defined by two adjacent grid values of G ϕ where a is the index of the a-th value of the grid for a = 1 , , 20 . For example, for a = 1 we have the interval [ 1 , 0.9 ) ; for a = 11 , we have the interval [ 0 , 0.1 ) ; and for a = 20 we have the interval [ 0.9 , 1 ) . Then generate the a candidate value ϕ as follows:
(i)
If the current value of ϕ is in the interval ( I 1 , I 2 ) , then generate ϕ from one of the two following Uniform distributions
ϕ U ( I 1 , I 2 ) , with   probability   1 / 2 , U ( I 2 , I 3 ) , with   probability   1 / 2 .
For this case, we generate an auxiliary variable U U ( 0 , 1 ) ; if u 1 / 2 , then we generate ϕ from U ( I 1 , I 2 ) , ϕ U ( I 1 , I 2 ) , otherwise we generate ϕ from U ( I 2 , I 3 ) , ϕ U ( I 2 , I 3 ) .
(ii)
If the current value of ϕ is in ( I 20 , I 21 ) , then generate ϕ from one of the two following uniform distributions
ϕ U ( I 19 , I 20 ) , with   probability   1 / 2 , U ( I 20 , I 21 ) , with   probability   1 / 2 ,
Similarly to item (i), we generate an auxiliary variable U U ( 0 , 1 ) ; if u 1 / 2 , then ϕ U ( I 20 , I 21 ) , otherwise ϕ U ( I 19 , I 20 ) .
(iii)
If the current value of ϕ is in the interval ( I a , I a + 1 ) , for a 1 and a 20 , then generate ϕ from one of three following uniform distributions
ϕ U ( I a 1 , I a ) , with   probability   1 / 3 , U ( I a , I a + 1 ) , with   probability   1 / 3 , U ( I a + 1 , I a + 2 ) , with   probability   1 / 3 .
For this case, we generate an auxiliary variable U U ( 0 , 1 ) ; if u 1 / 3 , then we generate ϕ from U ( I a 1 , I a ) , ϕ U ( I a , I a + 1 ) ; if 1 / 3 < u 2 / 3 , then we generate ϕ from U ( I a , I a + 1 ) , ϕ U ( I a , I a + 1 ) ; and if u > 2 / 3 , we generate ϕ from U ( I a + 1 , I a + 2 ) , ϕ U ( I a + 1 , I a + 2 ) .
The acceptance probability is given by ψ [ ϕ | ϕ ] = m i n ( 1 , A ϕ ) , where A ϕ = L ( ϕ , θ | t , δ ) L ( ϕ , θ | t , δ ) P ϕ for P ϕ = 1 or P ϕ = 1 / 2 1 / 3 according to items (i)–(iii) described above. This algorithm is implemented as follows.
  • IMH algorithm for ϕ : Let the current state of the Markov chain be θ ( l ) , ϕ ( l 1 ) . For the l-th iteration of the algorithm, l = 2 , , L :
    (1)
    Generate ϕ according to one of the items (i), (ii) or (iii) described above.
    (2)
    Calculate ψ ( ϕ | ϕ ) = m i n ( 1 , A ϕ ) .
    (3)
    Generate U U ( 0 , 1 ) . If u ψ ( ϕ | ϕ ) accept ϕ and set ϕ ( l ) = ϕ . Otherwise, reject ϕ and set ϕ ( l ) = ϕ ( l 1 ) .

3.3. MCMC Algorithms

Using the algorithms IMH, RWM, SS and MH described above, we implemented three MCMC algorithms:
  • Algorithm A 1 : Parameters α j ’s are updated via IMH,
  • Algorithm A 2 : Parameters α j ’s are updated via RWM,
  • Algorithm A 3 : Parameters α j ’s are updated via SS.
For these three algorithms, the parameters β j and ϕ are updated via MH and IMH, as described in Section 3.2, for j = 1 , 2 .
After defining the algorithms, we ran them for L iterations and a burn-in B. We also consider jumps of size J, i.e., only 1 drawn from every J was extracted from the original sequence obtaining a sub sequence of size S = [ ( L B ) / J ] to make inferences.
The estimates for parameters are given by
α ˜ j = 1 S l = 1 L α j ( K ( l ) ) ; β ˜ j = 1 S l = 1 L β ( K ( l ) ) and ϕ ˜ = 1 S l = 1 L ϕ ( K ( l ) ) ,
where θ ( K ( l ) ) is the value generated for θ in the K ( l ) = [ ( B + 1 + l J ) ] -th iteration of the algorithm, for j = 1 , 2 and l = 1 , , L .

4. Simulation Study

In this section, we present the comparison between the performances of the three algorithms applied to simulated data sets. Simulated random samples of sizes n = 25 , 50 , 100 and 250 with 0 % , 5 % , 10 % , 20 % and 30 % random right-censored were generated to represent small, medium and large data sets. Using these, we generated four simulated data sets with fixed parameters, as specified in Table 1.
Data set D 1 has two increasing hazard functions with a positive dependence parameter, while data set D 2 has a constant and increasing hazard function with a negative dependence parameter. Data set A 3 has parameters to produce a decreasing and a constant hazard function with weak dependence, while data set A 4 has strong dependence and two increasing hazard functions.
The simulation procedure to generate n observations ( t i 1 , t i 2 ) , for i = 1 , , n , is given by the following steps:
(i)
Set up the sample size n and set i = 1 ;
(ii)
Generate the censoring times C i j U ( 0 , τ j ) , where τ j controls the percentage of censored observations, for j = 1 , 2 ;
(iii)
Generate uniform values u i j U ( 0 , 1 ) , j = 1 , 2 and calculate w i , the solution of the nonlinear equation u i 2 w i [ 1 ϕ ( 1 w i ) ] [ 1 ϕ ( 1 u i 1 ) ( 1 w i ) ] 2 = 0 . Here we used the rootsolve package and the uniroot.all command from R software to solve the nonlinear equation and obtain w i ;
(iv)
Calculate T i 1 = log ( u i 1 ) / β 1 1 / α 1 and T i 2 = log ( w i ) / β 2 1 / α 2 ;
(v)
Calculate the times t i j = min ( T i j , C i j ) and the censorship indicators δ i j , which are equal to 1 if t i j < T i j and 0 otherwise, for j = 1 , 2 ;
(vi)
Set i = i + 1 . If i = n stop. Otherwise, return to step (ii).
We generated M = 200 different simulated data sets according to steps (i)–(vi) described above and the parameters were estimated according to algorithms A 1 , A 2 and A 3 .
We used hyperparameters a j 1 = a j 2 = b j 1 = b j 2 = 0.01 to obtain prior distributions with large variance, for j = 1 , 2 . For the m-th generated data set, we applied algorithms A 1 , A 2 and A 3 fixing L = 55,000 iterations, burn-in B = 5000 and J = 10 .
Comparison of the algorithms was made using the sample Root Mean Square Error (RMSE), given by
RMSE = 1 M m = 1 M j = 1 2 α ^ j ( m ) α j 2 + β ^ j ( m ) β j 2 + ( ϕ ^ ( m ) ϕ ) 2 .
A smaller RMSE indicates better overall quality of the estimates.
Table 2 presents the RMSE value for each simulated data set by algorithm, sample size and percentage of censorship. The smaller RMSE value for each sample size and percentage of censorship is highlighted in bold. For the three algorithms, by fixing the sample size and increasing the censuring percentage (% cens.), the RMSE values increased. When the sample size increases at a fixed percentage of censures, the RMSE values decrease, consequently improving the precision of the estimators.
Based on the results presented in Table 2, for the smaller sample size n = 25 , the algorithm A 3 (with SS) outperformed algorithm A 1 (with IMH) and algorithm A 2 (with RWM), i.e., it gave a smaller RMSE value for all percentages of censures. This better performance also happened for data sets D 3 and D 4 for n = 50 . For all other simulated cases, the algorithm A 2 outperformed algorithms A 1 and A 3 . An exception is the case with n = 250 and 0 % of censuring in data set D 2 , in which algorithm A 1 had a better performance. These results suggest a possible complementarity between algorithms A 2 and A 3 , where algorithm A 2 performs better for higher sample sizes and algorithm A 3 performs better for smaller sample sizes.
We verified the convergence of algorithms A 1 , A 2 and A 3 using the effective sample size [14] and the integrated autocorrelation time (IAT). The effective sample size (ESS) is the number of effectively independent draws from the posterior distribution. Method with larger ESS are the most efficient. The IAT is a MCMC diagnostic that estimates the average number of autocorrelated samples required to produce one independent sample draw. Lower IAT is means more efficiency. The EES and IAT values were obtained using the coda and LaplacesDemon. Both packages are available in the R software.
Table A1 and Table A2 in Appendix A show the average of ESS and IAT values for each algorithm by parameter for data set D 1 . Algorithm A 3 showed a better performance than algorithms A 1 and A 2 , i.e., it had the highest ESS values and smallest IAT values by parameter for all simulated cases. Note that algorithm A 1 had the worst results, especially for simulated values for α j , j = 1 , 2 . Results for data sets D 2 , D 3 and D 4 were similar.
Appendix B presents an empirical convergence check for the sampled values for α 1 for each algorithm. As shown in Figure A1, the generated values for α 1 by algorithm A 1 did not mix well and the stability for the ergodic mean and estimated autocorrelation were not satisfactory. On the other hand, the values generated by algorithms A 2 and A 3 were well mixed and present satisfactory stability for the ergodic mean and autocorrelation. As an illustration of convergence diagnostic, Figure A1 (j–l) shows the Gelman plot for the sequence of α 1 values in two chains by each algorithm. As can be seen in the figure, the number of iterations was sufficient for algorithms A 2 and A 3 to reach convergence, but not for algorithm A 1 . In addition, the scale reduction factor of the Gelman–Rubin diagnostic [29] for each parameter in algorithms A 2 and A 3 were smaller than 1.1 , meaning that there is no indication of non-convergence. This implies a faster convergence of algorithms A 2 and A 3 in relation to algorithm A 1 . For β 1 sampled values, the three algorithms present satisfactory properties, i.e., good mixing, and satisfactory stability for ergodic mean and autocorrelation (see Figure A2 in Appendix B).
The results indicate that algorithm A 3 (SS for α j ) is an effective alternative to algorithms A 1 (with IMH for α j ) and A 2 (with RWM for α j ) to simulate samples from the posterior distribution of bivariate survival models based on the Ali–Mikhail–Haq copula with marginal Weibull distributions.

5. Application to a Real Data Set

Next, we examine the performance of algorithms A 1 , A 2 and A 3 on the diabetic retinopathy data set described in [15], which is available in the R software `survival’ package [16]. This data set consists of the follow-up times of 197 diabetic patients under 60 years of age. The main objective of the study was to evaluate the effectiveness of the photocoagulation treatment for proliferative retinopathy. The treatment was randomly assigned to one eye of each patient and the other eye was taken as a control.
Let ( T 1 , T 2 ) be the bivariate times, where T 1 is the time to visual loss for the treatment eye and T 2 is the time to visual loss for the control eye. The percentage of censure times for each variable is 72.59 % (143 observations) for T 1 and 48.73 % (96 observations) for T 2 .
We used (1) to model this data with Weibull marginal distributions with parameters α j and β j and dependence parameter ϕ .
We compared the performances of the algorithms using the RMSE in relation to the empirical distribution function,
RMSE = 1 n i = 1 n j = 1 2 F ^ j ( t i j ) F j ( t i j ) 2 ,
where F ^ j ( t i j ) is obtained by substituting the estimates of α j , β j and ϕ (obtained by each algorithm); and F j ( t i j ) is the empirical distribution function obtained from the Kaplan–Meier estimates, for j = 1 , 2 and i = 1 , , n .
We ran the three algorithms using the same number of iterations, burn-in, thinning and hyperparameters values used with the simulation data. Table 3 shows the parameters estimates, the credibility intervals ( 95 % ) and RMSE values by algorithm. For this data set, the algorithm A 3 (with SS for α j ) gave the smaller RMSE value.
Figure 1 shows the estimated survival functions by algorithms A 1 (red line) and A 3 (blue line). The step functions (black lines) are the Kaplan–Meier estimates. The estimated curves by algorithms A 1 and A 2 are very close and so we show only the curve estimated by A 1 , in order to provide a good visualization. The Kaplan–Meier estimates were obtained using the survival package and the survfit command in the R software.
Table 4 shows the ESS and IAT values for the sequences generated by algorithms A 1 , A 2 , and A 3 . Algorithm A 3 had a better performance than algorithms A 1 and A 2 , i.e., the highest ESS value and the lowest IAT value per parameter.
We also compared the performances of the algorithms in relation to the sequences generated for each parameter. Figure 2 shows the traceplots, the ergodic means, and the autocorrelations for sequences of α 1 values simulated by algorithms A 1 , A 2 and A 3 .
It can be observed in these graphs that the α 1 values generated by the IMH (algorithm A 1 ) has poor mixing, does not show satisfactory stability for the ergodic mean, and the autocorrelation is high for long lags. On the other hand, the values generated by the RWM (algorithm A 2 ) and SS (algorithm A 3 ) are better mixed and present satisfactory stability for the ergodic mean. However, the sequence produced by the SS presents the steepest decreasing autocorrelation. Figure 3 shows the same graphs for parameter β 1 . As can be seen, for β 1 the performances of the three algorithms are satisfactory. These results, together with those presented by the RMSE, show that for the data set analyzed here SS provides a better performance than IMH or RWM.
Figure 4 shows the Gelman plot for the simulated values for α 1 , β 1 and ϕ in two chains by each algorithm. As can be seen, the number of iterations was sufficient for algorithms A 2 and A 3 to reach the convergence, but not sufficient for algorithm A 1 (Figure 4a,b). The scale reduction factor for each parameter in algorithms A 2 and A 3 are all less than 1.1 , while for algorithm A 1 only ϕ presents a scale reduction factor less than 1.1 .

6. Final Remarks

We investigated the performances of three Bayesian computational methods to estimate parameters of a bivariate survival model based on the Ali–Mikhail–Haq copula with marginal Weibull distributions. The performances of the MCMC algorithms were compared using the RMSE criterion. The RMSE values were calculated for different sample sizes and different percentages of censures.
The results obtained from the simulated data sets showed that the RWM and SS algorithms outperformed the IMH algorithm, and that the SS algorithm performed better for lower sample sizes. The results show evidence that MCMC sequences obtained with SS with the same number of iterations L, burn in B and thinning value, have better properties (i.e., higher ESS and lower IAT values) than for IMH and RWM, which are standard methods to sample from the joint posterior distribution.
We also illustrate the application of the algorithms using a real data set, available in the literature. The algorithm A 3 (with SS generating the α j ’s) presented a better performance when applied to this data set. The criteria used to reach this conclusion were the stability for the ergodic mean, the autocorrelation, the minimum RMSE value, the maximum E S S value, and the minimum I A T value. In addition, the algorithm using SS presented a satisfactory performance in relation to scale factor reduction, and the Gelman plot of the Gelman–Rubin convergence diagnostic.
Our results show that algorithm A 3 , which is composed by a mixing of SS for generating α j , MH for β j and IMH for ϕ , is an effective algorithm to simulate values from the joint posterior distribution of an AMH copula with Weibull marginal distributions. Moreover, two advantages of SS are that it is easy to implement and it does not need to specify a candidate generating density. A disadvantage in our specific case is that it took longer to perform the simulation when compared with IMH and RWM. The reason for this longer time is that we needed an iterative method to obtain the inverse of the function κ ( α j ) . This was because an analytical solution is not available. All calculations were implemented using the software R and can be obtained from the authors.
An extension of the results obtained here for other Arquimedian copulas as well other marginal distributions and a possible generalization would be a fruitful area for future work.

Author Contributions

The authors E.F.S. and A.K.S. developed the theoretical part of the research. The authors E.F.S., A.K.S. and L.A.M. developed the simulation studies and real data application.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. ESS and IAT Values for Simulated Data Sets

In this section, we present the average of ESS and IAT values for each algorithm by parameter for data set D 1 . As discussed in Section 4, Algorithm A 3 presented a better performance than algorithms A 1 and A 2 . The results for data sets D 2 , D 3 and D 4 are similar.
Table A1. E S S by algorithm for data sets D 1 .
Table A1. E S S by algorithm for data sets D 1 .
Sample Size% of CensuresAlgorithm A 1 Algorithm A 2 Algorithm A 3
α 1 β 1 α 2 β 2 ϕ α 1 β 1 α 2 β 2 ϕ α 1 β 1 α 2 β 2 ϕ
n = 25 0 % 25.41149.926.01168.4105.91741.73493.71816.33511.8111.24547.74110.04540.04136.9112.2
5 % 26.41360.427.41311.1100.61758.13530.21823.33563.4106.84569.74118.54622.44125.7112.0
10 % 27.91570.528.21422.597.61783.33543.01827.73598.999.94604.94220.74672.74191.9105.2
20 % 31.82178.730.11988.695.61869.03943.11822.23738.993.94681.84275.14726.54182.397.9
30 % 32.92293.832.72146.388.51931.04018.41772.03885.788.14782.54350.34744.44329.989.6
n = 50 0 % 19.4860.719.51049.2173.01415.23259.11774.83450.9172.74607.704132.94610.44129.5176.9
5 % 19.61061.118.7968.2167.21475.83456.21796.23517.1167.34680.24226.34698.94187.6169.3
10 % 21.11331.720.61168.2163.21565.63662.31861.43700.1155.84706.14237.64698.84148.0171.4
20 % 22.52134.523.12005.2141.61668.83926.31922.53804.2140.04825.14374.94792.84299.3143.6
30 % 24.32604.924.52241.4127.01770.54188.21989.04047.5132.24817.74504.14819.84364.1133.8
n = 100 0 % 14.3817.514.8826.7316.71107.53258.61518.93429.5323.94609.34244.34668.74169.3325.2
5 % 14.5899.714.5807.8304.11136.73393.61549.63522.7290.04639.94238.74689.24222.8311.4
10 % 15.61157.915.0938.3276.91199.23617.41598.73698.5272.94729.94311.94800.54295.0277.3
20 % 16.31846.416.41540.7260.71297.13886.41706.23834.2265.24833.44465.14827.24399.4271.4
30 % 17.63127.317.72337.1224.41414.14292.01831.94128.8211.14857.64475.24862.94410.8226.3
n = 250 0 % 10.3655.310.0662.7672.9712.32856.11055.43236.4687.84588.14210.64655.54275.5698.8
5 % 10.7800.510.5816.3672.3742.53106.11083.33343.3640.04664.54333.84734.34277.8693.9
10 % 10.71024.210.8951.7602.3786.73369.71128.43519.9607.54728.84362.84757.34338.3620.0
20 % 10.71735.211.81494.5549.7863.03890.01226.93845.6539.64741.74440.44805.14451.7550.0
30 % 12.23259.712.12271.8466.2936.64279.21308.94147.7477.24872.74625.04858.44552.6481.6
Table A2. I A T by algorithm for data sets D 1 .
Table A2. I A T by algorithm for data sets D 1 .
Sample Size% of CensuresData Set A 1 Data Set A 2 Data Set A 3
α 1 β 1 α 2 β 2 ϕ α 1 β 1 α 2 β 2 ϕ α 1 β 1 α 2 β 2 ϕ
n = 25 0 % 162.72.4162.42.350.63.01.52.91.550.21.11.31.11.250.0
5 % 162.32.2154.02.352.52.91.52.81.550.21.11.21.11.250.0
10 % 152.72.0150.92.354.12.91.52.81.554.81.11.21.11.251.3
20 % 136.81.7136.61.955.42.71.32.81.455.81.11.21.11.254.5
30 % 132.21.7130.41.759.92.61.33.01.459.81.11.21.11.257.6
n = 50 0 % 208.92.3213.52.233.23.71.62.91.532.81.11.21.11.232.5
5 % 208.72.0233.62.234.83.51.52.91.534.51.11.21.11.234.2
10 % 198.61.9206.52.235.63.31.42.71.436.01.11.21.11.235.2
20 % 183.61.6179.41.639.53.11.32.71.439.21.11.21.11.239.0
30 % 170.51.5170.01.643.22.91.22.51.341.91.11.11.11.240.3
n = 100 0 % 288.12.1278.22.217.94.61.63.41.518.11.11.21.11.217.2
5 % 284.72.2287.22.219.74.51.53.31.520.31.11.21.11.218.9
10 % 266.81.9271.91.921.34.21.43.21.420.51.11.21.11.220.3
20 % 250.01.6252.81.722.83.91.43.01.422.41.11.11.11.222.3
30 % 233.41.3227.11.526.53.61.22.81.227.01.11.11.11.226.2
n = 250 0 % 417.92.0418.82.07.97.11.84.81.67.91.11.21.11.27.6
5 % 400.61.9399.72.08.26.81.74.71.68.41.11.21.11.28.1
10 % 391.71.8366.71.89.16.51.54.51.59.01.11.21.11.28.8
20 % 374.61.5355.91.610.25.91.34.11.410.31.11.21.11.210.1
30 % 358.91.3339.21.411.85.5.1.53.92.111.71.11.11.11.111.1

Appendix B. Empirical Illustration of the Convergence

We present here an empirical illustration of the convergence of the simulated sequences for parameters α 1 and β 1 . We randomly selected a data set from one of the M = 200 generated data sets D 1 with n = 100 and % c e n s = 5 and present the traceplot, graphs showing of the ergodic mean and autocorrelation of the sampled values by algorithm and the Gelman plot.
Figure A1 shows the performance of the algorithms for sampled α 1 values. It can be observed that the IMH (algorithm A 1 ) does not mix well, it does not have stability for the ergodic mean, and the estimated autocorrelation does not decrease as fast as the other algorithms. The sequences of α 1 ’s generated by RWM and SS are well mixed and present satisfactory stability for the ergodic mean, and the autocorrelation decreases faster, with a clear advantage for algorithm A 3 . The Gelman plot indicates that the number of iterations used was sufficient for algorithms A 2 and A 3 to reach the convergence.
Figure A2 presents the performances of each algorithm for the sequence generated for β 1 . As can be observed, the three algorithms present satisfactory properties. The satisfactory performance of the three algorithms is mainly due to the fact that β 1 has a natural candidate generating density with parameters depending on the observed data and values of hyperparameters.
Figure A1. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for α 1 .
Figure A1. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for α 1 .
Entropy 20 00642 g0a1aEntropy 20 00642 g0a1b
Figure A2. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for β 1 .
Figure A2. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for β 1 .
Entropy 20 00642 g0a2aEntropy 20 00642 g0a2b

References

  1. Sahu, S.K.; Dey, D.K. A comparison of frailty and other models for bivariate survival data. Lifetime Data Anal. 2000, 6, 207–228. [Google Scholar] [CrossRef] [PubMed]
  2. Zhang, S.; Zhang, Y.; Chaloner, K.; Stapleton, J.T. A copula model for bivariate hybrid censored survival data with application to the MACS study. Lifetime Data Anal. 2010, 16, 231–249. [Google Scholar] [CrossRef] [PubMed]
  3. Shih, J.H.; Louis, T.A. Inferences on the association parameter in copula models for bivariate survival data. Biometrics 1995, 51, 1384–1399. [Google Scholar] [CrossRef] [PubMed]
  4. Othus, M.; Li, Y. A Gaussian copula model for multivariate survival data. Stat. Biosci. 2010, 2, 154–179. [Google Scholar] [CrossRef] [PubMed]
  5. Nelsen, R.B. An Introduction to Copulas; Springer: New York, NY, USA, 2006. [Google Scholar]
  6. Durante, F.; Sempi, C. Principles of Copula Theory; CRC/Chapman and Hall: London, UK, 2015. [Google Scholar]
  7. Romeo, J.S.; Tanaka, N.I.; Pedroso-de-Lima, A.C. Bivariate survival modeling: A Bayesian approach based on copulas. Lifetime Data Anal. 2006, 12, 205–222. [Google Scholar] [CrossRef] [PubMed]
  8. Da Cruz, J.N.; Ortega, E.M.M.; Cordeiro, G.M.; Suzuki, A.K.; Mialhe, F.L. Bivariate odd-log-logistic-Weibull regression model for oral health-related quality of life. Commun. Stat. Appl. Methods 2017, 24, 271–290. [Google Scholar] [CrossRef] [Green Version]
  9. Louzada, F.; Suzuki, A.K.; Cancho, V.G. The FGM long-term bivariate survival copula model: Modeling, Bayesian estimation, and case influence diagnostics. Commun. Stat. Theory Methods 2013, 42, 673–691. [Google Scholar] [CrossRef]
  10. Suzuki, A.K.; Louzada, F.; Cancho, V.G. On estimation and influence diagnostics for a bivariate promotion lifetime model based on the FGM copula: A fully Bayesian computation. TEMA 2013, 14, 441–461. [Google Scholar] [CrossRef]
  11. Romeo, J.S.; Meyer, R.; Gallardo, D.I. Bayesian bivariate survival analysis using the power variance function copula. Lifetime Data Anal. 2018, 24, 355–383. [Google Scholar] [CrossRef] [PubMed]
  12. Kumar, P. Probability Distributions and Estimation of Ali–Mikhail–Haq Copula. Appl. Math. Sci. 2010, 14, 657–666. [Google Scholar]
  13. Neal, R.M. Slice sampling. Ann. Stat. 2003, 31, 705–767. [Google Scholar] [CrossRef]
  14. Kass, R.E.; Carlin, B.P.; Gelman, A.; Neal, R.M. Markov Chain Monte Carlo in Pratice: A Roundtable Discussion. Am. Statist. 1998, 52, 93–100. [Google Scholar]
  15. The Diabetic Retinopathy Study Research Group. Preliminary report on the effect of photocoagulation therapy. Am. J. Ophthalmol. 1976, 81, 383–396. [Google Scholar] [CrossRef]
  16. Therneau, T.M. A Package for Survival Analysis in S, Version 2.38. 2015. Available online: https://CRAN.R-project.org/package=survival (accessed on 4 July 2018).
  17. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2012; ISBN 3-900051-07-0. [Google Scholar]
  18. Ali, M.M.; Mikhail, N.N.; Haq, M.S. A class of bivariate distributions including the bivariate logistic. J. Multivar. Anal. 1978, 8, 405–412. [Google Scholar] [CrossRef]
  19. Lawless, J.F. Statistical Models and Methods for Life Time Data; John Wiley and Sons: New York, NY, USA, 1974. [Google Scholar]
  20. Weibull, W. A statistical distribution function of wide applicability. AMSE J. Appl. Mech. 1951, 18, 292–297. [Google Scholar]
  21. Collett, D. Modelling Survival Data in Medical Research, 3rd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2015. [Google Scholar]
  22. Hastings, W.K. Monte Carlo sampling methods using Markov Chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
  23. Chib, S.; Greenberg, E. Understanding the Metropolis–Hastings algorithm. Am. Stat. 1995, 49, 327–335. [Google Scholar]
  24. Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis; Chapman and Hall: London, UK, 1995. [Google Scholar]
  25. Gilks, W.R.; Richardson, S.; Spiegelhalter, D.J. Markov Chain Monte Carlo in Practice; Chapman and Hall: London, UK, 1996. [Google Scholar]
  26. Roberts, G.; Gelman, A.; Gilks, W. Weak convergence and optimal scaling of Random Walk Metropolis algorithms. Ann. Appl. Probab. 1997, 7, 110–120. [Google Scholar] [CrossRef]
  27. Bedard, M. Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab. 2007, 17, 1222–1244. [Google Scholar] [CrossRef]
  28. Mattingly, J.C.; Pillai, N.S.; Stuart, A.M. Diffusion limits of the random walk Metropolis algorithm in high dimensions. Ann. Appl. Probab. 2011, 22, 881–930. [Google Scholar] [CrossRef]
  29. Gelman, A.; Rubin, D.B. Inference from Iterative Simulation using Multiple Sequences. Stat. Sci. 1992, 7, 457–511. [Google Scholar] [CrossRef]
Figure 1. The estimated survival function for algorithms A 1 and A 3 .
Figure 1. The estimated survival function for algorithms A 1 and A 3 .
Entropy 20 00642 g001
Figure 2. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for α 1 .
Figure 2. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for α 1 .
Entropy 20 00642 g002aEntropy 20 00642 g002b
Figure 3. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for β 1 .
Figure 3. Traceplot, ergodic mean and autocorrelation for sequences produced by algorithms A1, A2 and A3 for β 1 .
Entropy 20 00642 g003aEntropy 20 00642 g003b
Figure 4. Gelman plot for two sequences produced by algorithms A1, A2 and A3 for α 1 , β 1 and ϕ .
Figure 4. Gelman plot for two sequences produced by algorithms A1, A2 and A3 for α 1 , β 1 and ϕ .
Entropy 20 00642 g004aEntropy 20 00642 g004b
Table 1. Parameter values for simulated data sets.
Table 1. Parameter values for simulated data sets.
Data SetParameters
α 1 β 1 α 2 β 2 ϕ
D 1    2.00      1.00      3.00      1.00      0.50   
D 2 1.002.002.000.50−0.75
D 3 0.751.501.002.000.05
D 4 1.802.402.201.200.95
Table 2. Root mean square error (RMSE) by algorithm for data sets D 1 , D 2 , D 3 and D 4 .
Table 2. Root mean square error (RMSE) by algorithm for data sets D 1 , D 2 , D 3 and D 4 .
Sample Size% of CensuresData Set D 1 Data Set D 2 Data Set D 3 Data Set D 4
AlgorithmAlgorithmAlgorithmAlgorithm
A 1 A 2 A 3 A 1 A 2 A 3 A 1 A 2 A 3 A 1 A 2 A 3
n = 25 0 % 0.36780.37170.35810.37740.37810.34580.33750.33700.33681.10851.08881.0883
5 % 0.40780.38690.35970.38610.39010.37360.35860.35730.35231.13251.13051.1278
10 % 0.41890.40120.36700.41440.42590.41350.36870.36750.36111.14281.13961.1323
20 % 0.42450.41530.37720.44720.46480.43810.37720.37290.37271.17261.17141.1711
30 % 0.43620.45430.39890.53350.56140.53030.39940.39900.39441.20781.19461.1925
n = 50 0 % 0.25950.25070.26780.26330.25520.25730.21620.21120.20481.03971.03181.0312
5 % 0.26630.26520.26990.26410.26010.27190.22390.22830.22331.04701.04421.0403
10 % 0.28310.28060.28140.29590.26830.28440.23900.24570.22691.04831.04531.0433
20 % 0.28460.28200.28630.29660.28200.30260.27190.25460.23661.05171.05281.0513
30 % 0.29830.28850.31040.32450.31700.31820.28280.27760.27361.09151.06661.0550
n = 100 0 % 0.18220.18190.18330.19170.18160.18780.16640.16570.17021.01531.00411.0124
5 % 0.19530.18510.18590.19250.18570.19140.17690.17550.17821.02281.00631.0152
10 % 0.19820.19240.19270.20260.20190.20230.17880.17600.17911.02391.00881.0157
20 % 0.19960.19640.20740.20290.20280.20470.19340.18320.18791.02821.00921.0177
30 % 0.21310.21220.21440.24630.21120.22110.20940.19670.21431.02911.01281.0265
n = 250 0 % 0.11380.11230.11300.10750.10790.11150.11560.11400.11620.99340.99230.9936
5 % 0.11410.11360.11490.12060.11410.11290.11790.11460.11830.99700.99630.9968
10 % 0.11650.11640.11670.12440.11990.12370.11860.11590.11970.99850.99770.9972
20 % 0.12240.12160.12290.12580.12520.12870.13030.12600.12730.99910.99840.9991
30 % 0.13740.13330.13440.16770.13980.14580.13910.13280.13290.99990.99930.9997
Table 3. Parameters estimates and RMSE by algorithm.
Table 3. Parameters estimates and RMSE by algorithm.
AlgorithmParameterRMSE
α 1 β 1 α 2 β 2 ϕ Value
A 1 0.76240.01860.83990.02940.71590.4227
(0.5999,0.9361)(0.0087, 0.0338)(0.7607, 0.9353)(0.0195, 0.0414)(0.3765, 0.9637)
A 2 0.77570.01790.83080.03100.71480.4619
(0.5929, 0.9853)(0.0071, 0.0343)(0.6897, 0.9679)(0.0172, 0.0515)(0.3560, 0.9600)
A 3 0.64380.02890.70150.04940.72660.3562
(0.5103, 0.7967)(0.0142, 0.0482)(0.5910, 0.8273)(0.0293, 0.0746)(0.3675, 0.9715)
Table 4. Integrated autocorrelation time (IAT) and effective sample size (ESS) values for algorithms A 1 , A 2 and A 3 .
Table 4. Integrated autocorrelation time (IAT) and effective sample size (ESS) values for algorithms A 1 , A 2 and A 3 .
ParameterESSIAT
A 1 A 2 A 3 A 1 A 2 A 3
α 1 5.4650159.8655791.0559435.048534.22126.4039
β 1 6.5887205.4812880.922181.998026.83735.6359
α 2 8.1633134.7412227.6705327.937635.676024.6754
β 2 16.1893133.8282230.948736.759030.556021.1668
ϕ 2443.37912400.00972461.17812.34262.33482.2813

Share and Cite

MDPI and ACS Style

Saraiva, E.F.; Suzuki, A.K.; Milan, L.A. Bayesian Computational Methods for Sampling from the Posterior Distribution of a Bivariate Survival Model, Based on AMH Copula in the Presence of Right-Censored Data. Entropy 2018, 20, 642. https://doi.org/10.3390/e20090642

AMA Style

Saraiva EF, Suzuki AK, Milan LA. Bayesian Computational Methods for Sampling from the Posterior Distribution of a Bivariate Survival Model, Based on AMH Copula in the Presence of Right-Censored Data. Entropy. 2018; 20(9):642. https://doi.org/10.3390/e20090642

Chicago/Turabian Style

Saraiva, Erlandson Ferreira, Adriano Kamimura Suzuki, and Luis Aparecido Milan. 2018. "Bayesian Computational Methods for Sampling from the Posterior Distribution of a Bivariate Survival Model, Based on AMH Copula in the Presence of Right-Censored Data" Entropy 20, no. 9: 642. https://doi.org/10.3390/e20090642

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop