Next Article in Journal
Counting with Cilia: The Role of Morphological Computation in Basal Cognition Research
Previous Article in Journal
Correspondence Rules for SU(1,1) Quasidistribution Functions and Quantum Dynamics in the Hyperbolic Phase Space
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Model Checking with Right Censored Data Using Relative Belief Ratio

1
Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada
2
Department of Mathematics and Statistics, American University of Sharjah, Sharjah 61174, United Arab Emirates
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(11), 1579; https://doi.org/10.3390/e24111579
Submission received: 28 August 2022 / Revised: 28 September 2022 / Accepted: 29 October 2022 / Published: 31 October 2022
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Model checking is a topic of special interest in statistics. When data are censored, the problem becomes more difficult. This paper employs the relative belief ratio and the beta-Stacy process to develop a method for model checking in the presence of right-censored data. The proposed method for the given model of interest compares the concentration of the posterior distribution to the concentration of the prior distribution using a relative belief ratio. We propose a computational algorithm for the method and then illustrate the method through several data analysis examples.

1. Introduction

An important problem in statistics is to check whether a chosen statistical model is in agreement with observed data. This problem is known as model checking. The tests used to describe how well a statistical model fits a set of observations are goodness-of-fit tests. If it is determined that the observed data do not contradict the model, the true values of the parameters can be inferred. If the model fails to pass its checks, there is cause for concern about the accuracy of the inferences used in analysis. Thus, checking a proposed model on the basis of observed data is a critical component of statistical inference.
The time until an event occurs is the variable of interest in survival analysis. It is common to refer to this time as the survival time. In this case, the observed data are typically right censored. That is, at the right side, the exact survival time can become incomplete, for example, when a subject leaves the study before an event occurs, or when the study ends before the event occurs, right censoring occurs. This type of data are common in medical studies where the variable of interest is the time to death or time to relapse of a disease, in reliability studies where the variable of interest is the time until a machine part fails, and in social sciences where the variable of interest is the lifetime of the elderly in certain social programs [1]. While there are several well-known methods for model checking, the approach we want to take is Bayesian in nature. There is interest in developing nonparametric Bayesian procedures for hypothesis testing. The majority of this work focuses on uncensored data. The proposed model was embedded as a null hypothesis into a larger family of distributions, which is the main approach of model testing. Then priors were placed on the null and the alternative, and a Bayes factor was computed (see, for example, [2,3,4,5,6]). A second important approach for model testing is to apply a prior to the true distribution generating the data, and then measure the distance between the posterior distribution and the hypothesized distribution (see, for example, [7,8,9]). A third and more recent approach combines the second approach with relative belief ratio ([10]) (for instance, see the work of [11,12,13]).
When data are censored, goodness-of-fit tests become a challenging problem. This explains why there are few and scattered studies on goodness-of-fit tests for right-censored data. In particular, Bayesian goodness-of-fit tests are rarely covered. Some exceptions include the work of [14,15,16] considered the two sample problem by computing the Bayesian factors and beta process priors ([17]). Al-Labadi and Zarepour [15] used the beta-Stacy process and the Kolmogorov–Smirnov distance to test simple hypotheses. Chen and Hanson [16] used Polya tree priors and computed the Bayesian factor. To compute the posteriors, the MCMC procedure was used. Permutation tests were used to compute p-values. These approaches are computationally intensive.
The main problem addressed in this paper is model checking for lifetime distribution functions. A general description of the approach developed in this paper is provided along with a discussion of the benefits that the approach brings to the problem of model checking. The beta-Stacy process B S P ( k ( t ) , F 0 ( t ) ) , t 0 is considered as a prior on on the space of cumulative distribution functions ([18]). Here, k ( · ) > 0 is a function defined on R + , and F 0 is a cumulative distribution function (cdf) on R + = [ 0 , ) and it represents the prior guess for the beta-Stacy process. How closely realizations from the process are clustered around F 0 is determined by the value of k ( · ) . The use of the beta-Stacy process is justified because, unlike the Dirichlet process, it is conjugate to both exact and right-censored observations ([19]). The proposed method then compares the posterior and prior distribution concentrations about the model of interest. If the posterior is more concentrated on the model than the prior, this is evidence in favor of the model; if the posterior is less concentrated, this is evidence against the model. This comparison is produced using a relative belief ratio that measures the evidence in the observed data for or against the model and provides a measure of the strength of this evidence; thus, the methodology is based on a direct measure of statistical evidence. The approach is fairly simple to implement and does not necessitate obtaining a closed form of the relative belief ratio. Theoretical results are developed to support appropriate hyperparameter selections k ( · ) and F 0 .
The rest of the paper is structured as follows. Section 2 provides the background on the definitions and generic properties of the beta-Stacy process, the relative belief ratio, and the Cramér–von Mises distance between probability measures. Section 3 discusses the proposed methodology and the selection of relevant values for the beta-Stacy process’s hyperparameters. Section 4 develops a computational algorithm for implementing the approach. Section 5 examines the performance of the methodology through a number of examples. This method works well in general and can be used for both censored and uncensored observations. Section 6 wraps up the paper with a summary of the findings. Lastly, datasets and figures are provided in the Appendix A.

2. Notations and Background

Let the random sample X 1 , , X n be drawn from an unknown cdf F defined on R + . Let ( T , δ ) = ( T 1 , δ 1 ) , , ( T n , δ n ) be observed, where T i = min ( X i , C i ) , δ i = I X i C i and C 1 , , C n are the censoring times. When δ i = 1 , X i is observed, and when δ i = 0 , X i is right censored, this type of data can occur, for instance, when X i is the lifetime of a patient who enrols in a study of a certain disease. If death occurs from the disease before the study ends, X i is recorded; however, if the patient is still alive, withdraws, or dies for another reason at the end of the study, C i records the end of the period during which the patient was under observation ([20]).
In survival analysis, cdf F is frequently referred to as the lifetime distribution function. We use the same notation throughout this paper for the probability measure and its corresponding cumulative distribution function, i.e., F ( t ) = F ( ( , t ] ) .

2.1. Beta-Stacy Process

Walker and Muliere [18] introduced the beta-Stacy process, which is a nonparametric prior that is widely used in survival analysis. In this section, a relevant introduction about the beta-Stacy process is provided.
The beta-Stacy process is defined by another process known as the log-beta process ([18]). Let β ( · ) be a positive function defined on R + , and α be a measure concentrated on R + that is absolutely continuous with respect to the Lebesgue measure such that 0 β ( z ) 1 α ( d z ) = . On the basis of [18], Z ( t ) t 0 is a log-beta process with parameters α and β , written as Z ( t ) log B S P ( α ( t ) , β ( t ) ) , if Z ( t ) t 0 is a Lévy process with Lévy measure defined, for s > 0 , by
L t ( d s ) = 1 1 e s 0 t e s β ( z ) α ( d z ) d s
and has the following moment generating function:
log E e u Z c ( t ) = 0 e u s 1 L t ( d s ) , t 0 , u R .
In order to define the beta-Stacy process, let positive function k ( · ) be defined on R + and the absolutely continuous cdf F 0 be defined on R + . We call F ( t ) t 0 a beta-Stacy process with parameters k ( · ) and F 0 , written as F ( t ) B S P ( a ( t ) , F 0 ( t ) ) , if F ( t ) = 1 e Z ( t ) , and Z ( t ) t 0 is a log-beta process with parameters
α ( d z ) = k ( z ) F 0 ( d z ) and β ( z ) = k ( z ) F 0 ( [ z , ) ) .
Walker and Muliere [18] showed that
E F ( t ) = F 0 ( t ) = 1 exp 0 t β ( z ) 1 α ( d z ) ,
rendering F 0 the prior guess. The expression in (1) explains the need for assumption 0 β ( z ) 1 α ( d z ) = . The beta-Stacy process includes various neutral-to-right processes proposed in the literature. For instance, the beta-Stacy process reduces to the Dirichlet process when k ( t ) = k > 0 for all t 0 . The beta-Stacy process also reduces to a simple homogenous process ([21]) when β ( · ) is constant.
Next, we describe the posterior distributions of Z ( t ) t 0 and F ( t ) t 0 . Let random sample X 1 , , X n be drawn from F, and ( T , δ ) = ( T 1 , δ 1 ) , , ( T n , δ n ) be observed, where T i = min ( X i , C i ) , δ i = I X i C i and C 1 , , C n are censoring times. Define the counting process N by
N ( t ) = i = 1 n I T i t and δ i = 1
and the (left-continuous) at-risk process Y by
Y ( t ) = i = 1 n I T i t ,
where I is the indicator function. In particular, let N { t } = N ( t ) N ( t ) (i.e., N { t } is the number of observed X i ’s at the exact position t). Let Z ( t ) t 0 log B S P ( α ( t ) , β ( t ) ) and F ( t ) = 1 exp Z ( t ) . Given the data ( T , δ ) , the posterior distribution of Z is a log-beta process ([18]) with parameters
α * ( t ) = α ( t ) + N ( t )
and
β * ( t ) = β ( t ) + Y ( t ) N { t } .
The posterior Lévy measure for Z ( t ) is given by
L t ( d s ) = 1 1 e s 0 t e s β ( z ) + Y ( z ) d α ( z ) d s .
There are fixed points of discontinuity in the posterior process. These extra points appear at the exact (uncensored) observations. If t i is an exact observation with corresponding jump S i , then
1 exp ( S i ) beta N { t i } , β ( t i ) + Y ( t i ) N { t i } .
If N { t i } = 1 , then the random jump S i follows an exponential density with mean β ( t i ) + Y ( t i ) N { t i } 1 .
Let F ( t ) B S P k ( t ) , F 0 ( t ) , t 0 . Given the data ( T , δ ) , the posterior distribution of F is a beta-Stacy process with parameters
k * ( t ) = β * ( t ) F 0 * [ t , ) = β ( t ) + Y ( t ) N { t } F 0 * [ t , )
and
F 0 * ( t ) = 1 [ 0 , t ] 1 d α * ( z ) β * ( z ) + α * { z } = 1 [ 0 , t ] 1 k ( z ) d F 0 ( z ) + d N ( z ) k ( z ) F 0 [ z , ) + Y ( z ) ,
with α * ( s ) and β * ( s ) are defined in (2) and (3), and ∏ stands for the product integral. Note that, as k ( · ) tends to zero, the nonparametric Kaplan–Meier estimator of the distribution function is obtained. On the other hand, F * becomes the prior guess F 0 as k ( · ) grows large. Thus, parameter k ( t ) can be viewed as the concentration parameter. The posterior consistency of the beta-Stacy process was addressed by Kim and Lee [22].
Algorithms A and B in Appendix A can be used to sample from prior and posterior beta-Stacy processes. These algorithms were developed by Al-Labadi and Zarepour [15] (see also Lee and Kim [23]).

2.2. Relative Belief Ratio

Evans’ (2015) relative belief ratio has become a widely used tool in statistical hypothesis testing theory. Assume that we have a statistical model defined by the pdf { f θ : θ Θ } with respect to the Lebesgue measure on the parameter space Θ . Let π ( θ ) to be a prior on Θ . After observing the data ( T , δ ) , the posterior distribution of θ is given by the conditional density function
π θ | ( T , δ ) = f θ ( T , δ ) π ( θ ) Θ f θ ( T , δ ) π ( θ ) d θ ,
Assume that we are interested in inferring regarding the parameter θ . Let π and π ( · | ( T , δ ) ) be continuous at θ . Then, the relative belief ratio for a hypothesized value θ 0 of θ is given by
R B Θ ( θ 0 | ( T , δ ) ) = π ( θ 0 | ( T , δ ) ) / π ( θ 0 ) ,
The posterior density to the prior density ratio at θ 0 , that is, R B ( θ 0 | ( T , δ ) ) , measures how beliefs about θ 0 changed from a priori to a posteriori. When π and π ( · | ( T , δ ) ) are discrete, the relative belief ratio is defined through limits. For more information, see Evans (2015).
Quantity R B Θ ( θ 0 | ( T , δ ) ) is a measure of evidence that θ 0 is the true value. If R B Θ ( θ 0 | ( T , δ ) ) > 1 , then the probability of θ 0 being the true value increases from a priori to a posteriori; thus, there is evidence based on the data that θ 0 is the true value, and there is, hence, evidence in favor of θ 0 . If R B Θ ( θ 0 | ( T , δ ) ) < 1 ; then, the probability of θ 0 being the true value decreases from a priori to a posteriori. As a result, the data provide evidence that θ 0 is not the true value. Case R B Θ ( θ 0 | ( T , δ ) ) = 1 implies there is no evidence in either direction.
The ability to calibrate relative belief ratios is an appealing feature that renders it desirable in hypothesis testing problems. After calculating the relative belief ratio, it is critical to determine whether the result represents strong or weak evidence for or against H 0 : θ = θ 0 . A typical calibration R B ( θ 0 | ( T , δ ) ) is obtained by computing the tail probability (Evans, 2015)
S t r Θ ( θ 0 | ( T , δ ) ) = Π ( R B Θ ( θ | ( T , δ ) ) R B Θ ( θ 0 | ( T , δ ) ) | ( T , δ ) ) ,
where Π ( · | ( T , δ ) ) is the posterior distribution of the posterior density π ( · | ( T , δ ) ) . The posterior probability that the true value of θ has a relative belief ratio no greater than that of the hypothesized value θ 0 is represented by Equation (8). When R B Θ ( θ 0 | ( T , δ ) ) < 1 , there is evidence against θ 0 , and a small value for S t r Θ ( θ 0 | ( T , δ ) ) represents a large posterior probability that the true value has a relative belief ratio greater than R B Θ ( θ 0 | ( T , δ ) ) and hence there is strong evidence against θ 0 . Similarly when R B Θ ( θ 0 | ( T , δ ) ) > 1 , indicates a strong evidence in favour of θ 0 , while a small value of S t r Θ ( θ 0 | ( T , δ ) ) indicates weak evidence in favour of θ 0 .  Figure 1 illustrates the strength of the evidence for both cases; R B Θ ( θ 0 | ( T , δ ) ) < 1 and R B Θ ( θ 0 | ( T , δ ) ) > 1 .

2.3. Cramér–Von Mises Distance

Let F and G be two cdfs; Cramér–von Mises distance between F and G is defined as
d ( F , G ) = F ( x ) G ( x ) 2 G ( d x ) .
Other distances can be used (see Gibbs and Su (2002)), but d has some computational advantages. The formula for the distance between the beta-Stacy process and a continuous cdf is provided in the following result.
Lemma 1. 
Let F 0 be a continuous cdf and F ϵ = 1 exp Z ϵ ( t ) , where Z ϵ ( t ) = i = 1 M J i δ θ i 0 , t , where t 0 with θ 1 , , θ M i . i . d . F 0 . Let θ ( 1 ) θ ( M ) denote the order statistics of θ 1 , , θ M and J 1 , , J M be the associated jump sizes such that J i = J j when θ i = θ ( j ) . Then
d ( F ϵ , F 0 ) = 1 3 i = 0 M F ϵ ( θ ( i ) ) ( F 0 ( θ ( i + 1 ) ) ) 2 ( F 0 ( θ ( i ) ) ) 2 + i = 0 M ( F ϵ ( θ ( i ) ) ) 2 F 0 ( θ ( i + 1 ) ) F 0 ( θ ( i ) ) .
where F ϵ ( θ ( i ) ) = 1 exp k = 1 i J k .
Proof. 
Note that
F ϵ ( x ) = 0 if x < θ ( 1 ) F ϵ ( θ ( i ) ) if θ ( i ) x < θ ( i + 1 ) ( i = 1 , , M 1 ) . 1 if x θ ( M )
Le θ ( 0 ) = 0 and θ ( M + 1 ) = + . Then,
d ( F ϵ , F 0 ) = θ ( 0 ) θ ( ϵ + 1 ) F ϵ ( x ) F 0 ( x ) 2 f 0 ( x ) d x = i = 0 M θ ( i ) θ ( i + 1 ) F ϵ ( θ ( i ) ) F 0 ( x ) 2 f 0 ( x ) d x .
Substituting y = F 0 ( x ) and U ( i ) = F 0 ( θ ( i ) ) gives
d ( F ϵ , F 0 ) = i = 0 M U ( i ) U ( i + 1 ) F ϵ ( θ ( i ) ) y 2 d y = 1 3 i = 0 M [ F ϵ ( θ ( i ) ) U ( i ) ] 3 [ F ϵ ( θ ( i ) ) U ( i + 1 ) ] 3 . = 1 3 i = 0 M U ( i + 1 ) 3 U ( i ) 3 i = 0 M F ϵ ( θ ( i ) ) U ( i + 1 ) 2 U ( i ) 2 + i = 0 M F ϵ 2 ( θ ( i ) ) U ( i + 1 ) U ( i ) = 1 3 i = 0 M F ϵ ( θ ( i ) ) U ( i + 1 ) 2 U ( i ) 2 + i = 0 M ( ( F ϵ ) ( θ ( i ) ) 2 U ( i + 1 ) U ( i ) = 1 3 i = 0 M F ϵ ( θ ( i ) ) ( F 0 ( θ ( i + 1 ) ) ) 2 ( F 0 ( θ ( i ) ) ) 2 + i = 0 M ( F ϵ θ ( i ) ) 2 F 0 ( θ ( i + 1 ) ) F 0 ( θ ( i ) ) .
When considering the prior and posterior distributions of the Cramér–von Mises distance, the following lemma allows for the use of the approximation to B S P ( k ( · ) , F 0 ) .
Lemma 2. 
If F B S P ( k ( · ) , F 0 ) and F ϵ is given by Algorithm A, then d F ϵ , F 0 a . s . d F , F 0 as ϵ 0 .
Proof. 
By Kim and Lee (2001), F ϵ ( x ) a . s . F ( x ) , where a . s . stands for almost surely convergence. As ( F ϵ ( x ) F 0 ( x ) ) 2 f 0 ( x ) , where f 0 ( x ) is integrable, the proof follows from the dominated convergence theorem. □

3. Model Checking Using the Relative Belief

Consider the statistical model F θ : θ Θ of continuous cdf on R + . Let X 1 , , X n be a random sample drawn from an unknown cdf F defined on R + . Assume that ( T , δ ) = ( ( T 1 , δ 1 ) , , ( T n , δ n ) ) is observed, where T i = min ( X i , C i ) , δ i = I X i C i and C 1 , , C n are censoring times. The aim is to test the hypothesis H 0 : F F θ : θ Θ . Let B S P ( k ( · ) , F 0 ) be the prior on F for some choice of k ( · ) and F 0 . Then F ( t ) | ( T , δ ) B S P k * ( t ) , F 0 * ( t ) , where k ( · ) and F 0 are defined in (5) and (6), respectively. If H 0 is true, then the posterior distribution of the distance between F and the proposed model should be more concentrated around 0 than the prior distribution. As a result, this test involves comparing the concentrations of the prior and the posterior distributions of d (see Lemma 1) about 0 using the relative belief ratio with the interpretation as discussed in Section 2.2.
To perform this test, we must measure the distance and then set relevant values for k ( · ) and F 0 . To calculate the distance, similar to Al-Labadi and Evans [24], we computed d ( F , F θ ^ ) , where θ ^ is the relative belief estimate of θ , which is always the same as the maximal likelihood estimate (MLE) for the full model parameter. In terms of hyperparameters, we set F 0 = F θ ^ and k ( t ) = k for all t. There are numerous advantages in setting F 0 = F θ ^ . First, it avoids prior-data conflict, a possible contradiction between the data and the prior. This typically happens when the prior places its mass in a region of the parameter space where the data suggest the true value does not lie ([25,26]). In the context of the approach considered in this paper, prior-data conflict arises whenever there is a small overlap between the effective support regions of F and F θ ^ . Note that, by Lemma 1, the distance d ( F , F θ ^ ) depends on the prior guess F 0 through the jump points θ i . If the θ i lay in one tail of F θ ^ , then we get prior-data conflict between F and F θ ^ because F 0 and F had the same effective support. To avoid this, it is required that θ i are selected in a region that includes most of the mass of F θ ^ . When F 0 = F θ ^ , then F θ ^ is the prior mean of F; thus, both share the same effective support, which renders it a reasonable choice to avoid prior-data conflict. We refer the reader to Example 1 of Al Labadi and Evans (2018) for an interesting discussion about prior-data conflict. Nevertheless, the choice of F 0 = F θ ^ should also avoid any impacts due to the “double use of the data”. This means that the approach becomes conservative in detecting the model failure when H 0 is false. Although setting F 0 = F θ ^ appears to induce a data-dependent prior distribution for d, the following lemma implies that this is not the case; thus, the approach is prior distribution-free with this choice.
Lemma 3. 
If F B S P k ( · ) , F θ ^ , then the distribution of d F , F θ ^ does not depend on F θ ^ .
Proof. 
Using Lemma 1, since ( θ i ) 1 i M is a sequence of i.i.d. random variables with continuous distribution F θ ^ , for i 1 , we have U i = d F θ ^ ( θ i ) , where U i 1 i M is a sequence of i.i.d. random variables follow a uniform distribution on [ 0 , 1 ] . Thus,
d ( F ϵ , F θ ^ ) = d 1 3 i = 0 M F ϵ ( θ ( i ) ) ( U ( i + 1 ) ) 2 ( θ ( i ) ) 2 + i = 0 M ( F ϵ ( θ ( i ) ) ) 2 U ( i + 1 ) U ( i ) ,
where U ( i ) is the i-th order statistic for U i 1 i M i.i.d. uniform [ 0 , 1 ] . Now, as ϵ 0 , by Lemma 2, we conclude that the distribution of d ( F , F θ ^ ) does not depend on F θ ^ . □
The following results shows that setting F 0 = F θ ^ prevents any effect due to the double use of the data. Specifically, as the sample size increases, the posterior distribution of d F , F θ ^ becomes concentrated around 0 if and only if H 0 is true. For the proof, see Al-Labadi and Evans (2018).
Lemma 4. 
Let ( T , δ ) = ( ( T 1 , δ 1 ) , , ( T n , δ n ) ) F , where F B S P k ( · ) , F θ ^ . Suppose that θ ^ a . s . θ 0 , sup y | F θ ^ ( y ) F θ 0 ( y ) | a . s . 0 as n .
(i) 
If H 0 is true, then, as n , d F | ( T , δ ) , F θ ^ a . s . 0 .
(ii) 
If H 0 is false, then, as n , lim inf d ( F | ( T , δ ) , F θ ^ ) > a . s . 0 .
Now, concerning the choice of k ( · ) , we considered k ( t ) = k for all t > 0 . In general, larger values of k must be chosen to identify smaller deviations. Consequently, it is possible to consider multiple values of k. One way to perform that is, for instance, to start with k = 1 . If a larger (smaller) value of k renders the relative belief ratios to be below (above) 1, H 0 is rejected (accepted). As Section 5 shows, when the null hypothesis is correct (not correct), the relative belief ratio always remains above (below) 1 when larger (smaller) values of k are considered. When using the Dirichlet process, Al-Labadi and Zarepour [27] advised using k 0.5 n for complete data to avoid the prior becoming too influential. Setting k between 1 and 10 is satisfactory for most purposes. As indicated in the introduction, when k ( t ) = k , the beta-Stacy process turns into the Dirichlet process. However, in the presence of right censored data, the posterior distribution of the Dirichlet process becomes beta-Stacy process ([19]). This justifies the necessity of using the beta-Stacy process in the approach.

4. Computational Algorithm

Closed forms of the prior and posterior densities of D = d ( F , F θ ^ ) are required to compute the relative belief ratio as in (7). This is not usually available. As a result, the relative belief ratio must be approximated through simulation. A particular problem of computing (7) arises here when both π D ( 0 ) and π D | ( T , δ ) ( 0 ) are close to 0, where π D ( · ) and π D | ( T , δ ) ( · ) denote the pdf’s of D and D | ( T , δ ) , respectively. In such a case, determining R B D ( 0 | ( T , δ ) ) is difficult. The formal definition of the relative belief ratio, as discussed in Section 2.2, is as a limit that can be approximated at zero by
R B ^ D ( 0 | T , δ ) = Π D ( [ 0 , d c ) ) Π D | ( T , δ ) ( [ 0 , d c ) ) ,
for a suitably small value d c , where Π D ( · ) and Π D | ( T , δ ) ( · ) denote the cdfs of D and D | ( T , δ ) , respectively. From Equation (8), the strength of the evidence based on the relative belief ratio R B D ( 0 | T , δ ) can be computed using
S t r D ( 0 | ( T , δ ) ) = Π D | ( T , δ ) ( R B D ( d | ( T , δ ) ) R B D ( 0 | ( T , δ ) ) ) .
Appendix B provides computational Algorithm C for assessing H 0 on the basis of estimates of (9) and (10). A similar algorithm for complete data based on the Dirichlet process was developed by Al-Labadi and Evans [24].

5. Examples

The approach is demonstrated by two main examples in this section. Throughout this section, let Exp ( λ ) , Weibull ( k , λ ) , and Lognorma l ( μ , σ ) denote the exponential distribution with mean 1 / λ , the Weibull distribution with shape parameter k and scale parameter λ , and the log-normal distribution with mean μ and standard deviation σ , respectively. In all examples, the sensitivity to choosing k is investigated. We set ϵ = 0.01 , i 0 = 1 , M 0 = 20 , and r 1 = r 2 = 1000 in Algorithms A, B, and C, though other values are also possible. R package parmsurvfit was used to compute the MLE of the distribution parameters.
Example 1. 
A real dataset from Lee and Wang (2003) based on the remission times (in months) of cancer patients. The dataset is given in Appendix C. We tested hypothesis
H 0 :
F, given in Table 1, is the underlying distribution of the observed data,
where F could be a family of distributions (composite hypothesis) or a specific distribution with known parameters (simple hypothesis). Various values of k were considered to investigate the approach’s sensitivity to concentration parameter selection. The p-value of the (frequentist) log-rank test was also computed for comparison purposes. Table 1 summarizes the findings. When H 0 is true, we want R B > 1 and a strength close to 1, and when H 0 is false, we want R B < 1 and a strength close to 0. According to Table 1, the proposed test performed well in this example.
Example 2. 
Simulated data. The primary purpose of this dataset is to investigate how the proposed test performs as the sample size increases. We considered data ( T 1 , δ 1 ) , , ( T n , δ n ) , where T i = min ( X i , C i ) where the survival times ( X i ) 1 i n were generated from Lognormal ( 1 , 4 ) , while the censored time ( C i ) 1 i n are generated from Lognormal ( 4 , 1 ) .
H 0 :
F in Table 2 is the underlying distribution of the observed data.
Table 2 summarizes the results that show that the selected models were accepted. Figure A1, Figure A2 and Figure A3 (see Appendix D) give the plots of F 0 = F θ ^ and 5 sample paths each for the prior beta-Stacy process and the posterior beta-Stacy process for each case in Table 2. These figures clearly show that the plots of the sample paths for the posterior process moved toward the plot of F 0 . This supports the previous conclusion regarding the null hypothesis. Furthermore, in this case, the p-values of the (frequentist) log-rank test support the conclusion that the null hypothesis should not be rejected.

6. Concluding Remarks

The beta-Stacy process and relative belief ratio were used to propose a general approach for model checking. This method could be used for both complete and right-censored data. It could also be used to test composite or simple hypotheses. Several examples demonstrated that the approach works very well.
Though the Cramér–von Mises distance was used here, other distance measures such as the Kolmogorov–Smirnov and Anderson–Darling distances are viable alternatives. Testing for families of multivariate distributions is an important extension of the approach presented in this paper. While conceptually similar, computational and inferential issues must be addressed. This problem can be addressed in future work.

Author Contributions

Formal analysis, A.A.; Investigation, L.A.-L. and M.A.; Methodology, L.A.-L. and A.A.; Project administration, L.A.-L.; Software, L.A.-L. and M.A.; Writing – review & editing, A.A. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are gratefully acknowledge that the work in this paper was partly supported by Faculty Research grant FRG21-S-S03 and the Open Access Program from the American University of Sharjah.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Simulation Algorithms

Algorithm A: Simulation algorithm to approximate the prior of beta-Stacy process.
  • Generate log-beta process with parameters α ( d z ) = k ( z ) F 0 ( d z ) and β ( z ) = k ( z ) F 0 [ z , ) . The following steps are required to accomplish this:
    (a)
    Fix a small positive number ϵ .
    (b)
    Generate the total number of jumps M P o i s s o n λ ϵ , where λ ϵ = α ( [ 0 , ) ) / ϵ .
    (c)
    For i = 1 , M , generate the jump times θ i from the probability density function α ( d z ) / α ( [ 0 , ) ) .
    (d)
    Generate the jump sizes J 1 , , J M such that 1 exp ( J i ) | θ i B e t a ( ϵ , β ( θ i ) ) .
    (e)
    Set
    Z ϵ ( t ) = k = 1 M J k I ( θ k t ) .
  • The approximate prior beta-Stacy process is F ϵ ( t ) = 1 exp Z ϵ ( t ) .
Algorithm B: Simulation algorithm to approximate the posterior of beta-Stacy process.
  • Generate the posterior log-beta process. The following steps are required to accomplish this:
    (a)
    Generate the process Z ϵ * ( t ) t 0 on the basis of Algorithm A with β ( θ i ) is replaced by β ( θ i ) + Y ( θ i ) .
    (b)
    Generate the process Z f * ( t ) = i = 1 l S i * I T i t , where the random jumps ( S i * ) 1 i l are associated with the fixed points of discontinuity from the distribution given in (4), where l n . These random jumps occur at points where the data are not right-censored.
    (c)
    The approximated posterior log-beta process is given by:
    Z ϵ * ( t ) = Z ϵ ( t ) | ( T , δ ) = Z ϵ * ( t ) + Z f * ( t ) = i = 1 M J i * I ( θ i t ) + i = 1 l S i * I t i t .
  • The approximate posterior beta-Stacy is F ϵ * ( t ) = F ϵ ( t ) | ( T , δ ) = 1 exp Z ϵ * ( t ) .

Appendix B. Algorithm for Model Checking

Algorithm C (Relative belief algorithm for model checking):
(i) Generate a sample from F ϵ , where F ϵ is an approximation of F B S P ( k ( · ) , F θ ^ ) . See Algorithm A.
(ii) Compute d ( p r i ) = d ( F ϵ , F θ ^ ) as described in Lemma 2.
(iii) Repeat Steps (i) and (ii) to obtain a sample of r 1 values from the prior of D.
(iv) Generate a sample from F ϵ | ( T , δ ) , where F ϵ | ( T , δ ) an approximation of F. See Algorithm B.
(v) Compute d ( p o s ) = d F ϵ | ( T , δ ) , F θ ^ as described in Lemma 2.
(vi) Repeat Steps (iv) and (v) to obtain a sample of r 2 values from the posterior of D.
(vii) For a fixed positive number M 0 , let F ^ D denote the empirical cdf of D based on the prior sample in (iii) and for i = 0 , , M 0 , let d ^ i / M 0 ( p r i ) be the estimate of d i / M 0 ( p r i ) , the ( i / M 0 ) -th prior quantile of D . Here d ^ 0 ( p r i ) = 0 , and d ^ 1 ( p r i ) is the largest value of d ( p r i ) . Let F ^ D ( · | ( T , δ ) ) denote the empirical cdf of D based on the posterior sample of d ( p o s ) in (vi). For d [ d ^ i / M 0 ( p r i ) , d ^ ( i + 1 ) / M 0 ( p r i ) ) , estimate R B D ( d | ( T , δ ) ) by the ratio of the estimates of the posterior and prior contents of [ d ^ i / M 0 ( p r i ) , d ^ ( i + 1 ) / M 0 ( p r i ) ) . Specifically,
R B ^ D ( d | ( T , δ ) ) = M 0 { F ^ D ( d ^ ( i + 1 ) / M 0 ( p r i ) | ( T , δ ) ) F ^ D ( d ^ i / M 0 ( p r i ) | ( T , δ ) ) } ,
Moreover, estimate (9) by R B ^ D ( 0 | ( T , δ ) ) = M 0 F ^ D ( d ^ i 0 / M 0 ( p r i ) | ( T , δ ) ) where i 0 is chosen so that i 0 / M 0 is not too small (typically i 0 / M 0 0.05 ) .
(viii) Estimate (10) by the finite sum
A ( F ^ D ( d ^ ( i + 1 ) / M 0 ( p r i ) | ( T , δ ) ) F ^ D ( d ^ i / M ( p r i ) | ( T , δ ) ) ) ,
where A = { i i 0 : R B ^ D ( d ^ i / M 0 ( p r i ) | ( T , δ ) ) R B ^ D ( 0 | ( T , δ ) ) } .
By Al-Labadi and Evans (2018), for fixed M 0 , as r 1 , r 2 , (A2) converge a.s. to R B D ( d | ( T , δ ) ) and (A3) converge a.s. to D P D ( R B D ( d | ( T , δ ) ) R B D ( 0 | ( T , δ ) ) | ( T , δ ) ) .

Appendix C. Example 1 Dataset

Table A1. Remission times (months) of 137 cancer patients of the dataset used in Example 1. + denotes a right-censored observation.
Table A1. Remission times (months) of 137 cancer patients of the dataset used in Example 1. + denotes a right-censored observation.
 4.5  19.13  14.24 7.875.492.029.223.82 26.31  4.65+ 
2.620.9021.730.87+0.513.3643.010.813.361.46
24.80+10.86+17.1415.9622.694.337.282.463.484.23
6.548.655.412.234.3432.154.875.717.593.02
4.511.059.4779.052.024.2611.2510.3410.6612.03
2.6414.741.198.6614.835.6218.1025.7417.361.35
9.026.947.264.70+3.703.643.5711.646.2525.82
3.883.02+19.36+20.2846.125.170.2036.6610.064.94
5.0616.6212.076.970.081.402.757.321.266.76
8.60+7.623.529.740.405.412.542.698.260.50
5.325.092.097.9312.0213.085.857.095.324.33+
2.838.3714.778.5311.981.764.4034.262.0717.12
12.637.664.1813.2923.633.257.632.873.312.26
2.6911.795.346.9310.7513.117.39

Appendix D. Example 2 Figures

Figure A1. Plots of the sample paths for the model, prior and posterior of B S ( a = 0.01 , F 0 ) with different choices of n and k in Example 2.
Figure A1. Plots of the sample paths for the model, prior and posterior of B S ( a = 0.01 , F 0 ) with different choices of n and k in Example 2.
Entropy 24 01579 g0a1
Figure A2. Plots of the sample paths for the model, prior and posterior of B S ( a = 0.01 , F 0 ) with different choices of n and k in Example 2.
Figure A2. Plots of the sample paths for the model, prior and posterior of B S ( a = 0.01 , F 0 ) with different choices of n and k in Example 2.
Entropy 24 01579 g0a2
Figure A3. Plots of the sample paths for the model, prior and posterior of B S ( a = 0.01 , F 0 ) with different choices of n and k in Example 2.
Figure A3. Plots of the sample paths for the model, prior and posterior of B S ( a = 0.01 , F 0 ) with different choices of n and k in Example 2.
Entropy 24 01579 g0a3aEntropy 24 01579 g0a3b

References

  1. Lawless, J.F. Statistical Models and Methods for Lifetime Data, 2nd ed.; Wiley & Sons, Inc.: Hoboken, NJ, USA, 2003. [Google Scholar]
  2. Florens, J.P.; Richard, J.F.; Rolin, J.M. Bayesian Encompassing Specification Tests of a Parametric Model against a Nonparametric Alternative; Technical Report 9608; Universitse Catholique de Louvain, Institut de Statistique: Neuchatel, Switzerland, 1996. [Google Scholar]
  3. Carota, C.; Parmigiani, G. On Bayes Factors for Nonparametric Alternatives; In Bayesian Statistics 5; Bernardo, J.M., Berger, J., Dawid, A.P., Smith, A.F.M., Eds.; Oxford University Press: London, UK, 1996. [Google Scholar]
  4. Verdinelli, I.; Wasserman, L. Bayesian goodness-of-fit testing using finite-dimensional exponential families. Ann. Stat. 1998, 26, 1215–1241. [Google Scholar] [CrossRef]
  5. Berger, J.O.; Guglielmi, A. Bayesian testing of a parametric model versus nonparametric alternatives. J. Am. Stat. Assoc. 2001, 96, 174–184. [Google Scholar] [CrossRef]
  6. McVinish, R.; Rousseau, J.; Mengersen, K. Bayesian goodness of fit testing with mixtures of triangular distributions. Scand. J. Stat. 2009, 36, 337–354. [Google Scholar] [CrossRef]
  7. Hsieh, P. A nonparametric assessment of model adequacy based on Kullback-Leibler-divergence. Stat. Comput. 2011, 23, 149–162. [Google Scholar] [CrossRef]
  8. Swartz, T.B. Nonparametric goodness-of-fit. Commun. Stat. Theory Methods 1999, 28, 2821–2841. [Google Scholar] [CrossRef]
  9. Viele, K. Evaluating Fit Using Dirichlet Processes; Technical Report 384; University of Kentucky, Dept. of Statistics: Lexington, KY, USA, 2000. [Google Scholar]
  10. Evans, M. Measuring Statistical Evidence Using Relative Belief; Monographs on Statistics and Applied Probability 144, CRC Press, Taylor & Francis Group: London, UK, 2015. [Google Scholar]
  11. Abdelrazeq, I.; Al-Labadi, L.; Alzaatreh, A. On one-sample Bayesian tests for the mean. Statistics 2020, 54, 424–440. [Google Scholar] [CrossRef]
  12. Al-Labadi, L.; Patel, V.; Vakiloroayaei, K.; Wan, C. Kullback-Leibler Divergence for Bayesian Nonparametric Model Checking. J. Korean Stat. Soc. 2021, 50, 272–289. [Google Scholar] [CrossRef]
  13. Evans, M.; Tomal, J. Measuring statistical evidence and multiple testing. FACET 2018, 3, 563–583. [Google Scholar] [CrossRef]
  14. Damien, P.; Walker, S. A Bayesian non-parametric comparison of two treatments. Scand. J. Stat. 2002, 29, 51–56. [Google Scholar] [CrossRef]
  15. Al-Labadi, L.; Zarepour, M. A Bayesian nonparametric goodness of fit test for right censored data based on approximate samples from the beta-Stacy process. Can. J. Stat. 2013, 41, 466–487. [Google Scholar] [CrossRef]
  16. Chen, Y.; Hanson, T. Bayesian nonparametric k-sample tests for censored and uncensored data. Comput. Stat. Data Anal. 2014, 71, 335–346. [Google Scholar] [CrossRef]
  17. Hjort, N.L. Nonparametric Bayes estimators based on Beta processes in models for life history data. Ann. Stat. 1990, 18, 1259–1294. [Google Scholar] [CrossRef]
  18. Walker, S.; Muliere, P. Beta-stacy processes and a generalisation of the polya-urn scheme. Ann. Stat. 1997, 25, 1762–1780. [Google Scholar] [CrossRef]
  19. Phadia, E.G. Prior Processes and Their Applications; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  20. D’Agostino, R.B.; Stephens, M.A. Goodness-of-Fit Techniques; Marcel Dekker Inc.: New York, NY, USA, 1986. [Google Scholar]
  21. Ferguson, T.S.; Phadia, E.G. Bayesian nonparametric estimation based on censored data. Ann. Stat. 1979, 7, 163–186. [Google Scholar] [CrossRef]
  22. Kim, Y.; Lee, J. On posterior consistency of survival models. Ann. Stat. 2001, 29, 666–686. [Google Scholar] [CrossRef]
  23. Lee, J.; Kim, Y. A new algorithm to generate beta processes. Comput. Stat. Data Anal. 2004, 25, 401–405. [Google Scholar] [CrossRef]
  24. Al-Labadi, L.; Evans, M. Prior based model checking. Can. J. Stat. 2018, 46, 380–398. [Google Scholar] [CrossRef] [Green Version]
  25. Evans, M.; Moshonov, H. Checking for prior-data conflict. Bayesian Anal. 2006, 1, 893–914. [Google Scholar] [CrossRef]
  26. Al-Labadi, L.; Evans, M. Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Anal. 2017, 12, 705–728. [Google Scholar] [CrossRef]
  27. Al-Labadi, L.; Zarepour, M. Two-sample Bayesian nonparametric goodness-of-fit test. Math. Methods Stat. 2017, 26, 212–225. [Google Scholar] [CrossRef]
Figure 1. Strength of evidence. The shaded area represents S t r Θ ( θ 0 | ( T , δ ) ) . RB represents the relative belief ratio R B Θ ( θ 0 | ( T , δ ) ) . The smaller the shaded area, the stronger the evidence for case (a). The larger the shaded area, the stronger the evidence for case (b).
Figure 1. Strength of evidence. The shaded area represents S t r Θ ( θ 0 | ( T , δ ) ) . RB represents the relative belief ratio R B Θ ( θ 0 | ( T , δ ) ) . The smaller the shaded area, the stronger the evidence for case (a). The larger the shaded area, the stronger the evidence for case (b).
Entropy 24 01579 g001
Table 1. Model checking of Example 1.
Table 1. Model checking of Example 1.
F F 0 kRB(Str)p-Value
Exp ( λ ) Exp ( λ ^ = 0.0962 ) 120(1)0.6521
519.84(1)
1015.98(1)
Weibull ( k , λ ) Weibull ( k ^ = 1.0536 , λ ^ = 10.1901 ) 120(1)0.9986
519.76(1)
1017.58(1)
Lognormal ( μ , σ ) Lognormal ( μ ^ = 1.8198 , σ ^ = 1.0912 ) 120(1)0.7550
518.48(1)
1015.52 (1)
Exp ( λ = 2 ) Exp ( λ = 2 ) 10(0)0
50(0)
100(0)
Weibull ( 2 , 0.5 ) Weibull ( 2 , 0.5 ) 10(0)0
50(0)
100(0)
Lognormal ( 3 , 1 ) Lognormal ( 2 , 1 ) 10(0)0
50(0)
100(0)
Table 2. Model checking of Example 2.
Table 2. Model checking of Example 2.
Fn F 0 c RB ( Str ) p-Value
Exp ( λ ) 20Exp ( λ ^ = 0.03 ) 13.4(0.83)0.9995
50.76(0.08)
100.36(0)
50Exp ( λ ^ = 0.03 ) 10.24(0.02)0.9977
50.02(0)
100(0)
100Exp ( λ ^ = 0.05 ) 10.2(0.02)0.9992
50(0)
100(0)
Weibull ( k , λ ) 20       Weibull ( k ^ = 0.38 , λ ^ = 20.88 )         118.12(1)1
59.78 (1)
105.22(0.739)
50    Weibull ( k ^ = 0.36 , λ ^ = 19.77 )     119.88(1)0.9992
515.98(1)
1011.12(1)
100Weibull ( k ^ = 0.38 , λ ^ = 14.10 ) 120(1)0.9991
519.48(1)
1016.46(1)
Lognormal ( μ , σ ) 20Lognormal ( μ ^ = 1.14 , σ ^ = 2.40 ) 111.68(0.416)0.9867
53.52(1)
101.88(1)
50Lognormal ( μ ^ = 1.63 , σ ^ = 3.64 ) 120(1)0.9454
516.4(1)
1012.4(1)
100Lognormal ( μ ^ = 1.34 , σ ^ = 3.44 ) 120(1)0.9373
519.36 (1)
1016.12(1)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Alzaatreh, A.; Asuncion, M. Model Checking with Right Censored Data Using Relative Belief Ratio. Entropy 2022, 24, 1579. https://doi.org/10.3390/e24111579

AMA Style

Al-Labadi L, Alzaatreh A, Asuncion M. Model Checking with Right Censored Data Using Relative Belief Ratio. Entropy. 2022; 24(11):1579. https://doi.org/10.3390/e24111579

Chicago/Turabian Style

Al-Labadi, Luai, Ayman Alzaatreh, and Mark Asuncion. 2022. "Model Checking with Right Censored Data Using Relative Belief Ratio" Entropy 24, no. 11: 1579. https://doi.org/10.3390/e24111579

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop