Model Checking with Right Censored Data Using Relative Belief Ratio

Al-Labadi, Luai; Alzaatreh, Ayman; Asuncion, Mark

doi:10.3390/e24111579

Open AccessArticle

Model Checking with Right Censored Data Using Relative Belief Ratio

by

Luai Al-Labadi

^1,*,

Ayman Alzaatreh

² and

Mark Asuncion

¹

Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada

²

Department of Mathematics and Statistics, American University of Sharjah, Sharjah 61174, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Entropy 2022, 24(11), 1579; https://doi.org/10.3390/e24111579

Submission received: 28 August 2022 / Revised: 28 September 2022 / Accepted: 29 October 2022 / Published: 31 October 2022

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Model checking is a topic of special interest in statistics. When data are censored, the problem becomes more difficult. This paper employs the relative belief ratio and the beta-Stacy process to develop a method for model checking in the presence of right-censored data. The proposed method for the given model of interest compares the concentration of the posterior distribution to the concentration of the prior distribution using a relative belief ratio. We propose a computational algorithm for the method and then illustrate the method through several data analysis examples.

Keywords:

beta-Stacy process; model checking; nonparametric Bayesian statistics; relative belief inferences; right-censored data

1. Introduction

An important problem in statistics is to check whether a chosen statistical model is in agreement with observed data. This problem is known as model checking. The tests used to describe how well a statistical model fits a set of observations are goodness-of-fit tests. If it is determined that the observed data do not contradict the model, the true values of the parameters can be inferred. If the model fails to pass its checks, there is cause for concern about the accuracy of the inferences used in analysis. Thus, checking a proposed model on the basis of observed data is a critical component of statistical inference.

The time until an event occurs is the variable of interest in survival analysis. It is common to refer to this time as the survival time. In this case, the observed data are typically right censored. That is, at the right side, the exact survival time can become incomplete, for example, when a subject leaves the study before an event occurs, or when the study ends before the event occurs, right censoring occurs. This type of data are common in medical studies where the variable of interest is the time to death or time to relapse of a disease, in reliability studies where the variable of interest is the time until a machine part fails, and in social sciences where the variable of interest is the lifetime of the elderly in certain social programs [1]. While there are several well-known methods for model checking, the approach we want to take is Bayesian in nature. There is interest in developing nonparametric Bayesian procedures for hypothesis testing. The majority of this work focuses on uncensored data. The proposed model was embedded as a null hypothesis into a larger family of distributions, which is the main approach of model testing. Then priors were placed on the null and the alternative, and a Bayes factor was computed (see, for example, [2,3,4,5,6]). A second important approach for model testing is to apply a prior to the true distribution generating the data, and then measure the distance between the posterior distribution and the hypothesized distribution (see, for example, [7,8,9]). A third and more recent approach combines the second approach with relative belief ratio ([10]) (for instance, see the work of [11,12,13]).

When data are censored, goodness-of-fit tests become a challenging problem. This explains why there are few and scattered studies on goodness-of-fit tests for right-censored data. In particular, Bayesian goodness-of-fit tests are rarely covered. Some exceptions include the work of [14,15,16] considered the two sample problem by computing the Bayesian factors and beta process priors ([17]). Al-Labadi and Zarepour [15] used the beta-Stacy process and the Kolmogorov–Smirnov distance to test simple hypotheses. Chen and Hanson [16] used Polya tree priors and computed the Bayesian factor. To compute the posteriors, the MCMC procedure was used. Permutation tests were used to compute p-values. These approaches are computationally intensive.

The main problem addressed in this paper is model checking for lifetime distribution functions. A general description of the approach developed in this paper is provided along with a discussion of the benefits that the approach brings to the problem of model checking. The beta-Stacy process

B S P (k (t), F_{0} (t))

,

t \geq 0

is considered as a prior on on the space of cumulative distribution functions ([18]). Here,

k (\cdot) > 0

is a function defined on

R^{+}

, and

F_{0}

is a cumulative distribution function (cdf) on

R^{+} = [0, \infty)

and it represents the prior guess for the beta-Stacy process. How closely realizations from the process are clustered around

F_{0}

is determined by the value of

k (\cdot)

. The use of the beta-Stacy process is justified because, unlike the Dirichlet process, it is conjugate to both exact and right-censored observations ([19]). The proposed method then compares the posterior and prior distribution concentrations about the model of interest. If the posterior is more concentrated on the model than the prior, this is evidence in favor of the model; if the posterior is less concentrated, this is evidence against the model. This comparison is produced using a relative belief ratio that measures the evidence in the observed data for or against the model and provides a measure of the strength of this evidence; thus, the methodology is based on a direct measure of statistical evidence. The approach is fairly simple to implement and does not necessitate obtaining a closed form of the relative belief ratio. Theoretical results are developed to support appropriate hyperparameter selections

k (\cdot)

and

F_{0}

.

The rest of the paper is structured as follows. Section 2 provides the background on the definitions and generic properties of the beta-Stacy process, the relative belief ratio, and the Cramér–von Mises distance between probability measures. Section 3 discusses the proposed methodology and the selection of relevant values for the beta-Stacy process’s hyperparameters. Section 4 develops a computational algorithm for implementing the approach. Section 5 examines the performance of the methodology through a number of examples. This method works well in general and can be used for both censored and uncensored observations. Section 6 wraps up the paper with a summary of the findings. Lastly, datasets and figures are provided in the Appendix A.

2. Notations and Background

Let the random sample

X_{1}, \dots, X_{n}

be drawn from an unknown cdf F defined on

R^{+}

. Let

(T, δ) = (T_{1}, δ_{1}), \dots, (T_{n},

δ_{n})

be observed, where

T_{i} = min (X_{i}, C_{i})

,

δ_{i} = I (X_{i} \leq C_{i})

and

C_{1}, \dots, C_{n}

are the censoring times. When

δ_{i} = 1,

X_{i}

is observed, and when

δ_{i} = 0,

X_{i}

is right censored, this type of data can occur, for instance, when

X_{i}

is the lifetime of a patient who enrols in a study of a certain disease. If death occurs from the disease before the study ends,

X_{i}

is recorded; however, if the patient is still alive, withdraws, or dies for another reason at the end of the study,

C_{i}

records the end of the period during which the patient was under observation ([20]).

In survival analysis, cdf F is frequently referred to as the lifetime distribution function. We use the same notation throughout this paper for the probability measure and its corresponding cumulative distribution function, i.e.,

F (t) = F ((- \infty, t])

.

2.1. Beta-Stacy Process

Walker and Muliere [18] introduced the beta-Stacy process, which is a nonparametric prior that is widely used in survival analysis. In this section, a relevant introduction about the beta-Stacy process is provided.

The beta-Stacy process is defined by another process known as the log-beta process ([18]). Let

β (\cdot)

be a positive function defined on

R^{+}

, and

α

be a measure concentrated on

R^{+}

that is absolutely continuous with respect to the Lebesgue measure such that

\int_{0}^{\infty} {(β (z))}^{- 1} α (d z) = \infty .

On the basis of [18],

{\{Z (t)\}}_{t \geq 0}

is a log-beta process with parameters

α

and

β

, written as

Z (t) \sim log B S P (α (t), β (t))

, if

{\{Z (t)\}}_{t \geq 0}

is a Lévy process with Lévy measure defined, for

s > 0

, by

L_{t} (d s) =  [\frac{1}{1 - e^{- s}} \int_{0}^{t} e^{- s β (z)} α (d z)] d s

and has the following moment generating function:

log E e^{- u Z_{c} (t)} = \int_{0}^{\infty} (e^{- u s} - 1) L_{t} (d s), t \geq 0, u \in R .

In order to define the beta-Stacy process, let positive function

k (\cdot)

be defined on

R^{+}

and the absolutely continuous cdf

F_{0}

be defined on

R^{+}

. We call

{\{F (t)\}}_{t \geq 0}

a beta-Stacy process with parameters

k (\cdot)

and

F_{0}

, written as

F (t) \sim B S P (a (t), F_{0} (t))

, if

F (t) = 1 - e^{- Z (t)}

, and

{\{Z (t)\}}_{t \geq 0}

is a log-beta process with parameters

α (d z) = k (z) F_{0} (d z) and β (z) = k (z) F_{0} ([z, \infty)) .

Walker and Muliere [18] showed that

E  [F (t)] = F_{0} (t) = 1 - exp (- \int_{0}^{t} {(β (z))}^{- 1} α (d z)),

(1)

rendering

F_{0}

the prior guess. The expression in (1) explains the need for assumption

\int_{0}^{\infty} {(β (z))}^{- 1} α (d z) = \infty

. The beta-Stacy process includes various neutral-to-right processes proposed in the literature. For instance, the beta-Stacy process reduces to the Dirichlet process when

k (t) = k > 0

for all

t \geq 0

. The beta-Stacy process also reduces to a simple homogenous process ([21]) when

β (\cdot)

is constant.

Next, we describe the posterior distributions of

{\{Z (t)\}}_{t \geq 0}

and

{\{F (t)\}}_{t \geq 0}

. Let random sample

X_{1}, \dots, X_{n}

be drawn from F, and

(T, δ) = (T_{1}, δ_{1}), \dots, (T_{n}, δ_{n})

be observed, where

T_{i} = min (X_{i}, C_{i}),

δ_{i} = I (X_{i} \leq C_{i})

and

C_{1}, \dots, C_{n}

are censoring times. Define the counting process N by

N (t) = \sum_{i = 1}^{n} I (T_{i} \leq t and δ_{i} = 1)

and the (left-continuous) at-risk process Y by

Y (t) = \sum_{i = 1}^{n} I (T_{i} \geq t),

where I is the indicator function. In particular, let

N {t} = N (t) - N (t^{-})

(i.e.,

N {t}

is the number of observed

X_{i}

’s at the exact position t). Let

{\{Z (t)\}}_{t \geq 0} \sim log B S P (α (t), β (t))

and

F (t) = 1 - exp (- Z (t)) .

Given the data

(T, δ)

, the posterior distribution of Z is a log-beta process ([18]) with parameters

α^{*} (t) = α (t) + N (t)

(2)

and

β^{*} (t) = β (t) + Y (t) - N {t} .

(3)

The posterior Lévy measure for

Z (t)

is given by

L_{t}^{☆} (d s) =  [\frac{1}{1 - e^{- s}} \int_{0}^{t} e^{- s (β (z) + Y (z))} d α (z)] d s .

There are fixed points of discontinuity in the posterior process. These extra points appear at the exact (uncensored) observations. If

t_{i}

is an exact observation with corresponding jump

S_{i}

, then

1 - exp (- S_{i}) \sim beta (N {t_{i}}, β (t_{i}) + Y (t_{i}) - N {t_{i}}) .

(4)

If

N {t_{i}} = 1,

then the random jump

S_{i}

follows an exponential density with mean

{[β (t_{i}) + Y (t_{i}) - N {t_{i}}]}^{- 1} .

Let

F (t) \sim B S P (k (t), F_{0} (t))

,

t \geq 0

. Given the data

(T, δ)

, the posterior distribution of F is a beta-Stacy process with parameters

\begin{matrix} k^{*} (t) = \frac{β^{*} (t)}{F_{0}^{*} [t, \infty)} = \frac{β (t) + Y (t) - N {t}}{F_{0}^{*} [t, \infty)} \end{matrix}

(5)

and

\begin{matrix} F_{0}^{*} (t) = 1 - \prod_{[0, t]} (1 - \frac{d α^{*} (z)}{β^{*} (z) + α^{*} {z}}) = 1 - \prod_{[0, t]} (1 - \frac{k (z) d F_{0} (z) + d N (z)}{k (z) F_{0} [z, \infty) + Y (z)}), \end{matrix}

(6)

with

α^{*} (s)

and

β^{*} (s)

are defined in (2) and (3), and ∏ stands for the product integral. Note that, as

k (\cdot)

tends to zero, the nonparametric Kaplan–Meier estimator of the distribution function is obtained. On the other hand,

F^{*}

becomes the prior guess

F_{0}

as

k (\cdot)

grows large. Thus, parameter

k (t)

can be viewed as the concentration parameter. The posterior consistency of the beta-Stacy process was addressed by Kim and Lee [22].

Algorithms A and B in Appendix A can be used to sample from prior and posterior beta-Stacy processes. These algorithms were developed by Al-Labadi and Zarepour [15] (see also Lee and Kim [23]).

2.2. Relative Belief Ratio

Evans’ (2015) relative belief ratio has become a widely used tool in statistical hypothesis testing theory. Assume that we have a statistical model defined by the pdf

{f_{θ} : θ \in Θ}

with respect to the Lebesgue measure on the parameter space

Θ

. Let

π (θ)

to be a prior on

Θ

. After observing the data

(T, δ)

, the posterior distribution of

θ

is given by the conditional density function

π (θ | (T, δ)) = \frac{f_{θ} ((T, δ)) π (θ)}{\int_{Θ} f_{θ} ((T, δ)) π (θ) d θ},

Assume that we are interested in inferring regarding the parameter

θ

. Let

π

and

π (\cdot | (T, δ))

be continuous at

θ .

Then, the relative belief ratio for a hypothesized value

θ_{0}

of

θ

is given by

R B_{Θ} (θ_{0} | (T, δ)) = π (θ_{0} | (T, δ)) / π (θ_{0}),

(7)

The posterior density to the prior density ratio at

θ_{0},

that is,

R B (θ_{0} | (T, δ)),

measures how beliefs about

θ_{0}

changed from a priori to a posteriori. When

π

and

π (\cdot | (T, δ))

are discrete, the relative belief ratio is defined through limits. For more information, see Evans (2015).

Quantity

R B_{Θ} (θ_{0} | (T, δ))

is a measure of evidence that

θ_{0}

is the true value. If

R B_{Θ} (θ_{0} | (T, δ)) > 1

, then the probability of

θ_{0}

being the true value increases from a priori to a posteriori; thus, there is evidence based on the data that

θ_{0}

is the true value, and there is, hence, evidence in favor of

θ_{0}

. If

R B_{Θ} (θ_{0} | (T, δ)) < 1

; then, the probability of

θ_{0}

being the true value decreases from a priori to a posteriori. As a result, the data provide evidence that

θ_{0}

is not the true value. Case

R B_{Θ} (θ_{0} | (T, δ)) = 1

implies there is no evidence in either direction.

The ability to calibrate relative belief ratios is an appealing feature that renders it desirable in hypothesis testing problems. After calculating the relative belief ratio, it is critical to determine whether the result represents strong or weak evidence for or against

H_{0} : θ = θ_{0}

. A typical calibration

R B (θ_{0} | (T, δ))

is obtained by computing the tail probability (Evans, 2015)

S t r_{Θ} (θ_{0} | (T, δ)) = Π (R B_{Θ} (θ | (T, δ)) \leq R B_{Θ} (θ_{0} | (T, δ)) | (T, δ)),

(8)

where

Π (\cdot | (T, δ))

is the posterior distribution of the posterior density

π (\cdot | (T, δ))

. The posterior probability that the true value of

θ

has a relative belief ratio no greater than that of the hypothesized value

θ_{0}

is represented by Equation (8). When

R B_{Θ} (θ_{0} | (T, δ)) < 1,

there is evidence against

θ_{0},

and a small value for

S t r_{Θ} (θ_{0} | (T, δ))

represents a large posterior probability that the true value has a relative belief ratio greater than

R B_{Θ} (θ_{0} | (T, δ))

and hence there is strong evidence against

θ_{0} .

Similarly when

R B_{Θ} (θ_{0} | (T, δ)) > 1,

indicates a strong evidence in favour of

θ_{0},

while a small value of

S t r_{Θ} (θ_{0} | (T, δ))

indicates weak evidence in favour of

θ_{0} .

Figure 1 illustrates the strength of the evidence for both cases;

R B_{Θ} (θ_{0} | (T, δ)) < 1

and

R B_{Θ} (θ_{0} | (T, δ)) > 1 .

2.3. Cramér–Von Mises Distance

Let F and G be two cdfs; Cramér–von Mises distance between F and G is defined as

d (F, G) = \int_{- \infty}^{\infty} {(F (x) - G (x))}^{2} G (d x) .

Other distances can be used (see Gibbs and Su (2002)), but d has some computational advantages. The formula for the distance between the beta-Stacy process and a continuous cdf is provided in the following result.

Lemma 1.

Let

F_{0}

be a continuous cdf and

F_{ϵ} = 1 - exp (- Z_{ϵ} (t))

, where

Z_{ϵ} (t) = \sum_{i = 1}^{M} J_{i} δ_{θ_{i}} ( [0, t])

, where

t \geq 0

with

θ_{1}, \dots, θ_{M} \overset{i . i . d .}{\sim} F_{0}

. Let

θ_{(1)} \leq \dots \leq θ_{(M)}

denote the order statistics of

θ_{1}, \dots, θ_{M}

and

J_{1}^{'}, \dots, J_{M}^{'}

be the associated jump sizes such that

J_{i} = J_{j}^{'}

when

θ_{i} = θ_{(j)} .

Then

\begin{matrix} d (F_{ϵ}, F_{0}) & = & \frac{1}{3} - \sum_{i = 0}^{M} F_{ϵ} (θ_{(i)})  [{(F_{0} (θ_{(i + 1)}))}^{2} - {(F_{0} (θ_{(i)}))}^{2}] \\ + \sum_{i = 0}^{M} {(F_{ϵ} (θ_{(i)}))}^{2}  [F_{0} (θ_{(i + 1)}) - F_{0} (θ_{(i)})] . \end{matrix}

where

F_{ϵ} (θ_{(i)}) = 1 - exp (- \sum_{k = 1}^{i} J_{k}^{'})

.

Proof.

Note that

F_{ϵ} (x) = \{\begin{matrix} 0 & if x < θ_{(1)} \\ F_{ϵ} (θ_{(i)}) & if θ_{(i)} \leq x < θ_{(i + 1)} (i = 1, \dots, M - 1) . \\ 1 & if x \geq θ_{(M)} \end{matrix}

Le

θ_{(0)} = 0

and

θ_{(M + 1)} = + \infty .

Then,

\begin{matrix} d (F_{ϵ}, F_{0}) & = & \int_{θ_{(0)}}^{θ_{(ϵ + 1)}} {[F_{ϵ} (x) - F_{0} (x)]}^{2} f_{0} (x) d x \\ = & \sum_{i = 0}^{M} \int_{θ_{(i)}}^{θ_{(i + 1)}} {[F_{ϵ} (θ_{(i)}) - F_{0} (x)]}^{2} f_{0} (x) d x . \end{matrix}

Substituting

y = F_{0} (x)

and

U_{(i)} = F_{0} (θ_{(i)})

gives

\begin{matrix} d (F_{ϵ}, F_{0}) & = & \sum_{i = 0}^{M} \int_{U_{(i)}}^{U_{(i + 1)}} {[F_{ϵ} (θ_{(i)}) - y]}^{2} d y \\ = & \frac{1}{3} \sum_{i = 0}^{M} \{{[F_{ϵ} (θ_{(i)}) - U_{(i)}]}^{3} - {[F_{ϵ} (θ_{(i)}) - U_{(i + 1)}]}^{3}\} . \\ = & \frac{1}{3} \sum_{i = 0}^{M}  [U_{(i + 1)}^{3} - U_{(i)}^{3}] - \sum_{i = 0}^{M} F_{ϵ} (θ_{(i)})  [U_{(i + 1)}^{2} - U_{(i)}^{2}] \\ + \sum_{i = 0}^{M} F_{ϵ}^{2} (θ_{(i)})  [U_{(i + 1)} - U_{(i)}] \\ = & \frac{1}{3} - \sum_{i = 0}^{M} F_{ϵ} (θ_{(i)})  [U_{(i + 1)}^{2} - U_{(i)}^{2}] + \sum_{i = 0}^{M} ((F_{ϵ}) {(θ_{(i)})}^{2}  [U_{(i + 1)} - U_{(i)}] \\ = & \frac{1}{3} - \sum_{i = 0}^{M} F_{ϵ} (θ_{(i)})  [{(F_{0} (θ_{(i + 1)}))}^{2} - {(F_{0} (θ_{(i)}))}^{2}] \\ + \sum_{i = 0}^{M} {(F_{ϵ} θ_{(i)})}^{2}  [F_{0} (θ_{(i + 1)}) - F_{0} (θ_{(i)})] . \end{matrix}

□

When considering the prior and posterior distributions of the Cramér–von Mises distance, the following lemma allows for the use of the approximation to

B S P (k (\cdot), F_{0})

.

Lemma 2.

If

F \sim B S P (k (\cdot), F_{0})

and

F_{ϵ}

is given by Algorithm A, then

d (F_{ϵ}, F_{0}) \overset{a . s .}{\to} d (F, F_{0})

as

ϵ \to 0 .

Proof.

By Kim and Lee (2001),

F_{ϵ} (x) \overset{a . s .}{\to} F (x)

, where

a . s .

stands for almost surely convergence. As

{(F_{ϵ} (x) - F_{0} (x))}^{2} \leq f_{0} (x)

, where

f_{0} (x)

is integrable, the proof follows from the dominated convergence theorem. □

3. Model Checking Using the Relative Belief

Consider the statistical model

\{F_{θ} : θ \in Θ\}

of continuous cdf on

R^{+}

. Let

X_{1}, \dots,

X_{n}

be a random sample drawn from an unknown cdf F defined on

R^{+}

. Assume that

(T, δ) = ((T_{1}, δ_{1}), \dots, (T_{n}, δ_{n}))

is observed, where

T_{i} = min (X_{i}, C_{i}),

δ_{i} = I (X_{i} \leq C_{i})

and

C_{1}, \dots, C_{n}

are censoring times. The aim is to test the hypothesis

H_{0} : F \in \{F_{θ} : θ \in Θ\}

. Let

B S P (k (\cdot), F_{0})

be the prior on F for some choice of

k (\cdot)

and

F_{0}

. Then

F (t) | (T, δ) \sim B S P (k^{*} (t), F_{0}^{*} (t))

, where

k (\cdot)

and

F_{0}

are defined in (5) and (6), respectively. If

H_{0}

is true, then the posterior distribution of the distance between F and the proposed model should be more concentrated around 0 than the prior distribution. As a result, this test involves comparing the concentrations of the prior and the posterior distributions of d (see Lemma 1) about 0 using the relative belief ratio with the interpretation as discussed in Section 2.2.

To perform this test, we must measure the distance and then set relevant values for

k (\cdot)

and

F_{0}

. To calculate the distance, similar to Al-Labadi and Evans [24], we computed

d (F, F_{\hat{θ}})

, where

\hat{θ}

is the relative belief estimate of

θ,

which is always the same as the maximal likelihood estimate (MLE) for the full model parameter. In terms of hyperparameters, we set

F_{0} = F_{\hat{θ}}

and

k (t) = k

for all t. There are numerous advantages in setting

F_{0} = F_{\hat{θ}}

. First, it avoids prior-data conflict, a possible contradiction between the data and the prior. This typically happens when the prior places its mass in a region of the parameter space where the data suggest the true value does not lie ([25,26]). In the context of the approach considered in this paper, prior-data conflict arises whenever there is a small overlap between the effective support regions of F and

F_{\hat{θ}}

. Note that, by Lemma 1, the distance

d (F, F_{\hat{θ}})

depends on the prior guess

F_{0}

through the jump points

θ_{i}

. If the

θ_{i}

lay in one tail of

F_{\hat{θ}}

, then we get prior-data conflict between F and

F_{\hat{θ}}

because

F_{0}

and F had the same effective support. To avoid this, it is required that

θ_{i}

are selected in a region that includes most of the mass of

F_{\hat{θ}}

. When

F_{0} = F_{\hat{θ}}

, then

F_{\hat{θ}}

is the prior mean of F; thus, both share the same effective support, which renders it a reasonable choice to avoid prior-data conflict. We refer the reader to Example 1 of Al Labadi and Evans (2018) for an interesting discussion about prior-data conflict. Nevertheless, the choice of

F_{0} = F_{\hat{θ}}

should also avoid any impacts due to the “double use of the data”. This means that the approach becomes conservative in detecting the model failure when

H_{0}

is false. Although setting

F_{0} = F_{\hat{θ}}

appears to induce a data-dependent prior distribution for d, the following lemma implies that this is not the case; thus, the approach is prior distribution-free with this choice.

Lemma 3.

If

F \sim B S P (k (\cdot), F_{\hat{θ}})

, then the distribution of

d (F, F_{\hat{θ}})

does not depend on

F_{\hat{θ}}

.

Proof.

Using Lemma 1, since

{(θ_{i})}_{1 \leq i \leq M}

is a sequence of i.i.d. random variables with continuous distribution

F_{\hat{θ}},

for

i \geq 1,

we have

U_{i} \overset{d}{=} F_{\hat{θ}} (θ_{i})

, where

{(U_{i})}_{1 \leq i \leq M}

is a sequence of i.i.d. random variables follow a uniform distribution on

[0, 1]

. Thus,

\begin{matrix} d (F_{ϵ}, F_{\hat{θ}}) & \overset{d}{=} & \frac{1}{3} - \sum_{i = 0}^{M} F_{ϵ} (θ_{(i)})  [{(U (i + 1))}^{2} - {(θ_{(i)})}^{2}] \\ + \sum_{i = 0}^{M} {(F_{ϵ} (θ_{(i)}))}^{2}  [U_{(i + 1)} - U_{(i)}], \end{matrix}

where

U_{(i)}

is the i-th order statistic for

{(U_{i})}_{1 \leq i \leq M}

i.i.d. uniform

[0, 1] .

Now, as

ϵ \to 0

, by Lemma 2, we conclude that the distribution of

d (F, F_{\hat{θ}})

does not depend on

F_{\hat{θ}}

. □

The following results shows that setting

F_{0} = F_{\hat{θ}}

prevents any effect due to the double use of the data. Specifically, as the sample size increases, the posterior distribution of

d (F, F_{\hat{θ}})

becomes concentrated around 0 if and only if

H_{0}

is true. For the proof, see Al-Labadi and Evans (2018).

Lemma 4.

Let

(T, δ) = ((T_{1}, δ_{1}), \dots, (T_{n}, δ_{n})) \sim F

, where

F \sim B S P (k (\cdot), F_{\hat{θ}})

. Suppose that

\hat{θ} \overset{a . s .}{\to} θ_{0}, {sup}_{y} | F_{\hat{θ}} (y) - F_{θ_{0}} (y) | \overset{a . s .}{\to} 0

as

n \to \infty .

(i): If $H_{0}$ is true, then, as $n \to \infty$ , $d (F | (T, δ), F_{\hat{θ}}) \overset{a . s .}{\to} 0$ .
(ii): If $H_{0}$ is false, then, as $n \to \infty$ , $lim inf d (F | (T, δ), F_{\hat{θ}}) \overset{a . s .}{>} 0 .$

Now, concerning the choice of

k (\cdot)

, we considered

k (t) = k

for all

t > 0 .

In general, larger values of k must be chosen to identify smaller deviations. Consequently, it is possible to consider multiple values of k. One way to perform that is, for instance, to start with

k = 1

. If a larger (smaller) value of k renders the relative belief ratios to be below (above) 1,

H_{0}

is rejected (accepted). As Section 5 shows, when the null hypothesis is correct (not correct), the relative belief ratio always remains above (below) 1 when larger (smaller) values of k are considered. When using the Dirichlet process, Al-Labadi and Zarepour [27] advised using

k \leq 0.5 n

for complete data to avoid the prior becoming too influential. Setting k between 1 and 10 is satisfactory for most purposes. As indicated in the introduction, when

k (t) = k

, the beta-Stacy process turns into the Dirichlet process. However, in the presence of right censored data, the posterior distribution of the Dirichlet process becomes beta-Stacy process ([19]). This justifies the necessity of using the beta-Stacy process in the approach.

4. Computational Algorithm

Closed forms of the prior and posterior densities of

D = d (F, F_{\hat{θ}})

are required to compute the relative belief ratio as in (7). This is not usually available. As a result, the relative belief ratio must be approximated through simulation. A particular problem of computing (7) arises here when both

π_{D} (0)

and

π_{D | (T, δ)} (0)

are close to 0, where

π_{D} (\cdot)

and

π_{D | (T, δ)} (\cdot)

denote the pdf’s of D and

D | (T, δ)

, respectively. In such a case, determining

R B_{D} (0 | (T, δ))

is difficult. The formal definition of the relative belief ratio, as discussed in Section 2.2, is as a limit that can be approximated at zero by

\begin{matrix} {\hat{R B}}_{D} (0 | T, δ) = \frac{Π_{D} ([0, d_{c}))}{Π_{D | (T, δ)} ([0, d_{c}))}, \end{matrix}

(9)

for a suitably small value

d_{c}

, where

Π_{D} (\cdot)

and

Π_{D | (T, δ)} (\cdot)

denote the cdfs of D and

D | (T, δ)

, respectively. From Equation (8), the strength of the evidence based on the relative belief ratio

R B_{D} (0 | T, δ)

can be computed using

\begin{matrix} S t r_{D} (0 | (T, δ)) = Π_{D | (T, δ)} (R B_{D} (d | (T, δ)) \leq R B_{D} (0 | (T, δ))) . \end{matrix}

(10)

Appendix B provides computational Algorithm C for assessing

H_{0}

on the basis of estimates of (9) and (10). A similar algorithm for complete data based on the Dirichlet process was developed by Al-Labadi and Evans [24].

5. Examples

The approach is demonstrated by two main examples in this section. Throughout this section, let Exp

(λ)

, Weibull

(k, λ)

, and Lognorma l

(μ, σ)

denote the exponential distribution with mean

1 / λ

, the Weibull distribution with shape parameter k and scale parameter

λ

, and the log-normal distribution with mean

μ

and standard deviation

σ

, respectively. In all examples, the sensitivity to choosing k is investigated. We set

ϵ = 0.01

,

i_{0} = 1

,

M_{0} = 20

, and

r_{1} = r_{2} = 1000

in Algorithms A, B, and C, though other values are also possible.

R

package parmsurvfit was used to compute the MLE of the distribution parameters.

Example 1.

A real dataset from Lee and Wang (2003) based on the remission times (in months) of cancer patients. The dataset is given in Appendix C. We tested hypothesis

H_{0} :

F, given in Table 1, is the underlying distribution of the observed data,

where F could be a family of distributions (composite hypothesis) or a specific distribution with known parameters (simple hypothesis). Various values of k were considered to investigate the approach’s sensitivity to concentration parameter selection. The p-value of the (frequentist) log-rank test was also computed for comparison purposes. Table 1 summarizes the findings. When

H_{0}

is true, we want

R B > 1

and a strength close to 1, and when

H_{0}

is false, we want

R B < 1

and a strength close to 0. According to Table 1, the proposed test performed well in this example.

Example 2.

Simulated data. The primary purpose of this dataset is to investigate how the proposed test performs as the sample size increases. We considered data

(T_{1}, δ_{1}), \dots, (T_{n}, δ_{n})

, where

T_{i} = min (X_{i}, C_{i})

where the survival times

{(X_{i})}_{1 \leq i \leq n}

were generated from Lognormal

(1, 4)

, while the censored time

{(C_{i})}_{1 \leq i \leq n}

are generated from Lognormal

(4, 1)

.

H_{0} :

F in Table 2 is the underlying distribution of the observed data.

Table 2 summarizes the results that show that the selected models were accepted. Figure A1, Figure A2 and Figure A3 (see Appendix D) give the plots of

F_{0} = F_{\hat{θ}}

and 5 sample paths each for the prior beta-Stacy process and the posterior beta-Stacy process for each case in Table 2. These figures clearly show that the plots of the sample paths for the posterior process moved toward the plot of

F_{0} .

This supports the previous conclusion regarding the null hypothesis. Furthermore, in this case, the p-values of the (frequentist) log-rank test support the conclusion that the null hypothesis should not be rejected.

6. Concluding Remarks

The beta-Stacy process and relative belief ratio were used to propose a general approach for model checking. This method could be used for both complete and right-censored data. It could also be used to test composite or simple hypotheses. Several examples demonstrated that the approach works very well.

Though the Cramér–von Mises distance was used here, other distance measures such as the Kolmogorov–Smirnov and Anderson–Darling distances are viable alternatives. Testing for families of multivariate distributions is an important extension of the approach presented in this paper. While conceptually similar, computational and inferential issues must be addressed. This problem can be addressed in future work.

Author Contributions

Formal analysis, A.A.; Investigation, L.A.-L. and M.A.; Methodology, L.A.-L. and A.A.; Project administration, L.A.-L.; Software, L.A.-L. and M.A.; Writing – review & editing, A.A. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are gratefully acknowledge that the work in this paper was partly supported by Faculty Research grant FRG21-S-S03 and the Open Access Program from the American University of Sharjah.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Simulation Algorithms

Algorithm A: Simulation algorithm to approximate the prior of beta-Stacy process.

Generate log-beta process with parameters $α (d z) = k (z) F_{0} (d z)$ and $β (z) = k (z) F_{0} [z, \infty) .$ The following steps are required to accomplish this:
(a)
Fix a small positive number $ϵ$ .
(b)
Generate the total number of jumps $M \sim P o i s s o n (λ_{ϵ}),$ where $λ_{ϵ} = α ([0, \infty)) / ϵ$ .
(c)
For $i = 1, \dots M,$ generate the jump times $θ_{i}$ from the probability density function $α (d z) / α ([0, \infty)) .$
(d)
Generate the jump sizes $J_{1}, \dots, J_{M}$ such that $1 - exp (- J_{i}) | θ_{i} \sim B e t a (ϵ, β (θ_{i})) .$
(e)
Set

$\begin{matrix} Z_{ϵ} (t) = \sum_{k = 1}^{M} J_{k} I (θ_{k} \leq t) . \end{matrix}$

(A1)
The approximate prior beta-Stacy process is $F_{ϵ} (t) = 1 - exp (- Z_{ϵ} (t))$ .

Algorithm B: Simulation algorithm to approximate the posterior of beta-Stacy process.

Generate the posterior log-beta process. The following steps are required to accomplish this:
(a)
Generate the process ${\{Z_{ϵ}^{*} (t)\}}_{t \geq 0}$ on the basis of Algorithm A with $β (θ_{i})$ is replaced by $β (θ_{i}) + Y (θ_{i}) .$
(b)
Generate the process $Z_{f}^{*} (t) = \sum_{i = 1}^{l} S_{i}^{*} I (T_{i} \leq t)$ , where the random jumps ${(S_{i}^{*})}_{1 \leq i \leq l}$ are associated with the fixed points of discontinuity from the distribution given in (4), where $l \leq n$ . These random jumps occur at points where the data are not right-censored.
(c)
The approximated posterior log-beta process is given by:

$\begin{matrix} Z_{ϵ}^{*} (t) = Z_{ϵ} (t) | (T, δ) & = & Z_{ϵ}^{*} (t) + Z_{f}^{*} (t) \\ = & \sum_{i = 1}^{M} J_{i}^{*} I (θ_{i} \leq t) + \sum_{i = 1}^{l} S_{i}^{*} I (t_{i} \leq t) . \end{matrix}$
The approximate posterior beta-Stacy is $F_{ϵ}^{*} (t) = F_{ϵ} (t) | (T, δ) = 1 - exp (- Z_{ϵ}^{*} (t)) .$

Appendix B. Algorithm for Model Checking

Algorithm C (Relative belief algorithm for model checking):

(i) Generate a sample from

F_{ϵ}

, where

F_{ϵ}

is an approximation of

F \sim B S P (k (\cdot), F_{\hat{θ}})

. See Algorithm A.

(ii) Compute

d (p r i) = d (F_{ϵ}, F_{\hat{θ}})

as described in Lemma 2.

(iii) Repeat Steps (i) and (ii) to obtain a sample of

r_{1}

values from the prior of D.

(iv) Generate a sample from

F_{ϵ} | (T, δ)

, where

F_{ϵ} | (T, δ)

an approximation of F. See Algorithm B.

(v) Compute

d (p o s) = d (F_{ϵ} | (T, δ), F_{\hat{θ}})

as described in Lemma 2.

(vi) Repeat Steps (iv) and (v) to obtain a sample of

r_{2}

values from the posterior of D.

(vii) For a fixed positive number

M_{0}

, let

{\hat{F}}_{D}

denote the empirical cdf of D based on the prior sample in (iii) and for

i = 0, \dots, M_{0},

let

{\hat{d}}_{i / M_{0}} (p r i)

be the estimate of

d_{i / M_{0}} (p r i),

the

(i / M_{0})

-th prior quantile of

D .

Here

{\hat{d}}_{0} (p r i) = 0

, and

{\hat{d}}_{1} (p r i)

is the largest value of

d (p r i)

. Let

{\hat{F}}_{D} (\cdot | (T, δ))

denote the empirical cdf of D based on the posterior sample of

d (p o s)

in (vi). For

d \in [{\hat{d}}_{i / M_{0}} (p r i), {\hat{d}}_{(i + 1) / M_{0}} (p r i))

, estimate

R B_{D} (d | (T, δ))

by the ratio of the estimates of the posterior and prior contents of

[{\hat{d}}_{i / M_{0}} (p r i), {\hat{d}}_{(i + 1) / M_{0}} (p r i)) .

Specifically,

{\hat{R B}}_{D} (d | (T, δ)) = M_{0} {{\hat{F}}_{D} ({\hat{d}}_{(i + 1) / M_{0}} (p r i) | (T, δ)) - {\hat{F}}_{D} ({\hat{d}}_{i / M_{0}} (p r i) | (T, δ))},

(A2)

Moreover, estimate (9) by

{\hat{R B}}_{D} (0 | (T, δ)) = M_{0} {\hat{F}}_{D} ({\hat{d}}_{i_{0} / M_{0}} (p r i) | (T, δ))

where

i_{0}

is chosen so that

i_{0} / M_{0}

is not too small (typically

i_{0} / M_{0} \approx 0.05)

.

(viii) Estimate (10) by the finite sum

\sum_{A} ({\hat{F}}_{D} ({\hat{d}}_{(i + 1) / M_{0}} (p r i) | (T, δ)) - {\hat{F}}_{D} ({\hat{d}}_{i / M} (p r i) | (T, δ))),

(A3)

where

A = {i \geq i_{0} : {\hat{R B}}_{D} ({\hat{d}}_{i / M_{0}} (p r i) | (T, δ)) \leq {\hat{R B}}_{D} (0 | (T, δ))}

.

By Al-Labadi and Evans (2018), for fixed

M_{0},

as

r_{1} \to \infty, r_{2} \to \infty,

(A2) converge a.s. to

R B_{D} (d | (T, δ))

and (A3) converge a.s. to

D P_{D} (R B_{D} (d | (T, δ)) \leq R B_{D} (0 | (T, δ)) | (T, δ))

.

Appendix C. Example 1 Dataset

Table A1. Remission times (months) of 137 cancer patients of the dataset used in Example 1. + denotes a right-censored observation.

4.5	19.13	14.24	7.87	5.49	2.02	9.22	3.82	26.31	4.65+
2.62	0.90	21.73	0.87+	0.51	3.36	43.01	0.81	3.36	1.46
24.80+	10.86+	17.14	15.96	22.69	4.33	7.28	2.46	3.48	4.23
6.54	8.65	5.41	2.23	4.34	32.15	4.87	5.71	7.59	3.02
4.51	1.05	9.47	79.05	2.02	4.26	11.25	10.34	10.66	12.03
2.64	14.74	1.19	8.66	14.83	5.62	18.10	25.74	17.36	1.35
9.02	6.94	7.26	4.70+	3.70	3.64	3.57	11.64	6.25	25.82
3.88	3.02+	19.36+	20.28	46.12	5.17	0.20	36.66	10.06	4.94
5.06	16.62	12.07	6.97	0.08	1.40	2.75	7.32	1.26	6.76
8.60+	7.62	3.52	9.74	0.40	5.41	2.54	2.69	8.26	0.50
5.32	5.09	2.09	7.93	12.02	13.08	5.85	7.09	5.32	4.33+
2.83	8.37	14.77	8.53	11.98	1.76	4.40	34.26	2.07	17.12
12.63	7.66	4.18	13.29	23.63	3.25	7.63	2.87	3.31	2.26
2.69	11.79	5.34	6.93	10.75	13.11	7.39