2. Estimation When the Cure Status Is Partially Available
Let Y be the time until the event of interest, X is a vector of covariates and is the distribution function of Y conditional on . In follow-up studies, the event of interest may not be observed due to, for example, the end of the study or loss to follow up, which occurs at censoring time with conditional distribution function . As a consequence, instead of observing Y, only the possibly censored survival time and the indicator of the event can be observed. The random variables Y and are assumed to be conditionally independent given , which is a widely used assumption in most studies. We set if the subject will not experience the event and so is cured. Let be the indicator of being cured. Note that is partially observed because the individual is known not to be cured () when the event is observed (), but in the general situation, is unknown when . When the cure status is partially known, some censored individuals are identified to be cured, so is observed.
To accommodate the cure status information, we include an additional random variable , which indicates whether the cure status is known () or not (). Furthermore, let the censoring distribution be an improper distribution function . Thus, with probability , the censoring variable is , and with probability the value of the censoring variable corresponds to the value of a random variable C with proper continuous distribution function . A cured individual is identified with probability . In this setup, the data actually observed are , where the observed time is , except for those identified as cured which is . Hence, the observations can be classified into three groups: (a) the individual is observed to have experienced the event and, therefore, is known to be uncured ; (b) the lifetime is censored and the cure status is unknown ; and (c) the lifetime is censored and the individual is known to be cured . In standard cure models where the cure status is unknown for all the censored observations, only groups (a) and (b) are considered.
The probability of cure is
, and the conditional survival function of the uncured individuals, also known as latency, is
. The mixture cure model specifies the survival function
as the following.
Assuming model (
1) and the availability of a suitable estimator of the
, estimators of the cure probability and the latency can be derived by considering the following relationships.
Safari et al. [
2] proposed the generalized product-limit estimator of the conditional survival function
when the cure status is partially known, which is the following:
where
,
,
, and
are the concomitants of the ordered observed times
, and
is the Nadaraya–Watson (NW) weight of the following:
is a kernel function
rescaled with bandwidth
h. The corresponding estimator of the cure rate
[
3] is the following:
where
is the largest uncensored observed time. Here, in light of (
3), (
4), and the relation in (
2), a nonparametric estimator of the latency function is given by the following.
The optimal bandwidth for
in (
3) is not necessarily the optimal bandwidth for
in (
4); therefore, the estimator in (
5) is a more general estimator that uses two different bandwidths for estimating
and
. Note that if
, then the estimator in (
5) reduces to the following estimator.