Next Article in Journal
Some Multifaceted Aspects of Mathematical Physics, Our Common Denominator with Elliott Lieb
Next Article in Special Issue
Limiting Behaviors of Stochastic Spread Models Using Branching Processes
Previous Article in Journal
Complex Dynamic Behaviors of a Modified Discrete Leslie–Gower Predator–Prey System with Fear Effect on Prey Species
Previous Article in Special Issue
A New Family of Lifetime Models: Theoretical Developments with Applications in Biomedical and Environmental Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Proportional Hazard Model and Proportional Odds Model under Dependent Truncated Data

Department of Mathematics, National Chung Cheng University, Chia-Yi 621301, Taiwan
*
Author to whom correspondence should be addressed.
Axioms 2022, 11(10), 521; https://doi.org/10.3390/axioms11100521
Submission received: 25 August 2022 / Revised: 22 September 2022 / Accepted: 27 September 2022 / Published: 1 October 2022
(This article belongs to the Special Issue Applied Mathematics in Biology and Medicine)

Abstract

:
Truncation data arise when the interested event time can be observed only if it satisfies a certain condition. Most of the existing approaches analyze this kind of data by assuming the truncated variable is quasi-independent of the interested event time. However, in many situations, the quasi-independence assumption may be not suitable. In this article, the authors consider the copulas to relax the quasi-independence assumption. Additionally, the survival function of the interested event time is estimated by a copula-graphic approach. Then, the authors propose two estimation procedures for the proportional hazard (PH) model and the proportional odds (PO) model, which can be applied to the right-truncated data, and the left-truncated and right-censoring data. Subsequently, the performance of the proposed estimation approaches is assessed via simulation studies. Finally, the proposed methodologies are applied to analyze two real datasets (the retirement center dataset and the transfusion-related AIDS dataset).

1. Introduction

In survival analysis, censoring and truncation are usually encountered, which make the statistical inference complicated. When the censoring occurs, we do not know the survival time exactly. However, we have partial information for the censoring subject. In the truncation data, there is no information for the truncated subject. In this article, we consider two kinds of data, the left truncated and right censoring data and the right truncated data. The pair of truncated variables, ( X , Y ) , satisfy the condition Y > X . When Y is the variable of interest, the data are called left-truncated data. Then, the observation data are { ( X i , T i , δ i ) : X i < T i , i = 1 , , n } , where T = m i n ( Y , C ) , δ = 1 { Y < C } , and C is the right censoring time. On the other hand, when X is the variable of interest, the data are said to be right-truncated data. The observation data are { ( X i , Y i ) : X i < Y i , i = 1 , , n } . Previous studies assumed that X and Y are quasi-independent for the survival function estimation [1,2,3], linear regression model [4,5] and hazard rate model [6,7,8]. However, this assumption of quasi-independence may not be suitable in some situations. For instance, ref. [9] rejected the quasi-independence assumption using conditional Kendall’s tau for the transfusion-related AIDS dataset. Throughout this paper, we formulate a regression model on the interested event time (X or Y), which includes the PH model and the PO model, to discuss the relationships between the event time and covariates. For the regression problem under dependent truncated data, ref. [10] considered the semiparametric inference for an accelerated failure time model, which utilized the information of truncated variable into the regression model to relax the independent truncation assumption. Ref. [11] suggested an expectation-maximization algorithm to relax the independence truncation assumption for the Cox regression model. Ref. [12] proposed the conditional maximum likelihood estimators to relax the independence truncation assumption. This paper considers the copulas to relax the independence truncation assumption between X and Y. In addition, we apply the method by [13], which generalizes the copula-graphic estimator, to estimate the survival functions and the Kendall’s tau. Then, we extend the methods by [14] (the application of the area between two survival curves) and [15] (the minimization of the norm distance between two survival curves) to estimate the regression parameters for the PH model and the PO model under dependent truncated data.
The rest of this article is described in the following. In Section 2, the dependent truncated data, the copula models, and the regression models (the PH model and the PO model) are introduced. In Section 3, using the method by [13], we estimate the survival functions and Kendall’s tau in the case of dependent truncated data. Furthermore, we build a regression model and propose two methods for estimating the regression parameters under the dependent truncated data using the methods by [14,15], which are applied to semi-competing risks data and dependent current status data. Finally, the bootstrapping procedure is described. In Section 4, the performance of the proposed estimation procedures is examined via simulation studies in different sample sizes, Kendall’s tau, and copula models. In Section 5, we analyze two real datasets [16,17] and apply the estimation procedures mentioned in Section 3. Finally, we conclude in Section 6.

2. Data and Model Assumptions

2.1. Dependent Truncated Data

Dependent truncated data are a data type in the survival analysis. In this kind of incomplete data, the truncated variables pairs of ( X , Y ) can only be observed when they satisfy Y > X . Depending on the variable of interest (i.e., Y or X), left- or right-truncated variables are respectively termed. In the example by [16], the retirement center dataset was considered a set of left truncated and right censoring data. It defined X as the entry age, Y as the length of lifetime, and C as the censoring time. Assume that C is independent of (X, Y) and δ = 1 { Y < C } . In this example, the participants were only included when they lived long enough to enter the retirement house. Define T = m i n ( Y , C ) , and the observed data are { ( X i , T i , δ i ) : X i < T i , i = 1 , , n } . In the other case, the dataset by [17] was considered a set of right-truncated data. It defined X as the induction time and Y as the time between the infection onset and the end of the study. In this example, the participants were only included if they had developed AIDS before 102 months from the beginning of the study (i.e., they satisfied X < Y ). The observed data are { ( X i , Y i ) : X i < Y i , i = 1 , , n } .

2.2. Semisurvival Copula

A semisurvival copula was used in the case of truncated data [13]. The semisurvival copula is a statistical method proposed for modeling the dependence between X and Y when the pairs satisfy the condition of X < Y . The joint distribution of X and Y is expressed as follows:
π ( x , y ) = P r ( X x , Y > y | Y > X ) = 1 c C α { ( F X ( x ) , S Y ( y ) } ,
where F X ( x ) is the cumulative distribution function of X, S Y ( y ) is the survival function of Y, C α is a semisurvival copula function, and c is a normalizing constant, which represents P r ( Y > X ) . As is widely known, the Archimedean copula model (AC model) [18] is a popular subclass of the copula model with the generating function ϕ α , which can be expressed as follows:
C α { u , v } = ϕ α 1 ( ϕ α ( u ) + ϕ α ( v ) ) , 0 < u , v < 1 ,
where ϕ α : ( 0 , 1 ] [ 0 , ) is a continuous strictly decreasing convex function with ϕ α ( 1 ) = 0 , ϕ α ( t ) < 0 , ϕ α ( t ) > 0 . Furthermore, α is an association parameter related to Kendall’s tau. The AC model is a simpler form of the copula model and includes many famous copula functions, such as Clayton copula, Frank copula, Gumbel copula, independence copula, etc.

2.3. Regression Model

Let T * be the event time we are interested in, which could be Y or X. Then, we consider the following regression model as
h ( T * ) = Z θ + ε ,
where h ( . ) is an unknown monotonic increasing function, Z is a discrete covariate, θ is a parameter, both Z and θ are p × 1 vectors, and ε is the error term, which distribution is known. Two distinct scenarios are described when h ( . ) is unknown and the distribution of ε is formulated. In one scenario, the Cox proportional hazard model is described when ε follows the Gumbel extreme value distribution. In the other case, the proportional odds model is described when ε follows the standard logistic distribution.
The Cox PH model, which is a regression model commonly employed in survival analysis, links the hazard function with the covariates. The Cox PH model consists of two parts. One is h 0 ( y ) , the baseline hazard function, and the other is the exponential term, exp ( Z θ ) . The form of the Cox PH model is as follows:
h ( y , Z ) = h 0 ( y ) e Z θ .
Furthermore, the hazard ratio of Z 1 to Z 2 can be expressed as follows:
H R = h ( y , Z 1 ) h ( y , Z 2 ) = e x p ( Z 1 Z 2 ) θ .
The PO model is one of the linear models, designed for studying the effect of covariates on the odds. The form of the PO model is as follows:
ϕ ( y , Z ) = ϕ 0 ( y ) e Z θ ,
where ϕ ( y , Z ) is the failure odds of the individual at time y with variable Z. ϕ 0 ( y ) is the baseline odds. Furthermore, the odds ratio of Z 1 to Z 2 can be expressed as follows:
O R = ϕ ( y , Z 1 ) ϕ ( y , Z 2 ) = e x p ( Z 1 Z 2 ) θ .

3. The Proposed Estimation Procedures

The purpose of this article is to estimate the parameter θ within the regression model (3) under dependent truncated data, which measures the covariate effect on the interested event time. By [13,19], the survival function of Y (or X) can be obtained, which is defined as S ^ Y ( y ) (or S ^ X ( x ) ). In the following two subsections, two methods are proposed to estimate the regression parameter θ . The one, proposed by [14], was used for the case of semi-competing risks data. The other, proposed by [15], was used for the case of dependent current status data. Here, we extend the two methods to the left truncated and right censoring data, and the right truncated data. When Y is the interesting variable, we apply S ^ Y ( y ) to Method 1 and Method 2, such as the following inference. When X is the interesting variable, we apply S ^ X ( x ) to Method 1 and Method 2.

3.1. Method 1: The Application of the Area between Two Survival Curves

The test statistic with Z = 0 , 1 , as shown in the following, can be used to implement the hypothesis test H 0 : S Y , 0 ( x ) = S Y , 1 ( x ) , where S Y , Z ( x ) = P r ( Y > x | Z = z ) ,
U Y = n 0 n 1 n W ( x ) { S ^ Y , 0 ( x ) S ^ Y , 1 ( x ) } d x .
In the above test statistic, W ( x ) is the weight function, n 0 is the sample size of group Z = 0 , n 1 is the sample size of group Z = 1 , and n = n 0 + n 1 is the total size. For the general case, define { z k , k = 1 , 2 , , K } as the set of all possible values of Z and θ 0 as the true value of θ . If the regression model (3) is true, there exists a functional transformation, ξ θ ( . ) such that ξ ( z j z k ) T θ 0 ( S Y , z k ) ( y ) = S Y , z j ( y ) . Define g k j ( y , θ ) = ξ z k j T θ ( S Y , z k ) ( y ) S Y , z j ( y ) , then, the estimator of g is g ^ k j ( y , θ ) = ξ z k j T θ ( S ^ Y , z k ) ( y ) S ^ Y , z j ( y ) , where z k j = z j z k and S ^ Y , Z ( y ) = P r ^ ( Y > y | Z = z ) . Therefore, the estimating equation of θ becomes
U ( θ ) = k < j w 0 ( z k j T θ ) z k j n k n j n k + n j 0 t k j W k j ( y ) g ^ k j ( y , θ ) d y = 0 ,
where t k j is the largest value of Y in the subsample with Z = z k and Z = z j ; w 0 ( . ) is the weight function of group Z k and group Z j ; and W k j ( . ) is the weight function for time y of two survival curves. From [13], they proved the consistency and asymptotic normality properties of S ^ Y , z ( x ) . Then, the consistency and asymptotically normality properties of θ ^ of method 1 can be proved by Theorem 1 of [14].
The aforementioned content discussed the situation of multiple covariates Z k , k = 1 , , K . Based on this discussion, the following examples are described with Z = 0 , 1 . When h ( y ) is unknown and the distribution of ε is formulated, the general functional transformation is ξ θ ( S ) ( y ) = S ε S ε 1 { S ( y ) } θ , where S ε ( y ) = P r ( ε > y ) is the survival function of ε . Subsequently, the transformation under the truncated data with Z = 0 , 1 can be described as follows:
S Y , 0 ( x ) = P r ( Y > x | Z = 0 ) = P r ( h ( Y ) > h ( x ) | Z = 0 ) = P r ( ε > h ( x ) ) = S ε ( h ( x ) )
S Y , 1 ( x ) = P r ( Y > x | Z = 1 ) = P r ( h ( Y ) > h ( x ) | Z = 1 ) = P r ( ε > h ( x ) θ ) = S ε ( h ( x ) θ ) .
Plugging S Y , 0 ( x ) into ξ θ 0 ( . ) , then,
ξ θ 0 ( S Y , 0 ) ( x ) = S ε S ε 1 { S ε ( h ( x ) ) } θ 0 = S ε ( h ( x ) θ 0 ) = S Y , 1 ( x ) ,
where θ 0 is the true value of θ . The next two distinct models are discussed under the circumstances where Z = 0 , 1 , h ( . ) is unknown, and the distribution of ε is formulated.

3.1.1. Estimation under Cox Proportional Hazard Model

When ε has the Gumbel extreme value distribution, the regression model (3) is considered the Cox PH model. Subsequently, S ε ( y ) = e x p { e x p ( y ) } and ξ θ ( S ) ( y ) = S ( y ) e x p ( θ ) . In this case, it follows
S Y , 1 ( y ) = S Y , 0 ( y ) e x p ( θ ) .
Hence, the equation for estimating θ becomes
U ^ ( θ ) = n 0 n 1 n 0 t ( n ) W ( t ) { S ^ Y , 0 ( t ) e x p ( θ ) S ^ Y , 1 ( t ) } d t = 0 .
Thus, the following equation for estimating θ can be expressed as
U ^ ( θ ) = n 0 n 1 n i = 1 n 1 W ( t ( i ) ) ( t ( i + 1 ) t ( i ) ) { S ^ Y , 0 ( t ( i ) ) e x p ( θ ) S ^ Y , 1 ( t ( i ) ) } = 0 ,
where t ( i ) is the ordered survival jump time point in the two pool samples.

3.1.2. Estimation under the Proportional Odds Model

When ε has the standard logistic distribution, the regression model (3) is considered the PO model. Subsequently, S ε ( y ) = 1 / { 1 + e y } and ξ θ ( S ) ( y ) = S ( y ) / { e x p ( θ ) S ( y ) e x p ( θ ) + S ( y ) } . In this case, it follows
S Y , 1 ( y ) = S Y , 0 ( y ) e x p ( θ ) S Y , 0 ( y ) e x p ( θ ) + S Y , 0 ( y ) .
Hence, the equation for estimating θ becomes
U ^ ( θ ) = n 0 n 1 n 0 t ( n ) W ( t ) S ^ Y , 0 ( y ) e x p ( θ ) S ^ Y , 0 ( y ) e x p ( θ ) + S ^ Y , 0 ( y ) S ^ Y , 1 ( t ) d t = 0 .
Therefore, the estimating equation of θ is
U ^ ( θ ) = n 0 n 1 n i = 1 n 1 W ( t ( i ) ) ( t ( i + 1 ) t ( i ) ) S ^ Y , 0 ( t ( i ) ) e x p ( θ ) S ^ Y , 0 ( t ( i ) ) e x p ( θ ) + S ^ Y , 0 ( t ( i ) ) S ^ Y , 1 ( t ( i ) ) = 0 .

3.2. Method 2: The Minimization of the Norm Distance between Two Survival Curves

We use the area between two survival curves in Method 1. In Method 2, as proposed by [15], we would consider the minimization of the norm distance between two curves. Define the estimator of g as g ^ k j ( y , θ ) = ξ z k j T θ ( S ^ Y , z k ) ( y ) S ^ Y , z j ( y ) , then, the norm distance between two survival curves can be expressed as follows:
U ˜ ( θ ) = k < j w 0 ( z k j T θ ) | | g ^ k j ( y , θ ) | | = k < j w 0 ( z k j T θ ) | | ξ z k j T θ ( S ^ Y , z k ) ( y ) S ^ Y , z j ( y ) | | = k < j w 0 ( z k j T θ ) t ( i ) A k j ξ z k j T θ ( S ^ Y , z k ) ( t ( i ) ) S ^ Y , z j ( t ( i ) ) 2 1 2 ,
where w 0 ( . ) is the weight function and A k j is the set of survival jump time points in the pool samples with Z = z k and Z = z j . Subsequently, we obtain the parameter estimation of θ by minimizing U ˜ ( θ ) . Following the similar argument in Appendix B of [14] with the large sample properties of S ^ Y , z k ( y ) [13], the consistency property of θ ^ can be obtained. Then, by Taylor series expansion and the inference procedure in the proof of Theorem 1 of [14], we can prove the asymptotic normality property of n ( θ ^ θ 0 ) .

3.3. Estimate Variance by the Bootstrap Approach

In this paper, we apply the bootstrap method to estimate the variance of the estimation of θ . Under the left truncated and right censoring data, ( u i , v i ) are generated with the copula model, which is independent of w i , and they are all set to follow Uniform(0,1). From [13], S ^ Y , z k ( t ) , F ^ Y , z k ( t ) = 1 S ^ Y , z k ( t ) , F ^ X , z k ( t ) , and S ^ C , z k ( t ) for group Z = z k can be obtained. Therefore, we can obtain the data { ( X i * * , Y i * * , C i * * ) , i = 1 , . . . , n z k } with X i * * < m i n ( Y i * * , C i * * ) , where X i * * = F ^ X , z k 1 ( u i ) , Y i * * = F ^ Y , z k 1 ( v i ) , and C i * * = F ^ C , z k 1 ( w i ) . Define T i * * = m i n ( Y i * * , C i * * ) and δ i * * = I ( Y i * * < C i * * ) , then, the new bootstrapped data are
{ ( X i * * , T i * * , δ i * * ) : X i * * < T i * * , i = 1 , , n } .
In the other case, under the right truncated data, the copula relational variables are generated by ( u i , v i ) , and the margins follow Uniform(0,1). By [13], S ^ X , z k ( t ) , F ^ X , z k ( t ) = 1 S ^ X , z k ( t ) , and F ^ Y , z k ( t ) for group Z = z k are already obtained. Therefore, we can obtain the data { ( X i * * , Y i * * ) , i = 1 , , n z k } with X i * * < Y i * * , where X i * * = F ^ X , z k 1 ( u i ) and Y i * * = F ^ Y , z k 1 ( v i ) . The new bootstrapped data are
{ ( X i * * , Y i * * ) : X i * * < Y i * * , i = 1 , , n } .
Based on the bootstrapping data, we can estimate θ by the above methods. Repeating the procedure B times, the standard deviation and the confidence interval can be obtained by the B values of θ ^ .

4. Simulation Studies

In this section, we examine the performance of the proposed estimation procedures via simulations. Firstly, we generate the sample size with n = 100 (or 200) for each group by using the Clayton copula, Gumbel copula, and Frank copula for the dependence between X and Y. Below, the left-truncated and right-censoring data, and the right-truncated data are separately discussed. When Y is the variable of interest, define h ( Y ) = l o g ( Y ) for the PH model, h ( Y ) = l o g ( e Y 1 ) for the PO model, w 0 ( . ) = W k j ( . ) = 1 , and generate C from Uniform(0,10). For the Clayton and Gumbel copula with τ = 0.05 and Frank copula with τ = 0.3 , 0.5 , we generate e ε from e x p ( 1 ) and X from e x p ( 1 ) under the PH model, and ε from the standard logistic distribution and X from e x p ( 1 ) under the PO model. For the Frank copula with τ = 0.7 , we generate e ε from e x p ( 1 ) and X from e x p ( 0.7 ) under the PH model, and ε from the standard logistic distribution and X from e x p ( 0.7 ) under the PO model. When X is the variable of interest, define h ( X ) = l o g ( X ) for the PH model, h ( X ) = l o g ( e X 1 ) for the PO model, and w 0 ( . ) = W k j ( . ) = 1 . We set Kendall’s tau τ = 0.05 for the Clayton and Gumbel copula, and τ = 0.3 , 0.5 , 0.7 for the Frank copula. Then, we generate e ε from e x p ( 1 ) and Y from e x p ( 1 ) for the PH model, and ε from the standard logistic distribution and Y from e x p ( 1 ) for the PO model. For the regression model, we consider two cases on T * , which is the event time that we are interested in (X or Y). In the Case 1, we consider the regression model with Z = 0 , 1 ,
C a s e 1 : h ( T * ) = θ 0 Z + ε ,
where θ 0 = 0.3 for the PH model and θ 0 = 0.3 for the PO model. In Case 2, we consider the regression model
C a s e 2 : h ( T * ) = θ 10 Z 1 + θ 20 Z 2 + ε ,
where ( Z 1 , Z 2 ) = ( 0 , 0 ) for group 1, ( Z 1 , Z 2 ) = ( 1 , 0 ) for group 2, ( Z 1 , Z 2 ) = ( 0 , 1 ) for group 3, and ( θ 01 , θ 02 ) = ( 0.3 , 0.3 ) for the PH model and ( θ 01 , θ 02 ) = ( 0.3 , 0.3 ) for the PO model. Through 500 iterations of simulation runs and 100 iterations of bootstrapping procedures, we obtain five indices of the simulation results, which are presented as the bias of the proposed method (Bias), the empirical standard deviation (EmpSD), the average standard deviation based on the bootstrapping method (AveSD), the mean squares error (MSE), and the coverage probability (CP) of the 95% confidence interval.
Table 1, Table 2, Table 3 and Table 4 show the results of case 1 and case 2 when the variable of interest is Y. Table 5, Table 6, Table 7 and Table 8 show the simulation results when the variable of interest is X. According to Table 1, Table 2, Table 3 and Table 4, under the left-truncated and right-censoring data, the performance of the proposed methods is good, and the standard deviation and the mean square error of method 1 are smaller than those of method 2 in most situations. From Table 5, Table 6, Table 7 and Table 8, under the right-truncated data, the performance of the proposed methods is also good, and the standard deviation and the mean square error of method 2 are smaller than method 1 under the conditions of (i) the Clayton and Gumbel copula with the PH and the PO model and (ii) the Frank copula with the PO model. However, the standard deviation and the mean square error of method 1 are smaller than method 2 under the Frank copula with the PH model. Moreover, across the tables, we note that the standard deviation and the mean square error are decreasing when the correlation is increasing. The coverage probability (CP) of the 95% confidence interval is near 0.95.

5. Data Analysis

In this section, we analyze two real datasets [16,17] with the proposed methods. Ref. [16] included a retirement center dataset from the Channing House retirement community in Palo Alto, California, which includes the age at death (i.e., the failure time), the age when admitted into the community (i.e., the truncated time), and the age when leaving the community or the termination of the study (i.e., the censoring time). In this dataset, the failure time is the variable of interest, and it is an example of the left-truncated and right-censoring data. The data include 462 observations, 97 males (46 deceased and 51 were censored), and 365 females (130 deceased and 235 were censored). All observations in this dataset were obtained by the health care program at the center. The residents were granted easy access to medical care without any additional financial burden. For these data, we divide the observations into two groups by gender. Then, we transform the time scale as 10 months one unit for the truncated time X and the length of the lifetime Y. Define Z = 0 for females and Z = 1 for males. We employ the PH model and PO model to study the relationship between the failure time and Z. The first plot of Figure 1 is the survival curves of the failure time with the Frank copula in the female and male groups.
The second set of data is introduced by [17], who studied patients who developed AIDS through contaminated blood transfusions up until 1 July 1986. The data include the infection time Q, the induction time X, and age in years. Define Y = 102 Q as the time between the infection onset and the termination of the study. Individuals were observed only when X < Y . In this dataset, the induction time is the variable of interest and it is an example of the right truncated data. The dataset includes 293 observations, 34 observations aged 0–4 years as child patients, 119 observations aged 5–59 years as adult patients, and 141 observations aged 60 and over as elderly patients. Similar to [9], we also divide the observations into three groups by age. Define ( Z 1 , Z 2 ) = ( 0 , 0 ) for child, ( Z 1 , Z 2 ) = ( 1 , 0 ) for adult, and ( Z 1 , Z 2 ) = ( 0 , 1 ) for elderly patients. We use the PH model and the PO model to investigate the relationship between the induction time and ( Z 1 , Z 2 ) . We transform the time scale as 10 months one unit in the induction time X and Y. The second plot of Figure 1 is the survival curves of the induction time with the Frank copula in the three age groups. Ref. [9] claimed that X and Y were not quasi-independent. Hence, we propose to analyze the data with the Frank copula model [13] to specify the relationship between X and Y.
In the retirement center dataset, τ ^ is 0.39 for the male group and 0.09 for the female group. In the transfusion-related AIDS dataset, τ ^ are 0.13, 0.34, and 0.40 for the child group, the adult group, and the elderly group, respectively. With the 1000 bootstrapping times, Table 9 and Table 10 show the results of the estimation of θ ( θ ^ ), the standard deviation of θ ^ (SD), the 95% confidence interval of θ (95% C.I), D R (DR), and p-value of D R (PV), where the D R statistic is a model selection approach from [14]. According to these tables, we note that all the p-values of D R are larger than 0.05. That is, the PH model and the PO model are both proper for the retirement center dataset and the transfusion-related AIDS dataset. Additionally, we can obtain the better-fitting model in each dataset by using the smallest DR. The PH model is the better-fitting model for the retirement dataset, and the PO model is the better-fitting model for the transfusion-related AIDS dataset. In the retirement center dataset, the 95% confidence interval contains 0, which means that the difference in the lifetime between males and females is not significant. In the transfusion-related AIDS dataset, the difference in the induction time between (i) child and adult patients and (ii) child and elderly patients are both significant, but the difference in the induction time between adult and elderly patients is not significant. Next, we take some examples to explain the covariate effect based on the θ ^ . In the retirement center data, from Table 9 with Method 1 and the PH model, H R ^ = e ( 0.3211 ) = 1.3786 , which means the hazard of death in males is 1.3786 times larger than in females. In the transfusion-related AIDS data, from Table 10 with Method 1 and the PO model, θ ^ = ( 1.9358 , 2.2260 , 0.2902 ) , which means that the failure odds of the AIDS onset in adults is 0.1443 times less than children, the failure odds of the AIDS onset in elderly is 0.1079 times less than children, and the failure odds of the AIDS onset in adults is 1.3367 times larger than the elderly.

6. Conclusions

This paper studies the general regression model, which includes the PH model and the PO model, under dependent truncated data, and applies the copula model to relax the quasi-independence assumption between the truncation time and the failure time. Then, based on [13], we obtain the estimators of F X , S Y , and α . Two proposed methods (the application of the area between two survival curves and the minimization of the norm distance between two survival curves) are used to estimate the regression parameter θ for dependent truncated data. From the simulations, it shows that the performance of the suggested approaches is good and compares the two methods via various simulation settings. When Y is the variable of interest, Method 1 is more appropriate under most situations. When X is the variable of interest, Method 1 is more appropriate under the Frank copula with the PH model. On the contrary, Method 2 is more appropriate under the conditions of (i) the Clayton and Gumbel copula with the PH and the PO model and (ii) the Frank copula with the PO model. Finally, we analyze two actual datasets (the retirement center dataset and the transfusion-related AIDS dataset). In the retirement center dataset, we find that the hazard of death in males is higher than in females, but the difference is not significant, which is the same as the result of the analysis by [20]. In the transfusion-related AIDS dataset, we find that the child patients have shorter induction periods than adult and elderly patients. This conclusion is that the differences in the induction time between (i) child and adult patients and (ii) child and elderly patients are both significant, which is the same as the result of the analysis by [13]. The advantage of the proposed method is that it can be applied to two kinds of data, the right-truncated data and the left-truncated and right-censoring data, under a general regression model, which includes the PH model and the PO model.

Author Contributions

Conceptualization, J.-J.H.; methodology, J.-J.H.; software, J.-J.H. and Y.-J.C.; validation, J.-J.H. and Y.-J.C.; formal analysis, J.-J.H. and Y.-J.C.; investigation, J.-J.H.; resources, J.-J.H.; data curation, J.-J.H. and Y.-J.C.; writing—original draft preparation, J.-J.H. and Y.-J.C.; writing—review and editing, J.-J.H.; visualization, J.-J.H. and Y.-J.C.; supervision, J.-J.H.; project administration, J.-J.H.; funding acquisition, J.-J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science and Technology Council of Taiwan, grant number: MOST 110-2118-M-194-002-MY2.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Turnbull, B.W. The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Stat. Soc. Ser. B 1976, 38, 290–295. [Google Scholar] [CrossRef] [Green Version]
  2. Efron, B.; Petrosian, V. Survival analysis of the gamma-ray burst data. J. Am. Stat. Assoc. 1994, 89, 452–462. [Google Scholar] [CrossRef]
  3. Lagakos, S.W.; Barraj, L.M.; Gruttola, V.D. Nonparametric analysis of truncated survival data, with application to AIDS. Biometrika 1988, 75, 515–523. [Google Scholar] [CrossRef]
  4. Bhattacharya, P.K.; Chernoff, H.; Yang, S.S. Nonparametric estimation of the slope of a truncated regression. Ann. Stat. 1983, 11, 505–514. [Google Scholar] [CrossRef]
  5. Tsui, K.L.; Jewell, N.P.; Wu, C.F.J. A nonparametric approach to the truncated regression problem. J. Am. Stat. Assoc. 1988, 83, 785–792. [Google Scholar] [CrossRef]
  6. Alioum, A.; Commenges, D. A proportional hazards model for arbitrarily censored and truncated Data. Biometrics 1996, 52, 512–524. [Google Scholar] [CrossRef] [PubMed]
  7. Finkelstein, D.M.; Moore, D.F.; Schoenfeld, D.A. A proportional hazards model for truncated AIDS data. Biometrics 1993, 49, 731–740. [Google Scholar] [CrossRef] [PubMed]
  8. Kim, M.; Paik, M.C.; Jang, J.; Cheung, Y.K.; Willey, J.; Elkind, S.V.; Sacco, R.L. Cox proportional hazards models with left truncation and time-varying coefficient: Application of age at event as outcome in cohort studies. Biom. J. 2017, 59, 405–419. [Google Scholar] [CrossRef] [PubMed]
  9. Tsai, W.Y. Testing the assumption of independence of truncation time and failure time. Biometrika 1990, 77, 169–177. [Google Scholar] [CrossRef]
  10. Emura, T.; Wang, W. Semiparametric inference for an accelerated failure time model with dependent truncation. Ann. Inst. Stat. Math. 2016, 68, 1073–1094. [Google Scholar] [CrossRef]
  11. Rennert, L.; Xie, S.X. Cox regression model under dependent truncation. Biometrics 2022, 78, 460–473. [Google Scholar] [CrossRef] [PubMed]
  12. Shen, P.S.; Hsu, H. Conditional maximum likelihood estimation for semiparametric transformation models with doubly truncated data. Comput. Stat. Data Anal. 2020, 144, 106862. [Google Scholar] [CrossRef]
  13. Chaieb, L.L.; Rivest, L.P.; Abdous, B. Estimating survival under a dependent truncation. Biometrika 2006, 93, 655–669. [Google Scholar] [CrossRef]
  14. Hsieh, J.J.; Wang, W.; Ding, A. Regression analysis based on semi-competing risks data. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2008, 70, 3–20. [Google Scholar]
  15. Hsieh, J.J.; Lai, Y.H. Proportional hazard model and proportional odds model under dependent current status data. Master’s Thesis, National Chung Cheng University, Chia-Yi, Taiwan, 2019. [Google Scholar]
  16. Hyde, J. Testing survival under right-censoring and left-truncation. Biometrika 1977, 64, 225–230. [Google Scholar] [CrossRef]
  17. Kalbfleisch, J.D.; Lawless, J.F. Inference based on retrospective ascertainment: An analysis of the data on transfusion-related AIDS. J. Am. Stat. Assoc. 1989, 84, 360–372. [Google Scholar] [CrossRef]
  18. Genest, C.; Rivest, L.P. Statistical inference procedures for bivariate Archimedean copulas. J. Am. Stat. Assoc. 1993, 88, 1034–1043. [Google Scholar] [CrossRef]
  19. Lai, T.L.; Ying, Z. Estimating a distribution function with truncated and censored data. Ann. Stat. 1991, 19, 417–442. [Google Scholar] [CrossRef]
  20. Shen, P.S. A class of rank-based test for left-truncated and right-censored data. Ann. Inst. Stat. Math. 2009, 61, 461–476. [Google Scholar] [CrossRef]
Figure 1. The survival curves for the retirement center data and AIDS data.
Figure 1. The survival curves for the retirement center data and AIDS data.
Axioms 11 00521 g001
Table 1. The estimators of θ under Case 1 with τ = 0.05 when the variable of interest is Y.
Table 1. The estimators of θ under Case 1 with τ = 0.05 when the variable of interest is Y.
Method 1Method 2
n Z CopulaModelBiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100ClaytonPH−0.03230.50060.48640.25160.954−0.02610.56940.55680.32490.952
PO0.02570.65750.70260.43290.9480.01740.67850.71950.46070.944
200ClaytonPH−0.02760.28040.28820.07940.946−0.02520.31960.33400.10280.952
PO−0.01500.45510.50970.20730.972−0.01720.47020.52180.22140.960
100GumbelPH−0.02970.28180.29600.08030.968−0.03560.35640.37260.12830.964
PO−0.02930.46460.46630.21670.938−0.03900.50890.524630.26050.948
200GumbelPH−0.01950.20940.20130.04420.950−0.02640.25620.25070.06630.948
PO−0.00590.34970.34230.12230.950−0.00290.37870.37960.14340.968
Table 2. The estimators of θ under Case 1 when the variable of interest is Y.
Table 2. The estimators of θ under Case 1 when the variable of interest is Y.
Method 1Method 2
n Z Copula τ ModelBiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100Frank0.3PH0.02680.36380.32930.13300.9320.03260.43880.40210.19360.926
PO−0.05200.64030.59250.41270.930−0.05280.65420.60240.43080.930
0.5PH0.03180.29440.29970.08770.9500.03520.33920.35130.11630.956
PO−0.02030.54560.54550.29810.952−0.02320.52260.52140.27370.954
0.7PH−0.03530.20070.23180.04150.970−0.03550.22190.25890.05050.980
PO0.04180.32860.35440.10970.9680.03240.28790.31560.08390.974
200Frank0.3PH0.01850.27960.26920.07850.9380.02000.34340.32680.11830.942
PO0.03900.54670.49420.30040.9320.04180.55440.50230.30910.936
0.5PH−0.00110.22850.24660.05220.9600.00050.26080.28580.06800.964
PO0.01870.41880.40990.17570.9400.01540.39560.38810.15670.946
0.7PH−0.02640.14930.16260.02300.966−0.02170.16610.17890.02810.966
PO0.03240.21330.23700.04660.9720.02600.18580.20420.03520.968
Table 3. The estimators of θ under Case 2 with τ = 0.05 when the variable of interest is Y.
Table 3. The estimators of θ under Case 2 with τ = 0.05 when the variable of interest is Y.
Method 1Method 2
n Z CopulaModel θ BiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100ClaytonPH θ 1 −0.02260.41710.39380.17450.936−0.01410.48030.45580.23090.940
θ 2 −0.01830.40270.39320.16250.936−0.00530.47640.45760.22690.934
PO θ 1 −0.04140.81270.75050.66230.954−0.03560.81310.75440.66240.950
θ 2 −0.00280.76080.73600.57880.954−0.00210.77490.74200.60040.946
200ClaytonPH θ 1 −0.02490.31080.28550.09720.924−0.02660.35570.33030.12720.926
θ 2 −0.03030.30690.28420.09510.922−0.02320.35040.32730.12330.926
PO θ 1 0.00050.43450.45160.18880.9420.00610.44840.45960.20110.934
θ 2 −0.00830.46850.44590.21960.938−0.00230.48610.45590.23630.932
100GumbelPH θ 1 −0.02080.30370.27250.09270.912−0.02840.37230.33680.13940.908
θ 2 −0.05080.30050.28030.09290.932−0.05780.36930.34530.13970.930
PO θ 1 −0.01120.52480.45810.27550.944−0.00140.56660.49240.32100.930
θ 2 −0.03320.50890.46910.26010.940−0.02190.55110.50650.30410.946
200GumbelPH θ 1 −0.02550.22160.19680.04970.926−0.03590.27050.24310.07440.922
θ 2 −0.03160.21810.20260.04850.950−0.03540.26460.25010.07130.950
PO θ 1 0.00380.33870.32200.11470.9440.00680.36160.34710.13080.940
θ 2 −0.01160.33150.32420.11000.942−0.00470.36350.35010.13210.944
Table 4. The estimators of θ under Case 2 when the variable of interest is Y.
Table 4. The estimators of θ under Case 2 when the variable of interest is Y.
Method 1Method 2
n Z Copula τ Model θ BiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100Frank0.3PH θ 1 −0.00530.33740.31280.11390.934−0.01280.41170.37780.16970.920
θ 2 −0.03160.35530.33030.12730.928−0.03780.43250.39740.18850.922
PO θ 1 0.00280.61870.58970.38280.938−0.00010.62470.59610.39020.940
θ 2 0.00810.60180.59530.36220.9480.00470.62220.60270.38720.950
0.5PH θ 1 0.02530.30350.28930.09270.9340.03140.35560.34140.12740.926
θ 2 0.01540.31620.29720.10020.9280.02250.37080.34840.13800.934
PO θ 1 −0.01250.50910.53910.25930.956−0.02270.49510.51160.24570.952
θ 2 0.03780.51610.53050.26770.9480.02750.48900.50260.23990.952
0.7PH θ 1 −0.00660.20020.21120.04010.968−0.00610.22630.23720.05130.962
θ 2 −0.03150.20080.21430.04130.956−0.02890.21780.23900.04830.970
PO θ 1 0.03190.32020.35320.10350.9640.02630.27990.31090.07900.964
θ 2 0.02630.32480.35200.10620.9740.01810.28750.31060.08300.972
200Frank0.3PH θ 1 0.00670.30220.26800.09140.9360.00550.36850.32750.13580.934
θ 2 −0.01260.28890.28210.08360.938−0.01270.35140.34200.12360.944
PO θ 1 −0.00730.50460.46830.25470.930−0.00010.51300.47460.26320.924
θ 2 0.01170.50250.46750.25270.9420.02050.50600.47470.25650.936
0.5PH θ 1 −0.00280.26290.24270.06910.924−0.00610.30930.28400.09570.918
θ 2 0.00560.25700.24670.06610.9260.01000.30110.28620.09080.928
PO θ 1 0.01810.42100.42740.17760.9560.01580.39430.40270.15570.960
θ 2 0.03760.40610.42190.16630.9420.03580.37640.39750.14300.958
0.7PH θ 1 −0.01710.14610.14980.02160.964−0.01980.16080.16710.02630.960
θ 2 −0.01860.14330.15070.02090.966−0.01620.15600.16600.02460.966
PO θ 1 0.02540.21640.23590.04750.9760.02730.18720.20220.03580.972
θ 2 0.02340.23450.23610.05550.9620.01840.19420.20220.03800.972
Table 5. The estimators of θ under Case 1 with τ = 0.05 when the variable of interest is X.
Table 5. The estimators of θ under Case 1 with τ = 0.05 when the variable of interest is X.
Method 1Method 2
n Z CopulaModelBiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100ClaytonPH0.03820.46890.52910.22130.9860.00790.46590.52950.21710.984
PO0.03720.90200.97870.81490.9680.02670.71700.83160.51480.972
200ClaytonPH0.01940.29020.28910.08460.956−0.00310.25570.26650.06540.956
PO0.02590.58890.56950.34750.9300.01470.42250.41390.17870.940
100GumbelPH0.04570.43820.41520.19410.9500.02390.39440.38920.15610.950
PO0.04800.85150.75710.72730.9340.01460.59820.58190.35810.958
200GumbelPH0.05360.33780.30290.11700.9520.02610.28590.26990.08240.936
PO0.01130.67450.61060.45510.932−0.01060.41640.41990.17350.960
Table 6. The estimators of θ under Case 1 when the variable of interest is X.
Table 6. The estimators of θ under Case 1 when the variable of interest is X.
Method 1Method 2
n Z Copula τ ModelBiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100Frank0.3PH0.03310.44160.45700.19610.9500.01790.47790.48740.22870.948
PO−0.02490.76910.81070.59220.948−0.03490.70330.71700.49590.946
0.5PH−0.00630.36800.40480.13550.976−0.03970.42150.45670.17930.970
PO−0.00140.68590.71330.47050.966−0.00640.64650.66210.41800.964
0.7PH0.00160.25390.31950.06440.982−0.05870.28920.38110.08710.982
PO0.00360.46500.51750.21630.972−0.00050.43040.49190.18530.974
200Frank0.3PH0.02530.32090.33090.10360.9580.00310.34310.35530.11770.950
PO−0.00060.69260.65270.47970.934−0.01760.59380.56330.35300.926
0.5PH0.01380.22360.21970.05020.934−0.01150.23540.23440.05560.930
PO−0.00580.52750.52590.27830.9380.00700.49160.47530.24170.938
0.7PH−0.00060.15810.18950.02500.980−0.03050.16720.20530.02890.972
PO0.00600.30610.29430.09380.9300.01430.26400.26630.06990.928
Table 7. The estimators of θ under Case 2 with τ = 0.05 when the variable of interest is X.
Table 7. The estimators of θ under Case 2 with τ = 0.05 when the variable of interest is X.
Method 1Method 2
n Z CopulaModel θ BiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100ClaytonPH θ 1 0.03960.49000.51810.24170.9860.00800.48240.52250.23280.988
θ 2 0.03830.49410.52040.24560.9740.01140.49310.51870.24320.974
PO θ 1 −0.01810.90710.98430.82310.968−0.01860.68480.81960.46930.984
θ 2 0.01400.98381.01890.96800.9680.00610.78760.83550.62030.980
200ClaytonPH θ 1 0.03580.32470.31440.10670.9560.00410.28930.28870.08370.952
θ 2 0.02900.31920.31140.10270.9400.00280.29220.28850.08540.938
PO θ 1 −0.00690.61390.58030.37690.9400.00270.42340.41860.17930.944
θ 2 −0.02570.62270.57640.38840.922−0.03140.41800.41440.17570.930
100GumbelPH θ 1 0.01360.42930.40030.18450.932−0.01040.38680.37490.14970.940
θ 2 0.04290.43880.40900.19440.9360.01330.39380.38210.15520.946
PO θ 1 −0.02740.86310.76590.74570.920−0.01970.58230.56480.33950.942
θ 2 0.02540.83610.76530.69970.9420.00970.58120.58420.33790.964
200GumbelPH θ 1 0.01050.31030.29640.09640.956−0.00700.26110.26550.06820.960
θ 2 0.00330.29870.29260.08920.962−0.01260.25110.26330.06320.958
PO θ 1 0.00350.70220.61430.49310.9400.00230.41350.40720.17100.952
θ 2 0.03960.70820.63430.50310.9400.00490.42380.42730.17970.962
Table 8. The estimators of θ under Case 2 when the variable of interest is X.
Table 8. The estimators of θ under Case 2 when the variable of interest is X.
Method 1Method 2
n Z Copula τ Model θ BiasEmpSdAveSdMSECPBiasEmpSdAveSdMSECP
100Frank0.3PH θ 1 0.02200.44410.46290.19770.950−0.01040.49070.49640.24090.950
θ 2 0.00370.44080.45890.19430.954−0.02610.47490.49460.22620.956
PO θ 1 −0.01120.86130.82310.74200.948−0.01970.73540.70790.54120.944
θ 2 0.00800.83130.81180.69110.952−0.00150.73630.70930.54210.940
0.5PH θ 1 −0.00780.33740.39860.11390.974−0.03820.38560.45670.15010.972
θ 2 −0.01760.34600.39820.12010.974−0.05020.38540.45000.15110.970
PO θ 1 −0.04840.75560.71700.57330.930−0.01650.71680.66740.51410.940
θ 2 −0.02160.73570.71590.54180.938−0.02120.68860.66070.47460.940
0.7PH θ 1 −0.00930.24720.30850.06120.974−0.05410.29350.37060.08910.980
θ 2 −0.00640.24470.31030.05990.980−0.05770.28910.36840.08690.984
PO θ 1 −0.04900.47020.51880.22340.966−0.03100.44290.49260.19710.968
θ 2 −0.05050.53250.51640.28610.944−0.04250.49220.49150.24410.952
200Frank0.3PH θ 1 0.01210.31280.31220.09800.958−0.00420.34190.33700.11690.940
θ 2 0.02580.30910.31640.09620.9480.00520.33000.33890.10890.948
PO θ 1 0.00210.72550.69640.52640.9500.00930.61420.59520.37730.938
θ 2 −0.01300.72260.68860.52230.944−0.01600.60810.58980.37000.946
0.5PH θ 1 0.02170.20800.21080.04370.9400.00130.22840.22790.05220.934
θ 2 0.00820.20680.21040.04280.938−0.01620.22120.22670.04920.940
PO θ 1 0.03570.55540.52350.30980.9260.02340.48320.46980.23400.932
θ 2 0.04180.51510.51830.26710.9480.01510.46120.46370.21290.940
0.7PH θ 1 0.00470.16330.18810.02670.974−0.02880.17470.20070.03140.964
θ 2 0.00120.15620.18450.02440.974−0.02860.17220.19890.03050.964
PO θ 1 0.00140.37300.38170.13910.9420.00360.32790.34360.10750.948
θ 2 0.01590.37080.38270.13770.9480.01190.31140.34390.09710.950
Table 9. The analysis for the retirement center data.
Table 9. The analysis for the retirement center data.
Method 1Method 2
CopulaModel θ ^ SD95% C. IDRPV θ ^ SD95% C. IDRPV
FrankPH0.32110.4908−0.64081.28310.16510.6690.17490.4310−0.66991.01970.16510.659
PO0.51550.6694−0.79651.82750.16950.5380.30110.5881−0.85161.45380.16950.553
Table 10. The analysis for the AIDS data.
Table 10. The analysis for the AIDS data.
Method 1Method 2
CopulaModel θ ^ SD95% C. IDRPV θ ^ SD95% C. IDRPV
FrankPH θ 1 −1.00650.4244−1.8383−0.1746 −1.23220.4709−2.1552−0.3092
θ 2 −1.19410.3715−1.9222−0.46590.26610.823−1.39280.4052−2.1870−0.59860.34880.728
θ 1 θ 2 0.18760.4918−0.77631.1515 0.16060.5246−0.86761.1888
PO θ 1 −1.93580.7267−3.3601−0.5115 −2.17720.7509−3.6489−0.7054
θ 2 −2.22600.6684−3.5361−0.91590.22540.918−2.38580.7037−3.7651−1.00650.26280.804
θ 1 θ 2 0.29020.7392−1.15861.7390 0.20860.7053−1.17371.5909
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hsieh, J.-J.; Chen, Y.-J. Proportional Hazard Model and Proportional Odds Model under Dependent Truncated Data. Axioms 2022, 11, 521. https://doi.org/10.3390/axioms11100521

AMA Style

Hsieh J-J, Chen Y-J. Proportional Hazard Model and Proportional Odds Model under Dependent Truncated Data. Axioms. 2022; 11(10):521. https://doi.org/10.3390/axioms11100521

Chicago/Turabian Style

Hsieh, Jin-Jian, and Yun-Jhu Chen. 2022. "Proportional Hazard Model and Proportional Odds Model under Dependent Truncated Data" Axioms 11, no. 10: 521. https://doi.org/10.3390/axioms11100521

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop