Next Article in Journal
Differentially Private Sparse Covariance Matrix Estimation under Lower-Bounded Moment Assumption
Next Article in Special Issue
Optimal Grouping of Dependent Components in Parallel-Series and Series-Parallel Systems with Independent Subsystems Equipped with Starting Devices
Previous Article in Journal
Macroeconomic Effects of Maritime Transport Costs Shocks: Evidence from the South Korean Economy
Previous Article in Special Issue
An Approach to Integrating a Non-Probability Sample in the Population Census
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantile Regression Based on the Weighted Approach with Dependent Truncated Data

Department of Mathematics, National Chung Cheng University, Chia-Yi 621301, Taiwan
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(17), 3669; https://doi.org/10.3390/math11173669
Submission received: 24 July 2023 / Revised: 21 August 2023 / Accepted: 22 August 2023 / Published: 25 August 2023
(This article belongs to the Special Issue Nonparametric Statistical Methods and Their Applications)

Abstract

:
This paper discusses the estimation of parameters in the quantile regression model for dependent truncated data. To account for the dependence between the survival time and the truncated time, the Archimedean copula model is used to construct the association. The parameters of the Archimedean copula model are estimated using certain existing approaches. An inference procedure based on a weighted approach is proposed, where the weights are set according to the variables of interest in the quantile regression model. The finite sample performance of the proposed approach is examined through simulations, and the method is applied to analyze two real datasets: the transfusion-related AIDS dataset and the retirement community center dataset.

1. Introduction

Survival analysis is a statistical method used to analyze the time duration until an event of interest occurs. In the context of biomedicine, data on the occurrence of the event of interest are often incomplete, which are caused by censored or truncated data. Recently, the use of the quantile regression model [1] has become increasingly important in survival analysis. This is because it can provide robust inference and the quantile covariate effect on the interested event time. The first investigation of quantile regression under censored data was conducted in [2,3], which studied quantile regression analysis under a fixed censoring mechanism. Since then, many researchers have studied the use of quantile regression under different censoring mechanisms. For example, refs. [4,5,6,7,8,9,10,11] studied the use of quantile regression under conditional independent right censoring mechanisms. For competing risks data, ref. [12] investigated parameter estimation for quantile regression, and ref. [13] studied the confidence set problem. Ref. [14] discussed the quantile regression model under dependently censored data.
Previously, numerous studies have explored truncated datasets using the independent assumption. For instance, ref. [15] investigated the product-limit estimator of the survival curve for right-censored and left-truncated data. However, it is important to note that the assumption of quasi-independence may not always be appropriate for some situations. For example, ref. [16] rejected the quasi-independence assumption when analyzing the transfusion-related AIDS dataset using a conditional Kendall’s tau test. Thus, in this paper, we discuss the quantile regression model under dependent truncated data. This paper applied the copula model to specify the relationship between the survival time and the truncated time. Then, by [17], we apply the copula-graphic approach and U-statistics to construct the survival function estimation and the association parameter estimator. Referred to [18], which applied the weighted approach to estimate the quantile regression parameter for the semicompeting risks data, we extend the weighted method to every observation and construct the quantile loss function for the dependent truncated data with a quantile regression model to estimate the quantile regression parameter, which provides the quantile covariate effect on the interested event time.
The rest of this paper is organized as follows: Section 2 introduces the concepts of right-truncated and left-truncated data, as well as the copula model and the quantile regression model. In Section 3, we present a weighted approach for estimating the parameters of the quantile regression model. Section 4 describes the simulation studies conducted to evaluate the proposed method in finite samples. In Section 5, we apply our proposed method to analyze two real datasets. Finally, Section 6 presents the conclusion of this paper.

2. Data and Models

2.1. Truncated Data

Truncated data values refer to a situation in survival analysis where some values in a dataset are removed because they are truncated below or above a certain value. Truncation can occur when a researcher selects a study sample based on certain criteria, such as age or disease stage, and only includes individuals who meet those criteria. In our analysis, we are interested in pairs of variables ( X i , Y i ) that satisfy the condition Y i > X i when the data are truncated. If our focus is on the variable X, then we are dealing with right-truncated data. For instance, in their study of AIDS data, ref. [19] analyzed right-truncated data. On the other hand, if we are interested in the variable Y, then we are working with left-truncated data. For example, ref. [20] studied the retirement community data using left-truncated data.
In Section 5, we will analyze two different types of data. The AIDS data are an example of right-truncated data, which can be represented as { ( X i , Y i ) : Y i > X i , i = 1 , , n } . In this case, X represents the induction time, while Y represents the time between the start of an infection and the end of the study. On the other hand, the retirement community data are an example of left-truncated data. In this case, X is defined as the entry age, and Y is defined as the length of lifetime. We introduce a censoring variable, denoted by C, and define T = min ( Y , C ) and δ = I ( T C ) . The dataset can then be represented as { ( X i , T i , δ i ) : T i > X i , i = 1 , , n } . In both examples, X and Y may be correlated.

2.2. Semisurvival Copula Model

A copula is a mathematical function that enables us to model the dependence structure between random variables independently of their marginal distributions. When dealing with pairs of variables that satisfy Y > X within the context of truncated data, as detailed in [17,18], we can represent the joint distribution of ( X , Y ) using the semisurvival copula model as follows:
π ( x , y ) = P r ( X x , Y > y | Y > X ) = 1 c C α { ( F X ( x ) , S Y ( y ) } , ( y x ) ,
where α is an association parameter that is related to Kendall’s tau ( τ ) of ( X , Y ) , C α is a copula function, c = P r ( Y > X ) , and F X ( x ) and 1 S Y ( y ) are the cumulative distribution functions of X and Y. According to [21], the Archimedean copula (AC) model has the following form:
C α ( x , y ) = ϕ α 1 ( ϕ α ( x ) + ϕ α ( y ) ) ,
where ϕ α is a strictly decreasing function defined on [0,1] and ϕ α ( 1 ) = 0 .
With covariates Z, the joint distribution of ( X , Y ) can be written using the Archimedean copula model as
π ( x , y Z ) = P r ( X x , Y > y Y > X , Z = z ) = 1 c z ϕ α z 1 { ϕ α z ( F X | Z ( x z ) ) + ϕ α z ( S Y | Z ( y z ) ) } , ( y x ) ,
where Z is a ( p + 1 ) × 1 discrete explanatory variable vector, and c z = P r ( Y > X | Z = z ) . There exist two well-known Archimedean copulas. The first is the Clayton copula, which can be expressed as
ϕ α ( t ) = ( t α 1 ) / α , α ( 0 , ) .
The second is the Frank copula, which can be written as
ϕ α ( t ) = log ( 1 e α 1 e α t ) , α ( , ) { 0 } .

2.3. Quantile Regression Models

We have a discrete explanatory variable vector Z of size ( p + 1 ) × 1 and γ that ranges between 0 and 1. Let X be a random variable representing the conditional quantile, and we define it as follows:
ξ γ ( h ( X ) Z ) = i n f { x : P r ( h ( X ) x Z ) γ } ,
where h ( · ) is a monotonic increasing function. Under the quantile regression model, based on the linear link of ξ γ ( h ( X ) Z ) and Z, we can establish the following model:
ξ γ ( h ( X ) Z ) = Z T β ( γ ) , γ ( 0 , 1 ) ,
This model is equal to the following probability formula:
P r ( h ( X ) Z T β ( γ ) 0 Z ) = γ ,
where the regression parameter β ( γ ) represents the effect of the explanatory variable on X under the γ quantile.
Under the same conditions, let Y be a random variable representing the conditional quantile, and it is defined as follows:
ξ γ ( h ( Y ) Z ) = i n f { y : P r ( h ( Y ) > y Z ) 1 γ } ,
and the quantile regression model can be established as
ξ γ ( h ( Y ) Z ) = Z T β ( γ ) , γ ( 0 , 1 ) .
Thus, we have
P r ( h ( Y ) Z T β ( γ ) > 0 Z ) = 1 γ .
Here, the regression parameter β ( γ ) represents the effect of the explanatory variable on Y under the γ quantile.

3. The Proposed Inference Procedure

The primary objective of this section is to conduct a parameter estimation of β ( γ ) using quantile regression within a weighted approach, considering truncated data. Throughout this process, several estimations need to be computed, namely, α , c, S y , and F x . The estimation of β ( γ ) with the variable of interest, X, is discussed in Section 3.1 and Section 3.2. Similarly, the estimation of β ( γ ) with the variable of interest, Y, is presented in Section 3.3 and Section 3.4.

3.1. Estimations of Survival Function and Copula Parameter

Ref. [17] described the method for calculating the survival function and cumulative distribution function using truncated data. In the upper wedge, ( X , Y ) is a point that satisfies y x . Define R ( x , y ) = i = 1 n 1 { X i x , Y i y } , and an estimator of π ( x , y ) can be expressed as
π ^ ( x , y ) = R ( x , y ) n .
R ˜ ( x ) = R ( x , x ) is risk set at time x. Ref. [17] supposed that there are 2 n different data values ( x 1 , , x n , y 1 , , y n ) , and the copula parameter α is known. In the estimation of (8), if x = y = t , the estimating equation of F X ( t ) and S Y ( t ) is
c R ˜ ( t ) n = ϕ α 1 [ ϕ α { S ^ Y ( t ) } + ϕ α { F ^ X ( t ) } ] ,
where S ^ Y is a decreasing right-continuous survival function with jumps at y 1 , , y n and F ^ X is an increasing right-continuous distribution function with jumps at x 1 , , x n .
Through a series of mathematical derivations in their paper, the following estimation formulas can be derived:
ϕ α { S ^ Y ( t ) } = y i < t ϕ α c R ˜ ( y i ) n ϕ α c R ˜ ( y i ) 1 n ,
ϕ α { F ^ X ( t ) } = x i > t ϕ α c R ˜ ( x i ) n ϕ α c R ˜ ( x i ) 1 n .
Next, substituting Equations (10) and (11) into Equation (9) yields the estimation equation for c:
y i < t ϕ α c R ˜ ( y i ) n ϕ α c R ˜ ( y i ) 1 n + x i > t ϕ α c R ˜ ( x i ) n ϕ α c R ˜ ( x i ) 1 n + ϕ α c R ˜ ( t ) n = 0 .
The Equation (12) mentioned above is independent of the value of t. In the absence of any values in which x > x 0 and t = x 0 , Equation (12) simplifies to
H 1 ( α , c ) = y i < x 0 ϕ α c R ˜ ( y i ) n ϕ α c R ˜ ( y i ) 1 n + ϕ α c R ˜ ( x 0 ) n = 0 .
If the parameter α is unknown, an additional estimation equation is necessary to estimate both α and c. The details of this second estimation equation will be explained in the subsequent discussion. As described by [17], the second estimation equation for the parameters α and c is formulated as follows:
H 2 ( α , c ) = 1 n 2 i < j 1 { A i , j } 1 ( X i X j ) ( Y i Y j ) > 0 1 θ α { c R ( X ˜ i , j , Y ˜ i , j ) / n } + 1 = 0 ,
where A i , j = { min ( Y i , Y j ) > max ( X i , X j ) } , which is the comparable set for ( X i , Y i ) and ( X j , Y j ) , X ˜ i , j = max ( X i , X j ) , and Y ˜ i , j = min ( Y i , Y j ) . Hence, α ^ and c ^ are the solution of
H ( α , c ) = H 1 ( α , c ) H 2 ( α , c ) = 0 0 .

3.2. The Estimation of β ( γ ) for Right-Truncated Data

In this section, our objective is to investigate the estimation of β ( γ ) in model (6) using right truncated data. When X is completed data, the estimation of β ( γ ) can be obtained by minimizing the following objective function:
S 1 * ( b ) = i ρ γ [ h ( X i ) Z i T b ] ,
where ρ γ is the quantile loss function and is defined as ρ γ ( u ) = u [ γ I ( u < 0 ) ] . However, when X is right-truncated by Y, the objective function, S * ( b ) , is not valid. Here, we propose the use of inverse untruncated probability weights to address this issue. For the subject i with x i and z i , the weight under the AC model is established as follows:
W i = P r ( X = x i Z = z i ) P r ( X = x i , Y > x i Z = z i ) = P r ( Y > x i X = x i , Z = z i ) 1 = 1 ϕ α z 1 [ ϕ α z { F X Z ( x i z i ) } + ϕ α z { S Y Z ( x i z i ) } ] · ϕ α z ( F X Z ( x i z i ) ) .
Thus, the objective function for right-truncated data is constructed as
S 1 ( b ) = i W ^ i × ρ γ [ h ( X i ) Z i T b ] ,
where W ^ i is an estimator of W i in (15) with α ^ z , F ^ X Z ( x i z i ) , and S ^ Y Z ( x i z i ) from Section 3.1. Next, minimize the above objective function with respect to b to obtain the estimator of β ( γ ) .

3.3. Extension to Right-Censored Data

This section aims to address the topic of left-truncated and right-censored data. These types of observations are characterized by both independent right censoring and left truncation. Let C be the censoring variable, and T = min ( Y , C ) , δ = I ( Y C ) , so the dataset is of the form { ( X i , T i , δ i ) : T i > X i , i = 1 , , n } . Model (3) becomes
π ( x , y ) = P r ( X x , T > y ) = S C ( y ) ϕ α 1 [ ϕ α F X ( x ) + ϕ α S Y ( y ) ] ,
where S C is the survival function of C and the estimator of S C is written as
S ^ C ( t ) = w t 1 I ( T j = w , 1 δ j = 1 , X j < w ) I ( T j w , X j < w ) I i = 1 n I ( T j w , X j < w ) m n α ,
where α = 1/4 and m = 1 are given by [22] and n is the number of samples.
Let R ( x , y ) = i = 1 n 1 { X i x , T i y } and R ˜ ( x ) = R ( x , x ) . By [17], Equation (9) becomes
ϕ α c R ˜ ( t ) n S ^ C ( t ) = ϕ α { F ^ X ( t ) } + ϕ α { S ^ Y ( t ) } ;
then the estimators for S Y and F X are rewritten as
ϕ α { S ^ Y ( t ) } = y i t , δ i = 1 ϕ α c R ˜ ( y i ) n S ^ C ( y i ) ϕ α c R ˜ ( y i ) 1 n S ^ C ( y i ) ,
ϕ α { F ^ X ( t ) } = x i > t ϕ α c R ˜ ( x i ) n S ^ C ( x i ) ϕ α c R ˜ ( x i ) 1 n S ^ C ( x i ) ,
and Equation (13) becomes
H 1 ( α , c ) = y i < x 0 , δ i = 1 ϕ α c R ˜ ( y i ) n S ^ C ( y i ) ϕ α c R ˜ ( y i ) 1 n S ^ C ( y i ) + ϕ α c R ˜ ( x 0 ) n S ^ C ( x 0 ) = 0 .
According to the above conditions, Equation (14) becomes
H 2 ( α , c ) = 1 n 2 i < j 1 { B i , j } 1 { ( X i X j ) ( T i T j ) > 0 } θ α c R ( X ˜ i , j , T ˜ i , j ) n S ^ C ( T ˜ i , j ) + 1 1 = 0 ,
where
B i , j = { min ( T i , T j ) max ( X i , X j ) } { { δ i × δ j = 1 } { δ i × 1 { T i T j < 0 } = 1 } { δ j × 1 { T j T i < 0 } = 1 } } ,
X ˜ i , j = max ( X i , X j ) , and T ˜ i , j = min ( T i , T j ) .

3.4. The Estimation of β ( γ ) for Left-Truncated and Right-Censored Data

In this section, our objective is to investigate the estimation of β ( γ ) in model (7) under the scenario of left-truncated and right-censored data, denoted as { ( X i , T i , δ i ) : T i > X i , i = 1 , , n } . When Y represents complete data, the estimation of β ( γ ) can be obtained by minimizing the following objective function:
S 2 * ( b ) = i ρ γ [ h ( Y i ) Z i T b ] .
However, in the case where Y is left-truncated by X and right-censored by C, the objective function S 2 * ( b ) is not applicable or valid for estimation purposes.
When δ = 1 , it indicates that Y is left-truncated by X alone. In this case, we need to address the truncation probability and incorporate the inverse untruncated probability weight. The weight in the Archimedean copula model can be expressed as follows:
W i t = P r ( Y = y i Z = z i ) P r ( X y i , Y = y i Z = z i ) = P r ( X y i Y = y i , Z = z i ) 1 = 1 ϕ α z 1 [ ϕ α z { F X Z ( y i z i ) } + ϕ α z { S Y Z ( y i z i ) } ] · ϕ α z ( S Y Z ( y i z i ) ) .
When δ = 0 , it indicates that Y is left-truncated by X and right-censored by C. This means that Y is greater than C and X, i.e., Y > C and Y > X . To begin with, we need to determine whether C < Y h 1 ( Z T β ( γ ) ) or Y > h 1 ( Z T β ( γ ) ) . Therefore, we consider the following proportion weight:
W i c = P r ( h ( Y ) > Z i T β ( γ ) h ( X ) = h ( x i ) , Z = z i ) P r ( h ( Y ) > h ( c i ) h ( X ) = h ( x i ) , Z = z i ) = ϕ α z 1 [ ϕ α z { F X Z ( x i z i ) } + ϕ α z ( 1 γ ) ] ϕ α z 1 [ ϕ α z { F X Z ( x i z i ) } + ϕ α z { S Y Z ( c i z i ) } ] .
Under the condition C < Y h 1 ( Z T β ( γ ) ) or Y > h 1 ( Z T β ( γ ) ) , we employ inverse observable probability weights to handle the truncation and censoring. Specifically, for the condition C < Y h 1 ( Z T β ( γ ) ) , the inverse observable probability weight is calculated as follows:
W i A ( β ( γ ) ) = P r ( Y > X C i < Y h 1 ( Z i T β ( γ ) ) , Z = z i ) 1 = P r ( C i < Y h 1 ( Z i T β ( γ ) ) , Y > X Z = z i ) P r ( C i < Y h 1 ( Z i T β ( γ ) ) Z = z i ) 1 = γ S Y Z ( c i z i ) c i + h 1 ( Z i T β ( γ ) ) ϕ α z 1 [ ϕ α z { F X Z ( y i z i ) } + ϕ α z { S Y Z ( y i z i ) } ] · ϕ α z ( S Y Z ( y i z i ) ) · f Y Z ( y i z i ) d y .
For the condition Y > h 1 ( Z T β ( γ ) ) , the inverse observable probability weight is
W i B ( β ( γ ) ) = P r ( Y > X Y > h 1 ( Z i T β ( γ ) ) , Z = z i ) 1 = P r ( Y > h 1 ( Z i T β ( γ ) ) , Y > X Z = z i ) P r ( Y > h 1 ( Z i T β ( γ ) ) Z = z i ) 1 = 1 γ h 1 ( Z i T β ( γ ) ) ϕ α z 1 [ ϕ α z { F X Z ( y i z i ) } + ϕ α z { S Y Z ( y i z i ) } ] · ϕ α z ( S Y Z ( y i z i ) ) · f Y Z ( y i z i ) d y .
For left-truncated and right-censored data, the objective function is constructed based on W t , W c , W A , and W B . The objective function can be formulated as follows:
S 2 ( b ) = δ i = 1 W ^ i t × ρ γ [ h ( Y i ) Z i T b ] + δ i = 0 ( 1 W ^ i c ) × W ^ i A ( b ) × ρ γ [ h ( C i ) Z i T b ] + W ^ i c × W ^ i B ( b ) × ρ γ [ Y * Z i T b ] ,
where Y * is a large-enough value and ρ γ is the quantile loss function and defined as ρ γ ( t ) = t [ γ I ( t < 0 ) ] . Note that W ^ t , W ^ c , W ^ A , and W ^ B are the estimators of W t , W c , W A , and W B in (21)–(24) with α ^ z , F ^ X Z ( y i z i ) , and S ^ Y Z ( y i z i ) from Section 3.3 and f ^ Y Z ( y i z i ) = S ^ Y Z ( y i z i ) S ^ Y Z ( y i z i ) . Then, minimize the above objective function, S 2 ( b ) , with respect to b to obtain the estimator of β ( γ ) .
From [17], the large sample properties of α ^ z , F ^ X Z ( x i z i ) and S ^ Y Z ( x i z i ) can be obtained. Subsequently, by referring to Appendixes 1, 2, and 3 in [18], the consistency and asymptotic normality properties of β ^ ( γ ) in Section 3.2 and Section 3.4 can be established.

3.5. Bootstrap Approach

Because the standard deviation of β ^ ( γ ) is difficult to estimate, we employ the bootstrap approach to address this issue. This method entails repeatedly sampling data with replacement from the original dataset. In the case of right-truncated data, the resampled data are obtained by
{ ( X i * , Y i * , Z i * ) : Y i * > X i * , i = 1 , , n } .
On the other hand, when dealing with left-truncated and right-censored data, the resampling data are obtained through the following form:
{ ( X i * , T i * , δ i * , Z i * ) : T i * > X i * , i = 1 , , n } .
From Section 3.1 and Section 3.3, we can derive estimators for α ^ , c ^ , S ^ Y | Z ( y | z ) , and F ^ X | Z ( x | z ) based on the bootstrap resampling data for both right-truncated data and left-truncated and right-censored data. Subsequently, we employ the methods outlined in Section 3.2 and Section 3.4 to obtain an estimator for β ( γ ) . To achieve this, we repeat the bootstrap process a total of B times, resulting in a set of estimators denoted as { β ^ b ( γ ) : b = 1 , , B } . Thus, the estimates of variance and standard deviation of β ^ ( γ ) are presented as
V a r ^ = 1 B 1 b = 1 B ( β ^ b ( γ ) β ¯ ( γ ) ) 2 , S D ^ = V a r ^ ,
and the 95 % confidence interval for β ( γ ) is expressed as
β ^ ( γ ) Z 0.975 × S D ^ ,
where β ¯ ( γ ) is the mean of { β ^ b ( γ ) : b = 1 B } , Z 0.975 = Φ 1 ( 0.975 ) 1.96 , and Φ ( · ) is the cumulative distribution function of the standard normal distribution.

4. Simulation Studies

By employing the research methodology outlined in Section 3, our objective is to conduct a finite sample simulation to evaluate the accuracy of the proposed estimations in this section. The simulation results are categorized into two parts: the first part concentrates on the interest variable X, whereas the second part focuses on the interest variable Y. Each part is discussed separately within the framework of truncated data in the quantile regression model.
First, for the settings, the quantile regression model is
l o g ( X ) = β 0 ( γ ) + β 1 ( γ ) Z + ϵ γ ,
where Z = 0 or 1, which is a grouping variable. The parameter values are set as ( β 0 ( γ ) , β 1 ( γ ) ) = ( 1 , 0.5 ) , ϵ γ U ( 3 γ , 3 3 γ ) , P r ( ϵ γ 0 ) = γ , and Y exp ( λ 1 ) when Z = 0 , Y exp ( λ 2 ) when Z = 1 . The simulation results obtained using the Clayton copula model and the Frank copula model are presented in Table A1, Table A2, Table A3 and Table A4 of Appendix A. We consider Kendall’s τ = 0.05 for the Clayton copula and τ = 0.3 , 0.5 , 0.7 for the Frank copula. We use the quantile γ = 0.1 , 0.3 , 0.5 , with a sample size of n = 100 or 200. The simulation is run 500 times, and bootstrapping is performed 100 times. Next, we report five measures for evaluating the parameter estimation: the bias of the β 0 ( γ ) and β 1 ( γ ) estimations (Bias), the empirical standard deviation (EMPSD), the average standard deviation estimated using the bootstrapping method (AVESD), the mean square error (MSE), and the coverage probability of the 95 % confidence interval (CP).
Second, for the settings, the quantile regression model is the form as
l o g ( Y ) = β 0 ( γ ) + β 1 ( γ ) Z + ϵ γ ,
where Z is a cluster variable that can take the values 0 or 1. The parameter values are set as ( β 0 ( γ ) , β 1 ( γ ) ) = ( 1 , 0.5 ) , ϵ γ U ( 3.5 γ , 3.5 3.5 γ ) , and P r ( ϵ γ 0 ) = γ . Furthermore, the distribution of X can be expressed as X e x p ( λ 1 ) if Z = 0 and X e x p ( λ 2 ) if Z = 1 . The censoring time is the form as C U ( 0 , k 1 ) if Z = 0 and C U ( 0 , k 2 ) if Z = 1 , where k 1 = 12 , 14 , 15 and k 2 = 8 , 10 , 12 .
From Table A1, Table A2, Table A3 and Table A4 of Appendix A, it is evident that the bias of β ^ 0 ( γ ) and β ^ 1 ( γ ) is small, and the empirical standard deviation (EMPSD) is close to the average standard deviation (AVESD). When comparing the sample sizes of 100 and 200, the mean square error (MSE) is smaller in the latter case. The adjustment of the parameters λ 1 , λ 2 , k 1 , and k 2 enables control over the truncated and censored rates. Additionally, as the quantile γ increases, both the EMPSD and MSE tend to increase. Finally, the coverage probability of the 95 % confidence interval is approximately 0.95.

5. Real Data Analysis

In this section, we aim to analyze two distinct real datasets using the Frank copula in order to apply our proposed method.

5.1. Transfusion-Related AIDS Data

We know the data that defines the induction time (months) X, infection time (months) T, age (years), and number of samples, which is 293. We have the right truncation time Y = 102 T , and it is known that Y X . These data were studied by [19]. In the sample of 293 patients, who were infected with AIDS between January 1978 and July 1986, we conducted our research using right-truncated data ( Y > X ). The dataset includes four observations, where Y = X . To account for this, we added random values from a uniform distribution U ( 0 , 0.1 ) to both X and Y. Furthermore, we plan to divide the patients’ ages into two groups: those younger than 13 years, denoted as Z = 0 , and those older than 13 years, denoted as Z = 1 . There are 36 patients in the group younger than 13 years and 257 patients in the group older than 13 years.
For the two groups, Figure 1 presents the estimations of the cumulative distribution functions of the induction time, F ^ X | Z ( x | z ) . The figure clearly indicates that adults have longer induction periods compared with children, and the difference between the two groups is obvious. Next, we will consider the quantile regression model for these data, which takes the following form:
ξ γ ( l o g ( X ) Z ) = β 0 ( γ ) + β 1 ( γ ) Z .
Here, the quantile is denoted as γ , with values of γ = 0.1 , 0.3 , 0.5 . We will perform a bootstrap procedure with 1000 iterations, and estimate the standard deviation and a 95 % confidence interval. The estimation results are presented in Table 1. For γ = 0.1 , e x p ( β ^ 1 ( γ ) ) = e x p ( 0.6931 ) = 1.9999 ; for γ = 0.3 , e x p ( β ^ 1 ( γ ) ) = e x p ( 0.0823 ) = 2.2308 ; and for γ = 0.5 , e x p ( β ^ 1 ( γ ) ) = e x p ( 0.5534 ) = 1.7392 . The 10 % quantile of the induction period for adults is 1.9999 times longer than that for children. Similarly, the 30 % quantile of the induction periods’ time for adults is approximately 2.2308 times longer than that for children. Furthermore, the 50 % quantile of the induction periods’ time for adults is around 1.7392 times longer than that for children. These findings are consistent with the observations from Figure 1. Additionally, when examining the 95 % confidence intervals of β 1 ( γ ) in Table 1, we observe that they do not encompass the value 0. This indicates that the obtained results are statistically significant.

5.2. Retirement Community Center Data

The data are left-truncated and right-censored and consist of 462 samples. Among these samples, there are 365 females (with 235 being censored) and 97 males (with 51 being censored). The data include the entry time (in months) to the retirement center (X), the exit time (in months) from the retirement center (T), and the censoring indicator ( δ ). In this context, δ = 0 indicates that the observation was censored, while δ = 1 indicates that the observation was not censored. Furthermore, the data used in this study were collected by [20] from 1964 to 1 July 1975. We categorized the 462 samples into two groups based on sex, where Z = 0 represents females and Z = 1 represents males. Considering the large values of X and T, we transformed them into units of 10 months each.
Figure 2 illustrates the estimations of the survival function of Y (life time), denoted as S ^ Y | Z ( y | z ) , where the survival function for the female group closely resembles that of the male group. Similarly, we employ the quantile regression model to express the relationship within this dataset as follows:
ξ γ ( l o g ( Y ) Z ) = β 0 ( γ ) + β 1 ( γ ) Z .
After running the bootstrap procedure 1000 times, we can present the following table, which provides the estimates of the quantile regression parameters along with their corresponding standard deviations (SD) and 95% confidence intervals (CI). From Table 2, we examine whether the results match the figure above. First, we compute the three quantiles as follows: e x p ( β ^ 1 ( 0.1 ) ) = e x p ( 0.0312 ) = 0.9693 , e x p ( β ^ 1 ( 0.3 ) ) = e x p ( 0.0211 ) = 0.9791 , and e x p ( β ^ 1 ( 0.5 ) ) = e x p ( 0.0115 ) = 0.9886 . All three values are close to 1, indicating that the results for both females and males can be interpreted as being similar. Additionally, the 95% confidence interval of β 1 ( γ ) includes 0, suggesting that the result is not statistically significant.

6. Concluding Remarks

In this paper, we explore the concepts of quantile regression models and truncated data. We begin by introducing the quantile regression model and discussing its relevance. Next, we demonstrate how the Archimedean copula can be utilized to establish the relationship between variables ( X , Y ) and construct the quantile regression model, particularly when either X or Y is of interest. Section 3 focuses on estimating parameters, namely α , c, S Y , and F X , under truncated data, following the approach proposed by [17]. Furthermore, we introduce a weighted approach to estimate β ( γ ) in the quantile regression model, where we minimize an objective function. To assess the reliability of the model parameter estimators, we employ the bootstrap approach to compute the variance and standard deviation. In Section 4, we present simulation results for two scenarios: when X is of interest and when Y is of interest. For each scenario, we utilize Clayton copula and Frank copula models and evaluate their performance.
In the two real data analyses conducted, we examined the transfusion-related AIDS data and made an interesting observation: adult patients had longer induction periods compared with child patients. This finding aligns with the results reported in [17] analysis of real data. Furthermore, in the analysis of the retirement community center data, we investigated the impact of gender on the lifetime of individuals in the retirement center. Our findings revealed that gender did not have a significant effect on the lifetime. This outcome is consistent with the findings reported by [23] in a separate analysis.

Author Contributions

Methodology, J.-J.H.; Software, J.-J.H. and C.-C.H.; Formal analysis, J.-J.H.; Investigation, J.-J.H.; Data curation, C.-C.H.; Writing—original draft, J.-J.H.; Supervision, J.-J.H.; Funding acquisition, J.-J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Science and Technology Council of Taiwan, grant number: NSTC 110-2118-M-194-002-MY2.

Data Availability Statement

Transfusion-Related AIDS Data: Ref. [19]; Retirement Community Center Data: Ref. [20].

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (26) with the Clayton copula.
Table A1. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (26) with the Clayton copula.
n τ γ λ 1 β 0 ( γ ) λ 2 β 1 ( γ )
BiasEMPSDAVESDMSECPBiasEMPSDAVESDMSECP
100−0.050.10.05−0.03090.19580.22680.03830.9720.050.00610.28680.27600.08360.936
0.30.1−0.01970.21100.22170.04490.9480.1−0.00560.30360.33370.09220.948
0.50.13−0.02470.23430.25010.05550.9220.150.03460.31580.34080.10090.958
200−0.050.10.05−0.01010.08830.09780.00790.940.05−0.00090.12230.14010.01500.962
0.30.1−0.01340.14370.14810.02080.9440.10.01230.19630.21810.03870.962
0.50.13−0.01080.16230.16600.02650.9380.150.01570.21890.23080.04820.952
Table A2. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (26) with the Frank copula.
Table A2. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (26) with the Frank copula.
n τ γ λ 1 β 0 ( γ ) λ 2 β 1 ( γ )
BiasEMPSDAVESDMSECPBiasEMPSDAVESDMSECP
1000.30.10.13−0.04220.13340.13950.01960.9020.13−0.03780.21530.20240.04780.928
0.30.2−0.05160.20910.21110.04640.930.2−0.05340.29580.31880.09030.952
0.50.4−0.06920.24700.23700.06580.910.4−0.02870.33310.34460.11180.96
0.50.10.13−0.07480.15040.15950.02820.920.13−0.03610.20900.22620.04500.962
0.30.25−0.08950.17590.18250.03900.9140.25−0.07140.28780.31400.08800.95
0.50.5−0.09530.21490.21900.05520.90.5−0.02310.28720.31210.08300.956
0.70.10.065−0.04730.12540.13390.01800.940.08−0.06820.20030.21410.04480.942
0.30.19−0.06530.16920.17990.03290.930.16−0.05610.27620.27690.07940.932
0.50.4−0.06280.18910.18970.03970.900.4−0.04800.28800.28740.08530.924
2000.30.10.13−0.02970.08960.09210.00890.920.13−0.02540.13660.14940.01930.978
0.30.2−0.04290.12500.13320.01750.9320.2−0.02610.20290.22780.04190.962
0.50.4−0.04320.14910.14420.02410.9080.4−0.03260.23990.24700.05860.946
0.50.10.13−0.05160.10480.10620.01400.9160.13−0.03690.14840.15100.02420.934
0.30.25−0.06340.13750.14140.02290.9240.25−0.06530.19620.20910.04280.948
0.50.5−0.06550.13890.13980.02360.9020.5−0.03670.20240.21990.04230.954
0.70.10.065−0.05620.09640.10110.01250.9160.08−0.04380.14270.14730.02230.946
0.30.19−0.07350.12570.13030.02120.9080.16−0.03340.18760.19270.03630.936
0.50.4−0.05900.13430.14060.02150.9020.4−0.02890.19660.20330.03950.94
Table A3. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (27) with the Clayton copula.
Table A3. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (27) with the Clayton copula.
n τ γ λ 1 k 1 β 0 ( γ ) λ 2 k 2 β 1 ( γ )
BiasEMPSDAVESDMSECPBiasEMPSDAVESDMSECP
100−0.050.1312−0.00860.14450.14560.02100.926480.00610.28680.27600.08360.936
0.3512−0.01940.18320.18330.03390.914780.02430.28360.26410.08100.934
0.5712−0.00100.22370.19500.05000.9121080.04850.33170.33700.11240.932
200−0.050.1312−0.02670.08950.08470.00870.914480.01790.13450.12880.01840.934
0.3512−0.01690.13710.12710.01910.936780.01430.20480.19390.06960.94
0.57120.00190.16860.16470.02840.9221080.02840.28400.28510.09940.943
Table A4. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (27) with the Frank copula.
Table A4. Estimations of β 0 ( γ ) and β 1 ( γ ) in model (27) with the Frank copula.
n τ γ λ 1 k 1 β 0 ( γ ) λ 2 k 2 β 1 ( γ )
BiasEMPSDAVESDMSECPBiasEMPSDAVESDMSECP
1000.30.15120.02710.13240.11780.01460.90258−0.0610.23930.23460.05510.934
0.37120.08030.21200.20200.04730.88108−0.05370.28700.28810.08590.894
0.510120.02790.21320.21140.04550.904128−0.03380.31760.30480.09400.946
0.50.14140.01890.13610.13310.01890.94100.02930.23530.21670.05620.932
0.37140.07370.19050.18950.04130.888100.03080.25800.26620.07180.898
0.511140.07020.23110.22100.05380.9112100.07330.31870.30390.09770.918
0.70.15150.01620.12640.11250.01620.96120.01650.22840.22890.05240.922
0.39150.07610.16620.17720.03720.88611120.04940.24840.25040.06510.892
0.511150.07490.18280.17180.03510.89213120.07170.24930.26980.07790.904
2000.30.15120.01410.10690.09250.00680.91580.00810.21240.21390.04580.952
0.37120.06820.18750.17480.02860.9081080.02710.23960.24390.05600.923
0.510120.00820.19750.18360.03890.9141280.02540.27510.26780.08740.96
0.50.14140.00160.10070.09920.01360.924100.01700.21000.18820.03610.968
0.37140.06520.15530.14640.03650.9068100.02550.22530.23400.05040.945
0.511140.06720.21450.20780.03830.90812100.02310.27560.26690.07460.926
0.70.15150.01510.10360.09830.01530.9126120.00680.19130.19150.03670.928
0.39150.06110.15200.16280.02350.911112−0.00540.20870.20840.04680.938
0.511150.02710.15890.14250.02910.91613120.00750.21370.21010.05010.943

References

  1. Koenker, R.; Bassett, G. Regression Quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  2. Powell, J. Least Absolute Deviations Estimation for the Censored Regression Model. J. Econom. 1984, 25, 303–325. [Google Scholar] [CrossRef]
  3. Powell, J. Censored Regression Quantiles. J. Econom. 1986, 32, 143–155. [Google Scholar] [CrossRef]
  4. Ying, Z.; Jung, S.; Wei, L. Survival Analysis With Median Regression Models. J. Am. Stat. Assoc. 1995, 90, 178–184. [Google Scholar] [CrossRef]
  5. Fitzenberger, B. A Guide to Censored Quantile Regressions. In Handbooks of Statistics: Robust Inference; Maddala, G.S., Rao, C.R., Eds.; North-Holland: Amsterdam, The Netherlands, 1997; Volume 15, pp. 405–437. [Google Scholar]
  6. Buchinsky, M.; Hahn, J. A Alternative Estimator for Censored Quantile Regression. Econometrica 1998, 66, 653–671. [Google Scholar] [CrossRef]
  7. Yang, S. Censored Median Regression Using Weighted Empirical Survival and Hazard Functions. J. Am. Stat. Assoc. 1999, 94, 137–145. [Google Scholar] [CrossRef]
  8. Portnoy, S. Censored regression quantiles. J. Am. Stat. Assoc. 2003, 98, 1001–1012. [Google Scholar] [CrossRef]
  9. Peng, L.; Huang, Y. Survival Analysis Based on Quantile Regression Models. J. Am. Stat. Assoc. 2008, 103, 637–649. [Google Scholar] [CrossRef]
  10. Yin, G.; Zeng, D.; Li, H. Power-Transformed Linear Quantile Regression With Censored Data. J. Am. Stat. Assoc. 2008, 103, 1214–1224. [Google Scholar] [CrossRef]
  11. Portnoy, S.; Lin, G. Asymptotics for Censored Regression Quantiles. J. Nonparametr. Stat. 2010, 22, 115–130. [Google Scholar] [CrossRef]
  12. Peng, L.; Fine, J.P. Competing Risks Quantile Regression. J. Am. Stat. Assoc. 2009, 104, 1440–1453. [Google Scholar] [CrossRef]
  13. Fan, Y.; Liu, R. Partial Identification and Inference in Censored Quantile Regression. J. Econom. 2018, 206, 1–38. [Google Scholar] [CrossRef]
  14. Ji, S.; Peng, L.; Li, R.; Lynn, M. Analysis of Dependently Censored Data Based on Quantile Regression. Stat. Sin. 2014, 24, 1411–1432. [Google Scholar] [CrossRef]
  15. Tsai, W.Y.; Jewell, N.P.; Wang, M.C. A note on the product-limit estimator under right censoring and left truncation. Biometrika 1987, 74, 883–886. [Google Scholar] [CrossRef]
  16. Tsai, W.Y. Testing the assumption of independence of truncation time and failure time. Biometrika 1990, 77, 169–177. [Google Scholar] [CrossRef]
  17. Chaieb, L.L.; Rivest, L.P.; Abdous, B. Estimating survival under a dependent truncation. Biometrika 2006, 93, 655–669. [Google Scholar] [CrossRef]
  18. Hsieh, J.J.; Hsiao, M.F. Quantile regression based on a weighted approach under semi-competing risks data. J. Stat. Comput. Simul. 2015, 85, 2793–2807. [Google Scholar] [CrossRef]
  19. Kalbfleisch, J.D.; Lawless, J.F. Inference based on retrospective ascertainment: An analysis of the data on transfusion-related AIDS. J. Am. Stat. Assoc. 1989, 84, 360–372. [Google Scholar] [CrossRef]
  20. Hyde, J. Testing Survival with Incomplete Observations. In Biostatistics Casebook; Miller, R.G., Efron, B., Brown, B.W., Moses, L.E., Eds.; John Wiley: Hoboken, NJ, USA, 1980; pp. 31–46. [Google Scholar]
  21. Genest, C.; Rivest, L.P. Statistical inference procedures for bivariate Archimedean copulas. J. Am. Stat. Assoc. 1993, 88, 1034–1043. [Google Scholar] [CrossRef]
  22. Lai, T.L.; Ying, Z. Estimating a distribution function with truncated and censored data. Ann. Stat. 1991, 19, 417–442. [Google Scholar] [CrossRef]
  23. Shen, P.S. A class of rank-based test for left-truncated and right-censored data. Ann. Inst. Stat. Math. 2009, 61, 461–476. [Google Scholar] [CrossRef]
Figure 1. The cumulative distribution functions of induction time for two age groups. The Y-axis represents the cumulative distribution functions of induction time, while the X-axis denotes the induction time in months.
Figure 1. The cumulative distribution functions of induction time for two age groups. The Y-axis represents the cumulative distribution functions of induction time, while the X-axis denotes the induction time in months.
Mathematics 11 03669 g001
Figure 2. The survival curves of the lifetime for the sex groups. The Y-axis represents the survival curves of the lifetime, while the X-axis denotes the lifetime in units of 10 months.
Figure 2. The survival curves of the lifetime for the sex groups. The Y-axis represents the survival curves of the lifetime, while the X-axis denotes the lifetime in units of 10 months.
Mathematics 11 03669 g002
Table 1. Estimations of β ( γ ) in model ( 28 ) under the AIDS data.
Table 1. Estimations of β ( γ ) in model ( 28 ) under the AIDS data.
γ β 0 ( γ ) β 1 ( γ )
β ^ 0 ( γ ) SD 95 %   CI β ^ 1 ( γ ) SD 95 %   CI
0.12.07940.1394(1.8063, 2.3526)0.69310.1767(0.3468, 1.0395)
0.32.56510.1891(2.1944, 2.9355)0.80230.2097(0.3913, 1.2133)
0.53.13540.2601(2.6257, 3.6451)0.55340.2740(0.0163, 1.0905)
Table 2. Estimations of β ( γ ) in model (29) under the retirement data.
Table 2. Estimations of β ( γ ) in model (29) under the retirement data.
γ β 0 ( γ ) β 1 ( γ )
β ^ 0 ( γ ) SD 95 %   CI β ^ 1 ( γ ) SD 95 %   CI
0.14.43770.0191(4.4001, 4.4752)−0.03120.0211(−0.0727, 0.0103)
0.34.43760.0413(4.3560, 4.5179)−0.02110.0502(−0.1196, 0.0773)
0.54.43780.0612(4.3170, 4.5584)−0.01150.0802(−0.1687, 0.1457)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hsieh, J.-J.; Hsieh, C.-C. Quantile Regression Based on the Weighted Approach with Dependent Truncated Data. Mathematics 2023, 11, 3669. https://doi.org/10.3390/math11173669

AMA Style

Hsieh J-J, Hsieh C-C. Quantile Regression Based on the Weighted Approach with Dependent Truncated Data. Mathematics. 2023; 11(17):3669. https://doi.org/10.3390/math11173669

Chicago/Turabian Style

Hsieh, Jin-Jian, and Cheng-Chih Hsieh. 2023. "Quantile Regression Based on the Weighted Approach with Dependent Truncated Data" Mathematics 11, no. 17: 3669. https://doi.org/10.3390/math11173669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop