Next Article in Journal
A Semiclassical Approach to the Nonlocal Nonlinear Schrödinger Equation with a Non-Hermitian Term
Previous Article in Journal
Mathematical Modelling of Traction Equipment Parameters of Electric Cargo Trucks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two-Stage Estimation of Partially Linear Varying Coefficient Quantile Regression Model with Missing Data

1
School of Science, Xi’an Polytechnic University, Xi’an 710048, China
2
School of Economics and Finance, Xi’an Jiaotong University, Xi’an 710061, China
*
Authors to whom correspondence should be addressed.
Mathematics 2024, 12(4), 578; https://doi.org/10.3390/math12040578
Submission received: 11 December 2023 / Revised: 9 February 2024 / Accepted: 9 February 2024 / Published: 14 February 2024

Abstract

:
In this paper, the statistical inference of the partially linear varying coefficient quantile regression model is studied under random missing responses. A two-stage estimation procedure is developed to estimate the parametric and nonparametric components involved in the model. Furthermore, the asymptotic properties of the estimators obtained are established under some mild regularity conditions. In addition, the empirical log-likelihood ratio statistic based on imputation is proposed, and it is proven that this statistic obeys the standard Chi-square distribution; thus, the empirical likelihood confidence interval of the parameter component of the model is constructed. Finally, simulation results show that the proposed estimation method is feasible and effective.

1. Introduction

The partially linear varying coefficient model, originally proposed by Zhang et al. [1], is a very flexible model that includes several sub-models, such as parametric, nonparametric, and semiparametric models. It not only has the advantages of the general linear model and robust nonparametric model but can also dynamically express the relationship between covariates and responses. At the same time, it maintains the explanatory power of parameters and the flexibility of nonparametric models (see [2,3]). Therefore, the model has been widely studied by many scholars. As we know, this model is usually estimated using the least-square (LS) estimation method. Although the LS method is the best method for normal distribution data sets, it may produce large deviations when there are abnormal values or errors in the data sets that obey heavy-tailed distributions. In order to remedy the defects of the LS method, Koenker and Bassett [4] proposed the quantile regression method, which can be used to explore the potential relationship between the response variables and covariates. Further, in order to avoid being influenced by the specific value of the quantile and improve estimation efficiency, the composite quantile regression (CQR) method was proposed by Zou and Yuan [5] based on quantile regression. So far, the CQR method has been applied to solve many problems. For example, new estimation and variable selection procedures for the semiparametric partially linear varying coefficient model were proposed by Kai et al. [6], showing that compared with the least-squares-based method, the CQR method is much more efficient for many non-normal errors. Jiang et al. [7] proposed a functional single-index composite quantile regression method and estimated the unknown slope function and link function using B-spline basis functions. Song et al. [8] proposed a penalized composite quantile regression estimator based on SCAD and the Laplacian error penalty (LEP), which can realize variable selection and estimation at the same time.
Despite significant advances in CQR theory and its applications, the CQR method has attracted more attention when missing values are contained in data samples, which arises in various fields, including economics, engineering, biology, and epidemiology. Due to various man-made or other unknown factors, it is easy to produce a large number of missing data. Considering this problem, several methods have been proposed to deal with missing data, such as the imputation method [9,10,11], complete-case (CC) analysis method [12], likelihood-based method [13], and inverse probability weighted method (IPW) [14,15,16]. Thus, some scholars have applied the CQR method in the case of missing data. Based on inverse probability weighted and B-spline approximations, Jin et al. [17] proposed a weighted B-spline composite quantile regression method to estimate nonparametric functions and regression coefficients for the partially linear varying coefficient model with missing covariates.
Further, the empirical likelihood method, introduced by Owen in [18,19], is a non-parametric statistical inference method for complete samples, which has similar sampling characteristics to the bootstrap method. It has many outstanding advantages, including domain preservation, the shape of the confidence region determined by the data, Bartlett correction, and without constructing axis statistics compared with classical or modern statistical methods. As a result, this method has attracted the attention of many statisticians and thus has been applied to analyze linear, nonparametric, and semiparametric regression models. For example, Zhao and Xue [20] used this method to give the adjusted empirical likelihood ratio function of the parameter part and proved that it asymptotically obeyed the standard Chi-square distribution. Wang and Zhu [21] developed two methods of longitudinal data inference based on this method under the framework of quantile regression. Yan et al. [22] proposed an empirical likelihood method based on imputation combined with quantile regression to construct confidence intervals of parametric and nonparametric components. In this paper, we combine the imputation method and the empirical likelihood inference to propose a composite quantile regression estimation for the partially linear varying coefficient model when the responses are missing at random.
The rest of this paper is organized as follows. In Section 2, a two-stage estimation procedure is developed to estimate the parametric and nonparametric components. Based on this, methods for constructing imputation empirical likelihood-based confidence intervals for the parametric components are presented, and some asymptotic properties of the proposed estimator are studied. In Section 3, some simulation studies and a real data application are used to evaluate the performance of the proposed method. In addition, the proofs of the main results are given in Section 4.

2. Two-Stage Estimation Method

In what follows, we use the two-stage estimation procedure [23] to estimate the parametric and nonparametric components involved in the following partially linear varying coefficient model:
Y = X T α ( U ) + Z T β + ε ,
where Y is the response, U is a one-dimensional variable, and X T and Z T are the corresponding covariates. Suppose that α ( · ) = ( α 1 ( · ) , α 2 ( · ) , , α p ( · ) ) is the unknown vector whose components are smooth coefficient functions, β = ( β 1 , β 2 , , β q ) is a q × 1 vector, ε is the model error, and ε and ( U , X T , Z T ) are independent of each other. If Y i , X i , U i , Z i are random samples, we have
Y i = X i T α ( U i ) + Z i T β + ε i , i = 1 , 2 , , n .
An indicator variable δ i is introduced in this paper such that
δ i = 0 , if Y i is observed , 1 , if Y i is missing .
Further, Y i follows the following missing mechanism:
P ( δ i = 1 Y i , X i , U i , Z i ) = P ( δ i = 1 X i , U i , Z i ) .
Assuming β is known, the model in (2) can be regarded as the general varying coefficient model. Next, we use the local linear composite quantile estimation method to estimate the varying coefficient function α ( · ) . Specifically, for u in a neighborhood of U i , the unknown coefficient function α j ( U i ) can be locally linearly approximated as
α j ( U i ) α j ( u ) + α j ( u ) ( U i u ) a j + b j ( U i u ) , j = 1 , , p .
Let c k be the τ k quantile of ε i , then 0 < τ 1 < < τ k < 1 . For a given non-negative integer K, let τ k = k 1 + K , and let ρ τ ( v ) = v ( τ I ( v < 0 ) ) be a quantile loss function for τ ( 0 , 1 ) , so c k is uniquely deterministic for any 0 < τ k < 1 . Then, the initial estimation c ˜ 1 , , c ˜ K , a ˜ , b ˜ , β ˜ of c 1 , , c K , a , b , β under complete data can be obtained by minimizing the following formula
k = 1 K i = 1 n δ i ρ τ k Y i c k Z i T β X i T [ a + b ( U i u ) ] K h ( U i u ) ,
where a = ( a 1 , , a p ) T , b = ( b 1 , , b p ) T , K h ( · ) = h 1 K ( · / h ) , and K ( · ) is the kernel function. Then, we have a ˜ = α ˜ ( U i ; β ) , b ˜ = α ˜ ( U i ; β ) . Further, by substituting a ˜ into Formula (2), we can obtain
Y ˜ i = Z i T β + ε i ,
where Y ˜ i = Y i X i T α ˜ ( U i ; β ) . Now, the model in (6) can be regarded as a general linear model. Then, the estimation for β can be obtained by minimizing the following formula using the composite quantile regression estimation method
β ^ = argmin k = 1 K i = 1 n δ i ρ τ k ( Y ˜ i c k Z i T β ) .
The estimated efficiency of α ( u ) can be further improved by substituting β ^ into (5). Then, we have a ^ = α ^ ( u ; β ^ ) and b ^ = α ^ ( u ; β ^ ) .

2.1. Imputation-Based Empirical Likelihood Inference for β

Now, the confidence interval for β is constructed for the imputation-based empirical likelihood inference method. When a large amount of data are missing, an empirical likelihood inference method based on imputation is proposed [24] to improve the accuracy of the confidence interval. Let β ^ and Z i T β ^ denote the estimation for β and Y ˜ i , respectively. In addition, the auxiliary random vectors based on imputation are defined as
η ˜ i ( β ) = k = 1 K Z i τ k I ( δ i Y ˜ i + ( 1 δ i ) Z i T β ^ Z i T β c ˜ k ) ,
where I ( · ) is an indicator function. The empirical log-likelihood ratio for β is defined as
R ˜ ( β ) = 2 max i = 1 n log ( n p i ) | p i 0 , i = 1 n p i = 1 , i = 1 n p i η ˜ i ( β ) = 0 .
If zero is inside the convex hull of the point ( η ˜ 1 ( β ) , η ˜ 2 ( β ) , , η ˜ n ( β ) ) , there exists a unique optimal point for the optimization problem (9). It follows from the Lagrange multiplier method that R ˜ ( β ) can be rewritten as
R ˜ ( β ) = 2 i = 1 n log { 1 + λ T η ˜ i ( β ) } ,
where λ is a Lagrange multiplier that satisfies
i = 1 n η ˜ i ( β ) 1 + λ T η ˜ i ( β ) = 0 .

2.2. Asymptotic Properties of Estimators

For convenience, the following notation is given. Note that f ( · ) and F ( · ) represent the density function and distribution function of the model error ε , respectively. f U ( u ) represents the marginal density of the covariate u. Note that τ k k = min ( τ k , τ k ) τ k τ k . To prove the asymptotic properties, we assume that the following regularity conditions hold:
C1: The random variable U has a bounded support Ω , and its density function f U ( u ) satisfies the Lipschitz condition and is continuous.
C2: { α j ( · ) , j = 1 , , p } has a continuous second derivative.
C3: The density function of the model error ε satisfies f(·) > 0, and its derivative is continuously bounded.
C4: The random variable X has bounded support.
C5: The kernel function K ( · ) is a symmetric probability density function with bounded support and satisfies the Lipschitz condition. Let
μ j = u j K ( u ) d u < , v j = u j K 2 ( u ) d u < , j = 0 , 1 , 2 , .
C6: For a given u Ω , D 1 ( u ) , and 1 are positive definite matrices.
C7: The covariates X i , i = 1 , , n are i . i . d . centred random vectors and satisfy
max 1 i n X i / n 1 / 2 0
as n .
C8. Suppose that π ( x , u , z ) = E ( δ i | X i = x , U i = u , Z i = z ) and π ( u ) = E ( δ i | U i = u ) . Furthermore, we let π ( x , u , z ) > 0 for all x , u , and z.
Theorem 1.
Assume that conditions C1-C8 hold. Then,
n ( β ^ β ) L N 0 , c 2 1 k = 1 K k = 1 K τ k k Δ 1
if β is the true value of the parameter, where L denotes convergence in distribution, = π ( U , X , Z ) E ( Z Z T ) , Δ = E ( π ( U , X , Z ) ) E { [ δ Z ω ( U , X , Z ) ] [ δ Z ω ( U , X , Z ) ] T } , ω ( U , X , Z ) = E [ Z ( 0 T , X T , 0 ) | U = u ] D 1 ( u ) ( I K , 1 T X , 1 T Z ) T , and
D ( u ) = E I 1 X T 1 Z T X 1 T X X T X Z T Z 1 T Z X T Z Z T | U = u ,
where I = d i a g ( 1 , , 1 ) , 1 = ( 1 , , 1 ) T .
Theorem 2.
Assume that conditions C1-C8 hold, as n . Then,
n h ( a ^ α ( u ) 1 2 h 2 μ 2 α ( u ) ) L N 0 , k = 1 K k = 1 K τ k k E ( π ( u ) ) v 0 f U 1 ( u ) 1 1 ,
where L denotes convergence in distribution, and μ 2 = u 2 K ( u ) d u , v 0 = K 2 ( u ) d u , and 1 = V a r ( X | U = u ) .
Theorem 3.
Assume that conditions C1-C8 hold. Then,
R ( β ) L χ q 2
if β is the true value of the parameter, where L denotes convergence in distribution, and χ q 2 is the Chi-square distribution with q degrees of freedom.
Then, the confidence region for β will be constructed based on the results. For a given α with 0 < α < 1 , let c α satisfy P ( χ q 2 c α ) = 1 α . Then, the approximate 1 α confidence region for β can be defined as
C α ( β ) = { β | R ( β ) χ q 2 ( 1 α ) } .

3. Numerical Results

3.1. Simulation Studies

A numerical simulation experiment was carried out to study the performance of the proposed method under finite samples. We considered the following partially linear varying coefficient model and the data were also generated using this model
Y i = Z i β + X i ( sin ( 2 π U i ) ) + ε i ,
where β = 3 , X i N ( 0 , 1 ) , U i U ( 0 , 1 ) , Z i N ( 0 , 1 ) , ε i N ( 0 , 0.01 ) , and the response Y i is generated according to the model. Throughout the simulation study, we used the Epanechnikov kernel function, where K ( u ) = 0.75 ( 1 u 2 ) . Further, we used the cross-validation method to select the optimal bandwidth h C V
C V ( h ) = i = 1 n δ i ρ τ ( Y i X i T α ^ [ i ] ( U i ) Z i T β ^ [ i ] ) ,
where ρ τ ( v ) = v ( τ I ( v < 0 ) ) is the quantile loss function, and α ^ [ i ] ( U i ) and β ^ [ i ] are the estimators of α ( u ) and β after deleting the ith subject. Next, we considered the following three cases of missing probabilities: π ( x , u , z ) = P ( δ = 1 X i = x , Z i = z , U i = u ) : (1) π 1 ( x , z , u ) = { 2.5 + exp ( y 0.6 x 0.3 z 0.2 u 3 ) } 1 ; (2) π 2 ( x , z , u ) = { 1 + exp ( y 0.4 x 0.2 z 0.1 u 3 ) } 1 ; (3) π 3 ( x , z , u ) = { 1 + 2 exp ( y 0.5 x 0.2 z 0.1 u 3 ) } 1 .
Thus, the missing probabilities corresponding to the scenarios were approximately 0.1, 0.25, and 0.4, respectively. In this simulation, the quantile vector was taken as τ = ( 0.1 , , 0.9 ) T with K = 9 , and the sample sizes were set to n = 100 , n = 300 , and n = 1000 , respectively. For each case, we conducted 1000 simulation runs. The estimation errors, standard deviations, and mean squared errors of the estimator with the three missing probabilities are summarized in Table 1. For the nonparametric component, we provide an estimation curve for the component of the coefficient α ( u ) with a missing probability of 0.25 and n = 300 .
The following conclusions can be drawn from Table 1:
(1) For a given missing probability, with the increase in the sample size, the standard deviation and mean square error of the given estimator decrease.
(2) For a given sample size, with the increase in the missing probability, the deviation and mean square error of the given estimator increase.
In addition, by comparing the real curve with the estimated curve, as shown in Figure 1, we can see that the method proposed in this paper is effective.
Next, we used the empirical likelihood method to construct the confidence interval for β . In order to evaluate the performance of the proposed statistical inference method, two methods were compared in the following simulation study: the imputation-based empirical likelihood method (IEL) proposed in this paper and the complete data-based empirical likelihood method (CEL). Under three different missing probabilities, the upper and lower 95% confidence limits for β and the corresponding coverage probabilities were computed. The sample sizes were set to n = 100 , n = 300 , and n = 1000 , respectively. The upper and lower confidence limits, the average length of the confidence intervals, and the coverage probabilities are summarized in Table 2.
As can be seen in Table 2, as the sample size increased under the same missing probability, the confidence interval lengths of the two methods decreased and the coverage probabilities increased. However, the IEL method yielded smaller confidence intervals and higher coverage probabilities compared to the CEL method, so the IEL method is better than the CEL method. Similarly, as the missing probability increased under the same sample size, the confidence interval lengths of the two methods increased and the coverage probability decreased. However, the IEL method yielded smaller confidence intervals and higher coverage probabilities compared to the CEL method, so the IEL method is superior to the CEL method.

3.2. Application to a Real Data Example

In this section, we apply our proposed method to NCCTG lung cancer data, which are available in the R software (latest v. 4.2.3). In this data set, there are 228 patients with lung cancer, and the survival days for 63 of these patients have been deleted for various reasons, with a missing rate of about 27.63%. There are 10 variables in this data set, but we paid more attention to the following variables: time (survival time), ph.karno (Karnofsky performance score, assessed by doctors), meal.cal (calories consumed from three meals), and age (age of lung cancer patients). Here, we used the model in (1) to fit the lung cancer data, where Y represents the survival time, U represents age, Z represents meal.cal, and X represents ph.karno. For comparison, we considered composite levels K of 5 and 9 in the CQR method, denoted as CQR5 and CQR9, respectively.
Next, we compare the results obtained using the CQR5 and CQR9 methods for estimating β and give an estimation curve for the component of the coefficient α ( · ) using the CQR9 method. Then, we present the estimator and its corresponding 95% confidence interval for β based on the proposed IEL method. The results for the CQR method and the confidence interval for β are summarized in Table 3. The estimation curve for the component of the varying coefficient α ( · ) is shown in Figure 2. From Table 3, we can see that the confidence interval length of the CQR9 method is smaller than that of the CQR5 method, which indicates that the CQR9 method synthesizes more information about the quantile, which is consistent with the theory. In addition, it can be seen in Figure 2 that the varying coefficient function fluctuates with age.

4. Proofs of Theorems

Lemma 1.
Let ( X 1 , Y 1 ) , , ( X n , Y n ) be the i . i . d random vectors, where Y i , i = 1 , , n are one-dimensional random variables. At the same time, E | Y | s < and sup x | y | s f ( x , y ) d y < hold, where f ( x , y ) represents the joint density function of (X, Y). Let K ( · ) be a bounded positive function with bounded support, satisfying the Lipschitz condition. Then, we have
sup x D 1 n i = 1 n [ K h ( X i x ) Y i E ( K h ( X i x ) Y i ) ] = O p [ ( l o g ( 1 / h ) n h ) 1 / 2 ] ,
where n 2 ε 1 h for ε < 1 s 1 .
For the proof of Lemma 1, the reader is referred to Fan and Huang [3].
Lemma 2.
Suppose that A n ( s ) is convex and can be represented as 1 2 s T V s + U n T s + C n + r n ( s ) , where V is a symmetric and positive definite matrix, U n is stochastically bounded, C n is arbitrary, and r n ( s ) converges to 0 in probability for any s. Let the minimized solution of A n be α n , the minimized solution of 1 2 s T V s + U n T s + C n be β n = V 1 U n , and the difference between them be of order o p ( 1 ) . Further, if U n L U , then α n L V 1 U , where L denotes convergence in distribution.
Lemma 2 is taken from the basic propositions in Hjort and Pollard [25].
Lemma 3.
Suppose that conditions C1-C8 hold. Then, we have
n h ( c ˜ 1 c 1 , , c ˜ K c K , ( a ˜ α ( u ) ) T , ( β ˜ β ) T ) T = π ( u ) 1 f U ( u ) 1 D 1 1 ( u ) W n ( u ) + o p ( 1 ) ,
where
D 1 ( u ) = E C c X T c Z T X c T c X X T c X Z T Z c T c Z X T c Z Z T | U = u ,
π ( u ) = E ( δ | U = u ) , C = d i a g ( f ( c 1 ) , , f ( c k ) ) , c = ( f ( c 1 ) , , f ( c k ) ) T ,
W n ( u ) = 1 n h k = 1 K i = 1 n δ i K i ( u ) η i , k * ( u ) ( e k T , X i T , Z i T ) T ,
η i , k * ( u ) = I ( ε i c k r i ( u ) ) τ k ,
r i ( u ) = X i T [ α ( U i ) α ( u ) α ( u ) ( U i u ) ] .
Proof. 
Note that θ = n h ( c ˜ 1 c 1 , , c ˜ K c K , ( a ˜ α ( u ) ) T , ( β ˜ β ) T , ( b ˜ α ( u ) T ) T . Then, the estimate of θ can be obtained by minimizing the following loss function:
k = 1 K i = 1 n ρ τ k { Y i c k Z i T β X i [ a + b ( U i u ) ] } δ i K i ( U i u ) .
(12) is equivalent to minimizing the following formula:
L n ( θ ) = k = 1 K i = 1 n δ i ρ τ k ( ε i c k + r i ( u ) d i , k ) ρ τ k ( ε i c k + r i ( u ) ) ,
where K i ( u ) = K h ( U i u ) , d i , k = X ˜ i , k ( u ) T θ / n h , X ˜ i , k ( u ) = [ e k T , X i T , Z i T , X i T ( U i u ) / h ] T . e k is the K-dimensional unit vector; its kth component is 1, and the remaining component is 0. According to the identity proposed by Knight [26],
ρ τ ( u v ) ρ τ ( u ) = v ( τ I ( u < 0 ) ) + 0 v ( I ( u s ) I ( u 0 ) ) d s ,
and it is easy to obtain
L n ( θ ) = k = 1 K i = 1 n δ i K i ( u ) d i , k [ I ( ε i c k r i ( u ) ) τ k ] + k = 1 K i = 1 n δ i K i ( u ) 0 d i , k [ I ( ε i c k r i ( u ) + s ) I ( ε i c k r i ( u ) ) ] d s = W n * ( u ) T θ + k = 1 K M n , k ( θ ) ,
where
W n * ( u ) = 1 n h k = 1 K i = 1 n δ i K i ( u ) η i , k * ( u ) X ˜ i , k ( u ) ,
M n , k ( θ ) = i = 1 n δ i K i ( u ) 0 d i , k [ I ( ε i c k r i ( u ) + s ) I ( ε i c k r i ( u ) ) ] d s .
According to Lemma 1, we can obtain
M n , k ( θ ) = E [ M n , k ( θ ) ] + O p ( log 1 2 ( 1 / h ) / n h ) .
Moreover,
E [ M n , k ( θ ) ] = k = 1 K i = 1 n π ( u ) K i ( u ) 0 d i , k [ F ( c k r i ( u ) + s ) F ( c k r i ( u ) ) ] d s = k = 1 K i = 1 n π ( u ) K i ( u ) 0 d i , k { s f ( c k r i ( u ) ) [ 1 + o ( 1 ) ] } d s = 1 2 θ T ( 1 n h k = 1 K i = 1 n π ( u ) K i ( u ) f ( c k r i ( u ) ) X ˜ i , k ( u ) X ˜ i , k ( u ) T ) θ + O p ( log 1 2 ( 1 / h ) / n h ) = 1 2 θ T S n ( u ) θ + O p ( log 1 2 ( 1 / h ) / n h ) ,
where S n ( u ) = 1 n h k = 1 K i = 1 n π ( u ) K i ( u ) f ( c k r i ( u ) ) X ˜ i , k ( u ) X ˜ i , k ( u ) T . Thus, it is easy to obtain
L n ( θ ) = W n * ( u ) T θ + k = 1 K E [ M n , k ( θ ) ] + O p ( log 1 2 ( 1 / h ) / n h ) = W n * ( u ) T θ + k = 1 K E { E [ M n , k ( θ ) | U , X , Z ] } + O p ( log 1 2 ( 1 / h ) / n h ) = W n * ( u ) T θ + 1 2 θ T E [ S n ( u ) ] θ + O p ( log 1 2 ( 1 / h ) / n h ) .
Since E [ S n ( u ) ] = π ( u ) f U ( u ) S ( u ) + O ( h 2 ) is established, where S ( u ) = d i a g ( D 1 ( u ) , μ 2 E [ ( X T , X T ) T ( X T , X T ) | U = u ] ) . Then,
L n ( θ ) = W n * ( u ) T θ + π ( u ) 2 f U ( u ) θ T S ( u ) θ + O p ( log 1 2 ( 1 / h ) / n h ) .
From Lemma 2, a minimum resolution θ ˜ of L n ( θ ) can be written as
θ ˜ = π ( u ) 1 f U ( u ) 1 S 1 ( u ) W n * ( u ) + o p ( 1 ) .
Because S ( u ) is a pseudo-diagonal matrix, Lemma 3 can be proven. □
Lemma 4.
Assume that conditions C1-C8 hold. Then,
1 n i = 1 n η ˜ i ( β ) L N ( 0 , Λ )
if β is the true value of the parameter, where L denotes convergence in distribution, Λ = k = 1 K k = 1 K ( min ( τ k , τ k ) τ k τ k ) E ( δ 2 Z Z T ) .
Proof. 
It is easy to show that
1 n i = 1 n η ˜ i ( β ) = 1 n i = 1 n k = 1 K Z i { τ k I ( δ i Y ˜ i + ( 1 δ i ) Z i T β ^ Z i T β c ˜ k ) } = 1 n i = 1 n k = 1 K δ i Z i [ τ k I ( ε i c ˜ k ) ] + 1 n i = 1 n k = 1 K ( 1 δ i ) Z i [ τ k I ( Z i T β ^ Z i T β c ˜ k ) ] = 1 n i = 1 n A i 1 + 1 n i = 1 n A i 2 .
Note that
E { A i 1 } = E k = 1 K δ i Z i [ τ k I ( ε i c ˜ k ) ] ,
then, we have the following calculation:
V a r { A i 1 } = V a r k = 1 K δ i Z i [ τ k I ( ε i c ˜ k ) ] = E [ k = 1 K δ i Z i ( τ k I ( ε i c ˜ k ) ) ] [ k = 1 K δ i Z i ( τ k I ( ε i c ˜ k ) ) ] T = E k = 1 K k = 1 K δ i 2 ( τ k I ( ε i c ˜ k ) ) ( τ k I ( ε i c ˜ k ) ) Z i Z i T .
Then, we can obtain
E { ( τ k I ( ε i c ˜ k ) ) ( τ k I ( ε i c ˜ k ) ) } = E { τ k τ k τ k I ( ε i c ˜ k ) I ( ε i c ˜ k ) τ k + I ( ε i c ˜ k ) I ( ε i c ˜ k ) } = min ( τ k τ k ) τ k τ k .
Thus, we have
V a r { A i 1 } = k = 1 K k = 1 K ( min ( τ k τ k ) τ k τ k ) E ( δ i 2 Z i Z i T ) .
Moreover, using the central limit theorem, we can obtain
1 n i = 1 n A i 1 L N ( 0 , Λ ) .
Next, we prove 1 n i = 1 n A i 2 = o p ( 1 ) . It is easy to show that
A i 2 = k = 1 K ( 1 δ i ) Z i [ τ k I ( Z i T β ^ Z i T β c ˜ k ) ] = k = 1 K ( 1 δ i ) f ( c ˜ k ) Z i Z i T ( β ^ β ) + O p ( β ^ β 2 ) .
Let A i 2 , j be the jth component of A i 2 . Then, using Lemma 2 from Zhao and Xue [20], we can obtain
| 1 n A i 2 , j | = 1 n i = 1 n k = 1 K ( 1 δ i ) f ( c ˜ k ) Z i Z i T ( β ^ β ) + o p ( 1 ) 1 n max 1 i n | k = 1 K ( 1 δ i ) f ( c ˜ k ) Z i T ( β ^ β ) | max 1 s n i = 1 s | Z i j | = o p ( 1 ) .
Thus, we have
1 n i = 1 n A i 2 = o p ( 1 ) .
Then, by invoking (15)–(17), we complete the proof of Lemma 4. □
Lemma 5.
Assume that conditions C1–C8 hold. Then,
1 n i = 1 n η ˜ i ( β ) η ˜ i T ( β ) P Λ
if β is the true value of the parameter, where P denotes convergence in probability, Λ = k = 1 K k = 1 K ( min ( τ k , τ k ) τ k τ k ) E ( δ 2 Z Z T ) .
Proof. 
We also use the notations in the proof of Lemma 4. Then, we obtain
1 n i = 1 n η ˜ i ( β ) η ˜ i T ( β ) = 1 n i = 1 n A i 1 A i 1 T + 1 n i = 1 n A i 1 A i 2 T + 1 n i = 1 n A i 2 A i 1 T + 1 n i = 1 n A i 2 A i 2 T = B 1 + B 2 + B 3 + B 4 .
Using the law of large numbers, we can derive that B 1 P Λ . Let B 2 , r s be the ( r , s ) component of B 2 and A i j , r be the rth component of A i j , j = 1 , 2 . Further, using the Cauchy–Schwarz inequality, we can obtain
| B 2 , r s | ( 1 n i = 1 n A i 1 , r 2 ) 1 2 ( 1 n i = 1 n A i 2 , s 2 ) 1 2 .
From the proof of Lemma 4, we can obtain 1 n i = 1 n A i 1 , r 2 = O p ( 1 ) and 1 n i = 1 n A i 2 , s 2 = O p ( 1 ) . Hence, B 2 P 0 . B 3 P 0 and B 4 P 0 are also easy to improve using a similar argument. Hence, Lemma 5 can be proven. □
Proof of Theorem 1.
Note that β * = n ( β ^ β ) , η i , k = I ( ε i c k ) τ k , r ˜ i , k = X i T ( a ˜ α ( U i ) ) . Then, the estimate of β * can be obtained by minimizing the following loss function:
argmin k = 1 K i = 1 n δ i ρ τ k ( Y i c k X i T a ˜ Z i T β ) .
(19) is equivalent to minimizing the following formula,
L n ( β * ) = k = 1 K i = 1 n δ i { ρ τ k ( ε i c k r ˜ i , k Z i T β * / n ) ρ τ k ( ε i c k r ˜ i , k ) } .
According to the identity proposed by Knight,
ρ τ ( u v ) ρ τ ( u ) = v ( τ I ( u < 0 ) ) + 0 v ( I ( u s ) I ( u 0 ) ) d s ,
(20) can be expressed equivalently as
L n ( β * ) = k = 1 K i = 1 n δ i Z i T β * n ( I ( ε i < c k ) τ k ) + k = 1 K i = 1 n δ i r ˜ i , k r ˜ i , k + Z i T β * / n [ I ( ε i c k s ) I ( ε i c k 0 ) ] d s = ( 1 n k = 1 K i = 1 n δ i η i , k Z i ) T β * + M n ( β * ) ,
where M n ( β * ) = k = 1 K i = 1 n δ i r ˜ i , k r ˜ i , k + Z i T β * / n [ I ( ε i c k s ) I ( ε i c k 0 ) ] d s .
Next, we calculate the expectation of M n ( β * ) . The calculation is as follows:
E [ M n ( β * ) | U i , X i , Z i ] = k = 1 K i = 1 n π ( U i , X i , Z i ) r ˜ i , k r ˜ i , k + Z i T β * / n [ F ( c k + s ) F ( c k ) ] d s = k = 1 K i = 1 n π ( U i , X i , Z i ) r ˜ i , k r ˜ i , k + Z i T β * / n [ s f ( c k ) ( 1 + o ( 1 ) ) ] d s = 1 2 β * T ( 1 n k = 1 K i = 1 n π ( U i , X i , Z i ) f ( c k ) Z i Z i T ) β * + ( 1 n k = 1 K i = 1 n π ( U i , X i , Z i ) f ( c k ) r ˜ i , k Z i ) T β * + o p ( 1 ) .
Note that R n ( β * ) = M n ( β * ) E [ M n ( β * ) | U i , X i , Z i ] , then R n ( β * ) = o p ( 1 ) . Thus,
L n ( β * ) = ( 1 n k = 1 K i = 1 n δ i η i , k Z i ) T β * + E [ M n ( β * ) | U i , X i , Z i ] + R n ( β * ) = 1 2 β * T n β * + ( 1 n k = 1 K i = 1 n δ i η i , k Z i ) T β * + ( 1 n k = 1 K i = 1 n π ( U i , X i , Z i ) f ( c k ) r ˜ i , k Z i ) T β * + o p ( 1 ) .
where n = 1 n k = 1 K i = 1 n π ( U i , X i , Z i ) f ( c k ) Z i Z i T = E ( n ) + o p ( 1 ) = c π ( U i , X i , Z i ) E ( Z i Z i T ) + o p ( 1 ) = c + o p ( 1 ) . Further, we obtain
L n ( β * ) = c 2 β * T β * + ( 1 n k = 1 K i = 1 n δ i η i , k Z i ) T β * + ( 1 n k = 1 K i = 1 n π ( U i , X i , Z i ) f ( c k ) r ˜ i , k Z i ) T β * + o p ( 1 ) = L 1 β * + L 2 β * + L 3 β * + o p ( 1 ) .
Then, L 3 in (22) can be represented as
1 n k = 1 K i = 1 n π ( U i , X i , Z i ) f ( c k ) r ˜ i , k Z i = 1 n k = 1 K i = 1 n f ( c k ) f U ( U i ) Z i 0 X i 0 T D 1 1 ( U i ) 1 n h k = 1 K i = 1 n δ i η i , k * ( U i ) e k X i Z i K i ( U i ) + O p ( h 3 2 + log ( 1 / h ) / n h 2 ) = 1 n k = 1 K i = 1 n δ i η i , k ω ( U i , X i , Z i ) + o p ( 1 ) ,
where ω ( U i , X i , Z i ) = π ( U i , X i , Z i ) E [ Z i ( 0 T , X i T , 0 ) | U = U i ] D 1 ( U i ) ( I K , 1 T X i , 1 T Z i ) T . Below, we give the calculation
L n ( β * ) = c 2 β * T β * + ( 1 n k = 1 K i = 1 n δ i η i , k Z i ) T β * ( 1 n k = 1 K i = 1 n δ i η i , k ω ( U i , X i , Z i ) ) T β * + o p ( 1 ) = c 2 β * T β * + [ 1 n k = 1 K i = 1 n δ i η i , k ( δ i Z i ω ( U i , X i , Z i ) ) ] T β * + o p ( 1 ) = c 2 β * T β * + C n * T β * + o p ( 1 ) ,
where C n * = [ 1 n k = 1 K i = 1 n δ i η i , k ( δ i Z i ω ( U i , X i , Z i ) ) ] T . According to Lemma 2, the minimum solution β ^ * of L n ( β * ) can be expressed as
β ^ * = 1 c 1 C n * + o p ( 1 ) .
By using the Cramér–Wold theorem and the central limit theorem, it is easy to obtain
V a r ( C n * ) k = 1 K k = 1 K τ k k E ( π ( U , X , Z ) ) E { [ δ Z ω ( U , X , Z ) ] [ δ Z ω ( U , X , Z ) ] T } .
Further, by the Lindeberg–Feller central limit theorem, we can obtain
n ( β ^ β ) D N ( 0 , c 2 1 k = 1 K k = 1 K τ k k Δ 1 ) .
Proof of Theorem 2.
From Lemma 3, we can obtain
n h ( c ˜ 1 c 1 , , c ˜ K c K , ( a ^ α ( u ) ) T ) = π ( u ) 1 f U 1 ( u ) D 2 1 ( u ) W n 1 ( u ) + o p ( 1 ) ,
where D 2 ( u ) = E C c X T X c T c X X T | U = u , W n 1 ( u ) = 1 n h k = 1 K i = 1 n δ i K i ( u ) η i , k * ( u ) ( e k T , X i T ) T . It is easy to obtain
( D 2 1 ( u ) ) 22 = ( D 22 ( u ) D 21 ( u ) D 11 1 ( u ) D 12 ( u ) ) 1 = k = 1 K f ( c k ) 1 1 1 ,
( D 2 1 ( u ) ) 21 = ( D 2 1 ( u ) ) 22 D 21 ( u ) D 11 1 ( u ) = 1 K T k = 1 K f ( c k ) 1 1 1 E [ X | U = u ] T .
Then, we have
n h ( a ^ α ( u ) ) = 1 π ( u ) f U ( u ) k = 1 K i = 1 n δ i K i ( u ) η i , k * ( u ) ( k = 1 K f ( c k ) 1 1 1 E [ X | U = u ] + k = 1 K f ( c k ) 1 1 1 X i ) + o p ( 1 ) = 1 1 π ( u ) f U ( u ) k = 1 K f ( c k ) k = 1 K i = 1 n δ i K i ( u ) η i , k * ( u ) ( X i E ( X | U = u ) ) + o p ( 1 ) .
Further,
E [ n h ( a ^ α ( u ) ) ] = 1 1 π ( u ) f U ( u ) k = 1 K f ( c k ) k = 1 K i = 1 n K i ( u ) π ( u ) [ F ( c k r i ( u ) ) F ( c k ) ] [ X i E ( X | U = u ) ] = 1 1 f U ( u ) i = 1 n K i ( u ) [ X i E ( X | U = u ) ] r i ( u ) ( 1 + o p ( 1 ) ) = 1 2 f U ( u ) i = 1 n K i ( u ) α ( u ) ( U i u ) 2 ( 1 + o p ( 1 ) ) = 1 2 α ( u ) μ 2 h 2 + o p ( 1 ) .
Thus, we can obtain
C o v ( a ^ | U = u ) = ( k = 1 K k = 1 K min ( τ k , τ k ) τ k τ k ) E ( π ( u ) ) v 0 f U ( u ) 1 1 .
Consequently, we complete the proof of Theorem 2. □
Proof of Theorem 3.
Based on the definition of η ˜ i ( β ) and using the same proof as in Owen, we obtain
max 1 i n η ˜ i ( β ) = o p ( n 1 2 ) ,
and
λ = O p ( n 1 2 ) .
Invoking (23) and (24) and using theories similar to those of Xue and Zhu [27], we can obtain
R ( β ) = { 1 n i = 1 n η ˜ i ( β ) } T Λ ^ 1 { 1 n i = 1 n η ˜ i ( β ) } + o p ( 1 ) ,
where Λ ^ = n 1 i = 1 n η ˜ i ( β ) η ˜ i T ( β ) . Combining Lemma 4 and Lemma 5, we complete the proof of this theorem. □

5. Conclusions

In this paper, we discussed the statistical inference for the partially linear varying coefficient composite quantile regression model with missing data. By introducing a two-stage estimation process, we effectively solved the challenge introduced by random missing responses and proved the asymptotic property of the obtained estimator under mild and regular conditions. It is worth noting that this paper contributes a new empirical log-likelihood ratio statistic based on imputation and determines that it follows the standard Chi-square distribution. The feasibility and performance of the proposed method were verified with a comprehensive evaluation through a simulation study and applications using real data.
However, for more complex problems with missing data, there is still a lot of work to be done. For example, some new imputation methods should be proposed to solve these problems. On the other hand, in future work, we will use bootstrap methods to conduct simulation studies for varying coefficient functions in terms of coverage probability and average length [28].

Author Contributions

Methodology, S.L. and Y.Y.; software, Y.Y.; formal analysis, C.-y.Z.; investigation, S.L.; writing-original draft, Y.Y.; writing-review and editing, S.L. and C.-y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 12271420) and the Natural Science Foundation of Shaanxi Province of China (No. 2024JC-YBMS-007).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors are grateful to the all reviewers for the constructive comments and suggestions that led to significant improvements to the original manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, W.; Lee, S.; Song, X. Local polynomial fitting in semi-varying coefficient models. J. Multivar. Anal. 2002, 82, 166–188. [Google Scholar] [CrossRef]
  2. Zhou, X.; You, J.H. Wavelet estimation in varying coefficient partially linear regression models. Stat. Probablity Lett. 2004, 68, 91–104. [Google Scholar] [CrossRef]
  3. Fan, J.Q.; Huang, T. Profile likelihood inferences on semiparametric varying coefficient partially linear models. Bernoulli 2005, 11, 1031–1057. [Google Scholar] [CrossRef]
  4. Koenker, R.; Gilbert, B., Jr. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  5. Zou, H.; Yuan, M. Composite quantile regression and the oracle model selection theory. Ann. Stat. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
  6. Kai, B.; Li, R.; Zou, H. New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann. Stat. 2011, 39, 305–332. [Google Scholar] [CrossRef] [PubMed]
  7. Jiang, Z.; Huang, Z.; Zhang, J. Functional single-index composite quantile regression. Metrika 2022, 86, 595–603. [Google Scholar] [CrossRef]
  8. Song, Y.; Li, Z.; Fang, M. Robust variable selection based on penalized composite quantile regression for high-dimensional single-index models. Mathematics 2022, 10, 2000. [Google Scholar] [CrossRef]
  9. Rubin, D. Multiple Imputations for Nonresponse in Surveys; John Wiley & Sons Inc.: New York, NY, USA, 1987. [Google Scholar]
  10. Lipsitz, S.R.; Zhao, L.P.; Molenberghs, G. A semiparametric method of multiple imputation. J. R. Stat. Soc. Ser. B 1998, 60, 127–144. [Google Scholar] [CrossRef]
  11. Aerts, M.; Claeskens, G.; Hens, N.; Molenberghs, G. Local multiple imputation. Biometrika 2002, 89, 375–388. [Google Scholar] [CrossRef]
  12. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  13. Ibrahim, J. Incomplete data in generalized linear models. J. Am. Stat. Assoc. 1990, 85, 765–769. [Google Scholar] [CrossRef]
  14. Horvitz, D.G.; Thompson, D.J. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 1952, 47, 663–685. [Google Scholar] [CrossRef]
  15. Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 1994, 89, 846–866. [Google Scholar] [CrossRef]
  16. Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 1995, 90, 106–121. [Google Scholar] [CrossRef]
  17. Jin, J.; Ma, T.F.; Dai, J.J. Penalized weighted composite quantile regression for partially linear varying coefficient models with missing covariates. Comput. Stat. 2020, 1, 1–35. [Google Scholar] [CrossRef]
  18. Owen, A. Empirical likelihood ratio confidence intervals for a single function. Biometrika 1988, 75, 237–249. [Google Scholar] [CrossRef]
  19. Owen, A. Empirical likelihood ratio confidence regions. Ann. Stat. 1990, 18, 90–120. [Google Scholar] [CrossRef]
  20. Zhao, P.X.; Xue, L.G. Empirical likelihood inferences for semiparametric varying coefficient partially linear models with missing responses at random. Commun. Stat.-Theory Methods 2010, 27, 771–780. [Google Scholar]
  21. Wang, H.J.; Zhu, Z. Empirical likelihood for quantile regression models with longitudinal data. J. Stat. Plan. Inference 2011, 141, 1603–1615. [Google Scholar] [CrossRef]
  22. Yan, Y.X.; Luo, S.H.; Zhang, C.Y. Statistical inference for partially linear varying coefficient quantile models with missing responses. Symmetry 2022, 14, 2258. [Google Scholar] [CrossRef]
  23. Xue, L.G. Two-stage estimation and bias-corrected empirical likelihood in a partially linear single-index varying-coefficient model. Stat. Methodol. 2023, 85, 1299–1325. [Google Scholar] [CrossRef]
  24. Zhao, P.X.; Tang, X.R. Imputation based statistical inference for partially linear quantile regression models with missing responses. Metrika 2016, 79, 991–1009. [Google Scholar] [CrossRef]
  25. Hjort, N.; Pollard, D. Asymptotics for minimizers of convex processes. arXiv 2011, arXiv:1107.3806. [Google Scholar]
  26. Knight, K. Limiting distributions for l1 regression estimators under general conditions. Ann. Stat. 1998, 26, 755–770. [Google Scholar] [CrossRef]
  27. Xue, L.G.; Zhu, L.X. Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika 2007, 94, 921–937. [Google Scholar] [CrossRef]
  28. Xue, L.G.; Zhu, L.X. Empirical likelihood in a partially linear single-index model with censored response data. Comput. Stat. Data Anal. 2024, 193, 107912. [Google Scholar] [CrossRef]
Figure 1. Estimation curve (dashed line) and true curve (solid line) of α ( u ) .
Figure 1. Estimation curve (dashed line) and true curve (solid line) of α ( u ) .
Mathematics 12 00578 g001
Figure 2. Estimation curve of α ( · ) .
Figure 2. Estimation curve of α ( · ) .
Mathematics 12 00578 g002
Table 1. The bias, SD, and MSE for β with three missing probabilities.
Table 1. The bias, SD, and MSE for β with three missing probabilities.
π nBiasSDMSE
π 1 n = 100 −0.01270.09560.0093
n = 300 −0.00040.04740.0022
n = 1000 −0.00020.02470.0010
π 2 n = 100 −0.01020.25590.0655
n = 300 −0.00440.11870.0142
n = 1000 −0.00270.08270.0101
π 3 n = 100 0.02120.18930.0363
n = 300 0.00300.07640.0058
n = 1000 0.00190.04020.0039
Table 2. Confidence intervals and coverage probabilities for β under two different methods.
Table 2. Confidence intervals and coverage probabilities for β under two different methods.
IELCEL
n π LOWUPLENCPLOWUPLENCP
n = 100 π 1 2.84693.14350.29660.9462.70123.32730.62610.931
π 2 2.84953.15260.30310.9402.67943.33980.66040.928
π 3 2.84303.14980.30680.9322.65583.34240.68660.920
n = 300 π 1 2.91373.07910.16540.9522.87653.11770.24120.942
π 2 2.91663.08330.16660.9482.84963.21210.36250.940
π 3 2.91193.07850.16650.9402.82703.22150.39450.937
n = 1000 π 1 2.97313.05150.07840.9572.96263.06170.09910.952
π 2 2.97053.06890.09840.9562.91863.10440.18580.950
π 3 2.96983.09490.12510.9542.90673.13280.22610.949
Table 3. The results and confidence intervals for β under the CQR5 method and CQR9 method.
Table 3. The results and confidence intervals for β under the CQR5 method and CQR9 method.
MethodEstimatorConfidence Intervals
C Q R 5 0.6164(0.1436, 0.8375)
C Q R 9 0.4297(0.2547, 0.7155)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, S.; Yan, Y.; Zhang, C.-y. Two-Stage Estimation of Partially Linear Varying Coefficient Quantile Regression Model with Missing Data. Mathematics 2024, 12, 578. https://doi.org/10.3390/math12040578

AMA Style

Luo S, Yan Y, Zhang C-y. Two-Stage Estimation of Partially Linear Varying Coefficient Quantile Regression Model with Missing Data. Mathematics. 2024; 12(4):578. https://doi.org/10.3390/math12040578

Chicago/Turabian Style

Luo, Shuanghua, Yuxin Yan, and Cheng-yi Zhang. 2024. "Two-Stage Estimation of Partially Linear Varying Coefficient Quantile Regression Model with Missing Data" Mathematics 12, no. 4: 578. https://doi.org/10.3390/math12040578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop