Next Article in Journal
MHD Pulsatile Flow of Blood-Based Silver and Gold Nanoparticles between Two Concentric Cylinders
Previous Article in Journal
On Some Generalizations of Integral Inequalities in n Independent Variables and Their Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical Inference for Partially Linear Varying Coefficient Quantile Models with Missing Responses

1
School of Science, Xi’an Polytechnic University, Xi’an 710048, China
2
School of Economics and Finance, Xi’an Jiaotong University, Xi’an 710061, China
*
Authors to whom correspondence should be addressed.
Symmetry 2022, 14(11), 2258; https://doi.org/10.3390/sym14112258
Submission received: 25 September 2022 / Revised: 16 October 2022 / Accepted: 20 October 2022 / Published: 27 October 2022

Abstract

:
The construction of confidence intervals is investigated for the partially linear varying coefficient quantile model with missing random responses. Combined with quantile regression, an imputation-based empirical likelihood method is proposed to construct confidence intervals for parametric and varying coefficient components. Then, it is proved that the proposed empirical log-likelihood ratios are asymptotically Chi-square in theory. Finally, the symmetry confidence intervals of the parametric components and the point-by-point confidence intervals of the varying coefficient components are constructed in the simulation studies to demonstrate further that the proposed method yields smaller confidence intervals and higher coverage probabilities.

1. Introduction

The partially linear varying coefficient model, originally proposed by [1], is a very important semi-parametric model, and it has not only the flexibility of the semi-parametric model, but also the easy interpretation of the parametric model. In recent years, the model has been studied by many scholars. Zhou and You [2] combined wavelet and least squares to estimate the parameter part and the varying coefficient part of the model. Zhang et al. [3] proposed a method for estimating the parameter part and the varying coefficient part of the model, and derived the asymptotic conditional bias and variance to give some idea about the mean square error of the estimator. Xia et al. [4] investigated variable selection for semi-parametric varying coefficient partially linear models with random missing responses. Then, they presented a bias-corrected variable selection procedure and established the oracle property of the regularized estimators.
On the other hand, the quantile regression (QR) model, proposed by [5], has been extensively applied in environmental monitoring, population census, biomedicine, etc. One significant reason is that compared to the mean model, the effects of the covariates at different quantiles is able to be estimated directly by quantile regression such that the quantile regression estimator has an important role in characterizing the entire conditional distribution of a dependent variable, given regressors and the robustness property to outlier observations. Cai and Xiao [6] studied quantile regression under the dynamics of this model, and pointed out that its coefficient values are functions of covariates, and the estimation of parametric and nonparametric function coefficients is recommended. Jin et al. [7] studied partially linear varying coefficient models with missing covariates. Then, they proposed a weighted B-spline composite quantile regression method based on inverse probability-weighting and B-spline approximations to estimate the non-parametric function and the regression coefficients.
Despite significant advances in QR theory and its applications, QR analysis has received little attention when data samples contain missing values. In many practical problems, such as clinical trials and medical tracking trials, it is easy to generate a large number of missing data due to various human or other unknown factors. Considering this problem, there are some methods, such as the complete-case (CC) analysis method [8], imputation method [9,10,11], inverse probability weighted method (IPW) [12,13,14] and likelihood-based method [15] to handle the missing data problem. Among these methods, the imputation method is the most popular and effective method for managing missing data under missing at random (MAR). In this paper, the partially linear varying coefficient quantile regression model with missing response variables was studied, propose an imputation-based empirical likelihood inference method, which can full utilize the information of the non-missing data in the observation data with missing data at a specific time point.
The empirical likelihood method has been widely researched in recent years. Owen [16] firstly proposed the empirical likelihood method to deal with nonparametric statistical problems. Under certain regular conditions, the estimators obtained by this method have good statistical properties. Thus, this method has attracted the interest of many statisticians and is applied in various statistical fields. You and Zhou [17] investigated the empirical likelihood inference of the parameter component in the partially linear varying coefficient model and obtained some results. Huang and Zhang [18] considered the statistical inference for nonparametric component in the partially linear varying coefficient model and showed that the proposed method can obtain more desirable coverage probabilities and average areas of confidence regions. Chen [19] investigated the empirical likelihood estimator based on the imputation method of missing value using quantile regression, and showed that the proposed method has competitive advantages over some of the most widely used parametric and non-parametric imputation estimators. Wang and Rao [20] constructed empirical likelihood confidence intervals of the mean of the response variable for linear models and nonparametric regression models under random design and missing data. Their results show that the adjusted empirical likelihood method performs competitively and the use of auxiliary information improves inferences.
The rest of this paper is organized as follows. In Section 2, the confidence interval construction method based on imputation empirical likelihood is presented for the parametric component, and the varying coefficient component and some asymptotic properties of the proposed empirical log-likelihood ratio are investigated. In Section 3, some simulation studies are conducted to assess the performances of the proposed method. In addition, the proofs of the main results are given in Section 4.

2. Quantile Regression Estimates for Partially Linear Varying Coefficient Model

We consider the partially linear varying coefficient regression model:
Y = X T α ( U ) + Z T β + ε ,
where Y is a response data, X is a p-dimensional covariate vector, α ( · ) = ( α 1 ( · ) , α 2 ( · ) , , α p ( · ) ) T is an unknown smooth coefficient function vector, β = ( β 1 , β 2 , , β q ) T and Z are q-dimensional vectors, and U is a covariate; in order to avoid the curse of dimensional, without loss of generality, it is assumed to be the unit interval [ 0 , 1 ] . In addition, ε is the model error with P ( ε 0 X , U , Z ) = τ , and ε and ( X , U , Z ) are independent of each other. Suppose Y i , X i , U i , Z i are random samples, then we have
Y i = X i T α ( U i ) + Z i T β + ε i , i = 1 , 2 , , n .
Throughout this paper, an indicator variable δ i such that δ i = 1 means that Y i is observed and δ i = 0 indicates that Y i is missing. We assume that the data missing mechanism follows
P ( δ i = 1 Y i , X i , U i , Z i ) = P ( δ i = 1 X i , U i , Z i ) .
Due to P ( ε i 0 X i , U i , Z i ) = τ , we have
E τ I ( Y i X i T α ( U i ) Z i T β 0 ) X i , U i , Z i = 0 ,
where I ( · ) is an indicator function. Then, we approximate α ( U i ) by means of basis functions. Generally, let W ( U i ) = ( B 1 ( U i ) , B 2 ( U i ) , , B p ( U i ) ) T be B-spline basis functions with the order of M, where p = K + M + 1 , and K denotes the number of interior knots. Then, α ( U i ) can be approximated by
α ( U i ) = W ( U i ) T γ ,
where γ = ( γ 1 , γ 2 , , γ p ) T is a vector of basis function coefficients. Further, the quantile regression estimation of β and α ( U i ) under complete data can be obtained by solving the following equation:
i = 1 n δ i ρ τ ( Y i X i T W ( U i ) T γ Z i T β ) = 0 ,
where ρ τ ( v ) = v ( τ I ( v < 0 ) ) is the quantile loss function. Then, the model (6) can be written as
i = 1 n δ i ρ τ ( Y i Q i T γ Z i T β ) = 0 ,
where Q i = W ( U i ) · X i . Derive the model (7) with respect to the parameters to obtain
i = 1 n δ i ( Q i T , Z i T ) T [ τ I ( Y i Q i T γ Z i T β 0 ) ] = 0 .
Let β ^ c and γ ^ c be the solution of (8), then β ^ c is the estimator of β , and the estimator of α ( u ) can be given by α ^ c ( u ) = W ( u ) T γ ^ c . Use X i T α ^ c ( U i ) + Z i T β ^ c to estimate the missing Y i , and give an imputation for the responses as
Y * = δ i Y i ( 1 δ i ) [ X i T α ^ c ( U i ) + Z i T β ^ c ] .

2.1. The Imputation Empirical Likelihood for β

In order to construct a confidence interval for β based on the empirical likelihood method, we define the following imputation-based auxiliary random vector
η i ( β ) = Z i [ τ I ( Y * X i T α ^ c ( U i ) Z i T β 0 ) ] .
The empirical log-likelihood ratio for β is as follows:
R ( β ) = 2 m a x i = 1 n l o g ( n p i ) | p i 0 , i = 1 n p i = 1 , i = 1 n p i η i ( β ) = 0 .
If zero is inside the convex hull of the point ( η 1 ( β ) , η 2 ( β ) , , η n ( β ) ) , a unique value for R ( β ) exists. By using the Lagrange multiplier method and some calculations, R ( β ) can be written as follows:
R ( β ) = 2 i = 1 n l o g { 1 + λ T η i ( β ) } ,
where λ is a Lagrange multiplier, which satisfies
i = 1 n η i ( β ) 1 + λ T η i ( β ) = 0 .
Under some regularity conditions, we can show that R ( β ) is the asymptotically Chi-square distribution with q degrees of freedom when β is the true parameter.

2.2. The Imputation Empirical Likelihood for α ( u )

In this section, we construct the confidence interval for α ( u ) based on the empirical likelihood method. Due to P ( ε i 0 X i , U i , Z i ) = τ , it is easy to prove that
E { τ I ( Y i X i T α ( U i ) Z i T β 0 ) U i = u } f ( u ) = 0 ,
where f ( u ) is the density function of U i . Then we construct the empirical log-likelihood ratio for α ( u ) according to a similar method, and we use the following imputation-based auxiliary random vector
ψ i ( α ( u ) ) = X i [ τ I ( Y i * X i T α ( u ) Z i T β ^ c 0 ) ] K h ( u U i ) ,
where β ^ c is given by (8), K h ( · ) = h 1 K ( · / h ) and K ( · ) is a kernel function. However, using the existing conclusions, we can prove that ψ i ( α ( u ) ) is not asymptotic Chi-squared. To solve this problem, we use some undersmoothing technologies.
We propose a bias correction for ψ i ( α ( u ) ) as follows:
ψ i ^ ( α ( u ) ) = X i [ τ I ( Y i * Z i T β ^ c X i T α ( u ) X i T r ^ ( u ) ) ] K h ( u U i ) ,
where r ^ ( u ) = α ^ c ( U i ) α ^ c ( u ) , α ^ c ( · ) and β ^ c are obtained by (8). In what follows, we define a bias correction based empirical log-likelihood ratio function for α ( u ) as follows:
l ^ ( α ( u ) ) = 2 m a x i = 1 n l o g ( n p i ) | p i 0 , i = 1 n p i = 1 , i = 1 n p i ψ i ^ ( α ( u ) ) = 0 .

2.3. Asymptotic Properties of Estimators

To prove the asymptotic properties, we suppose that the following regularity conditions hold. For convenience and brevity, let c denote a positive constant, where this constant represents different values on different occasions.
C1. The function α ( u ) has continuous rth derivatives, where r 2 .
C2. Let f ( · | X , U , Z ) be the conditional density function of ε given X , U and Z. Then f ( · | X , U , Z ) has continuous and uniformly bounded first-order and second-order derivatives.
C3. If c 1 , , c K are the interior knots of [0, 1], there exists a constant C 0 such that
m a x { c i c i 1 } m i n { c i c i 1 } C 0 , m a x { | c i c i 1 | } = o ( K 1 ) ,
where c 0 = 0 and c K + 1 = 1 .
C4. Suppose π ( x , u , z ) = E ( δ i | X i = x , U i = u , Z i = z ) , then π ( x , u , z ) have bounded partial derivatives up to the order κ , where κ 2 . We let π ( x , u , z ) > 0 for all x , u and z.
C5. Assuming E ( X 2 ) < holds.
C6. We assume the matrix E ( π ( X , U , Z ) Z Z T ) is a non-singular and finite matrix, where T represents the transpose of the matrix.
Theorem 1.
Suppose that conditions C1–C6 hold, and the number of knots satisfies K = O ( n 1 2 r + 1 ) . Then
(1) α ^ c ( u ) α ( u ) = O p ( n 2 r 2 r + 1 ) ,
(2) 1 n i = 1 n η i ( β ) L N ( 0 , Λ ) ,
where L means convergence in distribution, Λ = τ ( 1 τ ) E { π ( X , U , Z ) Z Z T } .
Theorem 2.
Suppose that conditions C1–C6 hold. If β is the true value of the parameter, then
R ( β ) L χ q 2 ,
where L means convergence in distribution, and χ q 2 is the Chi-square distribution with q degrees of freedom.
Based on the results, we can construct a confidence region for β . For a given α with 0 < α < 1 , let c α satisfy P ( χ q 2 c α ) = 1 α . Then the approximate 1 α confidence region for β can be defined as
C α ( β ) = { β | R ( β ) c α } .
Theorem 3.
Suppose that conditions C1–C6 hold. If α ( u ) is the true value of the parameter of u, then
l ^ ( α ( u ) ) L χ p 2 ,
where L means convergence in distribution, and χ p 2 is the Chi-square distribution with p degrees of freedom.
Based on Theorem 3, we can obtain an approximate 1 α confidence interval for α ( u ) . For a given α with 0 < α < 1 , let c α satisfy P ( χ p 2 c α ) = 1 α . Then the approximate 1 α confidence region for α ( u ) can be defined as
C α ( α ( u ) ) = { α ( u ) | l ^ ( α ( u ) ) c α } .
The proof of the theorem relies on the lemmas of the Section 4.

3. Simulation Studies

To demonstrate the finite sample performance of the proposed method, we consider the following partially linear varying coefficient model:
Y i = Z i β + X i ( 2.2 U i 2 ) + ε i ,
where β = 2 , X i N ( 0 , 1 ) , U i U ( 0 , 1 ) , Z i N ( 0 , 1 ) . The response Y i is generated according to the model. The model error ε i is generated according to ε i = e i b τ , where e i follows the Chi-square distribution with one degree of freedom, and b τ is the τ th quantile of e i . It is easy to obtain P ( ε i 0 ) = τ . In the following simulation, we take τ = 0.5 . Consider the following two cases of selection probability Ω ( x , u , z ) = P ( δ = 1 X i = x , U i = u , Z i = z ) : (1) Ω 1 ( x , u , z ) = 0.5 + 0.2 u + 0.1 x + 0.1 z ; (2) Ω 2 ( x , u , z ) = e 0.2 u + 0.5 x + 0.1 z / ( 0.6 + e 0.2 u + 0.5 x + 0.1 z ) . Then, the missing rates corresponding to the two scenarios are approximately 0.1 and 0.3, respectively. In the following simulation, we take h = c n 1 / 5 , where c is chosen as the standard deviation of U i . The number of interior knots K is used in (5) and the bandwidth h is used in (14). Further, K is estimated by minimizing the cross-validation score:
C V ( K ) = i = 1 n δ i ρ τ ( Y i X i T α ^ [ i ] ( U i ) Z i T β ^ [ i ] ) ,
where ρ τ ( u ) = u ( τ I ( u < 0 ) ) is the quantile loss function, and α ^ [ i ] ( U i ) and β ^ [ i ] are the estimators of α ( u ) and β , which are obtained by (8) after deleting the ith subject.
We compared the following three methods to evaluate the performance of the proposed statistical inference method: the imputation-based empirical likelihood method (IEL) proposed by this paper, the complete data-based empirical likelihood method (CEL), and the full data set (i.e., no missing data)–based empirical likelihood method (FEL). In this simulation, the sample size is set to be 100, 500 and 1000, and we take 500 simulation runs for each case. Based on the run results, the averages of the confidence intervals for the parametric component β are summarized in Table 1, and the corresponding coverage probabilities are summarized in Table 2.
From Table 1 and Table 2, we can draw the following conclusions:
(1) The IEL method is better than the CEL method, because the IEL method yields a smaller confidence interval and higher coverage probabilities.
(2) With the same missing rate, as the sample size increases, the confidence intervals for both the IEL method and the CEL method become smaller, but the confidence interval for the IEL method is always smaller than the confidence interval for the CEL method.
(3) As n increases, the performance of the IEL method becomes closer and closer to that of the FEL estimation process. These results suggest that the proposed IEL process can weaken the effect of the missing rate compared to the CEL method.
For the varying coefficient part, we compare the IEL method with the CEL method. Here, we compare the case of n = 500 with the case of n = 1000 . Because the situations of n = 100 and n = 500 have not much difference between them, we will not show them here. Figure 1 and Figure 2 summarize the finite sample performances of the IEL and CEL methods for the varying coefficient part under different levels of missing rates and different sample sizes. The figure of (a) shows the averages of pointwise confidence intervals with 500 simulation runs under the first missing rate Ω 1 , and the figure of (b) shows the averages of pointwise confidence intervals with 500 simulation runs under the second missing rate Ω 2 .
For the varying coefficient part, the following can be seen from the figures:
(1) With the same missing rate, the IEL method is better than the CEL method because the IEL method gives a smaller interval length.
(2) As the missing rate increases, the pointwise confidence intervals for both the IEL method and the CEL method become larger, but the confidence interval for the IEL method is always smaller than that for the CEL method.
(3) As the sample size increases, the pointwise confidence intervals for both the IEL method and the CEL method become smaller. The pointwise confidence interval for the IEL method is always smaller than the pointwise confidence interval for the CEL method.

4. Proofs of Theorems

Lemma 1.
Suppose that conditions C1–C6 hold, and the number of knots satisfies K = O ( n 1 2 r + 1 ) . We can obtain
β ^ c β = O p ( n r 2 r + 1 ) .
Proof of Lemma 1.
Let κ = n r 2 r + 1 , β ˜ = β + κ M 1 , γ ˜ = γ + κ M 2 , M = ( M 1 T , M 2 T ) T . For any given ε > 0 , there exists a constant C, such that
P inf M = C ( β T β ˜ T , γ T γ ˜ T ) i = 1 n η i ˜ ( β ˜ , γ ˜ ) > 0 1 ε ,
where η i ˜ ( β ˜ , γ ˜ ) = δ i ( Q i T , Z i T ) T [ τ I ( Y i Q i T γ ˜ Z i T β ˜ 0 ) ] . It is obvious that (17) implies with probability of at least 1 ε that there exists a local β ^ c such that β ^ c β = O p ( κ ) .
Based on the definition of η i ˜ ( β ˜ , γ ˜ ) , below we give a calculation
η i ˜ ( β ˜ , γ ˜ ) = δ i ( Q i T , Z i T ) T [ τ I ( Y i Q i T γ ˜ Z i T β ˜ 0 ) ] = δ i ( Q i T , Z i T ) T [ τ I ( X i T α ( U i ) + Z i T β + ε i Q i T γ ˜ Z i T β ˜ 0 ) ] = δ i ( Q i T , Z i T ) T [ τ I ( ε i + Z i T ( β β ˜ ) + Q i T ( γ γ ˜ ) + R ( U i ) 0 ) ] = δ i f ( 0 X i , U i , Z i ) ( Q i T , Z i T ) T [ Z i T ( β β ˜ ) + Q i T ( γ γ ˜ ) + R ( U i ) ] + O p ( β β ˜ 2 ) + O p ( γ γ ˜ 2 ) ,
where R ( U i ) = X i T α ( U i ) Q i T γ . Suppose Δ ( β ˜ , γ ˜ ) = K 1 ( β T β ˜ T , γ T γ ˜ T ) i = 1 n η i ˜ ( β ˜ , γ ˜ ) . Then we have
Δ ( β ˜ , γ ˜ ) = κ K ( M 1 T , M 2 T ) i = 1 n δ i f ( 0 X i , U i , Z i ) ( Q i T , Z i T ) T [ Z i T ( κ M 1 ) + Q i T ( κ M 2 ) ] + κ K ( M 1 T , M 2 T ) i = 1 n δ i f ( 0 X i , U i , Z i ) ( Q i T , Z i T ) T R ( U i ) + O p ( n K 1 K 2 ) = κ 2 K i = 1 n δ i f ( 0 X i , U i , Z i ) ( Z i T M 1 + Q i T M 2 ) 2 + κ K i = 1 n δ i f ( 0 X i , U i , Z i ) ( Z i T M 1 + Q i T M 2 ) R ( U i ) + O p ( 1 ) = I 1 + I 2 .
From conditions C1, C3 and Corollary 6.21 in [21], we get that R ( U i ) = O ( K r ) . Combining Conditions C2 and C5, we obtain I 1 = O p ( κ 2 n K 1 ) M 2 = O p ( M 2 ) , I 2 = O p ( κ n K 1 r ) M = O p ( M ) . Therefore, by choosing a sufficiently large C, I 1 dominates I 2 uniformly in M = C . This means that for any given ε > 0 , if we choose C large enough, we obtain
P inf M = C Δ ( β ˜ , γ ˜ ) > 0 1 ε .
According to (17), there exists a local minimum β ^ c such that β ^ c β = O p ( κ ) with probability at least 1 ε . The proof of Lemma 1 is completed. □
Proof of Theorem 1.
Note that
α ^ c ( u ) α ( u ) 2 = 0 1 { α ^ c ( u ) α ( u ) } 2 d u = 0 1 { W T ( u ) γ ^ c W T ( u ) γ } d u 2 0 1 { W T ( u ) γ ^ c W T ( u ) γ } 2 d u = 2 ( γ ^ c γ ) T H ( γ ^ c γ ) ,
where H = 0 1 W ( u ) W T ( u ) d u . It is easy to obtain that H = O p ( 1 ) . We can obtain γ ^ c γ = O p ( n r 2 r + 1 ) with the same result as in the proof of Lemma 1. Thus, we have
( γ ^ c γ ) T H ( γ ^ c γ ) = O p ( n 2 r 2 r + 1 ) .
Invoking (19) and (20), we complete the proof of Theorem 1(1).
Next, we prove Theorem 1(2). Let Y i ^ = X i T α ^ c ( U i ) + Z i T β ^ c , then we can obtain
1 n i = 1 n η i ( β ) = 1 n i = 1 n Z i [ τ I ( Y i * X i T α ^ c ( U i ) Z i T β 0 ) ] = 1 n i = 1 n δ i Z i [ τ I ( Y i X i T α ^ c ( U i ) Z i T β 0 ) ] + 1 n i = 1 n ( 1 δ i ) Z i [ τ I ( Y i ^ X i T α ^ c ( U i ) Z i T β 0 ) ] = 1 n i = 1 n A i 1 + 1 n i = 1 n A i 2 .
Note that
A i 1 = δ i Z i [ τ I ( Y i X i T α ^ c ( U i ) Z i T β 0 ) ] = δ i Z i [ τ I ( X i T α ( U i ) X i T α ^ c ( U i ) + ε i 0 ) ] = δ i Z i [ τ I ( ε i 0 ) ] + δ i Z i [ I ( ε i 0 ) I ( ε i + X i T α ( U i ) X i T α ^ c ( U i ) 0 ) ] = δ i Z i [ τ I ( ε i 0 ) ] + δ i Z i f ( 0 X i , U i , Z i ) [ X i T α ( U i ) X i T α ^ c ( U i ) ] + O p ( α ^ ( U i ) α ^ c ( U i ) 2 ) .
Thus, we have
1 n i = 1 n A i 1 = 1 n i = 1 n δ i Z i [ τ I ( ε i 0 ) ] + 1 n i = 1 n δ i Z i f ( 0 X i , U i , Z i ) [ X i T α ( U i ) X i T α ^ c ( U i ) ] + o p ( 1 ) = B 1 + B 2 .
Moreover, we can obtain E ( B 1 ) = 0 and V a r ( B 1 ) = τ ( 1 τ ) E { π ( X , U , Z ) Z Z T } = Λ . Then, using the central limit theorem, we can obtain
B 1 L N ( 0 , Λ ) .
Next, we prove B 2 = o p ( 1 ) . Let B 2 , j be the jth component of B 2 , Z i j be the jth component of Z i . Note that Z i is the centered covariate. Through Lemma A.2 in [22], we obtain
max 1 s n i = 1 s Z i j = O p ( n l o g n ) .
In addition, by Theorem 1(1), we obtain
α ^ c ( u ) α ( u ) = O p ( n 2 r 2 r + 1 ) .
Invoking (24) and (25), and using Abel’s inequality, it is easy to obtain that
B 2 , j 1 n i = 1 n δ i Z i j f ( 0 X i , U i , Z i ) [ α ( U i ) α ^ c ( U i ) ] 1 n max 1 i n δ i f ( 0 X i , U i , Z i ) [ α ( U i ) α ^ c ( U i ) ] max 1 s n i = 1 s Z i j = O p ( n 1 2 · n 2 r 2 r + 1 · n 1 2 · l o g n ) = o p ( 1 ) .
Thus, we can get B 2 = o p ( 1 ) . Furthermore, combining (22) and (23), we obtain
1 n i = 1 n A i 1 L N ( 0 , Λ ) .
In the following we will prove n 1 2 i = 1 n A i 2 = o p ( 1 ) . Below we give a simple calculation
A i 2 = ( 1 δ i ) Z i [ τ I ( Y i ^ X i T α ^ c ( U i ) Z i T β 0 ) ] = ( 1 δ i ) Z i [ τ I ( Z i T β ^ c Z i T β + ε i 0 ) ] = ( 1 δ i ) f ( 0 X i , U i , Z i ) Z i Z i T ( β ^ c β ) + O p ( β ^ c β 2 ) .
Let A i 2 , j be the jth component of A i 2 . By analogy to (25), we can obtain
1 n i = 1 n A i 2 , j = 1 n i = 1 n ( 1 δ i ) f ( 0 X i , U i , Z i ) Z i j Z i T ( β ^ c β ) + o p ( 1 ) 1 n max 1 i n ( 1 δ i ) f ( 0 X i , U i , Z i ) Z i T ( β ^ c β ) max 1 s n i = 1 s Z i j = O p ( n 1 2 · n r 2 r + 1 · n 1 2 · l o g n ) = o p ( 1 ) .
Thus, we can obtain
1 n i = 1 n A i 2 = o p ( 1 ) .
Then, by invoking (21), (27) and (28), the proof of Theorem 1(2) is completed. □
Proof of Theorem 2.
Based on the definition of η i ( β ) , and using the theories similar to [23], we obtain
max 1 i n η i ( β ) = o p ( n 1 2 ) ,
and
λ = O p ( n 1 2 ) .
Invoking (29) and (30), and using the same proof as the proof of Theorem 1 in [16], we obtain
R ( β ) = { 1 n i = 1 n η i ( β ) } T Λ ^ 1 { 1 n i = 1 n η i ( β ) } + o p ( 1 ) ,
where Λ ^ = n 1 i = 1 n η i ( β ) η i T ( β ) . Combining the proof of Theorem 1, and using the law of large numbers, we have
Λ ^ = 1 n i = 1 n η i ( β ) η i T ( β ) P Λ .
Then invoking (31), (32) and Theorem 1, we complete the proof of Theorem 2. □
Lemma 2.
Suppose that conditions C1–C6 hold, and the number of knots satisfies K = O ( n 1 2 r + 1 ) . We can obtain
1 n h i = 1 n ψ ^ i ( α ( u ) ) L N ( 0 , σ 2 ( u ) ) ,
where σ 2 ( u ) = τ ( 1 τ ) f ( u ) E ( π ( X , U , Z ) ) K 2 ( t ) d t .
Proof of Lemma 2.
Let Y ^ i = X i T α ^ c ( U i ) + Z i T β ^ c , r ^ ( u ) = α ^ c ( U i ) α ^ c ( u ) . It can be obtained by calculation
ψ ^ i ( α ( u ) ) = X i [ τ I ( Y i * Z i T β ^ c X i T α ( u ) X i T r ^ ( u ) 0 ) ] K h ( u U i ) = δ i X i [ τ I ( Y i Z i T β ^ c X i T α ( u ) X i T r ^ ( u ) 0 ) ] K h ( u U i ) + ( 1 δ i ) X i [ τ I ( Y ^ i Z i T β ^ c X i T α ( u ) X i T r ^ ( u ) 0 ) ] K h ( u U i ) = J i 1 + J i 2 .
Thus, we can have
1 n h i = 1 n ψ ^ i ( α ( u ) ) = 1 n h i = 1 n J i 1 + 1 n h i = 1 n J i 2 .
Moreover, we have the calculation as follows:
J i 1 = δ i X i [ τ I ( Z i T β + X i T α ( U i ) + ε i Z i T β ^ c X i T α ( u ) X i T r ^ ( u ) 0 ) ] K h ( u U i ) = δ i X i [ I ( ε i 0 ) I ( ε i + Z i T ( β β ^ c ) + X i T ( α ( U i ) α ( u ) ) X i T r ^ ( u ) 0 ) ] K h ( u U i ) + δ i X i [ τ I ( ε i 0 ) ] K h ( u U i ) = δ i X i [ τ I ( ε i 0 ) ] K h ( u U i ) + δ i f ( 0 X i , U i , Z i ) Z i T ( β β ^ c ) K h ( u U i ) + δ i X i f ( 0 X i , U i , Z i ) [ X i T ( α ( U i ) α ( u ) ) X i T r ^ ( u ) ] K h ( u U i ) .
Thus,
1 n h i = 1 n J i 1 = 1 n h i = 1 n δ i X i [ τ I ( ε i 0 ) ] K h ( u U i ) + 1 n h i = 1 n δ i X i f ( 0 X i , U i , Z i ) Z i T ( β β ^ c ) K h ( u U i ) + 1 n h i = 1 n δ i X i f ( 0 X i , U i , Z i ) [ X i T ( α ( U i ) α ( u ) ) X i T r ^ ( u ) ] K h ( u U i ) = D 1 + D 2 + D 3 .
We can obtain E ( D 1 ) = 0 and V a r ( D 1 ) = σ 2 ( u ) . Using the central limit theorem, we can obtain
D 1 L N ( 0 , σ 2 ( u ) ) .
Next we prove D 2 = o p ( 1 ) . According to the Theorem 1, and using the similar conclusion that used in the proof of Theorem 2 in [24], we obtain n ( β ^ c β ) = O p ( 1 ) . In addition, using condition C5 and condition C6, we can obtain
1 n h i = 1 n X i Z i T K h ( u U i ) = O p ( 1 ) .
Therefore,
D 2 = n 1 n h i = 1 n δ i X i f ( 0 X i , U i , Z i ) K h ( u U i ) Z i T { n ( β β ^ c ) } = o p ( 1 ) .
Next, we prove D 3 = o p ( 1 ) . Using the Taylor expansion to α ( U i ) α ( u ) and α ^ c ( U i ) α ^ c ( u ) at u, we can obtain
X i X i T [ α ( U i ) α ( u ) r ^ ( u ) ] = X i T [ α ( U i ) α ( u ) ( α ^ c ( U i ) α ^ c ( u ) ) ] = X i X i T [ ( α ( u ) α ^ c ( u ) ) ( U i u ) + o p ( U i u ) 2 ] .
By conditions C1–C6, we have
1 n h i = 1 n ( U i u ) l K h ( u U i ) = O p ( 1 ) , l = 1 , 2 .
( α ( u ) α ^ c ( u ) ) 0 .
Combine the (37), we get D 3 = o p ( 1 ) . Hence, from (34) to (36), we obtain
1 n h i = 1 n J i 1 L N ( 0 , σ 2 ( u ) ) .
Next, we prove 1 n h i = 1 n J i 2 = o p ( 1 ) . We have the calculation as follows:
J i 2 = ( 1 δ i ) X i [ τ I ( Y ^ i Z i T β ^ c X i T α ( u ) X i T r ^ ( u ) 0 ) ] K h ( u U i ) = ( 1 δ i ) X i [ τ I ( X i T α ^ c ( U i ) X i T α ( u ) X i T r ^ ( u ) 0 ) ] K h ( u U i ) = ( 1 δ i ) X i f ( 0 X i , U i , Z i ) X i T ( α ^ c ( u ) α ( u ) ) K h ( u U i ) .
Then by using similar method to (26) based on Abel’s inequality, we can obtain
| 1 n h i = 1 n J i 2 | = 1 n h i = 1 n | ( 1 δ i ) f ( 0 X i , U i , Z i ) X i X i T ( α ^ c ( u ) α ( u ) ) K h ( u U i ) | = n h max n α ^ c ( u ) α ( u ) | 1 n h i = 1 n ( 1 δ i ) f ( 0 X i , U i , Z i ) X i X i T K h ( u U i ) | = O p ( ( n h ) 1 2 · n 2 r 2 r + 1 ) = o p ( 1 ) .
By invoking (33), (38) and (39), this Lemma 2 is completed. □
Proof of Theorem 3.
Similar to the proof of Theorem 2, for given u, we have
l ^ ( α ( u ) ) = 1 n h ψ ^ i ( α ( u ) ) T v ^ ( α ( u ) ) 1 1 n h ψ ^ i ( α ( u ) ) + o p ( 1 ) ,
where v ^ ( α ( u ) ) = ( n h ) 1 i = 1 n ψ ^ i 2 ( α ( u ) ) . According to Lemma 2, we can get v ^ ( α ( u ) ) P σ 2 ( u ) . Further, combined with Lemma 2, we obtain l ^ ( α ( u ) ) L χ p 2 . □

Author Contributions

Methodology, S.L.; Writing—original draft, Y.Y.; Writing—review and editing, C.-y.Z. All the authors inferred the main conclusions and approved of the current version of this manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundations of China (No. 11601409), the Natural Science Foundation of Shaanxi Province of China (Nos. 2020JM571, 2021JM-002).

Data Availability Statement

The data presented in this paper are obtained through computer simulation.

Acknowledgments

The authors would like to thank the anonymous referees for their valuable comments and suggestions, which actually stimulated this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fan, J.Q.; Huang, T. Profile likelihood inferences on semiparametric varying coefficient partially linear models. Bernoulli 2005, 11, 1031–1057. [Google Scholar] [CrossRef]
  2. Zhou, X.; You, J.H. Wavelet estimation in varying coefficient partially linear regression models. Stat. Probablity Lett. 2004, 68, 91–104. [Google Scholar] [CrossRef]
  3. Zhang, W.; Lee, S.; Song, X. Local polynomial fitting in semivarying coefficient models. J. Multivar. Anal. 2002, 82, 166–188. [Google Scholar] [CrossRef] [Green Version]
  4. Xia, Y.F.; Qu, Y.R.; Sun, N.L. Variable selection for semiparametric varying coefficient partially linear model based on modal regression with missing data. Commun. Stat.-Theory Methods 2019, 48, 5121–5137. [Google Scholar] [CrossRef]
  5. Koenker, R.; Gilbert, B., Jr. Regression Quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  6. Cai, Z.W.; Xiao, Z.J. Semiparametric quantile regression estimation in dynamic models with partially varying coefficients. J. Econom. 2012, 167, 413–425. [Google Scholar] [CrossRef] [Green Version]
  7. Jin, J.; Ma, T.F.; Dai, J.J.; Liu, S.Z. Penalized weighted composite quantile regression for partially linear varying coefficient models with missing covariates. Comput. Stat. 2021, 36, 541–575. [Google Scholar] [CrossRef]
  8. Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  9. Rubin, D. Multiple Imputations for Nonresponse in Surveys; John Wiley & Sons Inc.: New York, NY, USA, 1987. [Google Scholar]
  10. Lipsitz, S.R.; Zhao, L.P.; Molenberghs, G. A semiparametric method of multiple imputation. J. R. Stat. Soc. Ser. B 1998, 60, 127–144. [Google Scholar] [CrossRef]
  11. Aerts, M.; Claeskens, G.; Hens, N.; Molenberghs, G. Local multiple imputation. Biometrika 2002, 89, 375–388. [Google Scholar] [CrossRef]
  12. Horvitz, D.G.; Thompson, D.J. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 1952, 47, 663–685. [Google Scholar] [CrossRef]
  13. Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 1994, 89, 846–866. [Google Scholar] [CrossRef]
  14. Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 1995, 90, 106–121. [Google Scholar] [CrossRef]
  15. Ibrahim, J. Incomplete data in generalized linear models. J. Am. Stat. Assoc. 1990, 85, 765–769. [Google Scholar] [CrossRef]
  16. Owen, A.B. Empirical likelihood ratio confidence intervals for a single function. Biometrika 1991, 75, 237–249. [Google Scholar] [CrossRef]
  17. You, J.H.; Zhou, Y. Empirical likelihood for semi-parametric varying coefficient patially linear model. Stat. Probablity Lett. 2006, 76, 412–422. [Google Scholar] [CrossRef]
  18. Huang, Z.S.; Zhang, R.Q. Empirical likelihood for nonparametric parts in semiparametric varying coefficient patially linear models. Stat. Probablity Lett. 2009, 79, 1798–1808. [Google Scholar] [CrossRef]
  19. Chen, S.N. Imputation of Missing Values Using Quantile Regression. Ph.D. Thesis, Iowa State University, Ames, Iowa, 2014. [Google Scholar]
  20. Wang, Q.; Rao, J.N.K. Empirical likelihood-based inference under imputation for missing response data. Ann Stat. 2002, 30, 896–924. [Google Scholar]
  21. Schumaker, L.L. Spline Functions; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
  22. Zhao, P.X.; Xue, L.G. Empirical likelihood inferences for semiparametric varying coefficient partially linear models with longitudinal data. Commun. Stat.-Theory Methods 2010, 39, 1898–1914. [Google Scholar] [CrossRef]
  23. Xue, L.G.; Zhu, L.X. Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika 2007, 94, 921–937. [Google Scholar] [CrossRef]
  24. Lv, X.F.; Li, R. Smoothed empirical likelihood analysis of partially linear quantile regression models with missing response variables. Adv. Stat. Anal. 2013, 97, 317–347. [Google Scholar] [CrossRef]
Figure 1. Average of 95% pointwise confidence intervals of two choice probabilities for varying coefficient part with n = 500 .
Figure 1. Average of 95% pointwise confidence intervals of two choice probabilities for varying coefficient part with n = 500 .
Symmetry 14 02258 g001
Figure 2. Average of 95% pointwise confidence intervals of two choice probabilities for varying coefficient part with n = 1000 .
Figure 2. Average of 95% pointwise confidence intervals of two choice probabilities for varying coefficient part with n = 1000 .
Symmetry 14 02258 g002
Table 1. Confidence intervals for β for different selection probability functions under three different methods.
Table 1. Confidence intervals for β for different selection probability functions under three different methods.
Ω nIELCELFEL
Ω 1 n = 100(1.667, 2.292)(1.661, 2.329)(1.739, 2.357)
n = 500(1.813, 2.175)(1.801, 2.186)(1.872, 2.149)
n = 1000(1.902, 2.122)(1.864, 2.147)(1.921, 2.112)
Ω 2 n = 100(1.683, 2.325)(1.611, 2.275)(1.739, 2.357)
n = 500(1.801, 2.193)(1.734, 2.199)(1.872, 2.149)
n = 1000(1.895, 2.139)(1.817, 2.176)(1.921, 2.112)
Table 2. Confidence interval length and coverage probabilities for β under three different methods.
Table 2. Confidence interval length and coverage probabilities for β under three different methods.
n Ω IELCELFEL
LENCPLENCPLENCP
n = 100 Ω 1 0.6250.9470.6680.9400.6180.948
Ω 2 0.6420.9460.6640.9340.6180.948
n = 500 Ω 1 0.3620.9490.3850.9470.2770.950
Ω 2 0.3920.9480.4650.9450.2770.950
n = 1000 Ω 1 0.2200.9520.2830.9490.1910.953
Ω 2 0.2350.9500.3590.9470.1910.953
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yan, Y.; Luo, S.; Zhang, C.-y. Statistical Inference for Partially Linear Varying Coefficient Quantile Models with Missing Responses. Symmetry 2022, 14, 2258. https://doi.org/10.3390/sym14112258

AMA Style

Yan Y, Luo S, Zhang C-y. Statistical Inference for Partially Linear Varying Coefficient Quantile Models with Missing Responses. Symmetry. 2022; 14(11):2258. https://doi.org/10.3390/sym14112258

Chicago/Turabian Style

Yan, Yuxin, Shuanghua Luo, and Cheng-yi Zhang. 2022. "Statistical Inference for Partially Linear Varying Coefficient Quantile Models with Missing Responses" Symmetry 14, no. 11: 2258. https://doi.org/10.3390/sym14112258

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop