Next Article in Journal
Existence Results for Systems of Nonlinear Second-Order and Impulsive Differential Equations with Periodic Boundary
Previous Article in Journal
Social Ranking Problems at the Interplay between Social Choice Theory and Coalitional Games
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sampling Importance Resampling Algorithm with Nonignorable Missing Response Variable Based on Smoothed Quantile Regression

1
School of Statistics and Data Science, Beijing Wuzi University, Beijing 101149, China
2
School of Statistics and Data Science, Xinjiang University of Finance and Economics, Urumqi 830012, China
3
School of Mathematics and Data Science, Changji University, Changji 831100, China
4
Department of Information Management and Finance, National Yang Ming Chiao Tung University, Taiwan 30010, China
5
Institute Digital Assets, Academy Economic Sciences, 010374 Bucharest, Romania
6
School of Business and Economics, Humboldt-Universität zu Berlin, 10117 Berlin, Germany
7
Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi 830011, China
*
Author to whom correspondence should be addressed.
The first two authors contributed equally to this work.
Mathematics 2023, 11(24), 4906; https://doi.org/10.3390/math11244906
Submission received: 19 October 2023 / Revised: 24 November 2023 / Accepted: 29 November 2023 / Published: 8 December 2023
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
The presence of nonignorable missing response variables often leads to complex conditional distribution patterns that cannot be effectively captured through mean regression. In contrast, quantile regression offers valuable insights into the conditional distribution. Consequently, this article places emphasis on the quantile regression approach to address nonrandom missing data. Taking inspiration from fractional imputation, this paper proposes a novel smoothed quantile regression estimation equation based on a sampling importance resampling (SIR) algorithm instead of nonparametric kernel regression methods. Additionally, we present an augmented inverse probability weighting (AIPW) smoothed quantile regression estimation equation to reduce the influence of potential misspecification in a working model. The consistency and asymptotic normality of the empirical likelihood estimators corresponding to the above estimating equations are proven under the assumption of a correctly specified parameter working model. Furthermore, we demonstrate that the AIPW estimation equation converges to an IPW estimation equation when a parameter working model is misspecified, thus illustrating the robustness of the AIPW estimation approach. Through numerical simulations, we examine the finite sample properties of the proposed method when the working models are both correctly specified and misspecified. Furthermore, we apply the proposed method to analyze HIV—CD4 data, thereby exploring variations in treatment effects and the influence of other covariates across different quantiles.

1. Introduction

Missing data analysis has gained significant attention in recent years. To analyze missing data, it is crucial to understand the response mechanism that leads to missing data. If the missingness of the variable of interest is conditionally independent of that variable, the response mechanism is considered to be random or ignorable. Otherwise, the response mechanism is considered to be nonrandom or nonignorable. Dealing with nonrandom missing data presents greater challenges, which are evident in two aspects: Firstly, the assumed response model cannot be validated solely based on observed data; secondly, the model parameters may be unidentifiable.
To obtain meaningful inferences from incomplete data with nonrandom missingness, it is necessary to satisfy a set of identifying conditions [1,2]. Moreover, the accuracy of the methods based on parameter models is greatly influenced by the correct specification of the assumed parameter model [3]. Consequently, researchers aim to impose weaker model assumptions on the response mechanism to achieve robust results. The semiparametric response model was initially considered by Kim and Yu [4], but their proposed method necessitated a validation sample to estimate the model parameters. Similarly, Shao and Wang [5] examined the same semiparametric exponential tilting model and proposed a parameter estimation approach based on calibration estimation equations. A comprehensive review of parameter estimation methods for nonrandom missing data is provided by Kim and Shao [6].
Quantile regression, introduced by Koenker and Bassett [7], has become a widely used statistical analysis tool. It offers more adaptability and flexibility compared to mean regression. Notably, quantile regression does not require the assumption of error term distribution and demonstrates robustness against heavy-tailed errors and outliers. Furthermore, by considering regressions at different quantiles of the response variable, quantile regression enables the assessment of covariate effects at various quantiles and yields a more comprehensive understanding of the conditional distribution. However, there is a scarcity of literature on quantile regression for nonrandom missing data.
The nonsmoothness of the check function for standard quantile estimators makes it impossible to directly estimate the asymptotic covariance matrix [8]. As a result, the existing theoretical results for nonrandom missing mean regression cannot be directly extended to quantile regression.
The idea of smoothing nondifferentiable objective functions can be traced back to Horwitz [9], while Whang [10] introduced the smoothed empirical likelihood approach for quantile regression. Luo et al. [11] extended the aforementioned method to analyze data with random missingness; Zhang and Wang [12] further expanded it to handle cases of nonignorable missingness.
However, on the one hand, this method relies on the assumption of a parametric propensity missingness model, which introduces the risk of model misspecification. On the other hand, this method corrects estimation biases caused by missing data through inverse probability weighting but may not fully utilize the information from incomplete observations.
Regarding nonrandom missing data, previous studies have addressed the issue in different regression settings. Specifically, Niu et al. [13] and Bindele and Zhao [14] focused on estimation equation imputation in linear regression and rank regression, respectively. In the context of quantile regression, Chen et al. [15] introduced three missing quantile regression estimation equations: inverse probability weighting, estimation equation imputation, and an enhanced approach combining both methods. It is important to note that these studies assume a response mechanism with random missingness.
Moreover, the existing literature commonly utilizes kernel estimation methods [16] to estimate the conditional means involved in the imputation estimation equation. However, when the dimension of the covariates is high, the kernel estimation results can become unstable. To overcome the curse of dimensionality associated with multivariate nonparametric kernel estimation, Kim [17] proposed a parametric fractional imputation method for handling missing data. Additionally, Riddles et al. [18] extended this method to address the scenario of nonignorable missing data. They developed an EM algorithm based on a parameter working model derived from observed data and incorporated the parametric fractional imputation (FI) method. Nevertheless, these approaches heavily rely on parameter-based response models, which renders them sensitive to model misspecification. Furthermore, the likelihood-based EM algorithm is not directly applicable to quantile regression.
Utilizing estimation equations, Paik and Larsen [19] incorporated a working model for observed data and employed a sampling importance resampling (SIR) algorithm to estimate the missing data and corresponding estimation equations. Building upon this, Wang et al. [20] and Song et al. [21] extended the logistic response model used in the aforementioned approaches to a develop a semiparametric exponential tilt model. However, in the absence of knowledge about the tilting parameter, these methods relied on validation samples.
In this paper, we propose a smoothed empirical likelihood approach for imputing quantile regression estimation equations with nonignorable missing data based on a semiparametric response model. The novel estimation equation guarantees the second-order differentiability of the objective function with respect to the parameter vector. The imputed values for the missing data were derived from a parameter working model and obtained using sampling importance resampling.
Although imputation estimation equations applying information from missing data compared to IPW estimation equations can enhance estimation efficiency, both theoretical and numerical experiments have shown that imputation estimation equations are sensitive to misspecification of the parameter working model. Therefore, to mitigate the impact of misspecification in the working model, this paper further proposes the AIPW smoothed quantile regression estimation equation. It is demonstrated that, when the working model is correctly specified, the asymptotic variance of the AIPW estimation equation shares the same form as the asymptotic variance of the nonparametric model estimator. Furthermore, even when the working model is misspecified, the estimates remain consistent.
The remaining sections of this paper are organized as follows. Section 2 establishes the semi-parametric response model and the AIPW quantile regression estimation equation, along with the algorithmic procedure for estimating the skewness parameter and quantile regression coefficients using importance resampling. Section 3 presents the large sample properties of the parameter estimators. Section 4 demonstrates the finite sample properties of the estimators through numerical simulations. Section 5 applies the proposed methodology to analyze the HIV—CD4 dataset.

2. Proposed Method

Consider a linear quantile regression model as follows:
Y i = Z i θ τ + ϵ i , i = 1 , , n ,
where Y i is the response variable, Z i is a fully observed q-dimensional covariate vector, θ τ represents the unknown regression coefficient vector, ϵ i denotes the random error term satisfying P ( ϵ i 0 | Z i ) = τ , τ ( 0 , 1 ) , and the ϵ i values are mutually independent. In the subsequent discussion, we will abbreviate θ τ as θ .
If the response variables Y i , i = 1 , , n are fully observed, the quantile regression estimator of θ is obtained by minimizing the following equation:
θ ^ = arg min θ Θ 1 n i = 1 n ρ τ ( Y i Z i θ ) ,
where ρ τ ( u ) = u ( τ I ( u < 0 ) ) is the check function, and I ( · ) is the indicator function. For a given τ , θ ^ satisfies the following estimation equation:
i = 1 n ψ ( Z i , Y i ; θ ) 0 ,
where ψ ( Z i , Y i ; θ ) = Z i ψ τ ( Y i Z i θ ) when Y i Z i θ 0 , and ψ ( Z i , Y i ; θ ) = 0 otherwise. Here, ψ τ ( u ) = I ( u < 0 ) τ .
In the scenario where the missingness of response variable Y i is nonignorable, let δ i denote the missing indicator. If Y i is missing, δ i = 1 ; otherwise, δ i = 0 . ( Z i , Y i , δ i ) , i = 1 , , n represents an independent and identically distributed sample from ( Z , Y , δ ) . We establish a semiparametric exponential tilting model for missing propensity as follows:
P ( δ = 1 | Z , Y ) = 1 1 + exp ( g ( X ) + γ Y ) ,
where g ( · ) is an unspecified function, X Z is a d-dimensional vector, and there exists an instrumental variable V = Z / X that is unrelated to δ given ( X , Y ) .
Let f κ ( Y , Z | X ) , where κ 0 , 1 denotes the conditional density of ( Z , Y ) of X when δ = κ . Specifically, we have the following:
f 0 ( Y , Z | X , γ ) = f 1 ( Y , Z | X ) × O ( X , Y ) E { O ( X , Y ) | X , δ = 1 } ,
where O ( X , Y ) = P ( δ = 0 | X , Y ) P ( δ = 1 | X , Y ) = exp ( g ( X ) + γ Y ) . For the quantile estimation equation ψ ( Y , Z ; θ ) , let
ψ e e i ( Z , Y , δ ; θ , γ ) = δ ψ ( Y , Z ; θ ) + ( 1 δ ) E ( ψ ( Z , Y ; θ ) | X , δ = 0 , γ ) ,
it can be easily shown that E ( ψ ( Y , Z ; θ ) | X ) = E ( ψ e e i ( Y , Z ; θ ) | X ) , where
E { ψ ( Z , Y ; θ ) | X , δ = 0 , γ } = E { δ exp ( γ Y ) ψ ( Z , Y ; θ ) | Z } E { δ exp ( γ Y ) | Z } : = m ψ 0 ( X ; θ , γ ) .
The nonparametric kernel estimate of Equation (6) is given by
m ^ 0 ( X ; θ , γ ) = i = 1 n ω i ( γ ) ψ ( Z i , Y i ; θ ) ,
where ω i ( γ ) = δ i exp ( γ Y i ) K h ( X X i ) / j = 1 n δ j exp ( γ Y j ) K h ( X X j ) , K h ( u ) = h 1 K ( u / h ) , and K ( · ) is a d-dimensional kernel function with bandwidth h.
Due to the instability of the nonparametric multivariate kernel estimation of the aforementioned conditional expectation, this paper adopts Monte Carlo methods to estimate m ψ 0 ( X ; θ , γ ) . For simplicity of discussion, we consider the parameter assumption of the conditional distribution f ( Y | Z , δ = 1 ; β ) of the observed response. This assumption can be verified easily using fully observed samples. Consequently, the conditional distribution of the response with nonrandom missingness satisfies
f 0 ( Y | Z ; β , γ ) = f 1 ( Y | Z ; β ) × exp ( γ Y ) E { exp ( γ Y ) | Z , δ = 1 ; β } .
Let Y i ( j ) , j = 1 , , M be independent and identically distributed samples from f ( Y | Z i , δ = 0 ; β , γ ) . According to the law of large numbers, as M , we have
m ^ ψ 0 ( Z i ; θ , β , γ ) = 1 M j = 1 M ψ ( Z i , Y i * ( j ) ( β , γ ) ; θ ) p m ψ 0 ( Z i ; θ , β , γ ) .
To obtain a set of random realizations from f 0 ( Y | Z ; β , γ ) , the SIR algorithm [19] can be employed based on the parametric representation in (7) for a given ( β , γ ) :
(1)
Random samples S i = { Y ˜ i ( k ) , k = 1 , , M 2 } are drawn from f ( Y | Z i , δ = 1 ; β ) .
(2)
Calculate the adjustment weights for each sample point in S as
ω i k ( γ ) = exp ( γ Y ˜ i ( k ) ) 1 M 2 j = 1 M 2 exp ( γ Y ˜ i ( k ) ) , k = 1 , 2 , , M 2 .
(3)
Resample from S i according to the probabilities ω i 1 ( γ ) , , ω i M 2 ( γ ) to obtain Y i ( 1 ) , , Y i ( M ) . To ensure the convergence of the aforementioned process, it is crucial to have M 2 and M / M 2 0 .
The SIR-based quantile regression estimation equation is given by
ψ e e i ( Y i , Z i , δ i ; θ , β , γ ) = δ i ψ ( Z i , Y i ; θ ) + ( 1 δ i ) 1 M j = 1 M ψ ( Z i , Y i ( j ) ( β , γ ) ; θ ) .
Due to the nonsmoothness of the aforementioned estimation equation, obtaining the sandwich estimator of the asymptotic covariance matrix directly is not feasible. Therefore, this paper proposes using a smooth function G ( Z i θ Y i ) as a substitute for the indicator function I ( Y i Z i θ < 0 ) in the quantile estimation equation, thus resulting in a smooth approximation of ψ τ ( Y i Z i θ ) :
ψ h ( Y i , Z i ; θ ) = Z G ( Z i θ Y i ) τ ,
where G h ( u ) = G ( u / h ) , G ( u ) = u K ( v ) d v , and K ( · ) is a kernel function defined in the range [ 1 , 1 ] .
For nonignorable missing data, we have the following representation for the smoothed SIR-based quantile regression estimation equation:
ψ h e e i ( Y i , Z i , δ i ; θ , β , γ ) = δ i ψ h ( Z i , Y i ; θ ) + ( 1 δ i ) 1 M j = 1 M ψ h ( Z i , Y i ( j ) ( β , γ ) ; θ ) .
The estimation equation based on imputation is susceptible to the misspecification of f ( Y | Z , δ = 1 ; β ) . Due to the relative robustness of the semiparametric response model, we consider the AIPW (augmented inverse probability weighting) estimation equation:
ψ h a i p w ( Y i , Z i , δ i ; θ , β , γ ) = δ i Z i ψ h ( Z i , Y i ; θ ) π ( X i , Y i ; g ^ γ ; γ ) + 1 δ i π ( X i , Y i ; g ^ γ ; γ ) · 1 M j = 1 M ψ h ( Z i , Y i ( j ) ( β , γ ) ; θ ) ,
where it can be proven that the AIPW estimation equation is consistent in the case of the misspecification of the parameter model f ( Y | Z , δ = 1 ; β ) .
In practice, ( β , γ ) are often unknown and need to be estimated separately. The maximum likelihood estimation of β , denoted as β ^ , is the solution to the following score function:
i = 1 n δ i ln f ( y i | X i , δ i = 1 ; β ) β = 0 .
Then, we consider the estimation of the tilting parameter γ . The semiparametric missing propensity model is analyzed by considering two estimation approaches for the tilting parameter γ : the profile two-step generalized method of the moments estimation and the kernel regression estimation for the nonparametric component g ( · ) . To estimate the skewness parameter γ , we define the profile estimation equation as follows:
ξ ( Z i , Y i , δ i ; g γ , γ ) = δ i π ( X i , Y i ; g γ , γ ) 1 h ( V i ) : = ξ i ( g γ , γ ) ,
where h ( V ) is an arbitrary specified function of the instrumental variable V , and g γ ( · ) satisfies the following:
exp ( g γ ( X i ) ) = E ( 1 δ i | X i ) E { δ i exp ( γ Y i ) | X i } .
Under the assumption of a correctly specified missingness mechanism, it holds that E { ξ i ( g γ , γ ) } = 0 , and the vector ξ i ( g γ , γ ) is overidentified with respect to γ . The profile two-stage generalized method of the moments estimation for γ is given by the following:
γ ^ = arg min γ R ξ ¯ ( g ^ γ , γ ) W n 1 ξ ¯ ( g ^ γ , γ ) ,
where ξ ¯ ( g ^ γ , γ ) = 1 n i = 1 n ξ ( Z i , Y i , δ i ; g ^ γ , γ ) , and W n = 1 n i = 1 n ξ ( Z i , Y i , δ i ; g ^ γ , γ ) 2 . The estimator g ^ γ ( · ) represents the kernel regression estimate of g ( · ) and satisfies the following equation:
exp ( g ^ γ ( X ^ i ) ) = j = 1 n ( 1 δ j ) K h ( X ^ j X ^ i ) j = 1 n δ j exp ( γ Y i ) K h ( X ^ j X ^ i ) ,
where K h ( u 1 , , u d ) represents the d-dimensional kernel function with a bandwidth h.
Define ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) and l = 1 , 2 , which satisfy the following:
ψ ^ h i ( 1 ) ( θ , β ^ , γ ^ ) = δ i ψ h ( Z i , Y i ; θ ) + ( 1 δ i ) m ^ ψ h 0 ( Z i ; θ , β ^ , γ ^ ) , ψ ^ h i ( 2 ) ( θ , β ^ , γ ^ ) = δ i π ( X i , Y i ; g ^ γ ^ ; γ ^ ) ψ h ( Z i , Y i ; θ ) + 1 δ i π ( X i , Y i ; g ^ γ ^ ; γ ^ ) m ^ ψ h 0 ( Z i ; θ , β ^ , γ ^ ) ,
where m ^ ψ h 0 ( Z i ; θ , β ^ , γ ^ ) = 1 M j = 1 M ψ h ( Z i , Y i * ( j ) ( β ^ , γ ^ ) ; θ ) .
Let p i represent the probability mass of ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) , where i = 1 , 2 , , n . The empirical log-likelihood ratio function with respect to θ is defined as follows:
R ^ ( l ) ( θ ) = 2 sup i = 1 n log ( n p i ) | p i 0 , i = 1 n p i = 1 , i = 1 n p i ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) = 0 .
Using the method of Lagrange multipliers, it can be shown that R ^ ( l ) ( θ ) can be expressed as follows:
R ^ ( l ) ( θ ) = 2 i = 1 n log { 1 + λ ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) } ,
where λ satisfies the following:
1 n i = 1 n ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) 1 + λ ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) = 0 .
The empirical likelihood estimators of the quantile regression coefficients based on the two proposed estimation equations in this paper, denoted as θ ^ ( l ) , l = 1 , 2 , are given by the following:
θ ^ ( l ) = arg min θ R ^ ( l ) ( θ ) .

3. Theoretical Analysis

To elucidate the theoretical properties of the proposed estimators in this paper, we first define the matrix as follows:
A 1 = E π ( X , Y ) ψ ( Z , Y ; θ 0 ) 2 + ( 1 π ( X , Y ) ) m ψ 0 ( Z ; θ 0 ) 2 , A 2 = E π ( X , Y ) 1 { ψ ( Z , Y ; θ 0 ) m ψ 0 ( Z ; θ 0 ) } 2 + m ψ 0 ( Z ; θ 0 ) 2 , B 1 = Var ( H 1 Ψ ( β 0 ) + H 2 Φ ( γ 0 ) ) , B 2 = Var ( H 3 Φ ( γ 0 ) ) , T l = A l + B l , l = 1 , 2 , H 1 = E ( 1 δ ) Cov 0 ( ψ ( Z , Y ; θ 0 ) , s ( Z , Y ; β 0 ) | Z ) , H 2 = E ( 1 δ ) Cov 0 ( ψ ( Z , Y ; θ 0 ) , Y | Z ) , H 3 = E ( 1 δ ) ( Y m Y 0 ( X ) ) ( ψ ( Z , Y ; θ 0 ) m ψ 0 ( Z ; θ 0 ) ) .
(C1)
(a) The density of Z is bounded and has continuous and bounded second-order derivatives; (b) the density of Z and the propensity π Z , Y , γ 0 are bounded away from 0; and (c) E exp ( 2 γ Y ) is finite and E π Z , Y , γ 0 Z 1 almost surely.
(C2)
Let K h ( u ) = K ( u / h ) / h d and K denote a generic notation for a d-dimensional kernel function, and the value of d is determined by the context of use. K is a bounded, uniformly continuous, symmetric function of the m th order satisfying the following conditions: K ( s ) d s = 1 , s = s 1 , , s d , s l t K ( s ) d s = 0 , and s l m K ( s ) d s 0 for any l = 1 , d and t = 1 , , m 1 .
(C3)
The bandwidth sequence h satisfies n h 2 d , n h d / log n , and n h 2 m 0 as n ; the order m satisfies m 2 and 2 m > d .
(C4)
Let W ( γ ) = E { ξ ( g γ , γ ) 2 } and Ξ ( γ ) = E { ( 1 δ ) ( h ( V ) m V 0 ( X , γ ) ( Y m Y 0 ( X , γ ) ) } , Λ ( Z , Y , δ , γ ) = δ π ( X , Y ) 1 ( h ( V ) m V 0 ( X , γ ) ) , where m V 0 ( X , γ ) = E { h ( V ) | X , δ = 0 ; γ } , and m Y 0 ( X , γ ) = E { Y | X , δ = 0 ; γ } :
Φ ( Z , Y , δ , γ ) = Ξ ( γ ) W ( γ ) 1 Ξ ( γ ) 1 Ξ ( γ ) W ( γ ) 1 Λ ( Z , Y , δ , γ ) ,
E Φ i γ 0 2 < , Φ i ( γ ) / γ exists at γ 0 , E sup γ ξ ( g γ , γ ) < , γ 0 is the unique solution to E { ξ ( g γ , γ ) } = 0 , and W γ 0 is positive definite.
(C5)
Z i , Y i , δ i : i = 1 , , n are independent and identically distributed random vectors. The support of θ denoted by B is a compact set in R q , and θ 0 B is the unique solution to E ψ Z i , Y i , θ = 0 . Furthermore, ψ Z i , Y i , θ / θ , 2 ψ Z i , Y i , θ / θ θ and ψ Z i , Y i , θ 3 are bounded by an integrable function H ( x , y ) within a neighborhood of θ 0 .
(C6)
For all ϵ in a neighborhood of zero and for almost every Z , F ( ϵ Z ) , f ( ϵ Z ) and f ( ϵ Z , δ = 0 ) to exist, they are bounded away from zero and are r times continuously differentiable with r 2 . There exists a function C ( Z ) such that f ( s ) ( ϵ Z ) , and f ( s ) ( ϵ Z , δ = 0 ) C ( Z ) for s = 0 , 2 , , r , for almost all Z and ϵ in a neighborhood of zero, and E C ( X ) X 2 < .
(C7)
The kernel function K ( · ) is a probability density function such that (a) it is bounded and has a compact support; (b) K ( · ) is an rth order kernel, i.e., K ( · ) satisfies u j K ( u ) d u = 1 if j = 0 ; 0 if 1 j r 1 , and C K if j = r for some constant C k 0 ; and (c) we let G ˜ ( u ) = G ( u ) , G 2 ( u ) , , G L + 1 ( u ) for some L 1 , where G ( u ) = v ˜ < u K ( v ) d v . For any ι R L + 1 satisfying ι = 1 , there is a partition of [ 1 , 1 ] , 1 = a 0 < a 1 < a L + 1 such that ι G ˜ ( u ) is either strictly positive or strictly negative within a l 1 , a l for l = 1 , , L + 1 .
(C8)
The positive bandwidth parameter h satisfies n h 2 r 0 and n h / log ( n ) as n .
(C9)
Z has a bounded support, E Z 4 < and the matrices Γ and T l ; additionally, l = 1 , 2 are nonsingular.
(C10)
Under complete observation of Z i , Y i for i = 1 , , n , the unique solution β ^ to the score equation in (3) satisfies
n β ^ β 0 d N ( 0 , Σ ) ,
for some Σ for sufficiently large n.
To ensure the requirements of Lemma 8.11 by Newey and McFadden [22] and Theorem 6.18 by Van [23], the conditions (C1)–(C4), which are commonly found in the literature on missing data and nonparametric method [24,25], are primarily employed. These conditions encompass the following: (1) random equivalence and continuity conditions; (2) linearity conditions on the objective function with respect to nonparametric components and convergence rate conditions for nonparametric estimators; and (3) the differential continuity condition of the estimating equations with respect to the parameter of interest. Conditions (C5)–(C9) ensure the consistency and asymptotic normality of the empirical likelihood estimator for quantile regression smoothing [10]. To simplify the discussion on the asymptotic properties of maximum likelihood estimation in the working model, we introduce condition (C10).
Under the fulfillment of the assumed conditions, we define the following:
n ( γ ^ γ 0 ) = 1 n Φ i ( γ 0 ) + o p ( 1 ) , n ( β ^ β 0 ) = 1 n Ψ i ( β 0 ) + o p ( 1 ) .
In addition, we have the following lemma, whose proof is given in the Appendix A:
Lemma 1.
Under conditions (C5)–(C9), we have
E { ψ h ( Z i , Y i ; θ 0 ) } = O ( h r ) , E { m ψ h 0 ( Z i ; θ 0 ) } = E { m ψ 0 ( Z i ; θ 0 ) } + O ( h r ) .
Lemma 2.
Under the assumption conditions (C1)–(C10), with the notation from Section 3, the following results hold as n :
( 1 ) 1 n i = 1 n ψ h i ( l ) ( θ 0 , β ^ , γ ^ ) d N ( 0 , T l ) ; ( 2 ) 1 n i = 1 n ψ h i ( l ) ( θ 0 , β ^ , γ ^ ) p V l ; ( 3 ) 1 n i = 1 n ψ h i ( l ) ( θ 0 , β ^ , γ ^ ) θ p Γ ; ( 4 ) max i ψ h i ( l ) ( θ 0 , β ^ , γ ^ ) = o p ( n 1 / 2 ) .
Theorem 1.
Under conditions (C1)–(C10), if the parameter working model is correctly specified, as n for l = 1 , 2 , we have
n θ ^ ( l ) θ 0 d N 0 , Γ 1 T l Γ ,
where Γ = E f ( 0 Z ) Z Z .
If there is no missing data, we have P ( δ | Z , Y ) = 0 , which implies that H 1 , H 2 , and H 3 are all zero. Additionally, we have
A 1 = A 2 = E ψ ( Z , Y , θ 0 ) 2 = τ ( 1 τ ) E Z i Z i .
The above results are consistent with the asymptotic normality conclusion of classical quantile regression.
The different forms of Λ 1 and Λ 2 indicate that if the parameter working model f ( Y | Z , δ = 1 ; β ) is misspecified, θ ^ ( 1 ) is no longer consistent, while θ ^ ( 2 ) remains a consistent estimator of θ 0 . The following procedure demonstrates the double robustness of the AIPW estimation equation. For misspecified f ( Y | Z , δ = 1 ) values, there exists C ( Z ; θ , γ ^ ) such that
m ^ ψ 0 ( Z i ; θ , β ^ , γ ^ ) = C ( Z i ; θ , γ ^ ) + o p ( 1 ) .
This illustrates the double robustness property of the AIPW estimation equation.
It can be shown that
1 n i = 1 n ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) = 1 n i = 1 n δ i ψ ( Z i , Y i ; θ ) π ( X i , Y i ; g ^ γ ^ , γ ^ ) + 1 δ i π ( X i , Y i ; g ^ γ ^ , γ ^ ) C ( Z ; θ , γ ^ ) + o p ( 1 ) = 1 n i = 1 n δ i ψ ( Z i , Y i ; θ ) π ( X i , Y i ; g ^ γ ^ , γ ^ ) + o p ( 1 ) = : 1 n i = 1 n ψ ^ h i i p w ( θ , β ^ , γ ^ ) + o p ( 1 ) .
If π ( X i , Y i ) is correctly specified, the IPW estimation equation is consistent, which implies that the AIPW quantile regression estimation equation remains consistent in this case.
Theorem 2.
Under the conditions of Theorem 1, if the parameter working model is correctly specified, for l = 1 , 2 and as n ,
R ^ ( l ) ( θ 0 ) d r 1 ( l ) χ 1 · 1 2 + r 2 ( l ) χ 1 · 2 2 + + r q ( l ) χ 1 · q 2 ,
where r i ( l ) are the eigenvalues of A l 1 Λ l . χ 1 · 1 2 , χ 1 · 2 2 , , χ 1 · q 2 represent q independent standard χ 2 distributed random variables.
First, if there is no data missingness, we have T l = A l = τ ( 1 τ ) E Z Z , which leads to R ^ ( l ) ( θ ) d χ q 2 , and the Wilks’ Theorem holds. Furthermore, it is worth noting that if β and γ are known, we still have T l = A l , and in this case, the Wilks’ Theorem still holds. The above conclusion is consistent with Zhao [26].

4. Simulation Study

To investigate the finite-sample properties of the proposed method, this study conducted numerical simulations under both correctly specified and misspecified working model scenarios.

4.1. Simulation 1: Correctly Specified Working Model

In the numerical simulation, we generated a random vector ( x , y , δ ) , where x is the independent variable, y is the response variable of interest, and δ is the indicator variable for the observation of y. When δ = 1 , y has observed values; otherwise, the observation of y is missing. Let x N ( 0 , 0.5 ) , and generate the observed response variable y according to the following equation:
y = μ ( x ) + e ,
where μ ( x ) = 1 + x . We considered two different distributions for the random error term e: (a) N ( 0 , 0.9 ) and (b) N ( 0 , 0.49 ( 1 + x 2 ) ) . For the working model f ( y | x , δ = 1 ; β ) , the former follows a homoscedastic structure N ( μ ( x ) , σ 1 2 ) , while the latter follows a heteroscedastic structure N ( μ ( x ) , σ 2 2 ( x ) ) .
The indicator variable δ follows a Bernoulli distribution with parameter p, i.e., δ Bernoulli ( 1 , p ) . The conditional probability of δ given ( x , y ) is defined as follows:
p ( ϕ ) = P ( δ = 1 | x , y ) = 1 + exp ( ϕ 0 ϕ 1 y ) 1 ,
where ( ϕ 0 , ϕ 1 ) = ( 0.8 , 0.2 ) . In this case, the missingness mechanism for the response variable y is nonrandom, with x serving as a missingness instrument. The average observed rate in the sample was approximately 73%.
We establish a quantile regression model of the response variable y on x as follows:
Q τ ( y | x ) = θ 0 + θ 1 x ,
where τ represents the quantile of interest, specifically τ = ( 0.25 , 0.5 , 0.75 ) .
Consider the following five quantile regression estimation equations:
(1)
Full Estimation Equation: ψ h full ( x i , y i ; θ ) = ψ h ( X i , y i ; θ ) ;
(2)
Complete Case (CC) Estimation Equation: ψ h cc ( x i , y i , δ i ; θ ) = δ i ψ h ( X i , y i ; θ ) ;
(3)
IPW Estimation Equation: ψ h ipw ( x i , y i , δ i ; θ , ϕ ^ ) ;
(4)
EEI Estimation Equation: ψ h eei ( x i , y i , δ i ; θ , ϕ ^ , β ^ ) ;
(5)
AIPW Estimation Equation: ψ h aipw ( x i , y i , δ i ; θ , ϕ ^ , β ^ ) .
To generate a sample of size n = 500 that meets the requirements of the simulation, we can use the law of total probability and express f ( y | x ) as follows:
f ( y | x ) = P ( δ = 1 | x ) f ( y | x , δ = 1 ) + P ( δ = 0 | x ) f ( y | x , δ = 0 ) ,
where
f ( y | x , δ = 0 ) = f ( y | x , δ = 1 ) × O ( x , y ) E ( O ( x , Y ) | x , δ = 1 ) .
Under the specified nonrandom missingness mechanism, we have O ( x , y ) = exp ( ϕ 0 ϕ 1 y ) for the homoscedastic case of f ( y | x , δ = 1 ) = N ( μ , σ 1 2 ) . In this case, we can express the ratio of the conditional probabilities as follows:
P ( δ = 0 | x ) P ( δ = 1 | x ) = E ( O ( x , y ) | x , δ = 1 ) = exp ϕ 0 ϕ 0 ,
where ϕ 0 = 1 2 σ 1 2 ( 2 μ σ 1 2 ϕ 1 + σ 1 4 ϕ 1 2 ) . Thus, we have f ( y | x , δ = 0 ) = N ( μ + σ 1 2 ϕ 1 , σ 1 2 ) and P ( δ = 1 | x ) = 1 + exp ( ϕ 0 ϕ 0 ) 1 .
Since x is completely observed, we can draw a sample of size n = 500 from the mixed distribution of f ( y | x ) . A similar approach can be applied under the heteroscedastic assumption.
It should be noted that the response variable y originates from a distribution with a complex, mixed form. As a result, discussing the true values of the parameters ( θ 0 , θ 1 ) poses a formidable challenge. This complexity renders it difficult to assess the performance of the estimation methods using conventional measures such as bias or the root mean squared error (RMSE). Consequently, we introduce the following approxmate relative evaluation metrics:
A R E ( M e t h o d , F u l l ) = A R M S E ( M e t h o d ) A R M S E ( F u l l ) ,
where A R M S E ( M e t h o d ) = S D ( M e t h o d ) 2 + ( M e a n ( M e t h o d ) M e a n ( F u l l ) ) 2 .
Table 1 and Table 2 summarize the mean and variance of the five coefficient estimates at different quantiles based on 1000 Monte Carlo simulations under the homoscedastic case (a) and heteroscedastic case (b) of f ( y | x , δ = 1 ) . From the estimation results, it can be observed that the coefficient estimates based on complete observations have larger bias compared to the other estimation methods. When the working model f ( y | x , δ = 1 ; β ) was correctly specified, the proposed imputation estimates yielded smaller variances compared to the IPW estimates. In this case, the performance of the AIPW estimates was similar to the IPW estimates. Comparing the results at different quantiles, it can be seen that the variances of the five estimation methods at the 0.5 quantile were smaller than those at the 0.25 and 0.75 quantiles, which is due to the larger sample size at the central quantile compared to the tails. Under the homoscedastic assumption, the variances of the estimates at the 0.25 and 0.75 quantiles were similar. Under the existing missing mechanism, as the value of the response variable y increased, the missing propensity also increased, thereby indicating higher missing rates at the upper quantiles. Consequently, the estimation variances of the IPW and AIPW estimates were higher at the high quantile of τ = 0.75 compared to the low quantile of τ = 0.25 . However, proper imputation could greatly improve the estimation efficiency at the high quantile of τ = 0.75 . This improvement was more pronounced under the heteroscedastic model. These results demonstrate that the imputation estimates are nearly unbiased when the working model is correctly specified and have higher estimation efficiency compared to the IPW and AIPW estimates.

4.2. Simulation 2: Misspecification of the Working Model

In practical situations, the true data generation mechanism is unknown, and it is challenging to accurately specify the working model f ( y | x , δ = 1 ; β ) for the observed data. In this study, we investigated the finite sample properties of the proposed imputation estimator and calibration estimator under the misspecification of the working model. The simulation model includes two covariates: X 1 N ( 0 , 1 ) and X 2 Exp ( 0.2 ) . Given the covariates, the response variable Y is generated as follows:
Y = 1 + X 1 + X 2 + 0.25 ( 2 + X 2 ) ε ,
where ε N ( 0 , 1 ) , and X 1 , X 2 , and ε are mutually independent.
In this simulation setup, the error term distribution of Y is heteroscedastic. The missing data mechanism for Y is nonrandom and follows
P ( R = 1 | X , Y ) = 1 1 + exp ( 0.1 + 0.5 X 1 2 + 0.15 Y ) .
The average observed rate in the model was approximately 73%, and X 2 served as an instrumental variable. We generated a random sample of size n = 500 denoted as ( X i , Y i , δ i ) : i = 1 , , n . For the aforementioned simulation model, we consider the following quantile regression model:
Q τ ( Y | X ) = ( 1 , X ) θ 0 ( τ ) ,
where θ 0 ( τ ) = ( 1 + 0.5 Q τ ( ε ) , 1 , 1 + 0.25 Q τ ( ε ) ) .
Under the aforementioned data generating mechanism, obtaining an explicit expression for f ( Y | X , δ = 1 ; β ) is challenging and requires specifying the working model based on the observed data. In this simulation model, we consider two possible working models: (1) N ( μ ^ ( X ) , σ ^ 2 ) and (2) N ( μ ^ ( X ) , ( 0.5 + 0.25 X 2 ) 2 ) . Figure 1 and Figure 2 illustrate that the residual distribution of the working model (1) exhibited peakedness, thus violating the normality assumption and indicating model misspecification. In contrast, working model (2) took into account the correct specification of the variance.
The estimation results of the five types of quantile regression estimates obtained from 1000 random simulations at different quantiles are summarized in Table 3, Table 4 and Table 5. These tables include two types of imputation estimates based on the parameter working models (1) and (2), as well as the corresponding AIPW estimates based on the parameter working models (1) and (2), and the combined estimation equations. The results show that the imputation estimates based on the erroneously specified working model (1) exhibited significant estimation bias. On the other hand, although the working model (2) was also misspecified, it took into account the heteroscedasticity in the conditional distribution of the response variable, thus resulting in smaller estimation bias compared to model (1) and better estimation performance. These findings highlight the sensitivity of imputation methods to misspecified working models. Across the three quantiles, the IPW estimates performed well, thus indicating the robustness of the semiparametric response assumption. Even in the presence of misspecified parameter working models, both of the AIPW estimates had similar median absolute deviations to IPW, which were significantly smaller than the misspecified imputation estimates, thereby demonstrating the robustness of the AIPW estimation. Comparing the two AIPW estimates, it is observed that the estimate based on the correctly specified parameter working model had smaller estimation bias and higher estimation efficiency.

5. Real Data Application

We applied our proposed method to the data of 2139 HIV-infected patients enrolled in the ACTG175 study [27]. The ACTG175 study evaluated the efficacy of monotherapy or combination therapy in HIV-infected patients with CD4 cell counts between 200 and 500 cells/mm3. Following the studies by Davidian et al. [28] and Zhang et al. [29], we categorized all the treatment regimens into two groups. The first group consisted of the standard zidovudine (ZDV) monotherapy arm, while the second group included three newer treatment arms: ZDV and dual nucleoside analogue (ddl), ZDV and zalcitabine (ddC), and ddl monotherapy. The first group comprised 532 subjects, while the second group comprised 1697 subjects. We investigated the effect of the treatment arm (trt, 0 = ZDV monotherapy only) on the τ quantile of the CD4 cell count ( CD 4 96 ) measured at baseline and adjusted for the baseline CD4 cell count ( CD 4 0 ) and other baseline covariates, including age, weight, race (0 = Caucasian), gender (0 = female), history of reverse transcriptase inhibitor use (0 = no), and whether the subject discontinued treatment before 96 weeks (offtrt, 0 = no).
Consider fitting a linear quantile regression model as follows:
Q τ ( CD 4 96 | X ) = β 1 ( τ ) + β 2 ( τ ) trt + β 3 ( τ ) CD 4 0 + β 4 ( τ ) age + β 5 ( τ ) weight + β 6 ( τ ) race + β 7 ( τ ) gender + β 8 ( τ ) history + β 9 ( τ ) offtrt .
The dataset used in this study is sourced from the R package “speff2trial”. The study population consists of 1522 Caucasian individuals and 617 non-Caucasian individuals, with 1171 males and 368 females. The average age of the participants is 35 years, with a standard deviation of 8.7 years. Among the participants, 1253 individuals had a history of antiretroviral therapy, and 776 individuals discontinued treatment before the 96th week.
Due to attrition during the study period, approximately 37% of the participants have missing values for the variable CD 4 96 . Although complete measurements of other variables related to CD 4 96 , such as baseline CD4 and CD8 cell counts CD 4 0 and CD 8 0 , as well as CD4 and CD8 cell counts at 20 ± 5 weeks CD 4 20 and CD 8 20 , were obtained at baseline and follow-up visits, these variables may not fully explain the propensity for participants to drop out. In other words, we cannot assume that the missingness of CD 4 96 is random. Therefore, in our analysis, we consider a more comprehensive semiparametric nonrandom missingness mechanism:
P ( R = 1 | X , s , Y ) = π ( s , Y ) = 1 1 + exp ( g ( s ) + γ Y ) ,
where s represents the set of variables associated with attrition, and g ( s ) is a function capturing the relationship between these variables and the missingness indicator R.
Figure 3 displays the histograms of the observed CD 4 96 and its logarithm. From the figure, it can be observed that the conditional distribution f ( y | X , R = 1 ) of observed CD 4 96 is right-skewed. However, the logarithmic transformation did not result in improved symmetry, thus indicating that the normality assumption did not hold. In our analysis, we can assume that CD 4 96 follows a truncated normal distribution with left truncation at 0, where its mean is primarily determined by the influence of eight covariates and three auxiliary variables.
The parameters β in the working model f ( y | X , R = 1 ; β ) are estimated using the truncation regression model in R package "truncreg". The parameter γ in π ( s , Y ) is estimated using the method of the profile generalized method of moments (GMMs).
Figure 4 and Figure 5 illustrate the normality properties of the residuals from the truncated regression working model. Visually, the distribution of residuals appears to be symmetric. The calculated sample skewness is 0.05, thus indicating a slight deviation from perfect symmetry. The Q-Q plot reveals that the distribution of residuals has a kurtosis less than 3. Further computation reveals a kurtosis of 2.11, thus indicating that the residual distribution is flatter than a standard normal distribution.
Table 6 presents the estimates of the quantile regression coefficients and corresponding 95% confidence intervals at the τ = 0.25 , 0.5 , 0.75 quantile levels. The four estimation methods considered include complete case (CC) estimation, inverse probability weighting (IPW) estimation, multiple imputation (MI) estimation, and augmented inverse probability weighting (AIPW) estimation. The MI estimation is based on averaging over L = 20 randomly generated imputations. Confidence intervals for the coefficient estimates were obtained using the bootstrap method with B = 200 resampling iterations.
From Table 6, it can be observed that for the three given quantile levels and four estimation methods, patients receiving the three new combined treatment methods had significantly higher CD4 cell counts at 96 ± 5 weeks compared to the traditional treatment method. In other words, the new treatment methods had significantly slowed down the progression of AIDS compared to the traditional method. Comparing the four estimation methods, it is evident that the complete case estimation overestimated the performance of the treatment group. The results of the IPW estimation and AIPW estimation were similar and higher than the imputation estimation. When comparing the treatment effects at different quantile levels, both the IPW estimation and imputation estimation reflected a decreasing trend in treatment effect from the 0.25th to the 0.75th quantile. Although the AIPW estimation and complete case estimation did not show a similar trend, the coefficient estimates of the AIPW estimation also indicate a more significant improvement in treatment effect for patients at lower quantiles.
Upon examining the effects of the other covariates, it is found that for all four estimation methods, the baseline CD4 level CD 4 0 had a positive impact on the CD4 cell count at 96 ± 6 weeks, while patients with a history of antiretroviral therapy or early treatment discontinuation exhibited poorer CD4 cell levels at 96 ± 5 weeks. In comparison to the covariates directly related to the disease progression mentioned above, the effects of age, weight, race, gender, and other covariates on the CD4 cell count at 96 ± 6 weeks were minimal. The impact directions and significance obtained from different methods were also not consistent. Therefore, although these variables needed to be considered in the modeling process, conclusions regarding their effects should be drawn with caution.

6. Discussion

In this study, we address the bias in quantile regression estimates by constructing imputation and AIPW estimation equations, with both involving the estimation of conditional means under nonrandom missingness. Many existing methods rely on kernel regression to estimate conditional means. However, nonparametric estimation methods may suffer from the curse of dimensionality when the dimension of the covariates is high. Paik and Larsen [19] proposed using importance resampling to obtain Monte Carlo estimates of conditional means, and Song et al. [21] further applied this method to estimation equations. In this study, we extend these methods to quantile regression and overcome the theoretical and computational challenges caused by the nonsmoothness of the checking function in classical quantile regression by employing convolution smoothing.
Common parameter working models are based on linear regression for observed data. Song et al.’s [21] simulation results showed that model misspecification does not lead to estimation bias. However, their simulation study was based on a regression model that satisfied the Gauss–Markov assumption, with missing response variables following a normal distribution with homoscedasticity concerning the covariates. Misspecification was reflected in the estimation of the mean or location variables. However, the advantages of quantile regression are more evident in situations involving skewness, heavy tails, and heteroscedasticity. In this study, our simulation results under heteroscedasticity showed that imputation estimation based on the assumption of a linear regression working model leads to significant estimation bias, while the AIPW estimation equation can mitigate the impact of model misspecification. We also provide theoretical proof of the consistency of AIPW estimation.
Our simulation results demonstrate that, under the ideal scenario of correctly specified parameter working models, the imputation estimator is more efficient than the IPW and AIPW estimators. The AIPW estimator based on the correctly specified model was found to be more efficient than that based on the misspecified model. Therefore, in practical applications, it is crucial to appropriately specify the parameter working model based on the observed data. Fortunately, the effectiveness of the model specification can be assessed using various methods such as Q-Q plots and histograms. For the observed response conditional distributions that do not conform to the linear regression assumption, a Box–Cox transformation can be applied to approximate a normal parameter working model. If such a parameter working model is difficult to obtain, the AIPW estimator proposed in this study can still provide relatively reliable estimates. This is because the proposed response mechanism model is semiparametric and offers certain flexibility. However, the response model constructed in this study does not consider the interaction effects between covariates X and the response variable Y or the potential nonlinear effects of the response variable Y on the missingness propensity.

Author Contributions

The first two authors contributed equally to this work. J.G.: Methodology, software, validation, data curation, writing; F.L.: visualization, data curation, review; W.K.H.: writing—review and editing; X.Z.: supervision, validation; K.W.: formal analysis; T.Z.: investigation; L.Y.: resources; M.T.: Conceptualization, project administration, funding acquisition, and the corresponding author. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (22XNL016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The researchers can download the ACTG175 dataset from the R package “speff2trial”.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SIRSampling Importance Resampling
IPWInverse Probability Weighting
AIPWAugmented Inverse Probability Weighting
EEIImputed Estimation Equation

Appendix A

Appendix A.1. Proof of Lemma 1

E { ψ h ( Z i , Y i , θ 0 ) } = E { Z i { G h ( Z i θ 0 Y i ) τ } } = E { Z i { G ( ε i / h ) τ } } = E Z i u < ε i / h K ( u ) d u τ = E Z i E I ε i < h u ( ε i ) K ( u ) d u τ | Z i = E Z i F ( h u | Z i ) K ( u ) d u τ .
By assumption (C7), we can utilize a Taylor expansion, thus yielding
F ( h u | Z i ) K ( u ) d u = F ( 0 | Z i ) + k = 1 r F ( k ) ( 0 | Z i ) ( h ) r u r r ! K ( u ) d u + F ( r + 1 ) ( h ˜ u | Z i ) ( h ) r + 1 u r + 1 ( r + 1 ) ! K ( u ) d u = F ( 0 | Z i ) + F ( r ) ( 0 | Z i ) ( h ) r u r r ! K ( u ) d u + F ( r + 1 ) ( h ˜ u | Z i ) ( h ) r + 1 u r + 1 ( r + 1 ) ! K ( u ) d u = τ + f ( r 1 ) ( 0 | Z i ) ( h ) r u r r ! K ( u ) d u + f ( r ) ( h ˜ u | Z i ) ( h ) r + 1 u r + 1 ( r + 1 ) ! K ( u ) d u , h ˜ [ 0 , h ] .
Thus, we have
E { Z i G h ( Z i θ Y i ) τ } = ( h ) r r ! E { Z i f ( r 1 ) ( 0 | Z i ) } u r K ( u ) d u + E [ Z i f ( r ) ( h ˜ u | Z i ) u r K ( u ) d u ] O ( h r + 1 ) .
By assumption (C6), we have
E Z i f ( r ) ( h ˜ u | Z i ) u r K ( u ) d u E C ( Z ) Z u r K ( u ) d u = O ( 1 ) .
Therefore,
E { ψ h ( Z i , Y i ; θ 0 ) } = ( h ) r r ! C K E [ Z i f ( r 1 ) ( 0 | Z i ) ] + o ( h r ) .
Similarly, we have
E { m ψ h 0 ( Z i ; θ 0 ) } = E [ E { ψ h ( Z i , Y i ; θ 0 ) | Z i , δ i = 0 } ] = E [ E { Z i { G h ( Z i θ 0 Y i ) τ } | Z i , δ i = 0 } ] = E [ E { Z i { G h ( ε i / h ) τ } | Z i , δ i = 0 } ] = E E Z i u < ε i / h K ( u ) d u τ | Z i , δ i = 0 = E E Z i I ε i < h u ( ε i ) K ( u ) d u τ | Z i , δ i = 0 = E E Z i F ( h u | Z i , δ i = 0 ) K ( u ) d u τ | Z i , δ i = 0 .
Based on the assumptions and the Taylor expansion, we have
F ( h u | Z i , δ i = 0 ) K ( u ) d u = F ( 0 | Z i , δ i = 0 ) + k = 1 r F ( k ) ( 0 | Z i , δ i = 0 ) ( h ) r u r r ! K ( u ) d u + F ( r + 1 ) ( h ˜ u | Z i , δ i = 0 ) ( h ) r + 1 u r + 1 ( r + 1 ) ! K ( u ) d u = F ( 0 | Z i , δ i = 0 ) + F ( r ) ( 0 | Z i , δ i = 0 ) ( h ) r u r r ! K ( u ) d u + F ( r + 1 ) ( h ˜ u | Z i , δ i = 0 ) ( h ) r + 1 u r + 1 ( r + 1 ) ! K ( u ) d u = F ( 0 | Z i , δ i = 0 ) + f ( r 1 ) ( 0 | Z i , δ i = 0 ) ( h ) r u r r ! K ( u ) d u + f ( r ) ( h ˜ u | Z i , δ i = 0 ) ( h ) r + 1 u r + 1 ( r + 1 ) ! K ( u ) d u , h ˜ [ 0 , h ] .
Notice that
E E { Z i { F ( 0 | Z i , δ i = 0 ) τ } | Z i , δ i = 0 } = E Z i E { I ( Y i < Z i θ 0 ) τ } | Z i , δ i = 0 = E E { Z i { I ( Y i < Z i θ 0 ) τ } | Z i , δ i = 0 } = E { m ψ 0 ( Z i ; θ 0 ) } ,
which implies
E { m ψ h 0 ( Z i ; θ 0 ) } = E [ E { ψ h ( Z i , Y i ; θ 0 ) | Z i , δ i = 0 } ] = E { m ψ 0 ( X i ; θ 0 ) } + ( h ) r r ! E { Z i f ( r 1 ) ( 0 | Z i , δ i = 0 ) } u r K ( u ) d u + E [ Z i { f ( r ) ( h ˜ u | Z i , δ i = 0 ) u r K ( u ) d u } ] O ( h r + 1 ) .
Under the assumption conditions, we have
E Z i f ( r ) ( h ˜ u | Z i , δ i = 0 ) u r K ( u ) d u E C ( Z ) Z u r K ( u ) d u = O ( 1 ) ,
which implies
E { m ψ h 0 ( Z i ; θ 0 ) } = E { m ψ 0 ( Z i ; θ 0 ) } + ( h ) r r ! C K E [ Z i f ( r 1 ) ( 0 | Z i , δ i = 0 ) ] + o ( h r ) .

Appendix A.2. Proof of Lemma 2

To prove (1), we can perform a simple calculation. We have
1 n i = 1 n ψ h i ( 1 ) ( θ 0 , β ^ , γ ^ ) = 1 n i = 1 n δ i ψ h ( Z i , Y i ; θ 0 ) + ( 1 δ i ) m ^ ψ h 0 ( Z i ; θ 0 , β ^ , γ ^ ) = 1 n i = 1 n δ i ψ h ( Z i , Y i ; θ 0 ) + ( 1 δ i ) m ψ h 0 ( Z i ; θ 0 , β 0 , γ 0 ) + 1 n i = 1 n ( 1 δ i ) { m ^ ψ h 0 ( Z i ; θ 0 , β 0 , γ 0 ) m ψ h 0 ( Z i ; θ 0 , β 0 , γ 0 ) } + 1 n i = 1 n ( 1 δ i ) { m ^ ψ h 0 ( Z i ; θ 0 , β ^ , γ ^ ) m ^ ψ h 0 ( Z i ; θ 0 , β 0 , γ 0 ) } = 1 n i = 1 n δ i ψ h ( Z i , Y i ; θ 0 ) + ( 1 δ i ) m ψ h 0 ( Z i ; θ 0 , β 0 , γ 0 ) + 1 n ( 1 δ i ) m ^ ψ h 0 ( Z i ; θ 0 , β * , γ * ) β ( β ^ β 0 ) + 1 n ( 1 δ i ) m ^ ψ h 0 ( Z i ; θ 0 , β * , γ * ) γ ( γ ^ γ 0 ) + o p ( 1 ) = 1 n i = 1 n I i 1 + 1 n i = 1 n I i 2 + 1 n i = 1 n I i 3 + o p ( 1 ) .
Based on the fact that E { ψ h ( Z i , Y i ; θ 0 ) } = O ( h r ) and
E { m ψ h 0 ( Z i ; θ 0 , β 0 , γ 0 ) } = E { m ψ 0 ( Z i ; θ 0 , β 0 , γ 0 ) } + O ( h r ) ,
we have
E I i 1 = E { ψ h ( Z i , Y i ; θ 0 ) } = O ( h r ) .
Additionally, we have
E I i 2 2 = E Z i Z i { δ i G h ( Z i θ 0 Y i ) + ( 1 δ i ) E { G h ( Z i θ 0 Y i ) | Z i , δ i = 0 } } 2 ,
According to the assumptions, as n ,
lim n h 2 r 0 E I i 1 2 = E δ i ψ ( Z i , Y i ; θ 0 ) + ( 1 δ i ) m ψ 0 ( Z i ; θ 0 ) 2 = E π ( X i , Y i ) ψ ( Z i , Y i ; θ 0 ) 2 + ( 1 π ( X i , Y i ) ) m ψ 0 ( Z i ; θ 0 ) 2 : = A 1 ,
thus yielding
1 n i = 1 n I i 1 d N ( 0 , A 1 ) .
For I i 2 , we have
1 n i = 1 n I i 2 = 1 n i = 1 n ( 1 δ i ) m ^ ψ h 0 ( Z i ; θ 0 , β * , γ * ) β n ( β ^ β 0 ) = 1 n i = 1 n ( 1 δ i ) m ψ h 0 ( Z i ; θ 0 , β * , γ * ) β n ( β ^ β 0 ) + o p ( 1 ) .
where
lim n h 2 r 0 E ( 1 δ i ) m ψ h 0 ( Z i ; θ 0 , β * , γ * ) β = E ( 1 δ i ) Cov 0 ( ψ ( Z i , Y i ; θ 0 ) , s ( Z i , Y i ; β 0 ) | Z i ) + o p ( 1 ) .
Let H 1 = E ( 1 δ i ) Cov 0 ( ψ ( Z i , Y i ; θ 0 ) , s ( Z i , Y i ; β 0 ) | Z i ) . By the assumption, we have β ^ β 0 = O p ( n 1 / 2 ) and n ( β ^ β 0 ) d N ( 0 , Σ ) . Therefore, as n ,
1 n i = 1 n I i 2 d N ( 0 , H 1 Σ H 1 ) .
Similarly, let H 2 = E ( 1 δ i ) Cov 0 ( ψ ( Z i , Y i ; θ 0 ) , Y | Z i ) . According to Shao [5], we have γ ^ γ 0 = O p ( n 1 / 2 ) and n ( γ ^ γ 0 ) d N ( 0 , σ 2 ) . Thus, as n ,
1 n i = 1 n I i 3 d N ( 0 , σ 2 H 2 2 ) .
It can be shown that E { I i 1 + I i 2 } = o p ( 1 ) . We have
1 n i = 1 n I i 1 · 1 n i = 1 n I i 2 = 1 n i = 1 n I i 1 I i 2 + 1 n i j n j = 1 n I i 1 I j 2 .
where
1 n i = 1 n I i 1 I i 2 = 1 n i = 1 n ψ h ( l ) ( Z i , Y i , δ i ; θ 0 , β 0 , γ 0 ) m ^ ψ h 0 ( Z i ; θ 0 , β * , γ * ) β ( β ^ β 0 ) = o p ( 1 ) ,
1 n i j n j = 1 n I i 1 I j 2 = 1 n i j j = 1 n ψ h i ( 1 ) ( θ 0 , β 0 , γ 0 ) m ψ h 0 ( Z j ; θ 0 , β 0 , γ 0 ) β ( β ^ β 0 ) + o p ( n 1 / 2 ) .
For i j , ψ h i ( 1 ) ( θ 0 , β 0 , γ 0 ) and m ψ h 0 ( Z j ; θ 0 , β 0 , γ 0 ) β are independent; therefore,
E ψ h i ( 1 ) ( θ 0 , β 0 , γ 0 ) = O ( h r ) . E m ψ h 0 ( Z j ; θ 0 , β 0 , γ 0 ) β ( β ^ β 0 ) = O p ( n 1 / 2 ) .
By the assumption, we have 1 n i j n j = 1 n I i 1 I j 2 = ( n 1 ) O ( h r ) O p ( n 1 / 2 ) = O p ( n 1 / 2 h r ) = o p ( 1 ) . Hence, we can conclude that 1 n i = 1 n I i 1 · 1 n i = 1 n I i 2 = o p ( 1 ) , which implies
Cov ( 1 n i = 1 n I i 1 , 1 n i = 1 n I i 2 ) = o ( 1 ) .
Similarly, we can show that Cov 1 n i = 1 n I i 1 , 1 n i = 1 n I i 3 = o ( 1 ) . Consequently,
Cov ( 1 n i = 1 n I i 1 , 1 n i = 1 n ( I i 2 + I i 3 ) ) = o ( 1 ) .
To establish the asymptotic properties of 1 n i = 1 n I i 2 · 1 n i = 1 n I i 3 , we employ Taylor expansions, thus yielding the following:
n ( γ ^ γ 0 ) = 1 n i = 1 n ( γ 0 ) + o p ( 1 ) , n ( β ^ β 0 ) = 1 n i = 1 n Ψ i ( β 0 ) + o p ( 1 ) .
Consequently, as n , we have
1 n ( I i 2 + I i 3 ) d N ( 0 , B 1 ) ,
where B 1 : = Var ( H 1 Ψ ( β 0 ) + H 2 Φ ( γ 0 ) ) . Furthermore, we obtain
1 n i = 1 n ψ h i ( 1 ) ( θ 0 , β ^ , γ ^ ) d N ( 0 , T 1 ) ,
where T 1 = A 1 + B 1 .
To investigate the asymptotic properties of 1 n i = 1 n ψ h i ( 2 ) ( θ 0 , β ^ , γ ^ ) , we have
1 n i = 1 n ψ h i ( 2 ) ( θ 0 , β ^ , γ ^ ) = 1 n i = 1 n ψ h i ( 2 ) ( θ 0 , β 0 , γ 0 ) + 1 n i = 1 n δ i π ( X i , Y i ; g ^ γ 0 , γ 0 ) δ i π ( X i , Y i ) ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i ; θ 0 ) + 1 n i = 1 n δ i π ( X i , Y i ; g ^ γ ^ , γ ^ ) δ i π ( X i , Y i ; g ^ γ 0 , γ 0 ) ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i ; θ 0 ) + o p ( 1 ) = 1 n i = 1 n L i 1 + 1 n i = 1 n L i 2 + 1 n i = 1 n L i 3 + o p ( 1 ) .
Similar to the previous proof, as n , we have
1 n i = 1 n L i 1 d N ( 0 , A 2 ) ,
where A 2 = E π ( X i , Y i ) 1 ( ψ ( Z i , Y i ; θ 0 ) m ψ 0 ( Z i ; θ 0 ) ) 2 + E m ψ 0 ( Z i ; θ 0 ) 2 .
To analyze the asymptotic behavior of 1 n L i 2 , we have
1 n L i 2 = 1 n i = 1 n 1 δ i π ( X i , Y i ) E ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i ; θ 0 ) | X i , δ i = 0 + o p ( 1 ) = o p ( 1 ) .
According to the analysis, we can conclude that 1 n L i 2 converges to zero in probability, i.e., 1 n L i 2 = o p ( 1 ) .
To establish the asymptotic properties of 1 n i = 1 n L i 3 , we have
1 n i = 1 n L i 3 = 1 n i = 1 n δ i π 1 ( X i , Y i ; g ^ γ , γ ) γ ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i , θ 0 ) ( γ ^ γ 0 ) = 1 n i = 1 n δ i ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i , θ 0 ) exp ( g ^ γ 0 ( X i ) + γ 0 Y i ) { Y i m ^ Y 0 ( X i ; γ 0 ) } ( γ ^ γ 0 ) = 1 n i = 1 n δ i O ( X i , Y i ) ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i , θ 0 ) { Y i m Y 0 ( X i ; γ 0 ) } ( γ ^ γ 0 ) + o p ( 1 ) = : H 3 n ( γ ^ γ 0 ) + o p ( 1 ) ,
where H 3 = E ( 1 δ ) ( Y m Y 0 ( X ) ) ( ψ ( Z , Y ; θ 0 ) m ψ 0 ( Z ; θ 0 ) ) .
According to the Slutzky theorem, we have
1 n i = 1 n L i 3 d N ( 0 , B 2 ) ,
where B 2 = Var ( H 3 Φ ( γ 0 ) ) . Similarly to the previous proof, it can be shown that
Cov ( 1 n i = 1 n L i 1 , 1 n i = 1 n L i 3 ) = o p ( 1 ) ,
which implies that, as n ,
1 n i = 1 n ψ ^ h i ( 2 ) ( θ 0 , β ^ , γ ^ ) d N ( 0 , T 2 ) ,
where T 2 = A 2 + B 2 .
To prove (2), we first establish the asymptotic property of 1 n i = 1 n ψ ^ h i ( 1 ) ( θ 0 , β ^ , γ ^ ) 2 . By the law of large numbers and the fact that γ ^ γ 0 = o p ( 1 ) and β ^ β 0 = o p ( 1 ) , we have
1 n i = 1 n ψ ^ h i ( 1 ) ( θ 0 , β ^ , γ ^ ) 2 = 1 n i = 1 n ψ h i ( 1 ) ( θ 0 , β 0 , γ 0 ) 2 + o p ( 1 ) .
As n , under the assumption, we have
lim n h 2 r 0 E δ i ψ h ( Z i , Y i ; θ 0 ) + ( 1 δ i ) m ψ h 0 ( Z i ; θ 0 ) 2 = lim n h 2 r 0 E δ i ψ h ( Z i , Y i ; θ 0 ) 2 + ( 1 δ i ) m ψ h 0 ( Z i ; θ 0 ) 2 + lim n h 2 r 0 E 2 δ i ( 1 δ i ) ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i ; θ 0 ) = E δ i ψ ( Z i , Y i ; θ 0 ) 2 + ( 1 δ i ) m ψ 0 ( Z i ; θ 0 ) 2 + 2 δ i ( 1 δ i ) ψ ( Z i , Y i ; θ 0 ) m ψ 0 ( Z i ; θ 0 ) = E δ i ψ ( Z i , Y i ; θ 0 ) 2 + ( 1 δ i ) m ψ 0 ( Z i ; θ 0 ) 2 : = A 1 .
Therefore,
1 n i = 1 n ψ ^ h i ( 1 ) ( θ 0 , β ^ , γ ^ ) 2 p A 1 .
Similarly, for 1 n i = 1 n ψ ^ h i ( 2 ) ( θ 0 , β ^ , γ ^ ) 2 , we have
1 n i = 1 n ψ ^ h i ( 2 ) ( θ 0 , β ^ , γ ^ ) 2 = 1 n i = 1 n ψ h i ( 2 ) ( θ 0 , β 0 , γ 0 ) 2 + o p ( 1 ) .
As n , under the assumption, we have
lim n h 2 r 0 E δ i π ( X i , Y i ) { ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i ; θ 0 ) } + m ψ h 0 ( Z i ; θ 0 ) 2 = lim n h 2 r 0 E δ i π ( X i , Y i ) 2 { ψ h ( Z i , Y i ; θ 0 ) m ψ h 0 ( Z i ; θ 0 ) } 2 + { m ψ h 0 ( Z i ; θ 0 ) } 2 = E π ( X i , Y i ) 1 { ψ ( Z i , Y i ; θ 0 ) m ψ 0 ( Z i ; θ 0 ) } 2 + { m ψ 0 ( Z i ; θ 0 ) } 2 = A 2 .
Therefore, we have
1 n i = 1 n ψ ^ h i ( 2 ) ( θ 0 , β ^ , γ ^ ) p A 2 .
Next, we prove (3). Note that, for l = 1 , 2 , we have
E θ ψ h i ( l ) ( θ 0 , β 0 , γ 0 ) = E θ ψ h ( Z i , Y i ; θ 0 ) = E θ Z i G h ( Z i θ Y i ) = E θ Z i I ( Y i < Z i θ 0 u h ) K ( u ) d u = E θ Z i F Y ( Z i θ 0 u h ) K ( u ) d u = E Z i θ F Y ( Z i θ 0 u h ) K ( u ) d u = E Z i Z i f Y ( Z i θ 0 u h ) K ( u ) d u = E Z i Z i f Y ( Z i θ 0 | Z i ) + o p ( 1 ) = E Z i Z i f ( 0 | Z i ) + o p ( 1 ) : = Γ + o p ( 1 ) .
By the law of large numbers, as n , we have 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) θ p Γ .
Finally, we demonstrate (4). From
n 1 ( max i ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ) 2 n 1 i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 0 ,
it can be easily shown that, for l = 1 , 2 , max i ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) = o p ( n 1 / 2 ) .
Proof of Theorem 1.
By applying the Lagrange multiplier method, we obtain the empirical log-likelihood ratio function with respect to the parameter vector θ :
R ^ ( l ) ( θ ) = 2 i = 1 n log { 1 + λ ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) } ,
where λ = λ ( θ ) is the solution to the following equation:
g ( λ ) = 1 n i = 1 n ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) 1 + λ ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) .
In other words, θ ^ simultaneously satisfies the following two equations:
T 1 n ( l ) ( θ , λ ) = 1 n i = 1 n ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) 1 + λ ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) , T 2 n ( l ) ( θ , λ ) = 1 n i = 1 n { ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) / θ } λ 1 + λ ψ ^ h i ( l ) ( θ , β ^ , γ ^ ) .
Note that T 1 n ( l ) ( θ ^ , 0 ) = n 1 i = 1 n ψ ^ h i ( l ) ( θ ^ ) , and T 2 n ( l ) ( θ ^ , 0 ) = 0 . Under the assumption conditions, according to Lemma A.1 in Newey and Smith [30] and Theorem 1(a) in Leng and Tang Leng [31], it can be shown that θ ^ is a consistent estimator of θ 0 . By Taylor expanding T 1 n ( l ) ( θ ^ , λ ) and T 2 n ( l ) ( θ ^ , λ ) around ( θ 0 , 0 ) , we have
0 = T 1 n ( l ) ( θ 0 , 0 ) + T 1 n ( l ) ( θ 0 , 0 ) θ ( θ ^ θ 0 ) + T 1 n ( l ) ( θ 0 , 0 ) λ λ + o p ( u n ) , 0 = T 2 n ( l ) ( θ 0 , 0 ) + T 2 n ( l ) ( θ 0 , 0 ) θ ( θ ^ θ 0 ) + T 2 n ( l ) ( θ 0 , 0 ) λ λ + o p ( u n ) ,
where u n = θ ^ θ 0 | + | λ .
The above equations can be rewritten as follows:
λ θ ^ θ 0 = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) θ 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) θ 0 1 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) + o p ( u n ) o p ( u n ) .
Based on the results of Lemma 2, we have
n ( θ ^ ( l ) θ 0 ) = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) θ 1 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) + o p ( 1 ) .
Therefore, for l = 1 , 2 and as n , we have
n ( θ ^ ( l ) θ 0 ) d N ( 0 , Γ 1 T l Γ ) .
Proof of Theorem 2.
First, we note that λ = O p ( n 1 / 2 ) . Let λ = λ ( θ 0 ) = ρ u , where u = λ / λ and u = 1 . We have
0 = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 + λ ( θ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 + ρ u ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) u ρ 1 + ρ u ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ,
By multiplying both sides of the equation by u , we obtain
u 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) = 1 n i = 1 n u ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) u ρ 1 + ρ u ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 1 + ρ max i | ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) | 1 n i = 1 n u ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) u ρ .
Thus, we can conclude that
1 n i = 1 n u ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) u ρ u 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) { 1 + ρ max i ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) } .
Based on Lemma 2, we have
ρ u A l u + o p ( 1 ) O p ( n 1 / 2 ) 1 + ρ o p ( n 1 / 2 ) .
Consequently, it follows that ρ = O p ( n 1 / 2 ) . Furthermore, we can observe that
max i λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) λ max i ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) = O p ( n 1 / 2 ) o p ( n 1 / 2 ) = o p ( 1 ) .
By expanding the function g ( λ ) , we obtain
0 = g ( λ ) = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) + [ λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ] 2 ( 1 + η i ) 3 = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) λ 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 + 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) [ λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ] 2 ( 1 + η i ) 3 ,
where η i ( 0 , λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ) . From the fact that max i λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) = o p ( 1 ) , it follows that | ξ i | = o p ( 1 ) .
Note that
1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) [ λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ] 2 ( 1 + η i ) 3 max i ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 max i | ξ i | λ 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 λ = o p ( n 1 / 2 ) O p ( n 1 ) = o p ( n 1 / 2 ) .
Therefore,
λ = 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 1 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) + ζ ,
where ζ = o p ( n 1 / 2 ) .
By expanding R ^ ( l ) ( θ 0 ) around θ 0 using a Taylor series, we obtain
R ^ ( l ) ( θ 0 ) = 2 i = 1 n λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 2 [ λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ] 2 + 1 3 [ λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ] 3 ( 1 + ξ i ) 3 = 2 λ i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) i = 1 n λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) λ + 2 3 i = 1 n [ λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ] 3 ( 1 + ξ i ) 3 .
Similarly,
i = 1 n [ λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ] 3 ( 1 + ξ i ) 3 max i λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 max i | ξ i | λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) λ = o p ( 1 ) n O p ( n 1 ) = o p ( 1 ) .
Therefore, we obtain
R ^ ( l ) ( θ 0 ) = 2 λ i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) i = 1 n i = 1 n λ ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) λ + o p ( 1 ) .
By combining the previous results, we can express
R ^ ( l ) ( θ 0 ) = n 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 1 n ζ 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 ζ + o p ( 1 ) .
Here, n ζ 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 ζ = n o p ( n 1 / 2 ) o p ( n 1 / 2 ) = o p ( 1 ) . Thus,
R ^ ( l ) ( θ 0 ) = n 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) 2 1 · 1 n i = 1 n ψ ^ h i ( l ) ( θ 0 , β ^ , γ ^ ) + o p ( 1 ) .
Based on the results (1) and (2) of Lemma 2, it can be easily demonstrated that, as n tends to infinity, the asymptotic distribution of R ^ ( l ) ( θ 0 ) follows a linear combination of independent chi-squared random variables:
R ^ ( l ) ( θ 0 ) d r 1 ( l ) χ 1 · 1 2 + r 2 ( l ) χ 1 · 2 2 + + r q ( l ) χ 1 · q 2 ,
where r i ( l ) represents the eigenvalues of A l 1 T l . Here, χ 1 · 1 2 , χ 1 · 2 2 , , χ 1 · q 2 denote q independent standard chi-squared distributed random vectors. This completes the proof of the theorem. □

References

  1. Robins, J.M.; Ritov, Y. Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi–parametric models. Stat. Med. 1997, 16, 285–319. [Google Scholar] [CrossRef]
  2. Wang, S.; Shao, J.; Kim, J.K. An instrumental variable approach for identification and estimation with nonignorable nonresponse. Stat. Sin. 2014, 24, 1097–1116. [Google Scholar] [CrossRef]
  3. Kenward, M.G. Selection models for repeated measurements with non–random dropout: An illustration of sensitivity. Stat. Med. 1998, 17, 2723–2732. [Google Scholar] [CrossRef]
  4. Kim, J.K.; Yu, C.L. A semiparametric estimation of mean functionals with nonignorable missing data. J. Am. Stat. Assoc. 2011, 106, 157–165. [Google Scholar] [CrossRef]
  5. Shao, J.; Wang, L. Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 2016, 103, 175–187. [Google Scholar] [CrossRef]
  6. Kim, J.K.; Shao, J. Statistical Methods for Handling Incomplete Data; CRC Press: New York, NY, USA, 2022. [Google Scholar]
  7. Koenker, R.; Bassett, G. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  8. Koenker, R.; Bassett, G. Tests of linear hypotheses and L1 estimation. Econometrica 1982, 50, 1577–1583. [Google Scholar] [CrossRef]
  9. Horowitz, J.L. Bootstrap methods for median regression models. Econometrica 1998, 66, 1327–1351. [Google Scholar] [CrossRef]
  10. Whang, Y.J. Bootstrap methods for median regression models. Econ. Theory 2006, 22, 173–205. [Google Scholar]
  11. Luo, S.H.; Mei, C.L.; Zhang, C.Y. Smoothed empirical likelihood for quantile regression models with response data missing at random. Adv. Stat. Anal. 2017, 101, 95–116. [Google Scholar] [CrossRef]
  12. Zhang, T.; Wang, L. Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response. Comput. Stat. Data. Anal. 2020, 144, 106888. [Google Scholar] [CrossRef]
  13. Niu, C.; Guo, X.; Xu, W.; Zhu, L. Empirical likelihood inference in linear regression with nonignorable missing response. Comput. Stat. Data. Anal. 2014, 79, 91–112. [Google Scholar] [CrossRef]
  14. Bindele, H.F.; Zhao, Y.C. Rank-based estimating equation with non-ignorable missing responses via empirical likelihood. Stat. Sin. 2018, 28, 1787–1820. [Google Scholar] [CrossRef]
  15. Chen, X.R.; Wan, A.T.K.; Zhou, Y. Efficient quantile regression analysis with missing observations. J. Am. Stat. Assoc. 2015, 110, 723–741. [Google Scholar] [CrossRef]
  16. Tang, N.S.; Zhao, P.Y.; Zhu, H.T. Efficient quantile regression analysis with missing observations. Stat. Sin. 2014, 24, 723–747. [Google Scholar] [PubMed]
  17. Kim, J.K. Parametric fractional imputation for missing data analysis. Biometrika 2011, 98, 119–132. [Google Scholar] [CrossRef]
  18. Riddles, M.K.; Kim, J.K.; Im, J. A propensity-score-adjustment method for nonignorable nonresponse. J. Surv. Stat. Methodol. 2016, 4, 215–245. [Google Scholar] [CrossRef]
  19. Paik, M.; Larsen, M.D. Handling nonignorable nonresponse with respondent modeling and the SIR algorithm. J. Stat. Plan. Inference 2014, 145, 179–189. [Google Scholar] [CrossRef]
  20. Wang, X.L.; Song, Y.Q.; Lin, L. Handling estimating equation with nonignorably missing data based on SIR algorithm. J. Comput. Appl. Math. 2017, 326, 62–70. [Google Scholar] [CrossRef]
  21. Song, Y.Q.; Zhu, Y.J.; Wang, X.L.; Lin, L. Robust inference for estimating equations with nonignorably missing data based on SIR algorithm. J. Stat. Comput. Simul. 2019, 89, 3196–3212. [Google Scholar] [CrossRef]
  22. Newey, W.K.; McFadden, D. Large sample estimation and hypothesis testing. In Handbook of Econometrics; Engle, R.F., McFadden, D., Eds.; Elsevier: Amsterdam, The Netherlands, 1994; pp. 2111–2245. [Google Scholar]
  23. Van der Vaart, A.W. Semiparametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1999); Bernard, P., Ed.; Springer: Berlin, Germany, 2002; pp. 331–457. [Google Scholar]
  24. Morikawa, K.; Kim, J.K.; Kano, Y. Semiparametric maximum likelihood estimation with data missing not at random. Can. J. Stat. 2017, 45, 393–409. [Google Scholar] [CrossRef]
  25. Morikawa, K.; Kim, J.K. Semiparametric optimal estimation with nonignorable nonresponse data. Ann. Stat. 2021, 49, 2991–3014. [Google Scholar] [CrossRef]
  26. Zhao, P.; Wang, L.; Shao, J. Empirical likelihood and Wilks phenomenon for data with nonignorable missing values. Scan. J. Stat. 2019, 46, 1003–1024. [Google Scholar] [CrossRef]
  27. Hammer, S.M.; Katzenstein, D.A.; Hughes, M.D.; Gundacker, H.; Schooley, R.T.; Haubrich, R.H.; Henry, W.K.; Lederman, M.M.; Phair, J.P.; Niu, M.; et al. A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. N. Engl. J. Med. 1996, 335, 1081–1090. [Google Scholar] [CrossRef] [PubMed]
  28. Davidian, M.; Tsiatis, A.A.; Leon, S. Semiparametric estimation of treatment effect in a pretest–posttest study with missing data. Stat. Sci. 2005, 20, 261–301. [Google Scholar] [CrossRef] [PubMed]
  29. Zhang, M.; Tsiatis, A.A.; Davidian, M. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics 2008, 64, 707–715. [Google Scholar] [CrossRef]
  30. Newey, W.K.; Smith, R.J. Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 2004, 72, 219–255. [Google Scholar] [CrossRef]
  31. Leng, C.L.; Tang, C.Y. Penalized empirical likelihood and growing dimensional general estimating equations. Biometrika 2012, 99, 703–716. [Google Scholar] [CrossRef]
Figure 1. Histogram of the residual distribution for the parameter working model N ( μ ^ ( X ) , σ ^ 2 ) .
Figure 1. Histogram of the residual distribution for the parameter working model N ( μ ^ ( X ) , σ ^ 2 ) .
Mathematics 11 04906 g001
Figure 2. QQ plot of the residual distribution for the parameter working model N ( μ ^ ( X ) , σ ^ 2 ) .
Figure 2. QQ plot of the residual distribution for the parameter working model N ( μ ^ ( X ) , σ ^ 2 ) .
Mathematics 11 04906 g002
Figure 3. Histogram of complete observed data CD 4 96 in ACTG175.
Figure 3. Histogram of complete observed data CD 4 96 in ACTG175.
Mathematics 11 04906 g003
Figure 4. Histogram of residuals from the parameterized working model.
Figure 4. Histogram of residuals from the parameterized working model.
Mathematics 11 04906 g004
Figure 5. Q-Q plot of residuals from the parameterized working model.
Figure 5. Q-Q plot of residuals from the parameterized working model.
Mathematics 11 04906 g005
Table 1. Monte Carlo mean, standard deviation (SD), and approximate relative performance (ARE) of the five methods for error term (a).
Table 1. Monte Carlo mean, standard deviation (SD), and approximate relative performance (ARE) of the five methods for error term (a).
FULLCCEEIIPWAIPW
θ 0 θ 1 θ 0 θ 1 θ 0 θ 1 θ 0 θ 1 θ 0 θ 1
τ = 0.25 Mean−1.6061.009−1.6521.002−1.5981.008−1.6051.008−1.6081.008
SD0.0540.1050.0630.1230.0490.0950.0620.1210.0610.122
ARE1.0001.0001.4451.1730.9190.9051.1481.1521.1301.162
τ = 0.5 Mean−0.9501.010−0.9991.002−0.9531.008−0.9501.009−0.9501.008
SD0.0490.0980.0570.1160.0440.0890.0570.1170.0570.118
ARE1.0001.0001.5341.1860.9000.9081.1631.1941.1631.204
τ = 0.75 Mean−0.2981.009−0.3491.001−0.3101.008−0.2971.008−0.2931.008
SD0.0520.1070.0610.1260.0460.0970.0630.1300.0620.131
ARE1.0001.0001.5291.1800.9140.9071.2121.2151.1961.224
Table 2. Monte Carlo mean, standard deviation (SD), and approximate relative performance (ARE) of the five methods for error term (b).
Table 2. Monte Carlo mean, standard deviation (SD), and approximate relative performance (ARE) of the five methods for error term (b).
FULLCCEEIIPWAIPW
θ 0 θ 1 θ 0 θ 1 θ 0 θ 1 θ 0 θ 1 θ 0 θ 1
τ = 0.25 Mean−1.5031.004−1.5341.004−1.4921.005−1.5021.006−1.5071.005
SD0.0450.0990.0520.1120.0410.0880.0510.1090.0510.110
ARE1.0001.0001.3451.1310.9430.8891.1341.1011.1371.111
τ = 0.5 Mean−0.9681.004−1.0000.0997−0.9701.003−0.9681.004−0.9681.003
SD0.0400.0920.0460.1040.0360.0810.0460.1050.0470.106
ARE1.0001.0001.4011.1330.9010.8811.1501.1411.1751.152
τ = 0.75 Mean−0.4331.006−0.4670.996−0.4471.007−0.4321.007−0.4281.006
SD0.0440.1010.0510.1190.0390.0900.0530.1240.0520.123
ARE1.0001.0001.3921.1820.9420.8911.2051.2281.1871.218
Table 3. The bias (Bias), standard deviation (SD), and median absolute deviation (MAD) of the five types of quantile regression coefficient estimates at τ = 0.25 .
Table 3. The bias (Bias), standard deviation (SD), and median absolute deviation (MAD) of the five types of quantile regression coefficient estimates at τ = 0.25 .
Method θ 0 θ 1 θ 2
BiasSDMADBiasSDMADBiasSDMAD
full−0.0210.1000.0670.0020.0690.0460.0040.0380.026
cc0.0010.1220.086−0.0030.0790.0520.0200.0410.032
ipw−0.0390.1490.081−0.0010.1030.0530.0060.0650.029
eei.0−0.3620.1580.3580.0190.0980.0670.0470.0440.049
eei.1−0.0030.1370.0930.0020.0820.0540.0030.0430.029
aipw.0−0.0260.1200.0860.0020.0810.0540.0050.0410.028
aipw.1−0.0240.1150.0820.0020.0780.0520.0050.0400.027
Table 4. The bias (Bias), standard deviation (SD), and median absolute deviation (MAD) of the five types of quantile regression coefficient estimates at τ = 0.5 .
Table 4. The bias (Bias), standard deviation (SD), and median absolute deviation (MAD) of the five types of quantile regression coefficient estimates at τ = 0.5 .
Method θ 0 θ 1 θ 2
BiasSDMADBiasSDMADBiasSDMAD
full0.0010.0870.0560.0010.0640.0440.0010.0340.022
cc0.1571.0890.076−0.0390.3980.048−0.0030.1510.027
ipw0.0290.1030.067−0.0170.3950.047−0.0010.0650.024
eei.0−0.0890.1180.1040.0060.0760.0480.0130.0390.026
eei.10.0070.1270.0850.0010.0770.0490.0010.0390.026
aipw.0−0.0080.1590.0660.0010.1100.0470.0020.0370.025
aipw.1−0.0020.1240.0690.0030.0790.0460.0020.0360.024
Table 5. The bias (Bias), standard deviation (SD), and median absolute deviation (MAD) of the five types of quantile regression coefficient estimates at τ = 0.75 .
Table 5. The bias (Bias), standard deviation (SD), and median absolute deviation (MAD) of the five types of quantile regression coefficient estimates at τ = 0.75 .
Method θ 0 θ 1 θ 2
BiasSDMADBiasSDMADBiasSDMAD
full0.0190.0950.0630.0010.0720.048−0.0010.0370.025
cc0.0480.1170.082−0.0030.0800.0540.0090.0410.029
ipw0.0110.2460.0700.0020.0790.0510.0010.0490.026
eei.00.0920.1270.106−0.0030.0840.055−0.0090.0410.029
eei.10.0190.1310.084−0.0010.0810.054−0.0020.04190.029
aipw.0−0.0470.5100.0730.0200.3610.0520.0010.0720.026
aipw.1−0.0170.4240.0720.0060.2480.052−0.0010.0640.025
Table 6. Analysis results of the ACTG175 dataset.
Table 6. Analysis results of the ACTG175 dataset.
CovariateAIPW EstimatorIPW EstimatorEEI EstimatorCC Estimator
EstCIEstCIEstCIEstCI
τ = 0.25
intercept−0.527(−0.576, −0.478)−0.493(−0.656, −0.362)−0.528(−0.708, −0.416)−0.509(−0.671, −0.386)
age−0.001(−0.019, 0.021)0.001(−0.058, 0.056)0.091(0.029, 0.141)−0.006(−0.061, 0.059)
wtkg0.007(−0.012, 0.026)0.029(−0.025, 0.083)−0.147(−0.198, −0.085)0.031(−0.025, 0.091)
race−0.057(−0.091, −0.022)−0.089(−0.218, 0.026)−0.056(−0.152, 0.032)−0.092(−0.204, 0.027)
gender−0.002(−0.042, 0.037)−0.061(−0.148, 0.075)0.024(−0.079, 0.176)−0.057(−0.131, 0.076)
history−0.231(−0.266, −0.197)−0.234(−0.349, −0.142)−0.216(−0.293, −0.123)−0.227(−0.331, −0.134)
offtrt−0.549(−0.584, −0.513)−0.528(−0.716, −0.416)−0.399(−0.449, −0.268)−0.553(−0.717, −0.434)
CD 4 0 0.474(0.459, 0.489)0.496(0.446, 0.549)0.472(0.416, 0.506)0.493(0.442, 0.543)
trt0.367(0.332, 0.401)0.369(0.254, 0.475)0.239(0.168, 0.344)0.377(0.261, 0.479)
τ = 0.5
intercept−0.072(−0.144, −0.001)−0.006(−0.221, 0.199)0.025(−0.128, 0.147)−0.008(−0.253, 0.180)
age−0.021(−0.042, −0.001)−0.045(−0.091, 0.031)0.134(0.076, 0.179)−0.051(−0.091, 0.029)
wtkg−0.005(−0.024, 0.012)0.016(−0.035, 0.091)−0.175(−0.210, −0.104)0.016(−0.032, 0.095)
race−0.084(−0.121, −0.041)−0.115(−0.251, 0.017)−0.051(−0.182, 0.049)−0.134(−0.246, 0.013)
gender−0.013(−0.068, 0.042)−0.046(−0.212, 0.113)−0.002(−0.116, 0.096)−0.066(−0.197, 0.108)
history−0.243(−0.279, −0.208)−0.276(−0.406, −0.148)−0.217(−0.285, −0.099)−0.287(−0.407, −0.160)
offtrt−0.384(−0.422, −0.345)−0.493(−0.621, −0.302)−0.321(−0.383, −0.183)−0.500(−0.623, −0.331)
CD 4 0 0.509(0.493, 0.529)0.531(0.481, 0.597)0.523(0.492, 0.571)0.517(0.478, 0.585)
trt0.372(0.329, 0.415)0.366(0.254, 0.524)0.231(0.136, 0.306)0.385(0.261, 0.530)
τ = 0.75
intercept0.456(0.397, 0.515)0.553(0.319, 0.721)0.591(0.506, 0.896)0.547(0.328, 0.692)
age0.037(0.013, 0.059)0.005(−0.051, 0.068)0.181(0.119, 0.223)0.009(−0.051, 0.069)
wtkg0.026(0.004, 0.047)0.045(−0.009, 0.102)−0.163(−0.207, −0.106)0.046(−0.007, 0.106)
race−0.122(−0.158, −0.087)−0.182(−0.291, −0.057)−0.095(−0.131, 0.068)−0.188(−0.289, −0.056)
gender0.018(−0.029, 0.066)−0.039(−0.183, 0.114)0.041(−0.205, 0.132)−0.037(−0.195, 0.084)
history−0.207(−0.242, −0.173)−0.250(−0.350, −0.139)−0.221(−0.306, −0.109)−0.255(−0.348, −0.143)
offtrt−0.214(−0.251, −0.177)−0.403(−0.559, −0.230)−0.226(−0.358, −0.114)−0.457(−0.579, −0.266)
CD 4 0 0.557(0.536, 0.577)0.566(0.514, 0.674)0.539(0.489, 0.589)0.559(0.505, 0.649)
trt0.283(0.235, 0.330)0.316(0.181, 0.438)0.142(0.118, 0.179)0.318(0.203, 0.447)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, J.; Liu, F.; Härdle, W.K.; Zhang, X.; Wang, K.; Zeng, T.; Yang, L.; Tian, M. Sampling Importance Resampling Algorithm with Nonignorable Missing Response Variable Based on Smoothed Quantile Regression. Mathematics 2023, 11, 4906. https://doi.org/10.3390/math11244906

AMA Style

Guo J, Liu F, Härdle WK, Zhang X, Wang K, Zeng T, Yang L, Tian M. Sampling Importance Resampling Algorithm with Nonignorable Missing Response Variable Based on Smoothed Quantile Regression. Mathematics. 2023; 11(24):4906. https://doi.org/10.3390/math11244906

Chicago/Turabian Style

Guo, Jingxuan, Fuguo Liu, Wolfgang Karl Härdle, Xueliang Zhang, Kai Wang, Ting Zeng, Liping Yang, and Maozai Tian. 2023. "Sampling Importance Resampling Algorithm with Nonignorable Missing Response Variable Based on Smoothed Quantile Regression" Mathematics 11, no. 24: 4906. https://doi.org/10.3390/math11244906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop