Doubly Robust Estimation and Semiparametric Efficiency in Generalized Partially Linear Models with Missing Outcomes

Wang, Lu; Ouyang, Zhongzhe; Lin, Xihong

doi:10.3390/stats7030056

Open AccessArticle

Doubly Robust Estimation and Semiparametric Efficiency in Generalized Partially Linear Models with Missing Outcomes

by

Lu Wang

^1,*,

Zhongzhe Ouyang

¹ and

Xihong Lin

²

¹

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA

²

Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA

^*

Author to whom correspondence should be addressed.

Stats 2024, 7(3), 924-943; https://doi.org/10.3390/stats7030056

Submission received: 9 March 2024 / Revised: 20 July 2024 / Accepted: 24 July 2024 / Published: 31 August 2024

(This article belongs to the Special Issue Novel Semiparametric Methods)

Download

Browse Figures

Versions Notes

Abstract

:

We investigate a semiparametric generalized partially linear regression model that accommodates missing outcomes, with some covariates modeled parametrically and others nonparametrically. We propose a class of augmented inverse probability weighted (AIPW) kernel–profile estimating equations. The nonparametric component is estimated using AIPW kernel estimating equations, while parametric regression coefficients are estimated using AIPW profile estimating equations. We demonstrate the doubly robust nature of the AIPW estimators for both nonparametric and parametric components. Specifically, these estimators remain consistent if either the assumed model for the probability of missing data or that for the conditional mean of the outcome, given covariates and auxiliary variables, is correctly specified, though not necessarily both simultaneously. Additionally, the AIPW profile estimator for parametric regression coefficients is consistent and asymptotically normal under the semiparametric model defined by the generalized partially linear model on complete data, assuming that the missing data mechanism is missing at random. When both working models are correctly specified, this estimator achieves semiparametric efficiency, with its asymptotic variance reaching the efficiency bound. We validate our approach through simulations to assess the finite sample performance of the proposed estimators and apply the method to a study that investigates risk factors associated with myocardial ischemia.

Keywords:

asymptotics; augmented inverse probability weighting; kernel smoothing; missing data at random; profile-kernel estimating equation; semiparametric efficiency

1. Introduction

Generalized partially linear models,

E (Y | X, Z) = μ \{X^{T} β + θ (Z)\}

(1)

where

μ (\cdot)

is a known monotonic link function (McCullagh and Nelder 1989 [1]), Y is an outcome of interest,

X

is a

p \times 1

vector of primary covariates, Z is an additional scalar covariate,

β

is an unknown parameter vector of dimension p, and

θ (\cdot)

is an unknown smooth function, have been extensively studied without missing data (Severini and Staniswalis 1994 [2]; Hastie and Tibshirani 1990 [3]; Fan, et al. 1995 [4]; Carroll, et al. 1997 [5]; Lin and Carroll 2001a [6], 2001b [7]; Muller 2001 [8]; Hu and Cui 2010 [9]; Rahman, et al. 2020 [10]). In model (1),

X^{T} β

summarizes the dependence of the outcome mean on covariates

X

of interest whereas the unknown smooth function

θ (\cdot)

allows for model flexibility for the dependence on a secondary covariate Z. Our contribution in this paper is to study the estimation of

β

and

θ (\cdot)

and the asymptotics when the outcome Y is missing at random (MAR), i.e., missingness depends on observed data (Little and Rubin 2002 [11]) while some additional auxiliary variables and information exist (Chu and Halloran 2004 [12]).

Our work is motivated by the investigation of risk factors for myocardial ischemia (reduced blood flow due to obstruction in the vessels) from data collected at the radiology clinic of a nuclear imaging group. The standard technique for screening myocardial ischemia at the time of data collection was dual-isotope myocardial perfusion single-photon emission computed tomography (SPECT), whose use was not only expensive but also involved ingestion of radioactive tracing material. Because of this, only a subset of the subjects who attended the radiology clinic were actually referred to have the SPECT test performed. Instead, all subjects attending the clinic were screened with electron beam computed tomography (EBCT). This device is routinely used to measure the degree of calcification in the arteries (Braun, et al. 1996 [13]). Doctors decided whether or not to refer a subject to the SPECT test based on the information available, including the results of the EBCT test. In our investigation, we wish to make inferences about the parameter

β

in the logistic partially linear regression model (1), i.e., with

μ (u) = {\{1 + exp (- u)\}}^{- 1}

, when Y is a binary indicator of a positive SPECT test, Z is age, and

X

variables include gender, smoking status, blood pressure status (high/low), cholesterol status (high/low), and the presence of chest pain, under the assumption that whether to refer a patient to the SPECT test depends only on the recorded covariates and the EBCT test. Consequently, the missingness of the outcome Y is MAR in this study.

The literature is vast on inference on regression coefficients

β

in parametric generalized linear models of the form

E (Y | X) = μ (X^{T} β)

when outcomes are missing at random. Both likelihood-based approaches (Little 1982 [14], 1995 [15]; Little and Rubin 2002 [11]) and estimating equation-based approaches (Robins and Rotnitzky 1995 [16]; Robins et al. 1995 [17]) have been extensively studied. Inference on the nonparametric function

θ (\cdot)

in generalized nonparametric models

E (Y | Z) = μ \{θ (Z)\}

with outcomes missing at random has also been studied in the literature (e.g., Wang et al. 1998 [18]; Chen et al. 2006 [19]; Wang et al. 2010 [20]; and Kennedy et al. 2017 [21]). Our primary interest in this paper lies in estimating the finite dimensional parameter vector

β

while treating the infinite dimensional parameter

θ (\cdot)

as a nuisance parameter under semiparametric model (1) in the presence of missing outcomes. Liang et al. (2004) [22], Liang (2008) [23], and Wang (2009) [24] considered semiparametric models in the presence of missing covariates. Wang et al. (2004) [25] and Wang and Sun (2007) [26] considered imputation and weighted estimators in partially linear models for Gaussian outcomes when outcomes are missing at random. Liang et al. (2007) [27] also extended the work to a scenario when covariates are measured with error. Chen and Keilegom (2013) [28] proposed an imputation method for semiparametric models, and Kennedy et al. (2017) [21] proposed a kernel-smoothing method for estimating continuous treatment effects, but all these authors do not allow for auxiliary covariates. To the best of our knowledge, there is no existing literature on the semiparametric efficiency bound and semiparametric efficient estimators in generalized semiparametric regression models (1) for both continuous and discrete outcomes when outcomes are missing at random in the presence of auxiliary covariates. This paper aims to fill this gap.

Specifically, this paper makes the following three major contributions and provides a comprehensive investigation of inference in the generalized semiparametric regression model (1) when outcomes are missing at random: (i) Unlike previous authors, we allow for the possibility that some auxiliary covariate(s)

U

are available. For example, in the analysis of myocardial ischemia data,

U

is the EBCT test result. The auxiliary covariates are not of primary interest in the sense that we are concerned with the estimation of

E (Y | X, Z)

rather than

E (Y | X, Z, U)

. They allow for a weaker modeling assumption that the missingness is assumed to be independent of outcomes, conditional on the auxiliary covariates. They can also help improve the efficiency in estimation of both

θ (\cdot)

and

β

. (ii) We derive the explicit form of a semiparametric efficiency score and efficiency bound in generalized partially linear models in the presence of auxiliary covariates when outcomes are missing at random. (iii) We propose a locally semiparametric efficient estimator of

β

in model (1) that reaches the semiparametric efficiency bound when Y is missing at random. Specifically, we propose augmented inverse probability weighted (AIPW) kernel–profile estimating equations where for a given

β

, the nonparametric function

θ (\cdot)

is estimated using the AIPW kernel estimating equation and the parametric regression coefficient

β

is estimated using the AIPW profile estimating equation of

β

given in Equation (6). The joint estimation of

θ (z)

and

β

proceeds by iteratively solving the two sets of equations. Construction of the proposed estimators requires the specification of a parametric model for the missing data mechanism and a parametric model for

E (Y | X, Z, U)

. Yet, consistency of the proposed estimators of

θ (\cdot)

and

β

requires that one of these models is correctly specified but not necessarily both, that is, the proposed estimators of both

β

and

θ (\cdot)

are doubly robust (Robins and Rotnitzky, 1995 [16]; Robins et al. 1994 [29], 1995 [17]; Rotnitzky et al. 1998 [30]; and Bang and Robins 2005 [31]). In addition, the proposed estimator of

β

achieves the semiparametric efficiency bound when both models are correctly specified.

The rest of this paper is organized as follows. Section 2 formalizes the inferential problem. Section 3 delineates our proposed method for constructing estimators for both

β

and

θ (\cdot)

. In Section 4, we delve into the study of asymptotic efficiency concerning the estimation of

β

, presenting the derived semiparametric efficient score and efficiency bound. Section 5 examines the asymptotic properties of the proposed estimators for both

β

and

θ (\cdot)

, emphasizing the local semiparametric efficiency of our proposed estimator for

β

. Subsequently, Section 6 conducts a simulation study to evaluate the finite sample performance of the proposed methods, while Section 7 applies these methods to analyze data stemming from the myocardial ischemia study. Finally, we offer concluding remarks in Section 8.

2. A Formalization of the Inferential Problem

Suppose we would ideally like to measure variables

(Y, X, Z)

on a random sample of n subjects from a population of interest, where the variables follow model (1) with a known monotonic link function

μ (\cdot)

that has a continuous first derivative,

β \in \underset{̲}{β}

, an open set in

R^{p}

, and an unknown smooth function

θ (\cdot)

. For example, in the myocardial ischemia study, we include age as a nonparametric predictor due to the potential nonlinear effect of aging on the risk of myocardial ischemia, and we model all other covariates parametrically to avoid the curse of dimensionality. In this paper, we discuss the estimation of

β

and

θ (\cdot)

in settings where Y is only observed on a subsample, but

(X, Z)

and additional auxiliary variables

U

are always observed under the assumption that Y is missing at random (Little and Rubin 2002 [11]), i.e.,

P r (R = 1 | X, Z, U, Y) = P r (R = 1 | X, Z, U),

(2)

where

R = 1

if Y is observed and

R = 0

otherwise. Under assumption (2), missingness of the outcome Y may depend on

X

, Z, and

U

but is independent of Y given (

X

, Z,

U

). In the myocardial ischemia study described in the Introduction, assumption (2) would hold if the variables

X, Z

and the EBCT test result

U

were the only correlates of the SPECT test result that were used by doctors to decide the SPECT test referral status.

In our context, it is worth noting that the variables

U

are not our primary focus; that is, our concern lies in estimating

E (Y | X, Z)

rather than

E (Y | X, Z, U)

. However, these auxiliary variables may be necessary to ensure that the missingness is conditionally independent of the outcome Y. For instance, in the myocardial ischemia study, our primary interest does not lie in the relationship between EBCT and SPECT test results but rather in understanding the risk of a positive SPECT test result, indicative of myocardial ischemia, in relation to Z and

X

—factors like age, gender, smoking, and other health indicators. Nevertheless, if the referral to the SPECT exam within the strata of

X

and Z were influenced by the EBCT test values, then (2) would fail if

U

were omitted from both sides of the equation, because the EBCT and SPECT test results exhibit correlation within the strata defined by

(X, Z)

.

We consider the estimation of

β

and

θ (\cdot)

when Y is either missing by happenstance, where

π_{0} (X, Z, U) \equiv P r (R = 1 | X, Z, U)

, which is sometimes abbreviated as

π_{0}

when no confusion exists, is an unknown function of

(X, Z, U)

consequently, as in the case of the myocardial ischemia study, or missing by design, where

π_{0}

is a known function as it is the case in a designed two-stage study (Pepe, 1992 [32]; Reilly and Pepe, 1995 [33]). In the latter case,

(X, Z, U)

are measured across the entire sample in the initial stage, followed by the selection of a subsample in the subsequent stage, with selection probabilities contingent on the data from the first stage, and Y is measured within this subsample.

3. The Estimation Procedure

3.1. The AIPW Kernel–Profile Estimating Equations

Our estimating procedure is based on augmented inverse probability weighted (AIPW) kernel–profile estimating equations, where

θ (\cdot)

is estimated using AIPW kernel estimating equations (Wang et al. 2010 [20]) and

β

is estimated using profile-type AIPW estimating equations. An AIPW kernel- or profile-type estimating function is constructed as the sum of an inverse probability weighted (IPW) estimating function, corresponding to a kernel- or profile-type, and a specific augmentation term, with weights equal to either the inverse of

π_{0}

if

π_{0}

is known, as in designed two-stage studies, or the inverse of

\hat{π} \equiv π (X, Z, U; \hat{τ})

if

π_{0}

is unknown, as when Y is missing by happenstance, where

\hat{τ}

is the maximum likelihood estimator of

τ \in R^{k}

under a postulated parametric model,

P r (R = 1 | X, Z, U) = π (X, Z, U; τ),

(3)

and

π (X, Z, U; τ)

is a known smooth function subject to

τ

. For example,

π (X, Z, U; τ) = expit {τ_{0} + τ_{1}^{T} X + τ_{2} Z + τ_{3}^{T} U}

, where

τ = {(τ_{0}, τ_{1}^{T}, τ_{2}, τ_{3}^{T})}^{T}

and expit

(x) = exp (x) / \{1 + exp (x)\}

. A special case when there are no augmentation terms is referred to as the IPW kernel–profile estimating equations, which are similar to those described in Carroll et al. (1997) [5] and Wang et al. (2005) [34], based on units with Y observed but with each contributing unit weighted by the inverse of its selection probability, if known, or an estimate of it otherwise. The IPW estimators are easier to compute than the AIPW estimator. However, as we shall see, the IPW estimators are generally not efficient nor doubly robust.

To construct an AIPW kernel–profile estimator

{{\hat{θ}}_{A I P W} (\cdot), {\hat{β}}_{A I P W}}

, we initially input a user-specified

p \times 1

function

δ (X, Z, U)

and postulate a working model

v a r (ϵ_{δ}^{*} | X, Z) = V (X, Z; ζ)

(4)

for the conditional variance of

ϵ_{δ}^{*} \equiv \frac{R}{π_{0}} ϵ - (\frac{R}{π_{0}} - 1) \times [δ (X, Z, U) - μ {X^{T} β + θ (Z)}],

with

ϵ \equiv Y - μ {X^{T} β_{0} + θ_{0} (Z)}

,

β_{0}

and

θ_{0} (\cdot)

as the true values of

β

and

θ (\cdot),

V (X, Z; \cdot)

as a known smooth function, and

ζ \in R^{r}

as an unknown finite dimensional parameter vector.

For conciseness, here we describe AIPW local linear kernel–profile estimating equations. Extensions to weighted local polynomial estimating equations are straightforward. In what follows,

K_{h} (s) = h^{- 1} K (s / h)

, where

K (\cdot)

is a mean-zero density function,

α = {(α_{0}, α_{1})}^{T}

, and for any scalar u we let

G (u) = {(1, u)}^{T}

. We describe the algorithm for the case in which

π_{0}

is unknown. For the case where

π_{0}

is known, the algorithm differs only in that all instances of

\hat{π}

are replaced by

π_{0}

.

We start the algorithm with an initial estimator

{\overset{ˇ}{θ} (Z_{1}), \dots, \overset{ˇ}{θ} (Z_{n}), \overset{ˇ}{β}}

, satisfying

n^{(1 / 2)} (\overset{ˇ}{β} - β) = O_{p} (1)

. Such an initial estimator can be obtained by modifying the estimators described in Carroll et al. (1997) [5] based on the completed units (those with Y observed) weighted by the inverse of their selection probability, if known, or an estimate of it otherwise, with

V_{i}

replaced by

V (X_{i}, Z_{i}; \tilde{ζ})

for any user-specified fixed constant

\tilde{ζ}

. To compute

\hat{ζ}

, we calculate

S_{i} = {[R_{i} {\hat{π}}_{i}^{- 1} (Y_{i} - Q_{i}) - (R_{i} {\hat{π}}_{i}^{- 1} - 1) {δ (X_{i}, Z_{i}, U_{i}) - Q_{i}}]}^{2}

where

Q_{i} = μ {X_{i}^{T} \overset{ˇ}{β} + \overset{ˇ}{θ} (Z_{i})}

,

i = 1, \dots, n

. The estimator

\hat{ζ}

is obtained by nonlinear least squares regression of

S_{i}

on

X_{i}, Z_{i}

under the model

E (S_{i} | X_{i}, Z_{i}) = V (X_{i}, Z_{i}; ζ)

. Then we iterate the following two steps until convergence:

1. For the fixed

β

and any given z, we calculate

\hat{θ} (z, β)

using an AIPW kernel estimating equation similar to Wang et al. (2010) [20]. Specifically,

\hat{θ} (z, β)

is defined as

{\hat{α}}_{0} (β)

, the first component of the vector

\hat{α} (β) = {{\hat{α}}_{0} (β), {\hat{α}}_{1} (β)}^{T}

, solving the following AIPW kernel estimating equations in

α

,

\begin{matrix} \sum_{i = 1}^{n} K_{h} (Z_{i} - z) μ_{i, z}^{(1)} (α) V_{i}^{- 1} G (Z_{i} - z) [\frac{R_{i}}{{\hat{π}}_{i}} \cdot {Y_{i} - μ_{i, z} (α)} \\ - (\frac{R_{i}}{{\hat{π}}_{i}} - 1) \cdot {δ (X_{i}, Z_{i}, U_{i}) - μ_{i, z} (α)}] & = & 0, \end{matrix}

(5)

where, to simplify notation,

μ_{i, z} (α) = μ {X_{i}^{T} β + G {(Z_{i} - z)}^{T} α}

,

μ_{i, z}^{(1)} (α)

is the first derivative of

μ (r)

with respect to r when r is evaluated at

X_{i}^{T} β + G {(Z_{i} - z)}^{T} α

and

V_{i} = V (X_{i}, Z_{i}; \hat{ζ})

, with

\hat{ζ}

defined above.

2. We compute

\hat{β}

by solving the following AIPW profile estimating equation in

β

,

\begin{matrix} \sum_{i = 1}^{n} {\tilde{μ}}_{i}^{(1)} (β) V_{i}^{- 1} \{X_{i} + \frac{\partial \hat{θ} (Z_{i}, β)}{\partial β}\} [\frac{R_{i}}{{\hat{π}}_{i}} \cdot \{Y_{i} - {\tilde{μ}}_{i} (β)\} \\ - (\frac{R_{i}}{{\hat{π}}_{i}} - 1) \cdot \{δ (X_{i}, Z_{i}, U_{i}) - {\tilde{μ}}_{i} (β)\}] & = & 0, \end{matrix}

(6)

where

{\tilde{μ}}_{i} (β) = μ {X_{i}^{T} β + \hat{θ} (Z_{i}, β)}

,

{\tilde{μ}}_{i}^{(1)} (β)

is the first derivative of

μ (r)

with respect to r when r is evaluated at

X_{i}^{T} β + \hat{θ} (Z_{i}, β)

.

At convergence, we obtain

{\hat{β}}_{A I P W}

and

{\hat{θ}}_{A I P W} (\cdot) = \hat{θ} (\cdot, {\hat{β}}_{A I P W})

. When the link function

μ (\cdot)

is the identity and

V_{i}

s are constants, both

{\hat{β}}_{A I P W}

and

{\hat{θ}}_{A I P W} (z)

have a closed form and are linear functions of Y.

We can similarly define the simpler-to-compute nonaugmented inverse probability weighted (IPW) estimators

{\hat{β}}_{I P W}

and

{\hat{θ}}_{I P W} (z)

of

β

and

θ (\cdot)

, which are the output of the iterative two-step procedure described above with Equations (5) and (6) replaced by

\sum_{i = 1}^{n} \frac{R_{i}}{{\hat{π}}_{i}} K_{h} (Z_{i} - z) μ_{i, z}^{(1)} (α) V_{i}^{- 1} G (Z_{i} - z) {Y_{i} - μ_{i, z} (α)} = 0

(7)

and

\sum_{i = 1}^{n} \frac{R_{i}}{{\hat{π}}_{i}} {\tilde{μ}}_{i}^{(1)} (β) V_{i}^{- 1} \{X_{i} + \frac{\partial \hat{θ} (Z_{i}, β)}{\partial β}\} \{Y_{i} - {\tilde{μ}}_{i} (β)\} = 0

(8)

with

\hat{ζ}

obtained by regressing

{\tilde{S}}_{i} \equiv {[R_{i} {\hat{π}}_{i}^{- 1} \cdot (Y_{i} - μ {X_{i}^{T} β + \overset{ˇ}{θ} (Z_{i})})]}^{2}

on

X_{i}

and

Z_{i}

,

i = 1, \dots, n

, under a model

V (X_{i}, Z_{i}; ζ)

for

E ({\tilde{S}}_{i} | X_{i}, Z_{i})

.

Choosing an appropriate bandwidth parameter h is important when estimating

θ (\cdot)

. We generalize the empirical bias bandwidth selection (EBBS) method of Ruppert (1997) [35] to derive a data-driven bandwidth selection approach in practice; for details, refer to Section 4.3 in Wang et al. 2010 [20]. In Section 5, we derive the asymptotic properties of both

{{\hat{β}}_{A I P W}, {\hat{θ}}_{A I P W} (z)}

and

{{\hat{β}}_{I P W}, {\hat{θ}}_{I P W} (z)}

.

3.2. Doubly Robust, Locally Efficient Estimation

If

π_{0}

is unknown, the consistency of the estimators

{\hat{β}}_{A I P W}

and

{\hat{θ}}_{A I P W} (z)

requires model (3) for the selection probabilities to be correctly specified. This can be relaxed by a slight modification to the preceding algorithm. Specifically, consider new estimators

{\hat{β}}_{D R}

and

{\hat{θ}}_{D R} (z)

obtained by replacing

δ (X, Z, U)

with

δ (X, Z, U; \hat{η})

, where

\hat{η}

is the possibly weighted least squares estimator of

η

under the model

E (Y | X, Z, U) = δ (X, Z, U; η) .

(9)

In Section 5, we show that under regularity conditions,

{\hat{β}}_{D R}

and

{\hat{θ}}_{D R} (z)

are consistent for

β

and

θ (z)

provided either model (3) or model (9) is correctly specified but not necessarily both. This property is often referred to as double-robustness.

In addition to more protection against model misspecification, the estimators

{\hat{β}}_{D R}

and

{\hat{θ}}_{D R} (z)

have attractive asymptotic efficiency properties. Specifically, if model (3) and model (9) are both correctly specified, then

{\hat{θ}}_{D R} (z)

has asymptotic variance that is equal to the smallest possible asymptotic variance of

{\hat{θ}}_{A I P W} (z)

, as

δ (X, Z, U)

ranges over all possible functions. In addition,

{\hat{β}}_{D R}

is locally semiparametric efficient under the semiparametric model defined by the restrictions (1)–(3) of the submodel defined by the additional restrictions (4) and (9). That is, under regularity conditions,

{\hat{β}}_{D R}

is consistent and asymptotically normal for

β

when the selection probability satisfies (3); if in addition, the true data generating process satisfies the working models (4) and (9), then its limiting distribution has variance equal to the semiparametric variance bound for regular estimators of

β

in the semiparametric model defined by restrictions (1)–(3). We explicitly demonstrate these properties in Section 4 and Section 5.

4. Semiparametric Efficiency Theory for Estimation of $β$

The semiparametric variance bound for estimators of a finite-dimensional parameter

β

within an arbitrary semiparametric model serves as the counterpart to the Cramer–Rao bound in parametric models. This bound is defined as the supremum of the Cramer–Rao bounds for

β

across all regular parametric submodels (Begun et al., 1983 [36]; Newey, 1990 [37]; Bickel et al., 1993 [38]). Analogous to its parametric counterpart, this bound offers a benchmark against which the efficiency of estimators of

β

that are consistent and asymptotically normal—more precisely, regular and asymptotically linear (RAL) under the semiparametric model— can be assessed. Notably, the semiparametric bound emerges as the reciprocal of the variance of the semiparametric efficient score for

β

.

In this section, we elucidate the semiparametric efficient score and semiparametric variance bound for

β

within the semiparametric model

A

, governing the law

F_{O}

of the observed data

O = (R, R Y, X, Z, U)

defined by restriction (1) on the full data and the MAR restriction (2) on the missing data mechanism. To achieve this, we draw upon the general theory established by Ibragimov and Hasminskii (1981) [39], Robins and Rotnitzky (1992) [40], Robins et al. (1994) [29], and Rotnitzky and Robins (1997) [41], and discussed in van der Laan and Robins (2003) [42] and Tsiatis (2006) [43], among others. The derivation requires the characterization of

Λ_{n u i s}^{⊥},

the orthocomplement of the nuisance tangent space, i.e., of the closed linear span of nuisance scores under model

A

, in the Hilbert space

L_{2} (F_{O})

of mean-zero, finite variance scalar functions

T = t (O)

with covariance inner product. We characterize

Λ_{n u i s}^{⊥}

by

Λ_{n u i s}^{⊥, f u l l}

, the orthocomplement to the nuisance tangent space for

β

under the semiparametric model

A_{f u l l}

for the law

F_{W}

of the full data

W = (Y, X, Z, U)

defined by restriction (1) in the Hilbert space

L_{2} (F_{W})

. This is so since, as shown in Robins and Rotnitzky (1992) [40],

Λ_{n u i s}^{⊥} = \{\frac{R}{π_{0}} Q - (\frac{R}{π_{0}} - 1) E (Q | R = 1, X, Z, U) : Q \in Λ_{n u i s}^{⊥, f u l l}\} .

(10)

In Supplementary Materials S1, we show that

Λ_{n u i s}^{⊥, f u l l}

is composed of all finite variance functions of the form

b (X, Z) ϵ

, where

ϵ

is defined in Section 3.1, with

b (X, Z)

satisfying

E [b (X, Z) μ^{(1)} {X^{T} β_{0} + θ_{0} (Z)} | Z] = 0,

(11)

where

μ^{(1)} \{\cdot\}

is the first derivative of

μ (\cdot)

. Robins et al. (1994) [29] derived

Λ_{n u i s}^{⊥, f u l l}

for

μ (u) = u

, Bickel et al. (Sec 4.3, 1993) [38] (Sec 4.3, 1993) for

μ (u) = {\{1 + exp (- u)\}}^{- 1}

, and Robins and Rotnitzky (2001) [44] for

μ (u) = exp (u)

.

According to Bickel et al. 1993 [38], the semiparametric efficient score

S_{e f f}

for

β

in model

A

at

F_{O}

is a

p \times 1

vector whose elements belong to

Λ_{n u i s}^{⊥}

. Consequently, in view of (10),

S_{e f f}

must be equal to

b_{e f f} (X, Z) ϵ^{*}

for some

p \times 1

function

b_{e f f} (X, Z)

, whose elements satisfy (11), and with

ϵ^{*} = \frac{R}{π_{0}} ϵ - (\frac{R}{π_{0}} - 1) E (ϵ | X, Z, U) .

(12)

In S1, we show that

b_{e f f} (X, Z) = σ^{- 2} (X, Z) μ^{(1)} \{X^{T} β_{0} + θ_{0} (Z)\} (X - φ_{e f f})

, where

σ^{2} (X, Z) = v a r (ϵ^{*} | X, Z)

and

φ_{e f f} = \frac{E \{{[μ^{(1)} \{X^{T} β_{0} + θ_{0} (Z)\}]}^{2} σ^{- 2} (X, Z) X | Z\}}{E \{{[μ^{(1)} \{X^{T} β_{0} + θ_{0} (Z)\}]}^{2} σ^{- 2} (X, Z) | Z\}} .

It then follows that the semiparametric variance bound

V_{e f f} = E {(S_{e f f} S_{e f f}^{T})}^{- 1}

for

β

in the observed data model is equal to

V_{e f f} = {\{E [σ^{- 2} (X, Z) {[μ^{(1)} \{X^{T} β_{0} + θ_{0} (Z)\}]}^{2} (X - φ_{e f f}) {(X - φ_{e f f})}^{T}]\}}^{- 1} .

(13)

In fact, the semiparametric efficient score and efficiency bound given above are also the ones corresponding to a model which additionally imposes model (3) on the selection probabilities. This is so because under MAR, the likelihood factorizes into a part that depends on the selection probabilities and another part that depends on

β

and

θ (\cdot)

.

5. Asymptotic Properties

In this section, we investigate the asymptotic properties of the AIPW and IPW profile–kernel estimators. For conciseness, we only present our asymptotic results for the local linear kernel–profile estimators. The results can be extended to local polynomial regression. Here and throughout, we make the following assumptions: (I)

n \to \infty

,

h \to 0

, and

n h \to \infty

; (II) z is in the interior of the support of Z; (III) for some constant c,

Pr (R = 1 | X, Z, U) > c > 0

with probability 1 in a neighborhood of

Z = z

; and (IV) the regularity conditions stated at the beginning of the Supplementary Materials hold.

Under the aforementioned assumptions and MAR, both the IPW and AIPW kernel estimators are consistent for

θ (z)

provided that the estimating equations use either the true selection probabilities or

\sqrt{n} -

consistent estimates under a correctly specified model (3). Furthermore, the AIPW estimator of

θ (z)

that uses

δ (X_{i}, Z_{i}, U_{i}) = δ (X_{i}, Z_{i}, U_{i}; \hat{η})

, as defined before, remains consistent if model (9) for the conditional mean of Y given

X

, Z, and

U

is correctly specified even if model (3) for the selection probability is misspecified. These are similar to the findings in Wang et al. (2010) [20], but given that this paper’s model is different with

X^{T} β

additionally compared to theirs, some of the expressions are slightly different. So we summarize the asymptotic distributions of

{\hat{θ}}_{I P W} (z)

and

{\hat{θ}}_{A I P W} (z)

briefly in the following Theorems 1 and 2.

Theorem 1.

Suppose that Equation (5) uses (a)

{\hat{π}}_{i}

that is computed under model (3) or is replaced by fixed probabilities

π_{i}^{*}

, and (b) a fixed function

δ^{*} (X, Z, U)

or

δ (X, Z, U) = δ (X, Z, U; \hat{η})

, where

\hat{η}

is a

\sqrt{n} -

consistent estimator of η under model (9). Suppose the MAR assumption (2) and assumptions (I)–(IV) above hold and further that either of the following hold: (i) model (3) is correct or, if

π_{i}^{*}

is used,

π_{i}^{*} = π_{i 0}

for all i, or (ii)

δ^{*} (X, Z, U) = E (Y | X, Z, U)

or if

δ (X, Z, U; \hat{η})

is used, model (9) is correctly specified. Then:

(1) There exists a sequence of solutions

{\hat{θ}}_{A I P W} (z; β_{0})

of (5) such that

\sqrt{n h} \{{\hat{θ}}_{A I P W} (z; β_{0}) - θ (z) - \frac{1}{2} h^{2} θ^{''} (z) c_{2} (K) + o (h^{2})\} \to N \{0, Σ_{θ, A I P W} (z)\},

(14)

where

\begin{matrix} Σ_{θ, A I P W} (z) = \frac{c_{0} (K^{2})}{t (z) f_{Z} (z)} E [r (X, Z) \{\frac{π_{0} (X, Z, U)}{{\tilde{π}}^{2} (X, Z, U)} v a r (Y | X, Z, U) + \frac{π_{0} (X, Z, U)}{\tilde{π} (X, Z, U)} \times {[E (Y | X, Z, U) - μ {θ (Z) + X^{T} β_{0}}]}^{2} \end{matrix}

\begin{matrix} + \{\frac{π_{0} (X, Z, U)}{{\tilde{π}}^{2} (X, Z, U)} - \frac{π_{0} (X, Z, U)}{\tilde{π} (X, Z, U)}\} {E (Y | X, Z, U) - \tilde{δ} (X, Z, U)}^{2} \\ + \{1 - \frac{π_{0} (X, Z, U)}{\tilde{π} (X, Z, U)}\} {[\tilde{δ} (X, Z, U) - μ {θ (Z) + X^{T} β_{0}}]}^{2}\} | Z = z], \end{matrix}

(15)

θ^{''} (\cdot)

is the second derivative of

θ (\cdot)

,

c_{0} (K^{2}) = \int K^{2} (s) d s

,

f_{Z} (\cdot)

denotes the density function of Z,

t (z) = E ({[μ^{(1)} \{θ (Z) + X^{T} β_{0}\}]}^{2} V^{- 1} \{θ (Z) + X^{T} β_{0}\} | Z = z)

,

r (X, Z) = {[μ^{(1)} \{θ (Z) + X^{T} β_{0}\}]}^{2}

V^{- 2} \{θ (Z) + X^{T} β_{0}\}

,

c_{2} (K) = \int s^{2} K (s) d s

,

\tilde{π} (X, Z, U)

denotes

π^{*} (X, Z, U)

if

π_{i}^{*}

is used or the probability limit of

π (X, Z, U; \hat{τ})

if

{\hat{π}}_{i}

is used, and

\tilde{δ} (X, Z, U)

denotes

δ^{*} (X, Z, U)

if

δ^{*} (X, Z, U)

is used or the probability limit of

δ (X, Z, U; \hat{η})

if

δ (X, Z, U; \hat{η})

is used.

(2) If model (3) is correctly specified or if

π_{i}^{*} = π_{i 0}

for all i, then

\tilde{π} (X, Z, U) = π (X, Z, U)

, and

Σ_{θ, A I P W} (z)

simplifies to

\begin{matrix} \frac{c_{0} (K^{2})}{t (z) f_{Z} (z)} E [r (X, Z) \{\frac{v a r (Y | X, Z, U)}{π_{0} (X, Z, U)} + {[E (Y | X, Z, U) - μ {θ (Z) + X^{T} β_{0}}]}^{2} \\ + \{\frac{1}{π_{0} (X, Z, U)} - 1\} {[E (Y | X, Z, U) - \tilde{δ} (X, Z, U)]}^{2}\} | Z = z], \end{matrix}

which is minimized when

\tilde{δ} (X, Z, U) = E [Y | X, Z, U]

. The corresponding

{\hat{θ}}_{o p t, A I P W} (z; β_{0})

using either

δ^{*} (X, Z, U) = E (Y | X, Z, U)

or

δ (X, Z, U; \hat{η})

from a correctly specified model (9) for

E (Y | X, Z, U)

has the smallest asymptotic variance among all

{\hat{θ}}_{A I P W} (z; β_{0})

. The asymptotic variance of

{\hat{θ}}_{o p t, A I P W} (z; β_{0})

is equal to

\frac{c_{0} (K^{2})}{t (z) f_{Z} (z)} E [r (X, Z) \{\frac{v a r (Y | X, Z, U)}{π_{0} (X, Z, U)} + {[E (Y | X, Z, U) - μ {θ (Z) + X^{T} β_{0}}]}^{2}\} | Z = z] .

Part (1) of Theorem 1 formally states the important double-robustness property of

{\hat{θ}}_{A I P W} (z; β_{0})

. It stipulates that

{\hat{θ}}_{A I P W} (z; β_{0})

is asymptotically unbiased as

h \to 0

when

n \to \infty

if either the model (3) for the selection probability or the model (9) for

E (Y | X, Z, U)

is correctly specified but not necessarily both. Part (1) of Theorem 1 also states that

{\hat{θ}}_{A I P W} (z; β_{0})

converges to

θ (z; β_{0})

at the rate

\sqrt{n h}

, and it provides the general form of its asymptotic variance, which does not depend on the working variance

V (\cdot)

. Thus, misspecification of the working model for

v a r \{ϵ_{δ}^{*} (β, θ) | X_{i}, Z_{i}\}

does not impact the asymptotic efficiency of

{\hat{θ}}_{A I P W} (z; β_{0})

. When model (3) for

π (X, Z, U)

is misspecified but model (9) for

E (Y | X, Z, U)

is correctly specified, the asymptotic variance of

{\hat{θ}}_{A I P W} (z; β_{0})

that uses

\hat{π}

and

δ (X, Z, U) = δ (X, Z, U; \hat{η})

simplifies to

\frac{c_{0} (K^{2}) E [r (X, Z) \{\frac{π_{0} (X, Z, U) v a r (Y | X, Z, U)}{{\tilde{π}}^{2} (X, Z, U)} + {[E (Y | X, Z, U) - μ \{θ (Z) + X^{T} β_{0}\}]}^{2}\} | Z = z]}{t (z) f_{Z} (z)} .

Part (2) of this Theorem gives the asymptotic variance of

{\hat{θ}}_{A I P W} (z; β)

when

\hat{π}

is computed under a correctly specified model or

π_{i}^{*} = π_{i 0}

for all i. The result shows that in such cases, the most efficient AIPW kernel estimator is obtained when

E (Y | X, Z, U)

is used for

δ (X, Z, U)

or when

δ (X, Z, U; \hat{η})

, a model (9) for

E (Y | X, Z, U)

is correctly specified.

In contrast to the AIPW approach, inference based on the IPW estimator

{\hat{θ}}_{I P W} (z)

is valid only when the selection probabilities are correctly specified, as summarized in the following theorem.

Theorem 2.

If model (3) for the selection probability is correctly specified and the MAR assumption (2) and assumptions (I)–(IV) above hold, then there exists a sequence of solutions

{\hat{θ}}_{I P W} (z; β_{0})

of (7) with β fixed at

β_{0}

such that as

n \to \infty

,

h \to 0

, and

n h \to \infty

,

\sqrt{n h} \{{\hat{θ}}_{I P W} (z; β_{0}) - θ (z) - \frac{1}{2} h^{2} θ^{''} (z) c_{2} (K) + o (h^{2})\} \to N \{0, Σ_{θ, I P W} (z)\}

where

Σ_{θ, I P W} (z)

is equal to

\frac{c_{0} (K^{2})}{t (z) f_{Z} (z)} E [r (X, Z) \{\frac{v a r (Y | X, Z, U)}{π_{0} (X, Z, U)} + \frac{{[E (Y | X, Z, U) - μ {θ (Z) + X^{T} β_{0}}]}^{2}}{π_{0} (X, Z, U)}\} | Z = z] .

In the following, we primarily focus on the properties of the AIPW profile estimator

{\hat{β}}_{A I P W}

in Section 5.1 and compare with those of

{\hat{β}}_{I P W}

in Section 5.2. Lemma 1 and Theorem 3 establish that the AIPW profile estimator of

β

is consistent and asymptotically normal when either model (9) for the conditional mean of Y given

X

, Z, and

U

is correctly specified or model (3) for the missing data mechanism is correctly specified. This property is commonly referred to as double robustness. In contrast, the IPW profile estimator

{\hat{β}}_{I P W}

is inconsistent for

β

if model (3) for the selection probabilities is misspecified, as shown in Theorem 4. Theorem 3 also establishes that when model (3) for the selection probabilities is correctly specified, then among the class of AIPW profile estimators of

β

computed under the same working model for

v a r (ϵ_{δ}^{*} (β, θ))

, the one that uses

δ (X_{i}, Z_{i}, U_{i}) = δ (X_{i}, Z_{i}, U_{i}; \hat{η})

with

\hat{η}

computed under a correctly specified model for

E (Y_{i} | X_{i}, Z_{i}, U_{i})

has the smallest asymptotic variance. This asymptotic variance aligns with the semiparametric variance bound derived in Section 4 for

β

in the model defined by restriction (1) on the full data and restriction (2) on the missing data mechanism.

5.1. Asymptotic Results of the AIPW Profile Estimator ${\hat{β}}_{A I P W}$

In this subsection, our focus lies on the asymptotic properties of the AIPW profile estimators of

β

, demonstrating that the optimal estimator among this class has an asymptotic variance equivalent to the semiparametric variance bound derived in Section 4. We define

{\hat{φ}}_{A I P W} (z, β) = \frac{\partial {\hat{θ}}_{A I P W} (z, β)}{\partial β}

. Lemma 1 (proved in Supplementary Materials S2) establishes that

{\hat{φ}}_{A I P W} (z, β_{0})

converges in probability to

φ_{e f f}

defined in (4) when

V [μ {X^{T} β + θ (Z)}; ζ]

is a correctly specified model for

σ^{2} (X, Z) = v a r [ϵ^{*} (β_{0}, θ) | X, Z]

, where

ϵ^{*} (β_{0}, θ)

is defined in (12). This result, along with the subsequent theorem, is used to argue below that

{\hat{β}}_{A I P W}

is locally semiparametric efficient.

Let

φ_{A I P W} (z)

denote the probability limit of

{\hat{φ}}_{A I P W} (z, β_{0})

and V denote the probability limit of

V [μ {X^{T} β + θ (Z)}; \hat{ζ}]

, as

n \to \infty

. Let

ϵ_{δ}^{*} (β_{0}, θ)

be

ϵ_{δ}^{*}

evaluated at

β_{0}

and let

σ_{δ}^{2} (X, Z) = v a r [ϵ_{δ}^{*} (β_{0}, θ) | X, Z]

. Then we have the following Lemma 1.

Lemma 1.

Under regularity conditions, we have

φ_{A I P W} (z) = - \frac{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} V^{- 1} X | Z = z)}{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} V^{- 1} | Z = z)} .

In particular, if

V [μ \{X^{T} β + θ (Z)\}; ζ]

is a correctly specified model for

σ_{δ}^{2} (X, Z)

, then

φ_{A I P W} (z) = - \frac{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} σ_{δ}^{- 2} (X, Z) X | Z = z)}{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} σ_{δ}^{- 2} (X, Z) | Z = z)} .

Note that

φ_{A I P W} (z)

is affected by the choice of function

δ (X, Z, U)

used in the AIPW equations only through the working variance model. If

δ (X, Z, U) = E (Y | X, Z, U)

or

δ (X, Z, U) = δ (X, Z, U; \hat{η})

and

\hat{η}

is calculated under a correctly specified model (9), a direct result from Lemma 1 is that the limit of the AIPW profile estimating function is proportional to the semiparametric efficient score of

β

derived in Section 4.

The next theorem establishes the asymptotic distribution of

{\hat{β}}_{A I P W}

. Throughout, we use the subscript

δ

to emphasize the dependence of

{\hat{β}}_{A I P W}

and the asymptotic variance of

{\hat{β}}_{A I P W}

on the choice of function

δ (X, Z, U)

used in the AIPW equations. Let

τ^{*}, η^{*}

, and

ζ^{*}

be the probability limits of

\hat{τ}, \hat{η}

, and

\hat{ζ}

. Let

S (R, X, Z, U; τ) = \partial log [π {(X, Z, U; τ)}^{R} {1 - π (X, Z, U; τ)}^{1 - R}] / \partial τ

be the estimating function for

τ

and

l (Y, X, Z, U; η)

be the estimating function for

η

. Denote

D^{*} = μ^{(1)} \{X^{T} β_{0} + θ (Z)\} V^{- 1} [μ \{X^{T} β_{0} + θ (Z)\}; ζ^{*}] \tilde{X}

, where

\tilde{X} = X + φ_{A I P W} (Z)

. For any

τ

,

η

,

β

, and

θ (\cdot)

, define

ϵ_{i} (τ, η, β, θ)

as

{R_{i} [Y_{i} - μ {X_{i}^{T} β + θ (Z_{i})}] - {R_{i} - π (X_{i}, Z_{i}, U_{i}; τ)} [δ (X_{i}, Z_{i}, U_{i}; η) - μ {X_{i}^{T} β + θ (Z_{i})}]} / π (X_{i}, Z_{i}, U_{i}; τ)

. In what follows, for any symmetric matrices A and B,

A \geq B

stands for “

A - B

is semipositive definite”.

Theorem 3.

Suppose that Equation (6) uses

{\hat{π}}_{i}

computed under a model (3) and

δ (X_{i}, Z_{i}, U_{i}; \hat{η})

, where

\hat{η}

is calculated as in Section 3. Under the MAR assumption (2) and assumptions (I)–(IV) above, if either model (3) is correct or model (9) is correct, then there exists a sequence of solutions

{\hat{β}}_{A I P W, δ}

that satisfy

\sqrt{n} \{{\hat{β}}_{A I P W, δ} - β_{0}\} \to N (0, Ω_{δ} (V)),

(16)

where

Ω_{δ} (V) = A {(V)}^{- 1} B_{δ} (V) A {(V)}^{- 1}

,

A (V) = E ({[μ^{(1)} {X^{T} β_{0} + θ (Z)}]}^{2} V {[μ {X^{T} β_{0} + θ (Z)}; ζ^{*}]}^{- 1} \tilde{X} {\tilde{X}}^{T})

, and

\begin{matrix} B_{δ} (V) = v a r \{D^{*} ϵ (τ^{*}, η^{*}, β_{0}, θ) - E [D^{*} \frac{\partial}{\partial τ^{T}} ϵ (τ^{*}, η^{*}, β_{0}, θ)] E {[\frac{\partial}{\partial τ^{T}} S (R, X, Z, U; τ^{*})]}^{- 1} \\ S (R, X, Z, U; τ^{*}) - E [D^{*} \frac{\partial}{\partial η^{T}} ϵ (τ^{*}, η^{*}, β_{0}, θ)] E {[\frac{\partial}{\partial η^{T}} l (Y, X, Z, U; η^{*})]}^{- 1} l (Y, X, Z, U; η^{*})\} . \end{matrix}

Theorem 3 establishes that

{\hat{β}}_{A I P W}

is consistent and asymptotically normal so long as either model (3) is correct or model (9) is correct but not necessarily both. This is the so-called double-robustness property, which is desirable in practice against model misspecifications.

When model (3) is correctly specified for the selection probability,

E [D^{*} \partial ϵ (τ^{*}, η^{*}, β_{0}, θ) / \partial τ^{T}] = E [D^{*} ϵ (τ^{*}, η^{*}, β_{0}, θ) S_{τ}^{T}]

, where

S_{τ}

is the score function of

τ

,

E [\partial S (R, X, Z, U; τ^{*}) / \partial τ^{T}] = E [S_{τ} S_{τ}^{T}]

,

π (X_{i}, Z_{i}, U_{i}; τ^{*}) = π_{0} (X_{i}, Z_{i}, U_{i})

,

E [D^{*} \partial ϵ (τ^{*}, η^{*}, β_{0}, θ) / \partial η^{T}] = 0

, and

ϵ (τ^{*}, η^{*}, β_{0}, θ) = ϵ_{δ}^{*} (β_{0}, θ)

. Therefore,

B_{δ} (V)

reduces to

v a r {D^{*} ϵ_{δ}^{*} (β_{0}, θ) - E [D^{*} ϵ_{δ}^{*} (β_{0}, θ) S_{τ}^{T}] E {[S_{τ} S_{τ}^{T}]}^{- 1} S_{τ}}

. Then we have the following two corollaries.

Corollary 1.

(i) If

{\hat{π}}_{i}

in (6) is computed under a correctly specified model (3), then

B_{δ} (V) = v a r {D^{*} ϵ_{δ}^{*} (β_{0}, θ) - E [D^{*} ϵ_{δ}^{*} (β_{0}, θ) S_{τ}^{T}] E {[S_{τ} S_{τ}^{T}]}^{- 1} S_{τ}}

. (ii) If the true selection probabilities are used in Equation (6) instead of

{\hat{π}}_{i},

then (16) holds with

Ω_{δ} (V)

replaced by

{\tilde{Ω}}_{δ} (V) = A {(V)}^{- 1} {\tilde{B}}_{δ} (V) A {(V)}^{- 1}

and

{\tilde{B}}_{δ} (V) = v a r \{D^{*} {\tilde{ϵ}}_{δ}^{*} (β_{0}, θ)\}

.

Part (ii) of Corollary 1 holds since

E [D^{*} \partial ϵ (τ^{*}, η^{*}, β_{0}, θ) / \partial η^{T}] = 0

and

S_{τ} = 0

when the true selection probabilities are used in (6). Notice that in (i),

D^{*} ϵ_{δ}^{*} (β_{0}, θ) - E [D^{*} ϵ_{δ}^{*} (β_{0}, θ) S_{τ}^{T}] E {[S_{τ} S_{τ}^{T}]}^{- 1} S_{τ}

is the residual from the population least squares of

D^{*} ϵ_{δ}^{*} (β_{0}, θ)

on

S_{τ}

, and residual variances are always less than or equal to the variance of the outcomes in a regression; therefore,

B_{δ} (V) \leq {\tilde{B}}_{δ} (V)

whenever model (3) is correct. Thus, Corollary 1 implies that using

\sqrt{n} -

consistent estimates of the selection probabilities even when these are known yields efficiency gains for estimating

β

. This property has also been observed for parametric regression estimation by Robins et al. (1994) [29]. The asymptotic variance of

{\hat{β}}_{A I P W, δ}

varies with different choices of

δ

and working model V. The following corollary implies that for a fixed working model V,

{\hat{β}}_{A I P W, δ}

has the smallest variance when

δ_{o p t} (X, Z, U) = E (Y | X, Z, U)

is used in the augmentation term. Furthermore, if we correctly specify the working model V as

σ_{δ_{o p t}}^{2}

, the asymptotic variance of

{\hat{β}}_{A I P W, δ}

is minimized and the semiparametric variance bound derived in Section 4 is achieved.

Corollary 2.

If Equation (6) uses

{\hat{π}}_{i}

computed under a correctly specified model (3) or the true selection probabilities, we have the following: (i)

Ω_{δ_{o p t}} (V) = {\tilde{Ω}}_{δ_{o p t}} (V)

when

δ_{o p t} (X, Z, U) = E (Y | X, Z, U)

. (ii) For any

δ,

Ω_{δ_{o p t}} (V) \leq Ω_{δ} (V)

and

Ω_{δ_{o p t}} (V) \leq {\tilde{Ω}}_{δ} (V)

. Furthermore,

B_{δ_{o p t}} (V) = {\tilde{B}}_{δ_{o p t}} (V) = v a r {D^{*} ϵ_{δ_{o p t}}^{*} (β_{0}, θ)}

. (iii) For any

V,

Ω_{δ_{o p t}} (V) \geq Ω_{δ_{o p t}} (σ_{δ_{o p t}}^{2})

. Furthermore,

Ω_{δ_{o p t}} (σ_{δ_{o p t}}^{2}) = A {(σ_{δ_{o p t}}^{2})}^{- 1}

.

Part (i) of Corollary 2 shows that when

E (Y | X, Z, U)

is employed as the

δ

function, then estimating the selection probabilities does not yield efficiency gains for estimating

β

compared to using the true selection probabilities when they are known. As indicated by part (ii) of Corollary 2, among all

δ

functions,

δ_{o p t} (X, Z, U) = E (Y | X, Z, U)

is the optimal one to achieve the smallest asymptotic variance of

\hat{β}

. In part (iii), because

σ_{δ_{o p t}}^{2} = v a r [ϵ^{*} (β_{0}, θ) | X, Z]

and by Lemma 1

\tilde{X} = X - φ_{e f f} (Z)

when

V = σ_{δ_{o p t}}^{2},

it follows that

Ω_{δ_{o p t}} (σ_{δ_{o p t}}^{2})

is equal to the semiparametric variance bound (13) for RAL estimators of

β

. We therefore conclude from Theorem 3 and Corollary 2 that the profile AIPW estimator

{\hat{β}}_{A I P W}

is locally semiparametric efficient in the semiparametric model defined by restriction (1) on the full data and restrictions (2) and (3) on the selection probabilities, at the models (4) and (9). That is, it is consistent and asymptotically normal if (1), (2), and (3) hold regardless of whether (4) and (9) hold, and it has asymptotic variance equal to the semiparametric variance bound if, in addition, models (4) and (9) hold.

Corollary 1 and Corollary 2 present the properties of

{\hat{β}}_{A I P W}

when model (3) is correctly specified for the selection probability

π (X, Z, U)

or the true selection probabilities are used. Otherwise, if one can achieve a valid specification for model (9),

{\hat{β}}_{A I P W}

is still consistent and asymptotically normal due to its double-robustness property, established in Theorem 3. In this situation,

E [D^{*} \partial ϵ (τ^{*}, η^{*}, β_{0}, θ) / \partial τ^{T}] = 0

, and that leads to the following Corollary 3.

Corollary 3.

If Equation (6) uses

δ (X_{i}, Z_{i}, U_{i}; \hat{η})

under a correctly specified model (9), then

B_{δ} (V)

simplifies to

\begin{matrix} B_{δ} (V) & = & v a r \{D^{*} ϵ (τ^{*}, η^{*}, β_{0}, θ) - E [D^{*} \frac{\partial}{\partial η^{T}} \tilde{ϵ} (τ^{*}, η^{*}, β_{0}, θ)] \\ E {[\frac{\partial}{\partial η^{T}} l (Y, X, Z, U; η^{*})]}^{- 1} l (Y, X, Z, U; η^{*})\} \end{matrix}

Following Theorem 3, we can estimate the variance of AIPW profile estimator

{\hat{β}}_{A I P W}

using a sandwich formula after invoking a uniform law of large numbers. Specifically, let

\hat{δ} (X, Z, U) = δ (X, Z, U; \hat{η})

,

{\hat{X}}_{i} = X_{i} + {\hat{φ}}_{A I P W} (Z_{i}, {\hat{β}}_{A I P W, \hat{δ}})

, where

{\hat{φ}}_{A I P W} (\cdot, \cdot)

is as defined in the first paragraph of Section 5.2,

{\hat{V}}_{i} = V [μ {X_{i}^{T} {\hat{β}}_{A I P W, \hat{δ}} + {\hat{θ}}_{A I P W, \hat{δ}} (Z_{i}; {\hat{β}}_{A I P W, \hat{δ}})}; \hat{ζ}]

, and

{\hat{μ}}_{i}^{(1)} = μ^{(1)} {X_{i}^{T} {\hat{β}}_{A I P W, \hat{δ}} + {\hat{θ}}_{A I P W, \hat{δ}} (Z_{i}; {\hat{β}}_{A I P W, \hat{δ}})}

.

Ω_{δ} (V)

can be estimated consistently when either model (3) is correct or model (9) is correct, as

A_{n} {(\hat{V})}^{- 1} {\tilde{B}}_{\hat{δ}, n} (\hat{V}) A_{n} {(\hat{V})}^{- 1}

where

A_{n} (\hat{V}) = \frac{1}{n} \sum_{i = 1}^{n} {\{{\hat{μ}}_{i}^{(1)}\}}^{2} {\hat{V}}_{i}^{- 1} {\hat{X}}_{i} {\hat{X}}_{i}^{T},

and

\begin{matrix} {\tilde{B}}_{\hat{δ}, n} (\hat{V}) = \frac{1}{n} \sum_{i = 1}^{n} \{{\hat{D}}_{i} ϵ_{i} \{\hat{τ}, \hat{η}, {\hat{β}}_{A I P W, \hat{δ}}, {\hat{θ}}_{A I P W, \hat{δ}} (\cdot; {\hat{β}}_{A I P W, \hat{δ}})\} \\ - \hat{E} [D^{*} \frac{\partial}{\partial τ^{T}} ϵ (τ^{*}, η^{*}, β_{0}, θ)] \hat{E} {[\frac{\partial}{\partial τ^{T}} S (R, X, Z, U; τ^{*})]}^{- 1} S (R_{i}, X_{i}, Z_{i}, U_{i}; \hat{τ}) \\ {- \hat{E} [D^{*} \frac{\partial}{\partial η^{T}} ϵ (τ^{*}, η^{*}, β_{0}, θ)] \hat{E} {[\frac{\partial}{\partial η^{T}} l (Y, X, Z, U; η^{*})]}^{- 1} l (Y_{i}, X_{i}, Z_{i}, U_{i}; \hat{η})\}}^{\otimes 2} \end{matrix}

with

{\hat{D}}_{i} = {\hat{μ}}_{i}^{(1)} {\hat{V}}_{i}^{- 1} \{X_{i} + \frac{\partial \hat{θ} (Z_{i}, {\hat{β}}_{A I P W, \hat{δ}})}{\partial β^{T}}\},

\hat{E} [D^{*} \frac{\partial}{\partial τ^{T}} ϵ (τ^{*}, η^{*}, β_{0}, θ)] = \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i} \frac{\partial}{\partial τ^{T}} ϵ_{i} \{\hat{τ}, \hat{η}, {\hat{β}}_{A I P W, \hat{δ}}, {\hat{θ}}_{A I P W, \hat{δ}} (\cdot; {\hat{β}}_{A I P W, \hat{δ}})\},

\hat{E} [\frac{\partial}{\partial τ^{T}} S (R, X, Z, U; τ^{*})] = \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial}{\partial τ^{T}} S (Y_{i}, X_{i}, Z_{i}, U_{i}; \hat{τ})

\hat{E} [D^{*} \frac{\partial}{\partial η^{T}} ϵ (τ^{*}, η^{*}, β_{0}, θ)] = \frac{1}{n} \sum_{i = 1}^{n} {\hat{D}}_{i} \frac{\partial}{\partial η^{T}} ϵ_{i} \{\hat{τ}, \hat{η}, {\hat{β}}_{A I P W, \hat{δ}}, {\hat{θ}}_{A I P W, \hat{δ}} (\cdot; {\hat{β}}_{A I P W, \hat{δ}})\},

\hat{E} [\frac{\partial}{\partial η^{T}} l (Y, X, Z, U; η^{*})] = \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial}{\partial η^{T}} l (Y_{i}, X_{i}, Z_{i}, U_{i}; \hat{η}) .

5.2. Asymptotic Results of the IPW Profile Estimator ${\hat{β}}_{I P W}$

In comparison, we present the asymptotic properties of the IPW profile estimator

{\hat{β}}_{I P W}

in this subsection. The results stated in the next lemma and theorem imply that

{\hat{β}}_{I P W}

is generally not efficient. This is not surprising after noticing that

{\hat{β}}_{I P W}

actually solves the AIPW equation that uses a fixed function

δ (X, Z, U) = μ {X^{T} β_{0} + θ (Z)}

; however, this specific

δ

function is not the optimal one among the AIPW class and also makes the IPW profile estimator

{\hat{β}}_{I P W}

lose the double-robustness property. In order for

{\hat{β}}_{I P W}

to be consistent and asymptotically normal, model (3) for the selection probability

π (X, Z, U)

needs to be correctly specified.

Lemma 2.

Let

{\hat{φ}}_{I P W} (z, β) = \partial {\hat{θ}}_{I P W} (z, β) / \partial β

be the partial derivative of the final IPW kernel estimator of θ with respect to β, and let

φ_{I P W} (z)

be the probability limit of

{\hat{φ}}_{I P W} (z, β)

as

n \to \infty

. Then

φ_{I P W} (z) = - \frac{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} V^{- 1} X | Z = z)}{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} V^{- 1} | Z = z)} .

If

V [μ {X^{T} β + θ (Z)}; ζ]

is correctly specified for

σ_{I P W}^{2} = v a r [ϵ_{I P W}^{*} (β_{0}, θ) | X, Z]

, where

ϵ_{I P W}^{*} (β_{0}, θ) = R \cdot {π_{0} (X, Z, U)}^{- 1} ϵ (β_{0}, θ)

, then

φ_{I P W} (z) = - \frac{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} X / σ_{I P W}^{2} | Z = z)}{E ({[μ^{(1)} \{X^{T} β_{0} + θ (Z)\}]}^{2} / σ_{I P W}^{2} | Z = z)} .

Theorem 4 on the asymptotic distribution of

{\hat{β}}_{I P W}

follows directly from Lemma 2, Theorem 3, and Corollaries 1 and 2 with

δ (X, Z, U) = μ \{X^{T} β_{0} + θ (Z)\}

.

Theorem 4.

Under the assumptions of Theorem 3, if model (3) holds, we have

\sqrt{n} ({\hat{β}}_{I P W} - β_{0}) \to N \{0, Ω_{I P W} (V)\},

where

Ω_{I P W} (V) = A {(V)}^{- 1} B_{I P W} (V) {\{A {(V)}^{- 1}\}}^{T},

B_{I P W} (V) = v a r \{D^{*} ϵ_{I P W}^{*} (β_{0}, θ) - E [D^{*} ϵ_{I P W}^{*} (β_{0}, θ) S_{τ}^{T}] E {[S_{τ} S_{τ}^{T}]}^{- 1} S_{τ}\} .

In general,

D^{*} ϵ_{I P W}^{*} (β_{0}, θ) - E [D^{*} ϵ_{I P W}^{*} (β_{0}, θ) S_{τ}^{T}] E {[S_{τ} S_{τ}^{T}]}^{- 1} S_{τ}

is not proportional to the efficient score derived in Section 4, and thus, the IPW profile estimator

{\hat{β}}_{I P W}

is generally an inefficient estimator.

6. Simulations

In this section, we conduct simulation studies to compare the finite-sample performance of the AIPW and IPW kernel–profile estimators for

θ (\cdot)

and

β

, as well as the naive approach which solves unweighted kernel–profile estimating equations based on units with no missing data (complete cases). We generate data according to the spirit of a two-stage study design, where in the first stage we observe the covariates of interest X (e.g., treatment), nuisance covariates Z (e.g., age), and auxiliary variables U for every subject, while in the second stage we only measure the outcome of interest Y on a subset resampled from the first-stage cohort according to some selection probabilities, which depend on the first-stage variables, especially U. For each replication, we generate random samples of

(X, Z, U, Y, R)

, where Z is generated from a

Uniform (0, 1)

distribution, X is generated from

Norm {{(Z - 0.5)}^{2}, 1}

, U is generated from a

Uniform (0, 6) + Norm (X, 0 . 05^{2}) + Norm (Z, 0 . 05^{2})

, and the outcome Y is generated from a normal distribution with mean

E (Y | X, Z, U) = X β_{1} + m (Z) + U β_{2}

(17)

and variance

σ_{Y | X, Z, U}^{2}

, where

β_{1} = β_{2} = 1

,

σ_{Y | X, Z, U}^{2} = 1

,

m (Z) = 2 \cdot F_{8, 8} (Z)

, and

F_{p, q} (Z) = Γ (p + q) {Γ (p) Γ (q)}^{- 1} Z^{p - 1} {(1 - Z)}^{q - 1}

, a unimodal function. Note that Z is correlated with X, U, and Y, while U is correlated with X, Z, and Y. We generate R, the selection indicator, according to

logit {π_{i}} = τ_{0} + τ_{1} \cdot (U_{i} - a_{1}) I (a_{1} < U_{i} \leq a_{2}) + τ_{1} \cdot (a_{2} - a_{1}) I (U_{i} > a_{2})

(18)

where

π_{i} = P (R_{i} = 1 | X_{i}, Z_{i}, U_{i})

is the probability that subject i is selected to the second stage,

τ_{0} = - 2,

τ_{1} = 1

,

a_{1} = 0.5

, and

a_{2} = 5.5

. Based on this selection mechanism, the Monte Carlo median missing percentage of the outcome Y is around

35 %

. Since the selection probability depends on U only, the assumption of missing at random holds. Note that

E (Y | X, Z) = X β + θ (Z),

where the true

β = 2

, and true

θ (Z) = m (Z) + Z + 3 .

Our primary interest lies in estimating

β

and the nonparametric curve

θ (z)

.

We generated 100 replications with sample sizes

n = 500

or

n = 1000

in each dataset. For each simulated dataset, we applied the naive approach, which uses the complete cases directly, as well as the IPW and the proposed AIPW kernel–profile methods to estimate the nonparametric function

θ (\cdot)

and the semiparametric parameter

β

. We employed the generalized EBBS method to select the optimal local bandwidth.

Figure 1 displays the average estimated nonparametric functions

\hat{θ} (\cdot)

over 100 replications using the naive, IPW, and AIPW approaches. The plot shows the estimates when the weighted kernel–profile estimating equations use the true

π_{i 0}

and

E [Y | X = x_{i}, Z = z_{i}, U = u_{i}]

. The IPW and AIPW kernel estimates closely matched the true curve

θ (\cdot)

, while the naive approach produced an estimate biased away from the true curve.

Table 1 summarizes the performance of each kernel estimator using integrated relative bias, integrated empirical standard error (SE), integrated estimated SE, and empirical mean integrated squared error (MISE), over the support of Z. The integrated estimated SEs are close to the integrated empirical SEs. As expected, the naive kernel estimate exhibits a much larger relative bias than the IPW and AIPW kernel estimates. The AIPW kernel estimate using the true

π_{i 0}

and

E [Y | X = x_{i}, Z = z_{i}, U = u_{i}]

is optimal, having both a smaller SE and a smaller MISE compared to the IPW kernel estimate. The efficiency gain for

\hat{θ} (z)

in terms of MISE is about 40%. Figure 2 illustrates the estimated pointwise variance of

{\hat{θ}}_{I P W} (\cdot)

and

{\hat{θ}}_{A I P W} (\cdot)

based on 100 replications. It shows that the AIPW approach is more efficient than the IPW approach at each point z in this simulation setup.

In Table 1, we also evaluate the performance of each profile estimator using the averaged relative bias, empirical SE, estimated SE, and mean squared error (MSE). For all estimates, the estimated SEs are close to the empirical SEs, demonstrating that the sandwich estimator we proposed for the variance of

\hat{β}

in Section 5.2 performs well. The bias of

{\hat{β}}_{n a i v e}

was relatively large. When the true

π_{i 0}

or consistent estimates of

π_{i 0}

was used,

{\hat{β}}_{I P W}

had very little bias. Otherwise,

{\hat{β}}_{I P W}

was biased. By contrast, the simulation results in Table 1 demonstrate the double robustness of the AIPW kernel-profile estimators. We computed

{\hat{θ}}_{A I P W} (\cdot)

and

{\hat{β}}_{A I P W}

under three scenarios: (i) with an incorrectly specified model of

π_{i 0}

, e.g.,

τ_{0}^{'} + τ_{1}^{'} \cdot | X_{i} | + τ_{2}^{'} \cdot Z_{i}

on the right side of (18), but with

δ_{i 0}

computed from a correctly specified model (17); (ii) with

δ_{i 0}

computed from an incorrectly specified model, e.g.,

β_{0}^{'} + β_{1}^{'} Z + β_{2}^{'} X

in the right side of (17), but with

π_{i 0}

derived from the correctly specified model (18); and (iii) with both

{\hat{π}}_{i}

and

{\hat{δ}}_{i}

computed from incorrectly specified models respectively. When either the true

π_{i 0}

or the true

E [Y | X = x_{i}, Z = z_{i}, U = u_{i}]

was used or the consistent estimates were used, as in scenarios (i) and (ii), the AIPW kernel-profile estimates were still close to the true values. However, when both were incorrectly specified, as in scenario (iii), the AIPW estimates were subject to biases. Comparing the IPW estimator using the true

π_{i 0}

and the AIPW estimator using both true

π_{i 0}

and true

E [Y | X = x_{i}, Z = z_{i}, U = u_{i}]

, the SE and the MSE of the IPW profile estimate are much larger than the AIPW profile estimate. Under the true

π_{i 0}

model and the true

E [Y | X = x_{i}, Z = z_{i}, U = u_{i}]

model, the efficiency gain of

{\hat{β}}_{A I P W}

in terms of the MSE is about 47% relative to

{\hat{β}}_{I P W}

.

7. Application to the SPECT Data

To illustrate the proposed methods, we applied the AIPW kernel–profile estimating equations to analyze the SPECT data described in Section 1. Our primary objective was to investigate the potential risk factors of myocardial ischemia while controlling for patient age. The data analysis indicated that the risk of myocardial ischemia varies nonlinearly with age, prompting us to model the age effect nonparametrically. The data were collected at the Radiology Clinic of the Nuclear Imaging Group at Cedars Sinai Medical Center. Since myocardial ischemia is relatively rare in younger individuals, we focused on 6185 patients aged 45 and older. This two-stage study involved all patients undergoing EBCT in the first stage. Based on the initial results and other health variables, 458 patients with high-risk factors for coronary artery disease were referred by their doctors to undergo SPECT in the second stage.

The SPECT test serves as the gold standard for screening myocardial ischemia. Consequently, we assumed that the 5727 (93%) patients who did not undergo SPECT were unaware of their true myocardial ischemia status, resulting in missing outcomes. We employed model (1) with myocardial ischemia status as the outcome variable. The covariates of interest include patient age, a continuous variable, and patient gender, smoking status, presence of chest pain, high blood pressure status, and cholesterol status, all binary variables. While polynomial regression with terms like

a g e^{2}

or

a g e^{3}

can model nonlinear relationships, it may overlook local variations. Kernel-profile estimating equations offer greater flexibility, capturing nuances in the data that fixed polynomial terms might miss. We use a partially linear logistic regression model to estimate the probability of myocardial ischemia, with patient gender, smoking status, presence of chest pain, high-blood-pressure status, and cholesterol status serving as linear predictors, while the effect of age was modeled nonparametrically. The bandwidth was determined using the generalized EBBS method.

Figure 3 presents the estimated nonparametric curve of the risk of myocardial ischemia in relation to patient age. The curves estimated by the IPW and AIPW methods closely resemble each other, whereas the naive unweighted approach tends to overestimate the risk of myocardial ischemia. Given that primarily high-risk patients underwent the SPECT exam, relying solely on complete cases using the naive approach is likely to lead to bias in estimating the nonparametric relationship between myocardial ischemia risk and age, as well as the relationship on the logistic scale with other covariates. Our analysis utilizing IPW and AIPW kernel–profile estimating equations suggests that the risk of myocardial ischemia nonlinearly increases with age, with a notable change point around age 70.

Table 2 displays the estimates of regression coefficients from the model, along with the corresponding p-values. Once more, the IPW and AIPW profile-kernel estimates exhibit similarity and contrast with the naive estimates. Our weighted analysis indicates that women tend to have a lower risk of myocardial ischemia compared to men, while patients who smoke or experience chest pain, high blood pressure, or high cholesterol tend to have a heightened risk of myocardial ischemia. Particularly noteworthy is the statistically significant impact of gender and high blood pressure on myocardial ischemia risk. According to the AIPW analysis, patients experiencing high blood pressure have odds approximately

3.6

times higher of developing myocardial ischemia. Additionally, men have approximately 5 times higher odds

(p = 0.044)

of experiencing myocardial ischemia compared to women.

8. Discussion

In this paper, we propose weighted local polynomial kernel-profile estimation methods for generalized semiparametric partially linear regression in cases where outcomes are missing at random while auxiliary variables exist. We demonstrate that the estimators based on the IPW and AIPW kernel-profile estimating equations are consistent and asymptotically normal if the selection probability model

π

is correctly specified. When the

π

model is misspecified, the IPW approach fails to provide consistent estimators. However, the AIPW kernel-profile estimators maintain consistency and asymptotic normality if either the

π

model or the model for

E (Y | X, Z, U)

is correctly specified. This double-robustness property of the AIPW approach allows investigators two avenues for making valid inferences. Furthermore, the AIPW kernel-profile estimators optimally utilize information in observed data: when both the

π_{i}

selection probability model and the

E (Y | X, Z, U)

model are correctly specified, the corresponding AIPW kernel-profile estimators are the most efficient among its class, with the AIPW profile estimator achieving the semiparametric efficiency bound. User-friendly R code has been uploaded to GitHub, which can be accessed at https://github.com/Team-Wang-Lab/AIPWKPEE.git (accessed on 23 July 2024).

When

E (Y | X, Z, U)

is not correctly specified, Wang et al. (2010) [20] proposed a modified AIPW kernel estimator for nonparametric regression, which is guaranteed to be more efficient than the IPW kernel estimator and meanwhile also doubly robust. The same idea can be applied to the semiparametric IPW and AIPW kernel-profile estimators proposed in this paper. The IPW and AIPW kernel-profile estimating equations provide consistent estimators when the selection probability model

π

is correctly specified and is bounded away from 0. However, when some

π

’s are close to 0 with moderate sample sizes, the associated large weights can dramatically inflate a few observations. Therefore, the IPW and AIPW estimators might not perform well and cause unstable results. Special caution is hence needed when applying the proposed methods to studies when the selection probability is very small for some sample units.

The proposed method can be extended to the situation where multiple covariates need to be modeled nonparametrically, e.g., using additive models. For simplicity, we concentrate on local linear kernel estimators for the nonparametric function, but these methods can be readily extended to higher-order local polynomial kernel regression with similar asymptotic results. Although we adopt a parametric model for the missingness probability

π_{i}

in this paper, future research could explore the nonparametric estimation of

π_{i}

and its impact on the semiparametric efficiency of both IPW and AIPW profile estimators of

β

. However, fully nonparametric modeling of

π_{i}

is challenged by the curse of dimensionality, particularly when

π_{i}

depends on a set of covariates.

If some covariates of interest among

X

and Z are missing in addition to Y, the general AIPW profile-kernel theory remains applicable, but the efficient score and efficiency bound may change, and adjustments may be necessary for the corresponding estimating procedure. This topic exceeds the scope of our current paper, and further research is needed on complex scenarios where missingness occurs in both outcomes and covariates. We assume in this paper that the outcome is missing at random (MAR). However, justification for the MAR assumption may be required in observational studies, particularly when the missing data mechanism is not well understood. The literature is substantial on statistical methods for parametric regression in the presence of not missing at random (NMAR) for specific cases. Extending these methods and our proposed methods to fit the semiparametric model (1) under NMAR conditions represents a future research direction.

There are several challenges in the practical implementation. For example, bandwidth selection can be time-consuming in a grid search without any prior knowledge. To reduce the computational burden, one can initially search for the bandwidth on a coarser grid, followed by a finer grid search. Additionally, selecting the appropriate auxiliary variable(s) and specifying the

π

model are crucial factors. Insights from experts would also be beneficial.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/stats7030056/s1. These include detailed regularity conditions; Section S1: derivation of the semiparametric efficient score presented in Section 4; Section S2: proof of Lemmas 1 and 2; Section S3: proof of Theorem 3 and Corollary 2; and Section S4: additional simulation results and a sensitivity analysis for the application.

Author Contributions

Conceptualization, L.W. and X.L.; methodology, L.W. and X.L.; software, L.W. and Z.O.; validation, L.W.; formal analysis, L.W.; data curation, L.W. and Z.O.; writing-original draft preparation, L.W.; writing-review & editing, L.W. and X.L.; visualization, L.W. and Z.O.; supervision, L.W. and X.L.; project administration, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

Wang’s research is partially funded by NIH Grants P50-DA-054039-02, P30-ES-017885-10-A1, and CDC Grant R01-CE-003497-01.

Institutional Review Board Statement

Not applicable. This is a statistical method paper, and does not involve new data collection.

Informed Consent Statement

Not applicable. This is a statistical method paper, and does not involve new data collection.

Data Availability Statement

Available upon request.

Acknowledgments

The authors would like to thank Andrea Rotnitzky for her invaluable guidance, support, and expert advice throughout this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

McCullagh, P.; Nelder, J. Generalized Linear Models; Chapman & Hall: London, UK, 1989. [Google Scholar]
Severini, T.A.; Staniswalis, J.G. Quasi-Likelihood Estimation in Semiparametric Models. J. Am. Stat. Assoc. 1994, 89, 501–511. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R. Generalized Additive Models; Chapman & Hall/CRC: Boca Raton, FL, USA, 1990. [Google Scholar]
Fan, J.; Heckman, N.E.; Wand, M.P. Local Polynomial Kernel Regression for Generalized Linear Models and Quasi-Likelihood Functions. J. Am. Stat. Assoc. 1995, 90, 141–150. [Google Scholar] [CrossRef]
Carroll, R.J.; Fan, J.; Gijbels, I.; Wand, M.P. Generalized Partially Linear Single-Index Models. J. Am. Stat. Assoc. 1997, 92, 477–489. [Google Scholar] [CrossRef]
Lin, X.; Carroll, R.J. Semiparametric Regression for Clustered Data Using Generalized Estimating Equations. J. Am. Stat. Assoc. 2001, 96, 1045–1056. [Google Scholar] [CrossRef]
Lin, X.; Carroll, R.J. Semiparametric Regression for Clustered Data. Biometrika 2001, 88, 1179–1185. [Google Scholar] [CrossRef]
Muller, M. Estimation and Testing in Generalized Partial Linear Models: A comparative Study. Stat. Comput. 2001, 11, 299–309. [Google Scholar] [CrossRef]
Hu, T.; Cui, H. Robust estimates in generalised varying-coefficient partially linear models. J. Nonparametr. Stat. 2010, 22, 737–754. [Google Scholar] [CrossRef]
Rahman, J.; Luo, S.; Fan, Y.; Liu, X. Semiparametric efficient inferences for generalised partially linear models. J. Nonparametr. Stat. 2020, 32, 704–724. [Google Scholar] [CrossRef]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 2nd ed.; John Wiley: New York, NY, USA, 2002. [Google Scholar]
Chu, H.; Halloran, M.E. Estimating vaccine efficacy using auxiliary outcome data and a small validation sample. Stat. Med. 2004, 23, 2697–2711. [Google Scholar] [CrossRef]
Braun, J.; Oldendorf, M.; Moshage, W.; Heidler, R.; Zeitler, E.; Luft, F.C. Electron beam computed tomography in the evaluation of cardiac calcifications in chronic dialysis patients. Am. J. Kidney Dis. 1996, 27, 394–401. [Google Scholar] [CrossRef]
Little, R.J.A. Models for nonresponse in sample surveys. J. Am. Stat. Assoc. 1982, 77, 237–250. [Google Scholar] [CrossRef]
Little, R.J.A. Modeling the Drop-Out Mechanism in Repeated-Measures Studies. J. Am. Stat. Assoc. 1995, 90, 1112–1121. [Google Scholar] [CrossRef]
Robins, J.M.; Rotnitzky, A. Semiparametric Efficiency in Multivariate Regresion Models with Missing Data. J. Am. Stat. Assoc. 1995, 90, 122–129. [Google Scholar] [CrossRef]
Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data. J. Am. Stat. Assoc. 1995, 90, 106–121. [Google Scholar] [CrossRef]
Wang, C.Y.; Wang, S.; Gutierrez, R.G.; Carroll, R.J. Local Linear Regresion for Generalized Linear Models with Missing Data. Ann. Stat. 1998, 26, 1028. [Google Scholar]
Chen, J.; Fan, J.; Li, K.H.; Zhou, H. Local quasi-likelihood estimation with data missing at random. Stat. Sin. 2006, 16, 1044–1070. [Google Scholar]
Wang, L.; Rotnitzky, A.; Lin, X. Nonparametric Regression with Missing Outcomes Using Weighted Kernel Estimating Equations. J. Am. Stat. Assoc. 2010, 105, 1135–1146. [Google Scholar] [CrossRef] [PubMed]
Kennedy, E.H.; Ma, Z.; McHugh, M.D.; Small, D.S. Non-parametric methods for doubly robust estimation of continuous treatment effects. J. R. Stat. Soc. Ser. B Stat. Methodol. 2017, 79, 1229–1245. [Google Scholar] [CrossRef]
Liang, H.; Wang, S.; Robins, J.M.; Carroll, R.J. Estimation in partially linear models with missing covariates. J. Am. Stat. Assoc. 2004, 99, 357–367. [Google Scholar] [CrossRef]
Liang, H. Generalized partially linear models with missing covariates. J. Multivar. Anal. 2008, 99, 880–895. [Google Scholar] [CrossRef] [PubMed]
Wang, Q. Statistical estimation in partial linear models with covariate data missing at random. Ann. Inst. Stat. Math. 2009, 61, 47–84. [Google Scholar] [CrossRef]
Wang, Q.; Linton, O.; Hardle, W. Semiparametric Regression Analysis With Missing Response at Random. J. Am. Stat. Assoc. 2004, 99, 334–345. [Google Scholar] [CrossRef]
Wang, Q.; Sun, Z. Estimation in partially linear models with missing responses at random. J. Multivar. Anal. 2007, 98, 1470–1493. [Google Scholar] [CrossRef]
Liang, H.; Wang, S.; Carroll, R.J. Partially linear models with missing response variables and error-prone covariates. Biometrika 2007, 94, 185–198. [Google Scholar] [CrossRef]
Chen, S.; Keilegom, I.V. Estimation in semiparatric models with missing data. Ann. Inst. Stat. Math. 2013, 65, 785–805. [Google Scholar] [CrossRef]
Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. J. Am. Stat. Assoc. 1994, 89, 846–866. [Google Scholar] [CrossRef]
Rotnitzky, A.; Robins, J.M.; Scharfstein, D.O. Semiparametric Regression fro Repeated Outcomes with Nonignorable Nonresponse. J. Am. Stat. Assoc. 1998, 93, 1321–1339. [Google Scholar] [CrossRef]
Bang, H.; Robins, J.M. Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics 2005, 61, 962–972. [Google Scholar] [CrossRef]
Pepe, M.S. Inference Using Surrogate Outcome Data and a Validation Sample. Biometrika 1992, 79, 355–365. [Google Scholar] [CrossRef]
Reilly, M.; Pepe, M.S. A mean score method for missing and auxiliary covariate data in regression models. Biometrika 1995, 82, 299–314. [Google Scholar] [CrossRef]
Wang, N.; Carroll, R.J.; Lin, X. Efficient Semiparametric Marginal Estimation for Longitudinal/Clustered Data. J. Am. Stat. Assoc. 2005, 100, 147–157. [Google Scholar] [CrossRef]
Ruppert, D. Empirical-Bias Bandwidths for Local Polynomial Nonparametric Regression and Density Estimation. J. Am. Stat. Assoc. 1997, 92, 1049. [Google Scholar] [CrossRef]
Begun, J.M.; Hal, W.J.; Huang, W.M.; Wellner, J.A. Information and Asymptototic Efficiency in Parametric-Nonparametric Models. Ann. Stat. 1983, 11, 432–452. [Google Scholar] [CrossRef]
Newey, W.K. Semiparametric Efficiency Bounds. J. Appl. Econom. 1990, 5, 99–135. [Google Scholar] [CrossRef]
Bickel, P.J.; Klaassen, C.A.; Bickel, P.J.; Ritov, Y.; Klaassen, J.; Wellner, J.A.; Ritov, Y. Efficient and Adaptive Estimation for Semiparametric Models; Springer: New York, NY, USA, 1998. [Google Scholar]
Ibragimov, I.; Hasminskii, R. Statistical Estimation: Asymptotic Theory; Springer: New York, NY, USA, 1981. [Google Scholar]
Robins, J.M.; Rotnitzky, A. Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology: Methodological Issues; Jewell, N., Dietz, K., Farewell, V., Eds.; Birkhäuser: Boston, MA, USA, 1992; pp. 297–331. [Google Scholar]
Rotnitzky, A.; Holcroft, C.; Robins, J.M. Efficiency comparisons in multivariate multiple regression with missing outcomes. J. Multivar. Anal. 1997, 61, 102–128. [Google Scholar] [CrossRef]
van der Laan, M.; Robins, J.M. Unified Methods for Censored Longitudinal Data and Causality; Springer: New York, NY, USA, 2003. [Google Scholar]
Tsiatis, A.A. Semiparametric Theory and Missing Data; Springer Series in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
Robins, J.M.; Rotnitzky, A. Comment on the Bickel and Kwon article, Inference for semiparametric models: Some questions and an answer. Stat. Sin. 2001, 11, 920–936. [Google Scholar]

Figure 1. The true

θ (z)

and the estimated nonparametric functions

\hat{θ} (z)

through naive, IPW, and AIPW kernel estimating equations based on 100 replications.

Figure 1. The true

θ (z)

and the estimated nonparametric functions

\hat{θ} (z)

through naive, IPW, and AIPW kernel estimating equations based on 100 replications.

Figure 2. Empirical pointwise variance of the IPW and AIPW estimated nonparametric functions

\hat{θ} (z)

based on 100 replications.

Figure 2. Empirical pointwise variance of the IPW and AIPW estimated nonparametric functions

\hat{θ} (z)

based on 100 replications.

Figure 3. Estimate of

θ (age)

for the risk of myocardial ischemia controlled for other potential risk factors and confounders.

Figure 3. Estimate of

θ (age)

for the risk of myocardial ischemia controlled for other potential risk factors and confounders.

Table 1. Simulation results of the naive, IPW, and AIPW kernel–profile estimates based on 100 replications (sample size n = 500). Note: ¹ relative bias is defined as

\int | \hat{b i a s} {\hat{θ} (z)} / θ (z) | d F (z)

; ² EMP S.E. is the empirical S.E., defined as

\int {\hat{S E}}_{E M P} {\hat{θ} (z)} d F (z)

, where

{\hat{S E}}_{E M P} {\hat{θ} (z)}

is the sampling S.E. of the replicated

\hat{θ} (z)

; ³ EST S.E. is the estimated S.E., defined as

\int {\hat{S E}}_{E S T} {\hat{θ} (z)} d F (z)

, where

{\hat{S E}}_{E S T} {\hat{θ} (z)}

is the sampling average of the replicated sandwich estimates

\hat{S E} {\hat{θ} (z)}

; ⁴ EMP MISE is the empirical MISE, defined as

\int {\hat{θ} (z) - θ (z)}^{2} d F (z)

; and ⁵

δ

represents

E [Y | X, Z, U]

.

Table 1. Simulation results of the naive, IPW, and AIPW kernel–profile estimates based on 100 replications (sample size n = 500). Note: ¹ relative bias is defined as

\int | \hat{b i a s} {\hat{θ} (z)} / θ (z) | d F (z)

; ² EMP S.E. is the empirical S.E., defined as

\int {\hat{S E}}_{E M P} {\hat{θ} (z)} d F (z)

, where

{\hat{S E}}_{E M P} {\hat{θ} (z)}

is the sampling S.E. of the replicated

\hat{θ} (z)

; ³ EST S.E. is the estimated S.E., defined as

\int {\hat{S E}}_{E S T} {\hat{θ} (z)} d F (z)

, where

{\hat{S E}}_{E S T} {\hat{θ} (z)}

is the sampling average of the replicated sandwich estimates

\hat{S E} {\hat{θ} (z)}

; ⁴ EMP MISE is the empirical MISE, defined as

\int {\hat{θ} (z) - θ (z)}^{2} d F (z)

; and ⁵

δ

represents

E [Y | X, Z, U]

.

	Kernel Estimator				Profile Estimator
	of $θ (\cdot)$				of $β$
	Relative	EMP	EST	EMP	Bias	EMP	EST	EMP
	Bias ¹	S.E. ²	S.E. ³	MISE ⁴	of $\hat{β}$	S.E.	S.E.	MSE
Naive Estimator	0.166	0.231	0.226	0.672	0.120	0.108	0.102	0.070
IPW Estimator
True $π$	0.066	0.338	0.308	0.167	0.055	0.140	0.124	0.019
Consistent $\hat{π}$	0.064	0.329	0.311	0.158	0.051	0.130	0.125	0.017
Wrong $π$	0.162	0.222	0.226	0.638	0.137	0.115	0.102	0.088
AIPW Estimator
True $π$ and $δ$ ⁵	0.049	0.228	0.228	0.100	0.041	0.099	0.101	0.010
Consistent $\hat{π}$ and consistent $\hat{δ}$	0.047	0.231	0.233	0.092	0.040	0.099	0.100	0.010
Wrong $π$ and consistent $\hat{δ}$	0.048	0.124	0.213	0.096	0.046	0.110	0.092	0.012
Consistent $\hat{π}$ and wrong $δ$	0.077	0.399	0.404	0.218	0.067	0.169	0.153	0.029
Both wrong	0.162	0.279	0.286	0.653	0.109	0.125	0.114	0.060

Table 2. Estimates of

β

in the semiparametric logistic regression for evaluating the risk factors of myocardial ischemia.

Table 2. Estimates of

β

in the semiparametric logistic regression for evaluating the risk factors of myocardial ischemia.

	Naive			IPW			AIPW
Risk Factors	$\hat{β}$	SE	p -Value	$\hat{β}$	SE	p -Value	$\hat{β}$	SE	p -Value
female	−1.56	0.76	0.040	−1.55	0.76	0.043	−1.54	0.76	0.044
smoking	0.36	0.27	0.173	0.51	0.27	0.059	0.51	0.27	0.060
chest pain	0.32	0.49	0.514	0.39	0.49	0.430	0.39	0.49	0.433
blood pressure med.	1.14	0.34	0.001	1.28	0.35	<0.001	1.28	0.35	<0.001
cholesterol med.	1.13	0.84	0.177	1.21	0.84	0.152	1.22	0.84	0.148

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, L.; Ouyang, Z.; Lin, X. Doubly Robust Estimation and Semiparametric Efficiency in Generalized Partially Linear Models with Missing Outcomes. Stats 2024, 7, 924-943. https://doi.org/10.3390/stats7030056

AMA Style

Wang L, Ouyang Z, Lin X. Doubly Robust Estimation and Semiparametric Efficiency in Generalized Partially Linear Models with Missing Outcomes. Stats. 2024; 7(3):924-943. https://doi.org/10.3390/stats7030056

Chicago/Turabian Style

Wang, Lu, Zhongzhe Ouyang, and Xihong Lin. 2024. "Doubly Robust Estimation and Semiparametric Efficiency in Generalized Partially Linear Models with Missing Outcomes" Stats 7, no. 3: 924-943. https://doi.org/10.3390/stats7030056

Article Menu

Doubly Robust Estimation and Semiparametric Efficiency in Generalized Partially Linear Models with Missing Outcomes

Abstract

1. Introduction

2. A Formalization of the Inferential Problem

3. The Estimation Procedure

3.1. The AIPW Kernel–Profile Estimating Equations

3.2. Doubly Robust, Locally Efficient Estimation

4. Semiparametric Efficiency Theory for Estimation of $β$

5. Asymptotic Properties

5.1. Asymptotic Results of the AIPW Profile Estimator ${\hat{β}}_{A I P W}$

5.2. Asymptotic Results of the IPW Profile Estimator ${\hat{β}}_{I P W}$

6. Simulations

7. Application to the SPECT Data

8. Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Doubly Robust Estimation and Semiparametric Efficiency in Generalized Partially Linear Models with Missing Outcomes

Abstract

1. Introduction

2. A Formalization of the Inferential Problem

3. The Estimation Procedure

3.1. The AIPW Kernel–Profile Estimating Equations

3.2. Doubly Robust, Locally Efficient Estimation

4. Semiparametric Efficiency Theory for Estimation of β

5. Asymptotic Properties

5.1. Asymptotic Results of the AIPW Profile Estimator β ^ A I P W

5.2. Asymptotic Results of the IPW Profile Estimator β ^ I P W

6. Simulations

7. Application to the SPECT Data

8. Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4. Semiparametric Efficiency Theory for Estimation of $β$

5.1. Asymptotic Results of the AIPW Profile Estimator ${\hat{β}}_{A I P W}$

5.2. Asymptotic Results of the IPW Profile Estimator ${\hat{β}}_{I P W}$