Estimation in Semi-Varying Coefficient Heteroscedastic Instrumental Variable Models with Missing Responses

Zhang, Weiwei; Luo, Jingxuan; Ma, Shengyun

doi:10.3390/math11234853

Open AccessArticle

Estimation in Semi-Varying Coefficient Heteroscedastic Instrumental Variable Models with Missing Responses

by

Weiwei Zhang

¹

,

Jingxuan Luo

² and

Shengyun Ma

^1,*

¹

College of Science, Inner Mongolia Agricultural University, Hohhot 010018, China

²

School of Statistics, Beijing Normal University, Beijing 100875, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(23), 4853; https://doi.org/10.3390/math11234853

Submission received: 17 October 2023 / Revised: 29 November 2023 / Accepted: 30 November 2023 / Published: 2 December 2023

(This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

This paper studies the estimation problem for semi-varying coefficient heteroscedastic instrumental variable models with missing responses. First, we propose the adjusted estimators for unknown parameters and smooth functional coefficients utilizing the ordinary profile least square method and instrumental variable adjustment technique with complete data. Second, we present an adjusted estimator of the stochastic error variance by employing the Nadaraya–Watson kernel estimation technique. Third, we apply the inverse probability-weighted method and instrumental variable adjustment technique to construct the adaptive-weighted adjusted estimators for unknown parameters and smooth functional coefficients. The asymptotic properties of our proposed estimators are established under some regularity conditions. Finally, numerous simulation studies and a real data analysis are conducted to examine the finite sample performance of the proposed estimators.

Keywords:

adaptive-weighted adjusted estimation; heteroscedastic; Nadaraya–Watson kernel estimation; semi-varying coefficient instrumental variable models

MSC:

62G05; 62G20

1. Introduction

As an important category of statistical regression models, the varying-coefficient partially linear model possesses strong explanatory power and flexibility. It has been used widely in scientific research, such as econometrics, biomedicine and engineering technology. Its general mathematical expression is:

Y = X^{⊤} θ (U) + Z^{⊤} β + ε,

(1)

where Y is the response variable, and

X \in R^{P}

and

Z \in R^{q}

are covariates. To prevent the “curse of dimensionality” problem, the covariate U is confined to be one-dimensional.

θ (\cdot) = {[θ_{1} (\cdot), \dots, θ_{p} (\cdot)]}^{⊤}

is a

p \times 1

smooth functional coefficients vector,

β = {(β_{1}, β_{2}, \dots, β_{q})}^{⊤}

is a

q \times 1

constant coefficients vector, and

ε

is the model error, which is independent of

(X, Z, U)

. The mean of

ε

is zero, and the variance is designated as a heteroscedasticity structure, which satisfies

Var (ε | X, Z, U) = σ^{2} (U)

. As an extension of classical linear regression models, varying-coefficient partially linear models have been investigated by many scholars with the hypothesis of homoscedasticity; please see [1,2,3,4,5,6,7,8]. However, in regression analysis, we may omit some important explanatory variables, and the sample observation data may be measured with errors. For such cases, the model error can be heteroscedastic. In recent years, statisticians have developed many statistical inference methods for model (1) with heteroscedasticity. For instance, Shen et al. [9] introduced a re-weighted estimation procedure for unknown parameters based on the generalized least squares method. Zhao et al. [10] proposed a two-stage iterative estimation method by using the orthogonal projection technique, which can individually estimate unknown parameters and functional coefficients. Zhao et al. [11] proposed a re-weighted estimation procedure when the covariates contain an additive measurement error. Zhang and Li [12] proposed a weighted estimation and testing method when the covariates suffer from an additive measurement error. Yuan and Zhou [13] introduced an adaptive-weighted estimation method for model (1), which can increase the estimation accuracy of their proposed estimators.

However, the above works have not considered the endogenous problem of covariates. In practice, there may be endogenous explanatory variables in model (1); see [14,15]. In such cases, the above methods will promote the generation of endogenous bias, which leads to the inconsistency of obtained estimators. Therefore, the instrumental variable method will provide an effective way to eliminate endogeneity bias. In the past ten years, the semi-parametric instrumental variable model has been widely studied by many statisticians. For instance, Cai and Xiong [16] suggested a three-stage estimation approach for semi-varying coefficient models with endogenous covariates. Zhao and Li [17] developed an effective variable selection approach for the classical varying coefficient models with endogenous covariates. Zhao and Xue [18] considered the interval estimation for semi-parametric instrumental variable models by using the empirical likelihood method. Yuan et al. [19] proposed an effective method to identify important variables by combining the SCAD penalty method and instrumental variable adjustment technique for semi-varying coefficient models with endogenous covariates. Zhao et al. [20] applied the popular empirical likelihood approach to study the effective interval estimation for semi-varying coefficient instrumental variable models with the orthogonal decomposition technique. For more related research, please refer to the literature [21,22,23,24]. In this paper, the covariates U and

Z

are assumed to be exogenous, but the covariates

X

is endogenous, and

ζ \in R^{r}

is an instrumental variable related to

X

. Similar to [16], the dimension of

ζ

is designated as greater than or equal to the dimension p of

X

for identifiability, and

X

and

ζ

are specified to satisfy the following parametric model:

X = Ψ ζ + e,

(2)

where

Ψ

is an unknown constant matrix with dimension

p \times r

, and the error term

e

satisfies

E (e | ζ, Z, U) = 0

.

E (ε | X, Z, U) \neq 0

since thecovariates are endogenous, which indicates that

X

is associated with the model error

ε

. Moreover, we further suppose that

E (ε | ζ, Z, U) = 0

.

In applications, the missing data can occur in many areas, such as market surveys, medical research, opinion polls, and other scientific experiments. When we encounter the missing data, the classical statistical inference methods cannot be used directly. Thus, scholars have developed corresponding methods to solve the problem of missing data. The main methods included the complete sample method, inverse probability weighted technique, and imputation technique. Rubin [25] discussed the complete sample method in detail, but this method will reduce the estimation effectiveness, especially when the observed data are missing at random. Robins et al. [26] suggested an inverse probability-weighted method by assigning the weights to the observed data, which can effectively diminish the deviation caused by the missing data. Wang and Rao [27], and Wang et al. [28] developed the imputation methods for linear and semi-parametric regression models, respectively. Up to now, many scholars have studied the statistical inference for model (1) with missing data, but few scholars considered heteroscedasticity and endogeneity. For instance, Li and Xue [29] constructed an imputation estimator for unknown parameters with a missing response. When the explanatory variables are missing at random, Chen et al. [30] constructed the inverse probability weighted estimation of unknown constant and functional coefficients. For more recent works on missing data research for model (1), the reader can refer to [31,32,33], among others. In this paper, the response variable is specified to be missing, and other explanatory variables can be fully observed. An indicator variable

δ

is introduced to describe the missing mechanism. If Y is obtainable, we denote

δ = 1

, and otherwise

δ = 0

. Furthermore, we suppose that the missing data mechanism is randomly missing, which is expressed as:

P (δ = 1 | U, X, Z, Y) = P (δ = 1 | U, X, Z) = π (U, X, Z),

(3)

where

π (\cdot)

is referred to as the propensity score.

Although many scholars have discussed the statistical inference procedures for various semi-parametric models with endogenous covariates, the existing works have not considered the heteroscedasticity and missing data. Therefore, we consider the estimation problem for models (1) and (2) with heteroscedasticity and a missing response. The adaptive-weighted adjusted estimators of unknown parameters and functional coefficients are proposed using the profile least squares method, instrumental variables adjustment technique, Nadaraya–Watson kernel estimation, inverse probability-weighted method, and weighted least squares method, and we also establish the asymptotic properties of the proposed estimators.

The rest of the paper is organized as follows. In Section 2, we introduced an adaptive-weighted adjusted estimation method to obtain the estimators for unknown parameters and functional coefficients, and the corresponding asymptotic properties are established. In Section 3, numerous simulation studies are conducted to demonstrate the effectiveness and feasibility of the proposed estimators. A real data analysis is performed as well in Section 4. Section 5 summarizes the research results of this paper with some conclusions. The technical proofs are presented in Appendix A.

2. Estimation Methods and Main Results

2.1. Adjusted Profile Least Squares Estimation

In this subsection, we apply the local linear smoothing technique and instrumental variable adjustment technique to estimate unknown parameters

β

and smooth functional coefficients

θ (\cdot)

. Assume that

{Y_{i}, X_{i}, U_{i}, Z_{i}, ζ_{i}, δ_{i}}_{i = 1}^{n}

are independent and identically distributed (i.i.d.) samples, which come from the semi-varying coefficient heteroscedastic instrumental variable models (1)–(3); then, we have:

\{\begin{matrix} δ_{i} Y_{i} = δ_{i} X_{i}^{⊤} θ (U_{i}) + δ_{i} Z_{i}^{⊤} β + δ_{i} ε_{i}, \\ X_{i} = Ψ ζ_{i} + e_{i}, i = 1, 2, \dots, n . \end{matrix}

(4)

For any u in the small neighborhood of

u_{0}

, each functional coefficients

θ_{j} (u) (j = 1, 2, \dots, p)

can be expanded by the Taylor expansion as follows:

θ_{j} (u) \approx θ_{j} (u_{0}) + θ_{j}^{'} (u_{0}) (u - u_{0}), j = 1, 2, \dots, p .

If

β

is given, then the estimator of

θ_{j} (u_{0})

is given by minimizing the following weighted least squares objective function:

\sum_{i = 1}^{n} {\{Y_{i} - Z_{i}^{⊤} β - \sum_{j = 1}^{p} [θ_{j} (u_{0}) + θ_{j}^{'} (u_{0}) (U_{i} - u_{0})] X_{i j}\}}^{2} K_{h_{1}} (U_{i} - u_{0}) δ_{i},

(5)

where

K_{h_{1}} (\cdot) = K (\cdot / h_{1}) / h_{1}

,

K (\cdot)

is a kernel function, which is chosen as a symmetric probability density function, and

h_{1}

is a bandwidth. For ease of presentation, we denote:

Y = {(Y_{1}, Y_{2}, \dots, Y_{n})}^{⊤}, X = {(X_{1}, X_{2}, \dots, X_{n})}^{⊤}, Z = {(Z_{1}, Z_{2}, \dots, Z_{n})}^{⊤},

Δ_{0} = diag (δ_{1}, δ_{2}, \dots, δ_{n}), M = {[X_{1}^{⊤} θ (U_{1}), \dots, X_{n}^{⊤} θ (U_{n})]}^{⊤},

w_{h_{1}} (u_{0}) = diag [K_{h_{1}} (U_{1} - u_{0}), K_{h_{1}} (U_{2} - u_{0}), \dots, K_{h_{1}} (U_{n} - u_{0})],

X_{h_{1}} (u_{0}) = (\begin{matrix} X_{1}^{⊤} & h_{1}^{- 1} (U_{1} - u_{0}) X_{1}^{⊤} \\ ⋮ & ⋮ \\ X_{n}^{⊤} & h_{1}^{- 1} (U_{n} - u_{0}) X_{n}^{⊤} \end{matrix}) .

Thus, the first formula of model (4) can be transformed into:

Δ_{0} Y - Δ_{0} Z β = Δ_{0} M + Δ_{0} ε .

(6)

By minimizing the weighted least squares objective (5), the estimators of functional coefficients

θ (u_{0})

are given by:

\tilde{θ} (u_{0}, β) = (I_{p}, 0_{p}) {X_{h_{1}}^{⊤} (u_{0}) w_{h_{1}} (u_{0}) Δ_{0} X_{h_{1}} (u_{0})}^{- 1} X_{h_{1}}^{⊤} (u_{0}) w_{h_{1}} (u_{0}) Δ_{0} (Y - Z β),

(7)

where

I_{p}

and

0_{p}

denote the identity matrix and zero matrix with dimension

p \times p

. It is noteworthy that the explanatory variable

X

in this paper is an endogenous variable, which indicates that

E (ε | X) \neq 0

. Thus, the estimators of the functional coefficients in (7) are inconsistent. Then, we make a correction for

\tilde{θ} (u_{0}, β)

by applying the available instrumental variables

ζ

. For model (2), we can easily obtain:

E (X ζ^{⊤}) = Ψ E (ζ ζ^{⊤}) .

Therefore, a usual moment estimator of the unknown constant matrix

Ψ

is given by:

\hat{Ψ} = (\sum_{i = 1}^{n} X_{i} ζ_{i}^{⊤}) {(\sum_{i = 1}^{n} ζ_{i} ζ_{i}^{⊤})}^{- 1} .

Let

{\hat{X}}_{i} = \hat{Ψ} ζ_{i}

. Invoking (7), the adjusted estimators of functional coefficients

θ (u)

are given by:

\hat{θ} (u_{0}, β) = (I_{p}, 0_{p}) {{\hat{X}}_{h_{1}}^{⊤} (u_{0}) w_{h_{1}} (u_{0}) Δ_{0} {\hat{X}}_{h_{1}} (u_{0})}^{- 1} {\hat{X}}_{h_{1}}^{⊤} (u_{0}) w_{h_{1}} (u_{0}) Δ_{0} (Y - Z β),

(8)

where

{\hat{X}}_{h_{1}} (u_{0})

has the same structure as

X_{h_{1}} (u_{0})

, but replacing the variable

X_{i}

with

{\hat{X}}_{i}

. Then, invoking (8), the estimator of

M

is given by:

\hat{M} (β) = {({\hat{X}}_{1}^{⊤} \hat{θ} (U_{1}, β), \dots, {\hat{X}}_{n}^{⊤} \hat{θ} (U_{n}, β))}^{⊤} ≜ S (Y - Z β)

(9)

and:

S = (\begin{matrix} ({\hat{X}}_{1}^{⊤}, 0) {{\hat{X}}_{h_{1}} (U_{1}) w_{h_{1}} (U_{1}) Δ_{0} {\hat{X}}_{h_{1}} (U_{1})}^{- 1} {\hat{X}}_{h_{1}}^{⊤} (U_{1}) w_{h_{1}}^{⊤} (U_{1}) Δ_{0} \\ ⋮ \\ ({\hat{X}}_{n}^{⊤}, 0) {{\hat{X}}_{h_{1}} (U_{n}) w_{h_{1}} (U_{n}) Δ_{0} {\hat{X}}_{h_{1}} (U_{n})}^{- 1} {\hat{X}}_{h_{1}} (U_{n}) w_{h_{1}}^{⊤} (U_{n}) Δ_{0} \end{matrix}),

where

0

denotes a zero vector with dimensions

1 \times p

. Replacing

M

with

\hat{M} (β)

in (6), we obtain:

Δ_{0} (I - S) Y = Δ_{0} (I - S) Z β + Δ_{0} ε .

(10)

For the model (10), a least squares approach is implemented, and then the adjusted estimators of unknown parameters

β

are given by:

\hat{β} = {({\tilde{Z}}^{⊤} Δ_{0} \tilde{Z})}^{- 1} {\tilde{Z}}_{0}^{⊤} Δ_{0} \tilde{Y},

(11)

where

\tilde{Y} = (I - S) Y

,

\tilde{Z} = (I - S) Z

. Combining (8) and (11), the adjusted estimators of functional coefficients

θ (u)

at

u_{0}

are given by:

\hat{θ} (u_{0}, \hat{β}) = (I_{p}, 0_{p}) {{\hat{X}}_{h_{1}}^{⊤} (u_{0}) w_{h_{1}} (u_{0}) Δ_{0} {\hat{X}}_{h_{1}} (u_{0})}^{- 1} {\hat{X}}_{h_{1}}^{⊤} (u_{0}) w_{h_{1}} (u_{0}) Δ_{0} (Y - Z \hat{β}) .

(12)

2.2. Adaptive-Weighted Adjusted Profile Least Squares Estimation

In this subsection, we develop an adaptive-weighted adjusted estimation method for unknown parameters and functional coefficients based on weighted least squares estimation. First of all, according to the estimators of model residuals, we suggest an adjusted Nadaraya–Watson kernel estimation method for the variance function. By (11) and (12), we can estimate the residual error by:

\hat{ε} = {({\hat{ε}}_{1}, {\hat{ε}}_{2}, \dots, {\hat{ε}}_{n})}^{⊤} = Δ_{0} (Y - \hat{M} (\hat{β}) - Z \hat{β}) .

(13)

Note that

Var (ε_{i} | X_{i}, Z_{i}, U_{i}) = σ^{2} (U_{i})

, and by using the Nadaraya–Watson kernel estimation method, an adjusted estimator of variance function

σ^{2} (u_{0})

is given by:

{\hat{σ}}^{2} (u_{0}) = \frac{\sum_{i = 1}^{n} δ_{i} {\hat{ε}}_{i}^{2} K_{\tilde{h}} (U_{i} - u_{0})}{\sum_{i = 1}^{n} δ_{i} K_{\tilde{h}} (U_{i} - u_{0})},

(14)

where

K_{\tilde{h}} (\cdot)

and

K_{h_{1}} (\cdot)

have the same structure, except that the bandwidth

\tilde{h}

is replaced by

h_{1}

. Furthermore, replacing

u_{0}

by

U_{i} (i = 1, \dots, n)

, we can obtain:

{\hat{σ}}^{2} (U_{i}) = \frac{\sum_{k = 1}^{n} δ_{k} {\hat{ε}}_{k}^{2} K_{\tilde{h}} (U_{k} - U_{i})}{\sum_{k = 1}^{n} δ_{k} K_{\tilde{h}} (U_{k} - U_{i})}, i = 1, \dots, n .

(15)

The estimator

{\hat{σ}}^{2} (u_{0})

of the variance function

σ^{2} (u_{0})

is consistent. The proof of this property is similar to that of Theorem 1 in [9]; thus, we omit the details.

Then, we consider how to deal with missing data. In general, the selection probability function

π (U_{i}, X_{i}, Z_{i})

defined in (3) is usually unknown. We can employ nonparametric estimation methods to estimate it, such as kernel estimation and local polynomial estimation, but these methods may cause the curse of dimensionality. Therefore, similar to the method in [31], we suppose that the selection probability function satisfies:

π (U_{i}, X_{i}, Z_{i}, w) = \frac{exp (w_{0} + w_{1} U_{i} + w_{2}^{⊤} X_{i} + w_{3}^{⊤} Z_{i})}{1 + exp (w_{0} + w_{1} U_{i} + w_{2}^{⊤} X_{i} + w_{3}^{⊤} Z_{i})},

where

w = {(w_{0}, w_{1}, w_{2}^{⊤}, w_{3}^{⊤})}^{⊤}

is the unknown parameter vector, and the estimator

\hat{w}

of

w

can be attained by the quasi-likelihood estimation method. We also assume

V_{i} = {(U_{i}, X_{i}^{⊤}, Z_{i}^{⊤})}^{⊤}

, and then the estimator

π (U_{i}, X_{i}, Z_{i}, \hat{w}) = π (V_{i}, \hat{w})

.

Based on the estimator of variance function

{\hat{σ}}^{2} (U_{i})

and selection probability function

π (V_{i}, \hat{w})

, the adaptive-weighted adjusted estimators for functional coefficients are given by minimizing:

\sum_{i = 1}^{n} {\{Y_{i} - Z_{i}^{⊤} β - \sum_{j = 1}^{p} [θ_{j} (u_{0}) + θ_{j}^{'} (u_{0}) (U_{i} - u_{0})] X_{i j}\}}^{2} {\hat{σ}}^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, \hat{w})} K_{h_{2}} (U_{i} - u_{0}),

(16)

where

K_{h_{2}} (\cdot) = K (\cdot / h_{2}) / h_{2}

with bandwidth

h_{2}

.

Minimizing the objective Function (16), the adaptive-weighted estimators of functional coefficients can be expressed as:

{\tilde{θ}}_{w} (u_{0}, β) = (I_{p}, 0_{p}) {X_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} X_{h_{2}} (u_{0})}^{- 1} X_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} (Y - Z β),

where

\hat{Σ} = diag [{\hat{σ}}^{2} (U_{1}), {\hat{σ}}^{2} (U_{2}), \dots, {\hat{σ}}^{2} (U_{n})]

,

\hat{Δ} = [δ_{1} / π (V_{1}, \hat{w}), \dots, δ_{n} / π (V_{n}, \hat{w})] .

Since the explanatory variable

X

is endogenous, the instrumental variable adjustment technique is used to make a correction for

\tilde{θ} (u_{0}, β)

. Then, the adaptive-weighted adjusted estimators of functional coefficients

θ (u_{0})

are given by:

{\hat{θ}}_{w} (u_{0}, β) = (I_{p}, 0_{p}) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} (Y - Z β),

(17)

where

{\hat{X}}_{h_{2}} (u_{0})

and

{\hat{X}}_{h_{1}} (u_{0})

,

w_{h_{2}} (u_{0})

and

w_{h_{1}} (u_{0})

have the same form except that

h_{1}

is replaced by

h_{2}

, respectively. Then, the estimator of

M

is given by:

{\hat{M}}_{w} = {({\hat{X}}_{1}^{⊤} {\hat{θ}}_{w} (U_{1}, β), \dots, {\hat{X}}_{n}^{⊤} {\hat{θ}}_{w} (U_{n}, β))}^{⊤} ≜ \hat{S} (Y - Z β),

(18)

where:

\hat{S} = (\begin{matrix} ({\hat{X}}_{1}^{⊤}, 0) {{\hat{X}}_{h_{2}}^{⊤} (U_{1}) w_{h_{2}} (U_{1}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (U_{1})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (U_{1}) w_{h_{2}}^{⊤} (U_{1}) {\hat{Σ}}^{- 1} \hat{Δ} \\ ⋮ \\ ({\hat{X}}_{n}^{⊤}, 0) {{\hat{X}}_{h_{2}}^{⊤} (U_{n}) w_{h_{2}} (U_{n}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (U_{n})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (U_{n}) w_{h_{2}}^{⊤} (U_{1}) {\hat{Σ}}^{- 1} \hat{Δ} \end{matrix}) .

Substituting

{\hat{M}}_{w}

into (1), we have:

(I - \hat{S}) Y = (I - \hat{S}) Z β + ε .

(19)

In order to eliminate the impact of heteroscedasticity, left-multiplying the matrix

{\hat{Σ}}^{- 1 / 2}

to (19), we obtain:

{\hat{Σ}}^{- 1 / 2} (I - \hat{S}) Y = {\hat{Σ}}^{- 1 / 2} (I - \hat{S}) β + {\hat{Σ}}^{- 1 / 2} ε,

(20)

where

{\hat{Σ}}^{- 1 / 2} = diag [{\hat{σ}}^{- 1} (U_{1}), {\hat{σ}}^{- 1} (U_{2}), \dots, {\hat{σ}}^{- 1} (U_{n})]

.

By employing the inverse probability-weighted method and combining the idea of weighted least squares, we can derive the estimators of unknown parameters

β

by minimizing

Q (β) = {[(I - \hat{S}) Y - (I - \hat{S}) X β]}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} [(I - \hat{S}) Y - (I - \hat{S}) X β] .

Solving the minimum problem with respect to

β

, the proposed adaptive-weighted adjusted estimator of

β

is given by:

{\hat{β}}_{w} = {({\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Z})}^{- 1} {\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Y},

(21)

where

\hat{Y} = (I - \hat{S}) Y, \hat{Z} = (I - \hat{S}) Z

. Moreover, substituting

{\hat{β}}_{w}

into (17), we give the proposed adaptive-weighted adjusted estimators of functional coefficients

θ (u_{0})

as follows:

{\hat{θ}}_{w} (u_{0}, {\hat{β}}_{w}) = (I_{p}, 0_{p}) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} (Y - Z {\hat{β}}_{w}) .

(22)

2.3. Asymptotic Properties

In this subsection, we establish the asymptotic properties of the proposed estimators. First, some regularity conditions are needed. These conditions are mild, and similar conditions can be found in [9,10,11], and other varying-coefficient partially linear heteroscedasticity literature.

(C1): The random variable U has a bounded support $U$ and its density function $f (u)$ is Lipschitz-continuous and has a second-order continuous derivative. Moreover, $f (u)$ is bounded away from zero.
(C2): The kernel $K (\cdot)$ is a symmetric probability density function with a compact support and is Lipschitz-continuous.
(C3): For each $u \in U$ , the matrix $Π (u) = E (ζ_{1} ζ_{1}^{⊤} | U = u)$ is invertible, and the matrices $Π (u)$ , $Π^{- 1} (u)$ , and $Ξ (u) = E (ζ_{1} Z_{1}^{⊤} | U = u)$ are Lipschitz-continuous.
(C4): For each $u \in U$ , the functional coefficients ${θ_{j} (u), j = 1, 2, \dots, p}$ are Lipschitz-continuous and have continuous second derivatives.
(C5): There is a $s > 2$ , such that $E ∥ ζ_{1} ∥^{2 s} < \infty, E {∥ Z_{1} ∥}^{2 s} < \infty .$
(C6): There is a $δ < 1 - s^{- 1}$ , such that ${lim}_{n \to \infty} n^{2 δ - 1} h_{i} = \infty, i = 1, 2$ .
(C7): The variance function $σ^{2} (\cdot)$ has a continuous second derivative and is uniformly bounded on the domain.
(C8): For the bandwidth $h_{i} (i = 1, 2)$ , $n h_{i}^{2} \to \infty, n h_{i}^{8} \to 0$ and ${[log (1 / h_{i})]}^{2} / (n h_{i}^{2}) \to 0$ as $n \to \infty$ . In addition, $h_{i}$ and $\tilde{h}$ satisfy $O (c_{n_{i}}) O ({\tilde{c}}_{n}) = o (n^{- 1 / 2})$ , where $c_{n_{i}} = h_{i}^{2} + {[log (1 / h_{i}) / (n h_{i})]}^{1 / 2}, i = 1, 2, {\tilde{c}}_{n} = {\tilde{h}}^{2} + {[log (1 / \tilde{h}) / (n \tilde{h})]}^{1 / 2}$ .
(C9): On the base of $(X_{i}, Z_{i}, U_{i}), π (\cdot)$ has a second-order continuous derivative. Moreover, $π (\cdot)$ is bounded away from zero.

Conditions (C1)–(C6) are quite generally required in the semi-varying coefficient model. Conditions (C7) and (C8) are mainly to obtain the consistent estimator of the variance function, which can be found in [9]. Condition (C9) provides a guarantee for the inverse probability weighted technique.

Theorem 1.

Suppose that regularity conditions (C1)–(C9) hold; for each

u_{0} \in U

, we have:

sup_{u_{0} \in U} | {\hat{σ}}^{2} (u_{0}) - σ^{2} (u_{0}) | = o_{p} ({\tilde{c}}_{n}) .

Theorem 2.

Suppose that regularity conditions (C1)–(C9) hold; then, the proposed adaptive-weighted adjusted estimator of

β

satisfies:

\sqrt{n} ({\hat{β}}_{w} - β) \overset{D}{\to} N (0, Λ_{1}^{- 1} Λ_{2} Λ_{1}^{- 1}), n \to \infty,

where

Λ_{1} = E \{σ^{- 2} (U_{1}) {[Z_{1} - Ξ^{⊤} (U_{1}) Ψ^{⊤} {(Ψ Π (U_{1}) Ψ^{⊤})}^{- 1} Ψ ζ_{1}]}^{⨂ 2}\}

,

Λ_{2} = E {π {(V_{1}, w)}^{- 1} \times σ^{- 4} (U_{1}) {(e_{1}^{⊤} θ (U_{1}) + ε_{1})}^{2} {[Z_{1} - Ξ^{⊤} (U_{1}) Ψ^{⊤} {(Ψ Π (U_{1}) Ψ^{⊤})}^{- 1} Ψ ζ_{1}]}^{⨂ 2}},

and

H^{⨂ 2} = H H^{⊤}

.

Theorem 3.

Suppose that regularity conditions (C1)–(C9) hold; then, the proposed adaptive-weighted adjusted estimator of

θ (u_{0})

satisfies:

\sqrt{n h_{2}} [{\hat{θ}}_{w} (u_{0}, {\hat{β}}_{w}) - θ (u_{0}) - \frac{1}{2} h_{2}^{2} μ_{2} θ^{″} (u_{0})] \overset{D}{\to} N (0, Λ (u_{0})), n \to \infty,

where

Λ (u_{0}) = v_{0} f^{- 1} (u_{0}) E \{π {(V_{1}, w)}^{- 1} {(e_{1}^{⊤} θ (U_{1}) + ε_{1})}^{2}\} {(Ψ Π (u_{0}) Ψ^{⊤})}^{- 1},

μ_{2} = \int u^{2} K_{h_{2}} (u) d u, v_{0} = \int K_{h_{2}}^{2} (u) d u .

Theorems 2 and 3 give the asymptotic distribution of our proposed adaptive-weighted adjusted estimators. These results can be utilized to conduct statistical inference for unknown parameters and functional coefficients. Additionally, the above theorems expand the application scale of semi-varying coefficient models to satisfy the modeling requirements of applications. By assuming that the missing responses and endogenous covariates are nonexistent, the asymptotic variance of our proposed estimators possess the same structure as that of the estimators in [13]. On the other hand, when the missing response and heteroscedasticity are not considered, the asymptotic variance of our proposed estimators is the same as that of the estimators in [16].

3. Simulation Studies

In this section, we carry out some simulations to evaluate the finite sample performance of the proposed adaptive-weighted adjusted estimation method. We generate the data from the semi-varying coefficient heteroscedastic instrumental variables model:

Y = Z_{1} β_{1} + Z_{2} β_{2} + X θ (U) + ε,

where the explanatory variables

Z_{1}

and

Z_{2}

are both independently drawn from

N (2, 1)

, the univariate U is independently drawn from

U (0, 1)

, the explanatory variable X is an endogenous variable generated from the model

X = ζ + k ε

, where

ζ

in an instrumental variable generated from normal distribution

N (1, 1)

, and k is taken as 0.2 and 0.4 to represent different levels of endogeneity. We set the parameters

β_{1} = 1.5, β_{2} = 2, θ (U) = sin (2 π U)

. The model error

ε

∼

N (0, σ^{2} (U))

with

σ^{2} (U) = 0.25 + {[c sin (2 π U)]}^{2}

for

c = 2, 4

, respectively. The Gaussian kernel

K (x) = 1 / \sqrt{2 π} exp (- x^{2} / 2)

is adopted. The leave one out cross-validation (LOOCV) method is applied to choose

h_{1}

, which is derived by minimizing

CV (h_{1}) = \frac{1}{n} \sum_{i = 1}^{n} δ_{i} [Y_{i} - X_{i} {\hat{θ}}_{[- i]} (U_{i}) - Z_{i}^{⊤} {\hat{β}}_{[- i]}],

where

{\hat{β}}_{[- i]}

and

{\hat{θ}}_{[- i]} (\cdot)

are the adjusted profile least-squares estimators, which are given in (11) and (12), respectively. We choose bandwidths

\tilde{h}

and

h_{2}

by a similar method. To compare the performance of the proposed adaptive-weighted adjusted estimators under different missing probabilities, two selection probability functions are chosen as follows:

\begin{matrix} π_{1} (x, u, z) = & P (δ = 1 | X = x, U = u, Z = (z_{1}, z_{2})) \\ = & {[1 + exp (- 1 + 1.1 x - z_{1} - 0.4 z_{2} - 0.7 u)]}^{- 1}, \\ π_{2} (x, u, z) = & P (δ = 1 | X = x, U = u, Z = (z_{1}, z_{2})) \\ = & {[1 + exp (- 1 + 1.5 x - 0.8 z_{1} - 0.4 z_{2} - 0.5 u)]}^{- 1} . \end{matrix}

The corresponding average response rates are about 0.9 and 0.8 when

c = 2

and

k = 0.2

. To show the performance of our proposed adaptive-weighted adjusted profile least square estimation based on the instrumental variable adjustment technique (represented as IAWPLS), we contrast it with the two approaches below: (1) the naive adaptive-weighted profile least square estimation, denoted by NAWPLS; (2) the instrumental variable weighted profile least square estimation, denoted by IWPLS. The former omits the endogeneity of the explanatory covariate, and it is derived by combining the inverse probability-weighted method and adaptive-weighted profile least squares method in [13]. The latter ignores the heteroscedasticity of the model error and combines the inverse probability-weighted method and instrumental variable adjustment method in [16]. We set the sample size as 50, 100, 200, and 300. The results are based on 500 replications for each case. For parametric components, we use the following measures to compare the performance of different methods: (1) Mean: the average of the estimated value; (2) MSE: mean square errors for the corresponding estimators. The results are shown in Table 1 and Table 2 for

π = π_{1}, π_{2}

,

c = 2, 4

,

k = 0.2, 0.4

and

n = 50, 100, 200, 300

, respectively.

According to Table 1 and Table 2, we have the following results.

(1): The IAWPLS and IWPLS estimators are asymptotically unbiased, but the NAWPLS estimators are biased. For fixed $π, c, k$ , with the increase of n, the MSEs of all three estimators decrease.
(2): For fixed $π, c, k$ , when the sample size $n = 50$ , the MSEs of our proposed IAWPLS estimators are slightly larger than those of NAWPLS in some cases, but obviously smaller than those of the NAWPLS and IWPLS estimators when the sample size is greater than 100.
(3): For fixed $c, k, n$ , with the increase in the missing probability, the MSEs of all three estimators increase.
(4): For fixed $π, k, n$ , with the increase in c, the MSEs of all three estimators increase.
(5): For fixed $π, c, n$ , with the increase in k, the MSEs of all three estimators increase.

Subsequently, we further consider the behavior of the adaptive-weighted adjusted estimation method for the variance functions and functional coefficients. The corresponding estimated values are computed at

n = 200

equally spaced values

U_{i} = i / n \in [0, 1]

, and the mean value of 500 simulations at every point

U_{i}

is taken for the ultimate estimated values. Due to the similarity of the estimated curves for different sample sizes and missing probabilities, we only plot the estimated curves of the variance functions and functional coefficients when

c = 2, 4

,

k = 0.2, 0.4

,

n = 200

, and

π = π_{1}

. The estimated curves are shown in Figure 1 and Figure 2. To demonstrate the effectiveness of the proposed estimation method for the variance function, two methods are taken for contrast: (1) the adjusted Nadaraya–Watson kernel estimation method based on instrumental variable adjustment techniques; (2) the naive Nadaraya–Watson kernel estimation method, which ignores the endogeneity of explanatory variables and uses the standard Nadaraya–Watson kernel estimation.

Figure 1 shows that the proposed adjusted Nadaraya–Watson kernel estimators are asymptotically unbiased, but the naive Nadaraya–Watson kernel estimators are biased, and the deviation increases with the increase of c or k. Note that the performance of our proposed adjusted Nadaraya–Watson kernel estimators may be affected by larger c and k. From Figure 2, we find that the estimated curves obtained by the IAWPLS and IWPLS methods both approach the true curves, but the estimated curves obtained by the NAWPLS method are biased, and the deviation increases with the increase of c or k.

Owing to the estimated curves obtained by the IAWPLS and IWPLS methods being close to each other, we further utilize the root mean squared error (RMSE) to evaluate the effectiveness for functional coefficients:

RMSE = {\{\frac{1}{N} \sum_{k = 1}^{N} {∥ \hat{θ} (U_{k}) - θ (U_{k}) ∥}^{2}\}}^{1 / 2},

where

U_{k} (k = 1, 2, \dots, N)

are the selected split points defined on the bounded support

U

. In this case, we set

N = 100

and assume that

U_{k}

takes equal intervals on the interval

[0, 1]

. We calculate RMSE of the functional coefficients estimators. Results are presented in Table 3 for

π = π_{1}, π_{2}

,

c = 2, 4

,

k = 0.2, 0.4

and

n = 50, 100, 200, 300

.

According to Table 3, we find that the proposed IAWPLS estimation method for functional coefficients has smaller RMSEs than those of NAWPLS and IWPLS methods for given

π, k, c, n

. The RMSEs of all three estimators decrease with the increase of n. Nevertheless, the RMSEs increase with the increase of

c, k

or missing probability.

4. Real Data Analysis

We applied our adaptive-weighted adjusted estimation method to the National Longitudinal Survey of Young Men (NLSYM) dataset, which includes 3010 samples from 1976. This dataset has been used widely to analyze the endogeneity issues for parametric and semi-parametric models, such as in [18,21,23,34]. We aim to study the potential relationship between the log of hourly wage in cents

(Y)

and six other explanatory variables: the years of schooling

(Z_{1})

, the dummy variable black

(Z_{2})

, south

(Z_{3})

, the standard metropolitan statistical area (

Z_{4}

, smsa), age

(U)

, and work experience

(X)

, constructed as

U - Z_{1} - 6

; further details regarding the variables in the dataset can be found in [34]. Similar to [20,34], we constructed the following semi-varying coefficient model:

Y = X θ (U) + Z_{1} β_{1} + Z_{2} β_{2} + Z_{3} β_{3} + Z_{4} β_{4} + ε .

Based on the idea of [34], we took a four-year university as the corresponding instrumental variable since the years of schooling are not randomly assigned or endogenous. For the missing data, we used the following selection probability model to randomly delete approximately 11% of the responses:

\begin{matrix} π (x, u, z) = & P (δ = 1 | X = x, U = u, Z = (z_{1}, z_{2}, z_{3}, z_{4})) \\ = & {[1 + exp (- 1 + 1.5 x - 0.4 z_{1} - 0.4 z_{2} - 0.5 z_{3} - 0.7 z_{4} - 0.5 u)]}^{- 1} . \end{matrix}

In addition, the Gaussian kernel function was chosen in this case, and the bandwidths

h_{1}, h_{2}

, and

\tilde{h}

were all selected as 0.3 for ease of calculation. We first applied the adjusted profile least-squares method to obtain the initial estimators for unknown parameters and functional coefficients. Then, the model fitting values

\hat{Y}

and residuals

Y - \hat{Y}

could be derived using a simple calculation, and a scatter plot of

\hat{Y}

to

Y - \hat{Y}

is presented in Figure 3.

Figure 3 demonstrates that the model residuals show a certain linear trend instead of random variation. Therefore, we concluded that there should be a heteroscedasticity structure. To further reveal the heteroscedasticity, an adjusted Nadaraya–Watson kernel estimation method is suggested for estimating the variance function, and the estimated curve is presented in Figure 4.

From Figure 4, we found that the variance function shows a significant downward trend with the increase of age U, indicating the existence of heteroscedasticity in this specified model. Then, we employed the proposed adaptive-weighted adjusted estimation method to estimate unknown parameters and functional coefficients. For comparison, the three estimation methods IAWPLS, NAWPLS, and IWPLS were included, and the simulation results for unknown parameters and functional coefficients are shown in Table 4 and Figure 5, respectively.

According to Table 4 and Figure 5, we found that although the IAWPLS estimators and IWPLS estimators are relatively close to each other, the IWPLS estimation method slightly overestimated the parameters vector. In addition, there is a significant deviation in the NAWPLS estimators compared with the other estimators. Overall, our proposed estimation method can effectively eliminate the endogeneity and heteroscedasticity for semi-varying coefficient models with missing data.

5. Conclusions

In this paper, we study an estimation problem for semi-varying coefficient instrumental variable models with missing data, and the model error is subject to heteroscedasticity simultaneously. An adaptive-weighted adjusted estimation procedure is proposed based on the instrumental variable adjustment technique. The consistency of the variance function estimator is established, and the asymptotic distribution of unknown parameters and functional coefficient estimators is also demonstrated under some regular conditions. Moreover, numerous simulation studies and a NLSYM data analysis further demonstrate the effectiveness of the proposed estimation method. However, we only discuss the estimation problem of the semi-varying coefficient instrumental variables model with heteroscedasticity and missing data in this study. More interesting research topics can be explored in the future, including variable selection and model averaging issues. In addition, high-dimensional data have become a focus in statistical research. How to develop statistical inference methods and theories for a semi-varying coefficient heteroscedastic instrumental variables model with high-dimensional data is an interesting research direction. These issues will be studied in future work.

Author Contributions

Conceptualization, W.Z., S.M. and J.L.; Methodology, W.Z., S.M. and J.L.; Validation, W.Z., S.M. and J.L.; Formal analysis, W.Z.; Investigation, W.Z.; Data curation, W.Z.; Writing—original draft preparation, W.Z.; Writing—review and editing, W.Z., S.M. and J.L.; Supervision, W.Z., S.M. and J.L.; Project administration, W.Z. and S.M.; Funding acquisition, W.Z. and S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Inner Mongolia Autonomous Region of China (Nos. 2023QN01001, 2022MS07014), the National Natural Science Foundation of China (No. 12271046 and 71661027), and the Research Program of Humanities and Social Sciences at Universities of Inner Mongolia Autonomous Region of China (No. NJSY22497).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available to protect sensitive information.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Lemma A1

(Mack and Silverman [35]). Suppose that

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

are i.i.d. random vectors, where

Y_{i} (i = 1, \dots, n)

are scalar random variables. Assume that

E | Y_{i} |^{s} < \infty

and

{sup}_{x} \int {| y |}^{s} f (x, y) d y < \infty

, where

f (x, y)

denotes the joint probability density of random variables

X

and Y, and

K (\cdot)

is a bounded positive function and defined on a bounded support. Moreover,

K (\cdot)

satisfies a Lipschitz condition. Given

n^{2 r - 1} h \to \infty

for some

r < 1 - s^{- 1}

, we have:

sup_{x} |\frac{1}{n} \sum_{i = 1}^{n} [K_{h} (X_{i} - x) Y_{i} - E (K_{h} (X_{i} - x) Y_{i})]| = O_{p} ({\{\frac{log (1 / h)}{n h}\}}^{1 / 2}) .

Lemma A2

(Shi and Lau [36]). Let

T_{1}, \dots, T_{n}

be i.i.d. random variables. If

E | T_{i} |^{s}

is bounded for

s > 1

, then

{max}_{1 \leq i \leq n} | T_{i} | = o_{p} (n^{1 / s}), a . s .

Lemma A3

(Chen et al. [31]). Let

τ_{i} = {(1, X_{i}^{⊤}, U_{i}, Z_{i}^{⊤})}^{⊤}

,

λ_{min}

denotes the least eigenvalue of

\sum_{i = 1}^{n} τ_{i} τ_{i}^{⊤},

and assume that

{sup}_{i \geq 1} ∥ τ_{i} ∥ < \infty

and

λ_{min} \to \infty;

then, the quasi-likelihood estimation

\hat{w} = {({\hat{w}}_{0}, {\hat{w}}_{1}, {\hat{w}}_{2}^{⊤}, {\hat{w}}_{3}^{⊤})}^{⊤}

of

w = {(w_{0}, w_{1}, w_{2}^{⊤}, w_{3}^{⊤})}^{⊤}

satisfies:

\sqrt{n} (\hat{w} - w) = A^{- 1} n^{- \frac{1}{2}} \sum_{i = 1}^{n} τ_{i} (δ_{i} - π_{i}) + o_{p} (1),

where

A = E [τ_{1} τ_{1}^{⊤} π_{1} (1 - π_{1})] .

Lemma A4.

Under regularity conditions (C1)–(C9), we have:

max_{1 \leq i \leq n} |\frac{δ_{i}}{π (V_{i}, \hat{w})} - \frac{δ_{i}}{π (V_{i}, w)}| = o_{p} (n^{- \frac{1}{2} + \frac{1}{2 s}}) .

Proof of Lemma A4.

According to Lemma A3, we use the first-order Taylor expansion

δ_{i} / π (V_{i}, \hat{w})

at

w

,

\begin{matrix} \frac{δ_{i}}{π (V_{i}, \hat{w})} = & \frac{δ_{i}}{π (V_{i}, w)} + {[\frac{δ_{i}}{π (V_{i}, w)}]}^{'} (\hat{w} - w) + o_{p} (| \hat{w} - w |) \\ = & \frac{δ_{i}}{π (V_{i}, w)} - \frac{δ_{i} π^{'} (V_{i}, w)}{π^{2} (V_{i}, w)} (\hat{w} - w) + o_{p} (1) . \end{matrix}

By Condition (C9) and Lemma A2, we have:

|\frac{δ_{i}}{π (V_{i}, \hat{w})} - \frac{δ_{i}}{π (V_{i}, w)}| \leq max |τ_{i}^{⊤}| |\frac{- δ_{i} [1 - π (V_{i}, w)]}{π (V_{i}, w)}| |(\hat{w} - w)| + o_{p} (1) = o_{p} (n^{- \frac{1}{2} + \frac{1}{2 s}}) .

This completes the proof of Lemma A4. □

Lemma A5.

Under regularity conditions (C1)–(C9), as

n \to \infty

, it holds that:

\frac{1}{n} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) = σ^{- 2} (u_{0}) f (u_{0}) Ψ Π (u_{0}) Ψ^{⊤} \otimes (\begin{matrix} 1 & 0 \\ 0 & μ_{2} \end{matrix}) {1 + O_{p} (c_{n_{2}})},

(A1)

\frac{1}{n} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} Z = σ^{- 2} (u_{0}) f (u_{0}) Ψ Ξ (u_{0}) \otimes {(1 0)}^{⊤} {1 + O_{p} (c_{n_{2}})},

(A2)

\frac{1}{n} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} M = σ^{- 2} (u_{0}) f (u_{0}) Ψ Π (u_{0}) Ψ^{⊤} θ (u_{0}) \otimes {(1 0)}^{⊤} {1 + O_{p} (c_{n_{2}})},

(A3)

\frac{1}{n} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} ε = O_{p} (c_{n_{2}}),

(A4)

where ⊗ denotes the Kronecker product.

Proof of Lemma A5.

For any

u_{0} \in U

, by some simple calculation, we have:

\frac{1}{n} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) = \frac{1}{n} (\begin{matrix} \sum_{i = 1}^{n} B_{i} & \sum_{i = 1}^{n} B_{i} h_{2}^{- 1} (U_{i} - u_{0}) \\ \sum_{i = 1}^{n} B_{i} h_{2}^{- 1} (U_{i} - u_{0}) & \sum_{i = 1}^{n} B_{i} h_{2}^{- 2} {(U_{i} - u_{0})}^{2} \end{matrix}),

where

B_{i} = [δ_{i} / π (V_{i}, \hat{w})] {\hat{σ}}^{- 2} (U_{i}) {\hat{X}}_{i} {\hat{X}}_{i}^{⊤} K_{h_{2}} (U_{i} - u_{0})

.

We first consider the term of the upper-left corner of the matrix. It is noteworthy that

\hat{Ψ}

is the usual moment estimator of

Ψ

. According to [16], we obtain

\hat{Ψ} = Ψ + O_{p} (n^{- 1 / 2})

. Therefore, combining Theorem 1, Lemma A4 and condition (C8), we can obtain:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} {\hat{σ}}^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, \hat{w})} {\hat{X}}_{i} {\hat{X}}_{i}^{⊤} K_{h_{2}} (U_{i} - u_{0}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w)} Ψ ζ_{i} ζ_{i}^{⊤} Ψ^{⊤} K_{h_{2}} (U_{i} - u_{0}) \\ + \frac{1}{n} \sum_{i = 1}^{n} [{\hat{σ}}^{- 2} (U_{i}) - σ^{- 2} (U_{i})] \frac{δ_{i}}{π (V_{i}, w)} Ψ ζ_{i} ζ_{i}^{⊤} Ψ^{⊤} K_{h_{2}} (U_{i} - u_{0}) \\ + \frac{1}{n} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) [\frac{δ_{i}}{π (V_{i}, \hat{w})} - \frac{δ_{i}}{π (V_{i}, w)}] Ψ ζ_{i} ζ_{i}^{⊤} Ψ^{⊤} K_{h_{2}} (U_{i} - u_{0}) \\ + \frac{1}{n} \sum_{i = 1}^{n} [{\hat{σ}}^{- 2} (U_{i}) - σ^{- 2} (U_{i})] [\frac{δ_{i}}{π (V_{i}, \hat{w})} - \frac{δ_{i}}{π (V_{i}, w)}] Ψ ζ_{i} ζ_{i}^{⊤} Ψ^{⊤} K_{h_{2}} (U_{i} - u_{0}) + O_{p} (n^{- 1 / 2}) \\ = & σ^{- 2} (u_{0}) f (u_{0}) Ψ Π (u_{0}) Ψ^{⊤} + o_{p} ({\tilde{c}}_{n}) + o_{p} (n^{- \frac{1}{2} + \frac{1}{2 s}}) + o_{p} ({\tilde{c}}_{n}) o_{p} (n^{- \frac{1}{2} + \frac{1}{2 s}}) + O_{p} (n^{- 1 / 2}) \\ = & σ^{- 2} (u_{0}) f (u_{0}) Ψ Π (u_{0}) Ψ^{⊤} + O_{p} ({\tilde{c}}_{n_{2}}) . \end{matrix}

By using the same argument as above, other terms in the matrix can be proven, which lead to the result of (A1). The proofs of (A2)–(A4) are similar to those of (A1). We omit the details here. □

Lemma A6.

Under regularity conditions (C1)–(C9), as

n \to \infty

, we have:

\frac{1}{n} {\hat{Z}}^{⊤} Σ^{- 1} Δ {\hat{Z}}^{⊤} \to Λ_{1}, a . s .,

where

Σ = diag [σ^{2} (U_{1}), σ^{2} (U_{2}), \dots, σ^{2} (U_{n})], Δ = diag [\frac{δ_{1}}{π (V_{1}, {\hat{w}}_{1})}, \dots, \frac{δ_{n}}{π (V_{n}, {\hat{w}}_{n})}]

,

Λ_{1}

is given in Theorem 2.

Proof of Lemma A6.

Invoking (A1) and (A2), it is easy to show that:

\begin{matrix} ({\hat{X}}^{⊤} 0) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} Z \\ = & {(Ψ ζ)}^{⊤} {(Ψ Π (u_{0}) Ψ^{⊤})}^{- 1} Ψ Ξ (u_{0}) {1 + O_{p} (c_{n_{2}})} . \end{matrix}

Then, we have:

\hat{S} Z = (\begin{matrix} {(Ψ ζ_{1})}^{⊤} {(Ψ Π (U_{1}) Ψ^{⊤})}^{- 1} Ψ Ξ (U_{1}) \\ ⋮ \\ {(Ψ ζ_{n})}^{⊤} {(Ψ Π (U_{n}) Ψ^{⊤})}^{- 1} Ψ Ξ (U_{n}) \end{matrix}) {1 + O_{p} (c_{n_{2}})} .

(A5)

Then, invoking (A5), some calculations yield:

\frac{1}{n} {\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Z} = \frac{1}{n} {(Z - \hat{S} Z)}^{⊤} Σ^{- 1} Δ (Z - \hat{S} Z) = \frac{1}{n} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w_{i})} η_{i} η_{i}^{⊤} + O_{p} (c_{n_{2}}),

where

η_{i} = Z_{i} - Ξ (U_{i}) Ψ^{⊤} {(Ψ Π (U_{i}) Ψ^{⊤})}^{- 1} Ψ ζ_{i}

. Then, by a law of large numbers, Lemma A6 can be easily proven. □

Lemma A7.

Under regularity conditions (C1)–(C9), then as

n \to \infty

, we can obtain:

n^{- 1 / 2} {\hat{Z}}^{⊤} Σ^{- 1} Δ ({\hat{M}}_{w} + \hat{ε}) \overset{D}{\to} N (0, Λ_{2}), n \to \infty,

where

\hat{ε} = (I - \hat{S}) ε, {\hat{M}}_{w} = (I - \hat{S}) M

is given in Theorem 2.

Proof of Lemma A7.

Invoking (A1) and (A3), it is easy to check that:

\begin{matrix} ({\hat{X}}^{⊤} 0) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} M \\ = & {(Ψ ζ)}^{⊤} θ (u_{0}) {1 + O_{p} (c_{n_{2}})} . \end{matrix}

Then, invoking (A5), some calculations yield:

\begin{matrix} n^{- 1 / 2} {\hat{Z}}^{⊤} Σ^{- 1} Δ {\hat{M}}_{w} = n^{- 1 / 2} Z^{⊤} {(I - \hat{S})}^{⊤} Σ^{- 1} Δ (I - \hat{S}) M \\ = & n^{- 1 / 2} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w_{i})} η_{i} (1 + O_{p} (c_{n_{2}})) [X_{i}^{⊤} θ (U_{i}) - {(Ψ ζ_{i})}^{⊤} θ (U_{i}) (1 + O_{p} (c_{n_{2}}))] \\ = & n^{- 1 / 2} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w_{i})} η_{i} {(Ψ ζ_{i})}^{⊤} θ (U_{i}) O_{p} (c_{n_{2}}) + n^{- 1 / 2} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w_{i})} η_{i} e_{i}^{⊤} θ (U_{i}) . \end{matrix}

Note that

E [η_{i} {(Ψ ζ_{i})}^{⊤} θ (U_{i}) | U_{i}] = 0

, and then we can prove:

n^{- 1 / 2} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w_{i})} η_{i} {(Ψ ζ_{i})}^{⊤} θ (U_{i}) O_{p} (c_{n_{2}}) = O_{p} (n^{1 / 2} c_{n_{2}}^{2}) .

Therefore, we derive:

n^{- 1 / 2} {\hat{Z}}^{⊤} Σ^{- 1} Δ {\hat{M}}_{w} = n^{- 1 / 2} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w_{i})} η_{i} e_{i}^{⊤} θ (U_{i}) + O_{p} (n^{1 / 2} c_{n_{2}}^{2}) .

(A6)

In addition, invoking (A1) and (A4), we have:

({\hat{X}}^{⊤} 0) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} ε = O_{p} (c_{n_{2}}) .

Then, invoking (A5), some calculations yield:

\begin{matrix} n^{- 1 / 2} {\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{ε} & = n^{- 1 / 2} Z^{⊤} {(I - \hat{S})}^{⊤} Σ^{- 1} Δ (I - \hat{S}) ε \\ = n^{- 1 / 2} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w_{i})} η_{i} ε_{i} + O_{p} (c_{n_{2}}) . \end{matrix}

(A7)

Hence, combining (A6), (A7), and condition (C8), with the help of the central limit theorem, the result of Lemma A7 can be obtained. □

The proof of Theorem 1 is similar to that of Theorem 1 in [9]. We omit the details here.

Proof of Theorem 2.

Note that:

\sqrt{n} ({\hat{β}}_{w} - β) = \sqrt{n} ({\hat{β}}_{w} - {\hat{β}}_{v}) + \sqrt{n} ({\hat{β}}_{v} - β),

where

{\hat{β}}_{v} = {({\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Z})}^{- 1} {\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Y}

. Then, we need to complete the proof of

\sqrt{n} ({\hat{β}}_{v} - β) \overset{D}{\to} N (0, Λ_{1}^{- 1} Λ_{2} Λ_{1}^{- 1}), n \to \infty

(A8)

and

\sqrt{n} ({\hat{β}}_{w} - {\hat{β}}_{v}) = o_{p} (1) .

(A9)

Multiplying both sides of model (1) by

(I - \hat{S})

, we obtain:

\hat{Y} = \hat{Z} β + {\hat{M}}_{w} + \hat{ε} .

Then, we have:

\sqrt{n} ({\hat{β}}_{v} - β) = {(n^{- 1} {\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Z})}^{- 1} n^{- 1 / 2} {\hat{Z}}^{⊤} Σ^{- 1} Δ ({\hat{M}}_{w} + \hat{ε}) .

Invoking Lemma A6 and Lemma A7, we can derive the result of (A8) using the Slutsky Theorem. Since:

\begin{matrix} \sqrt{n} ({\hat{β}}_{w} - {\hat{β}}_{v}) = & \sqrt{n} [{({\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Z})}^{- 1} {\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Y} - {({\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Z})}^{- 1} {\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Y}] \\ = & \sqrt{n} [{({\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Z})}^{- 1} - {({\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Z})}^{- 1}] {\hat{Z}}^{⊤} Σ^{- 1} Δ ({\hat{M}}_{w} + \hat{ε}) \\ + \sqrt{n} {({\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Z})}^{- 1} ({\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} - {\hat{Z}}^{⊤} Σ^{- 1} Δ) ({\hat{M}}_{w} + \hat{ε}) . \end{matrix}

Thus, to obtain the result of (A9), we have to prove:

\begin{matrix} n^{- 1} ({\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Z} - {\hat{Z}}^{⊤} Σ^{- 1} Δ \hat{Z}) = o_{p} (1), n^{- 1 / 2} {\hat{Z}}^{⊤} Σ^{- 1} Δ ({\hat{M}}_{w} + \hat{ε}) = O_{p} (1), \\ n^{- 1} {\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} \hat{Z} = O_{p} (1), n^{- 1 / 2} ({\hat{Z}}^{⊤} {\hat{Σ}}^{- 1} \hat{Δ} - {\hat{Z}}^{⊤} Σ^{- 1} Δ) ({\hat{M}}_{w} + \hat{ε}) = o_{p} (1) . \end{matrix}

Usng a similar method to the proof of Theorem 3 in [9], it is easy to prove the above conclusions under regular conditions (C1)–(C9), so the details are omitted here. Then, combining (A8) and (A9) yields the result in Theorem 2. □

Proof of Theorem 3.

Recall the definition of

{\hat{θ}}_{w} (u_{0}, {\hat{β}}_{w})

in (21); we have:

\begin{matrix} {\hat{θ}}_{w} (u_{0}, {\hat{β}}_{w}) = & (I_{p}, 0_{p}) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} (Y - Z {\hat{β}}_{w}) \\ = & (I_{p}, 0_{p}) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} M_{ζ} \\ + (I_{p}, 0_{p}) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} Z (β - {\hat{β}}_{w}) \\ + (I_{p}, 0_{p}) {{\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} {\hat{X}}_{h_{2}} (u_{0})}^{- 1} {\hat{X}}_{h_{2}}^{⊤} (u_{0}) w_{h_{2}} (u_{0}) {\hat{Σ}}^{- 1} \hat{Δ} (ε + e) \\ ≜ & K_{1} + K_{2} + K_{3}, \end{matrix}

where:

\begin{matrix} M_{ζ} = [{(Ψ ζ_{1})}^{⊤} & θ (U_{1}), {(Ψ ζ_{2})}^{⊤} θ (U_{2}), \dots, {(Ψ ζ_{n})}^{⊤} θ (U_{n})]^{⊤}, \\ e = & {(e_{1}^{⊤} θ (U_{1}), e_{2}^{⊤} θ (U_{2}), \dots, e_{n}^{⊤} θ (U_{n}))}^{⊤} . \end{matrix}

Let us consider

K_{1}

first; for any point

U_{i}

in the neighborhood of

u_{0}

, each functional coefficients

θ (U_{i})

can be approximated by:

θ (U_{i}) = θ (u_{0}) + h_{2} θ^{'} (u_{0}) \frac{U_{i} - u_{0}}{h_{2}} + \frac{h_{2}^{2}}{2} θ^{″} (u_{0}) {(\frac{U_{i} - u_{0}}{h_{2}})}^{2} + o_{p} (h_{2}^{2}) .

Then, we have:

M_{ζ} = (\begin{matrix} {(Ψ ζ_{1})}^{⊤} θ (U_{1}) \\ ⋮ \\ {(Ψ ζ_{n})}^{⊤} θ (U_{n}) \end{matrix}) = {\hat{X}}_{h_{2}}^{⊤} (u_{0}) (\begin{matrix} θ (u_{0}) \\ h_{2} θ^{'} (u_{0}) \end{matrix}) + \frac{h_{2}^{2}}{2} (\begin{matrix} {\hat{X}}_{1}^{⊤} {(\frac{U_{1} - u_{0}}{h_{2}})}^{2} \\ ⋮ \\ X_{n}^{⊤} {(\frac{U_{n} - u_{0}}{h_{2}})}^{2} \end{matrix}) θ^{″} (u_{0}) + o_{p} (h_{2}^{2}) .

By Lemma 1, we obtain:

K_{1} = θ (u_{0}) + \frac{h_{2}^{2}}{2} μ_{2} θ^{″} (u_{0}) + o_{p} (h_{2}^{2}) .

(A10)

For

K_{2}

, combining (A1), (A2), Theorem 2, and condition (C8), we can obtain that:

\sqrt{n h_{2}} K_{2} = \sqrt{n h_{2}} {(Ψ Π (u_{0}) Ψ^{⊤})}^{- 1} Ψ Ξ (u_{0}) {1 + O_{p} (c_{n_{2}})} O_{p} (n^{- 1 / 2}) = o_{p} (1) .

(A11)

Now, we consider

K_{3}

. Combining (A1), Theorem 1, and Lemma 4, we can derive:

\begin{matrix} \sqrt{n h_{2}} K_{3} = & σ^{2} (u_{0}) f^{- 1} (u_{0}) {(Ψ Π (u_{0}) Ψ^{⊤})}^{- 1} \\ \times \sqrt{n h_{2}} \frac{1}{n} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w)} {\hat{X}}_{i} K_{h_{2}} (U_{i} - u_{0}) (ε_{i} + e_{i}^{⊤} θ (U_{i})) + o_{p} (1) . \end{matrix}

Since

\sqrt{n h_{2}} \frac{1}{n} \sum_{i = 1}^{n} σ^{- 2} (U_{i}) \frac{δ_{i}}{π (V_{i}, w)} {\hat{X}}_{i} K_{h_{2}} (U_{i} - u_{0}) (ε_{i} + e_{i}^{⊤} θ (U_{i}))

follows an asymptotic normal distribution with mean zero and the variance function

v_{0} f (u_{0}) σ^{- 4} (u_{0}) E {π {(V_{1}, w)}^{- 1} {(e_{1}^{⊤} θ (U_{1}) + ε_{1})}^{2}} Ψ Π (u_{0}) Ψ^{⊤} .

Then, by the Slutsky Theorem, we have:

\sqrt{n h_{2}} K_{3} \overset{D}{\to} v_{0} f^{- 1} (u_{0}) E {π {(V_{1}, w)}^{- 1} {(e_{1}^{⊤} θ (U_{1}) + ε_{1})}^{2}} {(Ψ Π (u_{0}) Ψ^{⊤})}^{- 1} .

(A12)

Combining (A10)–(A12) together with the Slutsky Theorem yields the result in Theorem 3. □

References

Zhang, W.Y.; Lee, S.Y.; Song, X.Y. Local polynomial fitting in semi-varying coefficient models. J. Multivar. Anal. 2002, 82, 166–188. [Google Scholar] [CrossRef]
Zhou, X.; You, J.H. Wavelet estimation in varying-coefficient partially linear regression model. Stat. Probab. Lett. 2004, 68, 91–104. [Google Scholar] [CrossRef]
Fan, J.Q.; Huang, T. Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 2005, 11, 1031–1057. [Google Scholar] [CrossRef]
Zhao, P.X.; Xue, L.G. Variable selection for semiparametric varying coefficient partially linear models. Stat. Probab. Lett. 2009, 79, 2148–2157. [Google Scholar] [CrossRef]
Kai, B.; Li, R.Z.; Zou, H. New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann. Stat. 2011, 39, 305–332. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Lu, F.; Yang, H. Quantile regression for robust estimation and variable selection in partially linear varying-coefficient models. Stat. J. Theor. Appl. Stat. 2017, 51, 1–21. [Google Scholar] [CrossRef]
Li, Y.J.; Li, G.R.; Lian, H.; Tong, T.J. Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models. J. Multivar. Anal. 2017, 155, 133–150. [Google Scholar] [CrossRef]
Zhao, P.X.; Yang, Y.P. A new orthogonality-based estimation for varying-coefficient partially linear models. J. Korean Stat. Soc. 2019, 48, 29–39. [Google Scholar] [CrossRef]
Shen, S.L.; Cui, J.L.; Mei, C.L.; Wang, C.L. Estimation and inference of semi-varying coefficient models with heteroscedastic errors. J. Multivar. Anal. 2014, 124, 70–93. [Google Scholar] [CrossRef]
Zhao, Y.Y.; Lin, J.G.; Xu, P.R.; Ye, X.G. Orthogonality-projection-based estimation for semi-varying coefficient models with heteroscedastic errors. Comput. Stat. Data Anal. 2015, 89, 204–221. [Google Scholar] [CrossRef]
Zhao, F.R.; Song, W.X.; Shi, J.H. Statistical inference for heteroscedastic semi-varying coefficient EV models. Commun. Stat.-Theory Methods 2018, 48, 2432–2455. [Google Scholar] [CrossRef]
Zhang, W.W.; Li, G.R. Weighted bias-corrected restricted statistical inference for heteroscedastic semiparametric varying-coefficient errors-in-variables model. J. Korean Stat. Soc. 2021, 50, 1098–1128. [Google Scholar] [CrossRef]
Yuan, Y.Z.; Zhou, Y. Adaptive-weighted estimation of semi-varying coefficient models with heteroscedastic errors. J. Stat. Comput. Simul. 2021, 91, 3029–3047. [Google Scholar] [CrossRef]
Greenland, S. An introduction to instrumental variables for epidemiologists. Int. J. Epidemiol. 2000, 29, 722–729. [Google Scholar] [CrossRef] [PubMed]
Fan, J.Q.; Liao, Y. Endogeneity in dimensions. Ann. Stat. 2014, 42, 872–917. [Google Scholar] [CrossRef]
Cai, Z.W.; Xiong, H.Y. Partially varying coefficient instrumental variables models. Stat. Neerl. 2012, 66, 85–110. [Google Scholar] [CrossRef]
Zhao, P.X.; Li, G.R. Modified SEE variable selection for varying coefficient instrumental variable models. Stat. Methodol. 2013, 12, 60–70. [Google Scholar] [CrossRef]
Zhao, P.X.; Xue, L.G. Empirical likelihood inferences for semiparametric instrumental variable models. J. Appl. Math. Comput. 2013, 43, 75–90. [Google Scholar] [CrossRef]
Yuan, J.Y.; Zhao, P.X.; Zhang, W.G. Semiparametric variable selection for partially varying coefficient models with endogenous variables. Comput. Stat. 2016, 31, 693–707. [Google Scholar] [CrossRef]
Zhao, P.X.; Zhou, X.S.; Wang, X.L.; Huang, X.S. A new orthogonality empirical likelihood for varying coefficient partially linear instrumental variable models with longitudinal data. Commun. Stat. Simul. Comput. 2020, 49, 3328–3344. [Google Scholar] [CrossRef]
Yao, F. Efficient semiparametric instrumental variable estimation under conditional heteroskedasticity. J. Quant. Econ. 2012, 10, 32–55. [Google Scholar]
Yang, Y.P.; Chen, L.F.; Zhao, P.X. Empirical likelihood inference in partially linear single-index models with endogenous covariates. Commun. Stat.-Theory Methods 2017, 46, 3297–3307. [Google Scholar] [CrossRef]
Huang, J.T.; Zhao, P.X. Orthogonal weighted empirical likelihood-based variable selection for semiparametric instrumental variable models. Commun. Stat.-Theory Methods 2018, 47, 4375–4388. [Google Scholar] [CrossRef]
Tang, X.R.; Zhao, P.X.; Yang, Y.P.; Yang, W.M. Adjusted empirical likelihood inferences for varying coefficient partially non linear models with endogenous covariates. Commun. Stat.-Theory Methods 2022, 51, 953–973. [Google Scholar] [CrossRef]
Rubin, D.B. Inference and missing data. Biometrika 1976, 63, 581–592. [Google Scholar] [CrossRef]
Robins, J.M.; Rotnitzky, A.; Zhao, L.P. Estimation of regression coefficient when some regressors are not always observed. J. Am. Stat. Assoc. 1994, 89, 846–866. [Google Scholar] [CrossRef]
Wang, Q.H.; Rao, J.N.K. Empirical likelhood-based inference in linear models with missing response data. Scand. J. Stat. 2002, 29, 563–576. [Google Scholar] [CrossRef]
Wang, Q.H.; Linton, O.; Härdle, W. Semiparametric regression analysis with missing response at random. J. Am. Stat. Assoc. 2004, 99, 334–345. [Google Scholar] [CrossRef]
Li, Z.Q.; Xue, L.G. The imputation estimators of semiparametric varying-coefficient models with missing data. Acta Math. Appl. Sin. 2009, 32, 422–430. (In Chinese) [Google Scholar]
Chen, P.P.; Feng, S.Y.; Xue, L.G. Statistical inference for semiparametric varying coefficient partially linear model with missing data. Acta Math. Sci. 2015, 35A, 345–358. (In Chinese) [Google Scholar]
Xu, H.X.; Fan, G.L.; Wu, C.X. Statistical inference for varying-coefficient partially linear errors-in-variables models with missing data. Commun. Stat.-Theory Methods 2019, 48, 5621–5636. [Google Scholar] [CrossRef]
Xiao, Y.T.; Li, F.X. Estimation in partially linear varying-coefficient errors-in-variables models with missing response variables. Comput. Stat. 2020, 35, 1637–1658. [Google Scholar] [CrossRef]
Yan, Y.X.; Lan, S.H.; Zhang, C.Y. Statistical inference for partially linear varying coefficient quantile models with missing responses. Symmetry 2022, 14, 2258. [Google Scholar] [CrossRef]
Card, D. Using Geographic Variation in College Proximity to Estimate the Return to Schooling; Nber Working Papers; University of Toronto Press: Toronto, ON, Canada, 1993; pp. 1127–1160. [Google Scholar]
Mack, Y.P.; Silverman, B.W. Weak and strong uniform consistency of kernel regression estimates. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1982, 61, 405–415. [Google Scholar] [CrossRef]
Shi, J.; Lau, T.S. Emprical likelihood for partially linear models. J. Multivar. Anal. 2000, 72, 132–148. [Google Scholar] [CrossRef]

Figure 1. Plot of the variance function estimates through the use of the adjusted Nadaraya–Watson kernel estimation method (denoted using dashed lines) and the naive Nadaraya–Watson kernel estimation method (denoted using dot-dashed lines); the solid lines represent the true curves.

Figure 2. Plot of the functional coefficient estimates through the use of the IAWPLS method (denoted using dot-dashed lines), the NAWPLS method (denoted using dashed lines), and the IWPLS method (denoted using dotted lines); the solid lines represent the true curves.

Figure 3. Plot of the model residuals for NLSYM data.

Figure 4. Plot of the variance function estimate for NLSYM data.

Figure 5. Plot of the functional coefficient estimates through the use of the IAWPLS method (solid lines), the NAWPLS method (dashed lines), and the IWPLS method (dotted lines) for NLSYM data.

Table 1. Sample means and MSEs for

β_{1}