Estimation and Inference for Spatio-Temporal Single-Index Models

Wang, Hongxia; Zhao, Zihan; Hao, Hongxia; Huang, Chao

doi:10.3390/math11204289

Open AccessArticle

Estimation and Inference for Spatio-Temporal Single-Index Models

by

Hongxia Wang

¹,

Zihan Zhao

¹,

Hongxia Hao

¹ and

Chao Huang

^2,*

¹

School of Statistics and Data Science, Nanjing Audit University, Nanjing 211815, China

²

Department of Statistics, Florida State University, Tallahassee, FL 32306, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(20), 4289; https://doi.org/10.3390/math11204289

Submission received: 17 August 2023 / Revised: 18 September 2023 / Accepted: 12 October 2023 / Published: 14 October 2023

(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)

Download

Browse Figure

Versions Notes

Abstract

:

To better fit the actual data, this paper will consider both spatio-temporal correlation and heterogeneity to build the model. In order to overcome the “curse of dimensionality” problem in the nonparametric method, we improve the estimation method of the single-index model and combine it with the correlation and heterogeneity of the spatio-temporal model to obtain a good estimation method. In this paper, assuming that the spatio-temporal process obeys the

α

mixing condition, a nonparametric procedure is developed for estimating the variance function based on a fully nonparametric function or dimensional reduction structure, and the resulting estimator is consistent. Then, a reweighting estimation of the parametric component can be obtained via taking the estimated variance function into account. The rate of convergence and the asymptotic normality of the new estimators are established under mild conditions. Simulation studies are conducted to evaluate the efficacy of the proposed methodologies, and a case study about the estimation of the air quality evaluation index in Nanjing is provided for illustration.

Keywords:

spatio-temporal correlation; spatio-temporal heterogeneity; reweighting estimation; local linear method; single-index models

MSC:

62H11

1. Introduction

Recently, statistical analysis of spatio-temporal data has been widely used in environmental science, economics and social science, and other fields, and it plays an important role in both academic and industrial circles. Spatio-temporal data estimation methods have also been developed. Spatio-temporal correlation and spatio-temporal heterogeneity are two significant features of spatio-temporal data.

In the process of solving practical problems, spatio-temporal data models usually consider these two properties. Various spatio-temporal modeling methods have been applied to explore the effect of spatio-temporal correlation and heterogeneity. For example, ref. [1] proposed multiscale geographically weighted regression (MGWR). The authors of [2] extended geographically and temporally weighted regression (GTWR) ([3]) to multiscale geographically and temporally weighted regression (MGTWR) in consideration of scale effects. The authors of [4] proposed the hetero-convolutional long short-term memory model based on the convolutional long short-term memory (ConvLSTM) neural network model. The authors of [5] adopted a local linear method to model spatio-temporal heterogeneity. A new nonparametric spatio-temporal inversion model was proposed by [6] for the gas emission problem considering both spatio-temporal heterogeneity and atmospheric inversion. Ref. [7] proposed to estimate the variance–covariance structure of the residuals by variography and removed the correlation by spatial filtering residuals.

In addition, the spatio-temporal trend is more complex, and spatio-temporal data can be regarded as a time series dataset with spatial information, which has the characteristics of being dynamic, massive, and high-dimensional. In order to accurately predict the change in the spatio-temporal trend function, this paper combines the idea of a semi-parametric model, adds the influence behavior of multidimensional independent variables on dependent variables, and considers the spatio-temporal error model. The single-index model is one of the most popular semi-parametric models. The spatio-temporal single-index model that we are interested in here is:

Y_{i, t} = g (β_{0}^{T} X_{i, t}) + ε_{i, t} i \in I_{n} = \prod_{k = 1}^{N} {1, \dots, n_{k}}, t \in T_{n} = {1, \dots, n_{N + 1}},

(1)

where

Y_{i, t} \in R

,

X_{i, t} \in R^{d}

,

i = (i_{1} \dots i_{N})

represents the spatial location, t represents the time,

n = (n_{1}, \dots, n_{N + 1})

,

g (\cdot)

is an unknown link function,

β_{0}

is an unknown coefficient with

β_{0}^{T} β_{0} = 1

and the first component of

β_{0}

is positive [8], and

ε

is a random error which has

E (ε | X) = 0

almost surely.

The single-index model is one of the most popular semi-parametric models in applied statistics. Many authors have studied the estimation of the index coefficient

β_{0}

, focusing on issues of

\sqrt{n}

estimability and efficiency. The methods include the average derivative method [9,10], the local linear method [11,12], the least squares method [13,14], the functional additive regression [15], and the empirical likelihood method [16,17]. Most regressions in single-index models assume that the random variables are independently and identically distributed. However, spatio-temporal correlation and heterogeneity are often found in the model error terms. For spatio-temporal correlation, we portray it through an

α

-mixing condition that is consistent with the spatio-temporal characterization. There are usually two types of assumptions for heterogeneity: the first assumption is that the variance function is purely nonparametric, and the second requires that the variance function has a dimension reduction structure like the mean function. This is often the case with models with a dimensionality reduction structure, such as generalized linear models. The second case holds in more general semiparametric settings for the central subspace and the central mean subspace, which have the same dimensions; see [18,19,20].

This work mainly deals with the estimation problem of spatio-temporal heterogeneity. By combining the idea of the local linear method and Nadaraya–Watson smoothing techniques, we propose a method for estimating the variance function in the single index model with heteroscedastic errors, and the resulting estimator based on a fully nonparametric variance function or a dimensionality reduction structure is proved to be consistent. For the parametric part, we use an efficient estimator of the parametric component by applying the iterative generalized least squares method in heteroscedastic generalized linear models and taking into account the estimated heteroscedasticity. We call this model fitting method reweighting estimation, and it is shown that the resulting constant coefficient estimators have smaller asymptotic variances than the rMAVE estimators, which neglect error heteroscedasticity but retain the same biases.

Throughout the rest of the paper, the symbols ‘

\overset{P}{⟶}

’ and ‘

\overset{D}{⟶}

’ denote convergence in probability and convergence in distribution, respectively. The symbol

A^{+}

indicates the Moore–Penrose inverse of the symmetric matrix A. The notation

∥ \cdot ∥

denotes the Euclidean distance.

This article is structured as follows. In Section 2, the estimation process of the rMAVE method is briefly described. In Section 3, two estimators for variance are proposed, namely, a nonparametric estimator of kernel smoothing type and a nonparametric estimator integrated with the dimensionality reduction structure. In Section 4, a reweighting estimation method and its asymptotic properties are given. In Section 5, the effectiveness of the proposed method is verified through simulations. The analyses for Nanjing air quality data are found in Section 6. Conclusions are presented in Section 7. The assumptions required for the theorem and the proof are in Appendix A and Appendix B.

2. A Brief Description of the rMAVE

Consider the regression model (1). Let

(X, Y)

be the random variable in model (1). The parameter

β_{0}

is obtained by minimizing the expectation of the joint distribution

(X, Y)

; see [21],

\min_{β} [E {Y - E (Y | β^{T} X)}^{2}] .

(2)

For any

β

, the conditional variance given

β^{T} X

is:

σ_{β}^{2} (X) = E [E {Y - E (Y | β^{T} X)}^{2} | β^{T} X] .

(3)

It follows that:

E {Y - E (Y | β^{T} X)}^{2} = E {σ_{β}^{2} (X)} .

Therefore, minimizing Expression (2) is equivalent to minimizing, with respect to

β

,

E {σ_{β}^{2} (X)} subject to β^{T} β = 1 .

(4)

Suppose that

{(X_{i, t}, Y_{i, t})}

is a random sample from Formula (2). Let

g (v) = E (Y | β^{T} X = v)

. For any given

X_{j, τ}

, a local linear expansion of

E (Y_{i, t} | β^{T} X_{i, t})

at

X_{j, τ}

is:

E (Y_{i, t} | β^{T} X_{i, t}) \approx a + b β^{T} X_{i j},

(5)

where

a = g (β^{T} X_{j, τ})

,

b = g^{'} (β^{T} X_{j, τ})

, and

X_{i, t, j, τ} = X_{i, t} - X_{j, τ}

. Following the idea of local linear smoothing estimation, we can estimate

σ_{β}^{2} (X_{j, τ})

by exploiting the approximation:

\sum_{i \in I_{n}} \sum_{t \in T_{n}} {\{Y_{i, t} - E (Y_{i, t} | β^{T} X_{i, t})\}}^{2} w_{(i, t), (j, τ)} \approx \sum_{i \in I_{n}} \sum_{t \in T_{n}} {[Y - {a + b β^{T} X_{(i, t), (j, τ)}}]}^{2} w_{(i, t), (j, τ)},

(6)

where

w_{(i, t), (j, τ)} = K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) / \sum_{l \in I_{n}} \sum_{r \in T_{n}} K_{h_{n}} (β^{T} X_{(l, r), (j, τ)})

,

K_{h_{n}} (\cdot) =

{h_{n}}^{- 1} K (\cdot / h_{n})

,

h_{n}

is a bandwidth, and

K (\cdot)

is an univariate symmetric density function. Therefore, the estimator of

σ_{β}^{2} (\cdot)

at

β^{T} X_{j, τ}

is just the minimum value of expression (6), namely:

{\hat{σ}}_{β}^{2} (X_{j, τ}) = \min_{a, b} \sum_{i \in I_{n}} \sum_{t \in T_{n}} {[Y_{i, t} - {a + b β^{T} X_{(i, t), (j, τ)}}]}^{2} w_{(i, t), (j, τ)} .

(7)

Under some mild conditions, we have

{\hat{σ}}_{β}^{2} (X_{j, τ}) - σ_{β}^{2} (X_{j, τ}) = o_{P} (1)

. On the basis of Expressions (2), (4), and (7), we can estimate

β

by solving the minimization problem:

\min_{β : β^{T} β = 1} \{\sum_{j \in I_{n}} \sum_{τ \in T_{n}} {\hat{σ}}_{β}^{2} (X_{j, τ})\} = \min_{\underset{a_{j, τ}, b_{j, τ}, j \in I_{n}}{β : β^{T} β = 1}} (\sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {[Y_{i, t} - {a_{j, τ} + b_{j, τ} β^{T} X_{(i, t), (j, τ)}}]}^{2} w_{(i, t), (j, τ)}) .

(8)

Let

G (\cdot) = {(g (\cdot), g^{'} (\cdot))}^{T}

. The estimation algorithm for

β

and

G (\cdot)

can be described as follows.

Step 0.: Give the calculation of the initial value $β$ of $β_{0}$ .
Step 1.: Calculate:

${\hat{f}}_{β} (β^{T} X_{j, τ}) = n^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)})$

and:

$\begin{matrix} (\begin{matrix} a_{j, τ}^{β} \\ b_{j, τ}^{β} h_{n} \end{matrix}) & = & {\{\sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, τ), (j, τ)}) (\begin{matrix} 1 \\ β^{T} X_{(i, τ), (j, τ)} / h_{n} \end{matrix}) {(\begin{matrix} 1 \\ β^{T} X_{(i, τ), (j, τ)} / h_{n} \end{matrix})}^{T}\}}^{- 1} \\ \times \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, τ), (j, τ)}) (\begin{matrix} 1 \\ β^{T} X_{(i, τ), (j, τ)} / h_{n} \end{matrix}) Y_{i, t} . \end{matrix}$
Step 2.: Calculate:

$\begin{matrix} β & = & {\{\sum_{j \in I_{n}} \sum_{τ \in T_{n}} \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, τ), (j, τ)}) {\hat{ρ}}_{j, τ}^{β} {(b_{j, τ}^{β})}^{2} X_{(i, τ), (j, τ)} X_{(i, τ), (j, τ)}^{T} / {\hat{f}}_{β} (β^{T} X_{j, τ})\}}^{- 1} \\ \times \sum_{j \in I_{n}} \sum_{τ \in T_{n}} \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, τ), (j, τ)}) {\hat{ρ}}_{j, τ}^{β} b_{j, τ}^{β} X_{(i, τ), (j, τ)} (Y_{i, t} - {\hat{a}}_{j, τ}^{β}) / {\hat{f}}_{β} (β^{T} X_{j, τ}), \end{matrix}$

where ${\hat{ρ}}_{j, τ}^{β} = ρ_{n} (n^{- 1} {\sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, τ), (j, τ)}))$ , and $ρ_{n} (\cdot)$ is a trimming function.
Step 3.: Repeat steps 1–3 with $β : = β / ∥ β ∥$ , where $∥ \cdot ∥$ denotes the Euclidean distance, until convergence. The vector obtained in the last iteration is defined as the rMAVE estimator of $β$ , denoted by ${\hat{β}}_{n}$ .
Step 4.: Put ${\hat{β}}_{n}$ into step 1 and obtain the estimators of $g (\cdot)$ and $g^{'} (\cdot)$ , denoted by ${\hat{G}}_{n} (\cdot) = {({\hat{g}}_{n} (\cdot), {\hat{g}}_{n}^{'} (\cdot))}^{T}$ .

Combining

{\hat{β}}_{n}

,

{\hat{g}}_{n} (\cdot)

and model (1) leads to the residuals

{\hat{ε}}_{i, t} = Y_{i, t} - {\hat{g}}_{n} ({\hat{β}}_{n}^{T} X_{i, t}), i \in I_{n}, t \in T_{n} .

(9)

Remark 1.

The calculation of the initial value

{\hat{β}}_{(1)}

can refer to [10,13,22]. To deal with the large deviation of the boundary points, a suitable trimming function is introduced; see [11], which is of the form:

ρ_{n} (v) = \{\begin{matrix} 1, & v \geq 2 c_{0} n^{- ϵ}, \\ \frac{\exp {{(2 c_{0} n^{- ϵ} - v)}^{- 1}}}{\exp {{(2 c_{0} n^{- ϵ} - v)}^{- 1}} + \exp {{(v - c_{0} n^{- ϵ})}^{- 1}}}, & 2 c_{0} n^{- ϵ} > v > c_{0} n^{- ϵ}, \\ 0, & v \leq c_{0} n^{- ϵ}, \end{matrix}

where

0 < ϵ < \frac{1}{20}

and

0 < c_{0} < \frac{1}{20}

.

3. Estimation of the Variance Function

In this section, we focus on estimating the variance function in single-index heteroscedastic models. The estimation methods and the convergence properties of their estimations are given respectively.

3.1. Estimation of the Variance Function with Fully Nonparametric Function

For any given point

X_{s_{0}, t_{0}}

and the estimation

{\hat{β}}_{n}

, an estimation of

E (ε^{2} | X)

at

X_{s_{0}, t_{0}}

is:

{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{s_{0}, t_{0}}) = \frac{\sum_{j \in I_{n}} \sum_{τ \in T_{n}} {\hat{ε}}_{j, τ}^{2} L_{l_{n}} (X_{(j, τ) (s_{0}, t_{0})})}{\sum_{j \in I_{n}} \sum_{τ \in T_{n}} L_{l_{n}} (X_{(j, τ) (s_{0}, t_{0})})},

(10)

where

{\hat{ε}}_{j, τ}

are the residuals calculated by (9),

L_{l_{n}} (\cdot) = {l_{n}}^{- 1} L (\cdot / l_{n})

,

L (\cdot)

is a d-dimensional symmetric density function such that

\int L (u) d u = 1

and

\int u L (u) d u = 0

, and

l_{n}

is a bandwidth. Taking

X_{s_{0}, t_{0}} = X_{i, t}

((

i \in I_{n}, t \in T_{n}

)), the estimations at all the designed points

X_{i, t}

can be obtained, i.e.,

{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{i, t})

(

i \in I_{n}, t \in T_{n}

).

The following theorems are about the asymptotic result for the estimator

{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{i, t})

.

Theorem 1.

According to the assumptions (C1)–(C4), we have:

\sup_{X_{s_{0}, t_{0}} \in R^{d}} |{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{s_{0}, t_{0}}) - σ^{2} (X_{s_{0}, t_{0}})| = o_{P} \{l_{n}^{2 d} + {(\frac{\log (1 / l_{n}^{d})}{n l_{n}^{d}})}^{\frac{1}{2}}\} .

Theorem 1 shows that the estimator

{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{s_{0}, t_{0}})

is consistent, and the proof is found in the Appendix B.

3.2. Estimation of the Variance Function with Dimension Reduction Structure

If the dimension of the covariates is large, the core estimates given in the previous section suffer from the curse of dimensionality and, furthermore, the proposed test statistic may not be very informative. If the model contains a dimensionality reduction structure, we should use it to make the test more powerful. Let us assume the following model structure:

Y ⊢ X | β^{T} X .

(11)

Formula (11) notes that, given

β^{T} X

, Y and X are conditionally independent. According to [18,19,20], this is a general dimension reduction structure that includes the model

Y = G (β^{T} X, ε)

as its special case.

If the above structure holds for model (1), the mean and variance have the same index

β

. For any given point

X_{s_{0}, t_{0}}

and the estimated

{\hat{β}}_{n}

, the variance function

E (ε^{2} | β^{T} X)

at

X_{s_{0}, t_{0}}

is:

{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{s_{0}, t_{0}}) = {\hat{σ}}^{2} ({\hat{β}}_{n}^{T} X_{s_{0}, t_{0}}) = \frac{\sum_{j \in I_{n}} \sum_{τ \in T_{n}} {\hat{ε}}_{j}^{2} Q_{b_{n}} ({\hat{β}}_{n}^{T} X_{(j, τ) (s_{0}, t_{0})})}{\sum_{j \in I_{n}} \sum_{τ \in T_{n}} Q_{b_{n}} ({\hat{β}}_{n}^{T} X_{(j, τ) (s_{0}, t_{0})})},

(12)

where

{\hat{σ}}^{2} ({\hat{β}}_{n}^{T} X_{s_{0}, t_{0}})

can also be denoted by

{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{s_{0}, t_{0}})

,

{\hat{ε}}_{j, τ}

(j \in I_{n}, τ \in T_{n})

are the residuals of the rMAVE,

Q_{b_{n}} (\cdot) = b_{n}^{- 1} Q (\cdot / b_{n})

,

Q (\cdot)

is a univariate kernel function, and

b_{n}

is a bandwidth. Particularly, taking

X_{s_{0}, t_{0}} = X_{i, t}

, (

i \in I_{n}, t \in T_{n}

), we can obtain the estimations of the variance function at

X_{i, t}

, i.e.,

{\hat{σ}}^{2} ({\hat{β}}_{n}^{T} X_{i, t})

.

For the estimate

{\hat{σ}}^{2} ({\hat{β}}_{n}^{T} X_{s_{0}, t_{0}})

of the variance function, the consistency can be proven.

Theorem 2.

According to the assumptions (C1)–(C3) and (C5), we have:

\sup_{X_{s_{0}, t_{0}} \in R^{d}} |{\hat{σ}}^{2} ({\hat{β}}_{n}^{T} X_{s_{0}, t_{0}}) - σ^{2} (β_{0}^{T} X_{s_{0}, t_{0}})| = o_{P} \{b_{n}^{2} + {(\frac{\log (1 / b_{n})}{n b_{n}})}^{\frac{1}{2}}\} .

The proof of Theorem 2 is given in the Appendix B.

4. Reweighting Estimation and Asymptotic Properties

In this section, we provide the estimation method for the spatio-temporal single-index model and provide the asymptotic properties of the proposed estimator.

4.1. Reweighting Estimation

If the model exhibits heteroscedasticity, the usual approach can be taken by considering weighted estimates. In particular, we describe a reweighting procedure of the single-index model with heteroscedastic errors with the estimated values of the variance function at all designed points. For model (1), weighted versions of the proposed estimation methods following the idea of [23] can be considered. When calculating, we only need to modify the algorithms in Step 2, by replacing

{\hat{ρ}}_{j, τ}^{β} = ρ_{n} (n^{- 1} \sum_{j \in I_{n}} \sum_{τ \in T_{n}} K_{h_{n}} (β^{T} X_{(i, τ), (j, τ)}))

with

{\tilde{ρ}}_{j, τ}^{β} = ρ_{n} (n^{- 1} \sum_{j \in I_{n}} \sum_{τ \in T_{n}} K_{h_{n}} (β^{T} X_{(i, τ), (j, τ)})) / {\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{j, τ}),

where

{\hat{σ}}_{{\hat{β}}_{n}}^{2} (X_{j, τ})

is defined in (10) or (12). In terms of the modified algorithms, the reweighted estimators for the parameter vector

β

and the link function

g (\cdot)

can be obtained, denoted by

{\hat{β}}_{R}

and

{\hat{g}}_{R} (\cdot)

, respectively.

4.2. Asymptotic Properties

In this subsection, under the assumption of the

α

mixing condition, we provide the asymptotic distributions of two type estimators of

β

and show that the reweighted estimator

{\hat{β}}_{R}

has no greater asymptotic variance than the rMAVE estimator

{\hat{β}}_{n}

. Then, the asymptotic distributions of

{\hat{g}}_{n} (\cdot)

and

{\hat{g}}_{R} (\cdot)

are shown.

There are some notations to determine asymptotic results. Let

μ_{β} (x) = E (X | β^{T} X = β^{T} x)

,

ν_{β} (x) = μ_{β} (x) - x

,

w_{β} (x) = E (X X^{T} | β^{T} X = β^{T} x)

, and

W_{0} (x) = ν_{β_{0}} (x) ν_{β_{0}}^{T} (x)

.

Theorem 3.

Assuming

n \to \infty

, according to the assumptions (C1)–(C3), we have:

\sqrt{n} ({\hat{β}}_{n} - β_{0}) \overset{D}{⟶} N (0, W_{g 0}^{+} Δ W_{g 0}^{+}),

where

W_{g 0} = E {g^{'} {(β_{0}^{T} X)}^{2} W_{0} (X)}

and

Δ = E {g^{'} {(β_{0}^{T} X)}^{2} W_{0} (X) σ_{β_{0}}^{2} (X)}

.

Theorem 4.

Assuming

n \to \infty

, according to the assumptions (C1)–(C3) and (C4) or (C5), we have:

\sqrt{n} ({\hat{β}}_{R} - β_{0}) \overset{D}{⟶} N (0, {\tilde{W}}_{g 0}^{+}),

where

{\tilde{W}}_{g 0} = E {g^{'} {(β_{0}^{T} X)}^{2} W_{0} (X) σ_{β_{0}}^{- 2} (X)}

.

Theorem 5.

In addition to the assumptions (C1)–(C3) and (C4) or (C5), if ε is independent of X, then we have,

{\tilde{W}}_{g 0}^{+} \leq W_{g 0}^{+} Δ W_{g 0}^{+} .

Remark 2.

From the result of Theorem 5, a conclusion can be obtained that

{\hat{β}}_{R}

is asymptotically more efficient than

{\hat{β}}_{n}

in terms of asymptotic variance.

For the rMAVE estimator

{\hat{g}}_{n} (\cdot)

and reweighted estimator

{\hat{g}}_{R} (\cdot)

of the link function

g (\cdot)

at

X = X_{s_{0}, t_{0}}

, their asymptotic distributions are also derived. These are the following theorems.

Theorem 6.

According to the assumptions (C1)–(C3), for any

v = β_{0}^{T} x

, as

n \to \infty

,

\sqrt{n h_{n}} \{{\hat{g}}_{n} (v) - g (v) - \frac{h_{n}^{2}}{2} g^{″} (v)\} \overset{D}{⟶} N (0, \frac{σ_{β_{0}}^{2} (x)}{f_{β_{0}} (v)}) .

Theorem 7.

According to the assumptions (C1)–(C3), for any

v = β_{0}^{T} x

, as

n \to \infty

,

\sqrt{n h_{n}} \{{\hat{g}}_{R} (v) - g (v) - \frac{h_{n}^{2}}{2} g^{″} (v)\} \overset{D}{⟶} N (0, \frac{σ_{β_{0}}^{2} (x)}{f_{β_{0}} (v)}) .

Remark 3.

By comparing Theorems 6 and 7, the asymptotic distributions obtained by rMAVE and reweighting methods are the same, which reflects the characteristic of local regression in nonparametric models.

5. Monte Carlo Study

We use the spectral method to simulate the spatio-temporal process:

X_{i, j, t} = {(2 / Ω)}^{1 / 2} \sum_{k = 1}^{Ω} \cos (w (1, k) * i + w (2, k) * j + q (k) * t + r (k)),

where

w (i, k), i = 1, 2, q (k), k = 1, \dots, Ω

are i.i.d. from a standard normal distribution, independent of

r (k), k = 1, \dots, Ω,

i.i.d. uniform random variables on

[- π, π] .

As

n \to \infty

,

X_{i, j, t}

converges to a Gaussian ergodic process (see [24]). Additionally, the

ε_{i, j, t}, i = 1, \dots, n_{1}, j = 1, \dots, n_{2}, t = 1, \dots, n_{3}

are i.i.d. from a standard normal distribution. We conducted 100 simulation studies with

Ω = 1000

and a sample size of

8 \times 8 \times 8 (n_{1} = 8, n_{2} = 8, n_{3} = 8)

,

10 \times 10 \times 10

. Comparisons were made between the rMAVE estimator and two reweighted estimators. For the convenience of expression, we use RWEFN and RWEDR respectively to represent reweighting estimation of a fully nonparametric function and reweighting estimation by the dimensionality reduction structure.

For simplicity, the rule of thumb [25] is used to select the bandwidth for the rMAVE and reweighted methods. To verify the performance of the proposed methods, we design the following two examples. We take the Gaussian kernel functions

K (u) = Q (u) = 1 / (\sqrt{2 π}) e^{- \frac{u^{2}}{2}}

and

L (u_{1}, \dots, u_{p}) = l (u_{1}) \times \dots \times l (u_{p})

with

l (u) = 1 / (\sqrt{2 π}) e^{- \frac{u^{2}}{2}}

. The trimming function with

ϵ = \frac{1}{10}

and

c_{0} = 0.01

is considered. In the example, we also take

β_{0} = {(1, 2, 3, 0)}^{T} / \sqrt{14}

, and the samples

X_{i, j, t} = (X_{i, j, t (1)}, X_{i, j, t (2)}, X_{i, j, t (3)}, X_{i, j, t (4)})

are simulated by the spectral method on the 4-dimensional cube

{[- 1, 1]}^{4}

.

Example 1.

We consider the following model:

Y_{i, t} = 2 + 2 (β_{0}^{T} X_{i, t}) + (θ_{0} | β_{0}^{T} X_{i, t} | + 0.5) ε_{i, t} .

Example 2.

We consider the following example:

Y_{i, t} = 1 + 2 (β_{0}^{T} X_{i, t} + (θ_{0} | X_{i, t} (1) + X_{i, t} (2) | + 1) ε .

The simulated results of Example 1 are reported in Table 1 and Table 2, and the results of Example 2 are shown in Table 3 and Table 4.

We assume the value of

θ_{0} = 1, 1.5, 2

to evaluate the influence of error heteroscedasticity on coefficient estimates. To show the performance of the estimators

{\hat{β}}_{n}

,

{\hat{β}}_{R}^{F N}

(RWEFN), and

{\hat{β}}_{R}^{D R}

(RWEDR), two indices are defined: the sampling standard deviation (SSD) and the relative sampling efficiency (SRE), respectively. In particular, for rMAVE, RWEFN, or RWEDR, let

{\hat{β}}_{j (1)}, \dots, {\hat{β}}_{j (M)}

be estimates of

β_{j}

in M replicates. The SSD for

β_{j}

is defined by:

S S D (β_{j}) = {[\frac{1}{M} \sum_{k = 1}^{M} {({\hat{β}}_{j (k)} - {\bar{β}}_{j})}^{2}]}^{\frac{1}{2}},

and the SRE for

β_{j}

is defined by:

S R E (β_{j}) = \frac{\frac{1}{M} \sum_{k = 1}^{M} {({\hat{β}}_{j (k)} - β_{j})}^{2}}{\frac{1}{M} \sum_{k = 1}^{M} {({\tilde{β}}_{j (k)} - β_{j})}^{2}},

where

{\bar{β}}_{j} = \frac{1}{M} \sum_{k = 1}^{M} {\hat{β}}_{j (k)}

, and

\tilde{β}

has the same form as that of

{\hat{β}}_{R}^{F N}

or

{\hat{β}}_{R}^{D R}

, except that the weighting uses the true variance function. The calculation in different sample size results of

{\hat{β}}_{n}

,

{\hat{β}}_{R}^{F N}

and

{\hat{β}}_{R}^{D R}

, in terms of the sample mean, sample standard deviation, and relative sampling efficiency, are shown in Table 1 and Table 2.

The following conclusions can be obtained from Table 1, Table 2, Table 3 and Table 4. First, in all cases, the values of the sample mean are reasonably close to their true values, suggesting that

{\hat{β}}_{n}

,

{\hat{β}}_{R}^{F N}

, and

{\hat{β}}_{R}^{D R}

are asymptotically unbiased. Second, for

θ_{0} \neq 0

, the values of SSD and SRE for the RWEFN and RWEDR methods are smaller than those for the rMAVE method. Third, from tables, the values of

{\hat{β}}_{R}^{D R}

slightly outperform

{\hat{β}}_{R}^{F N}

when error heteroscedasticity exists. This is also explainable because the example has a dimensionality reduction structure. Fourth,

{\hat{β}}_{R}^{F N}

and

{\hat{β}}_{R}^{D R}

have better performance when the error heteroscedasticity is larger, which implies that the improvement in

{\hat{β}}_{R}^{F N}

and

{\hat{β}}_{R}^{D R}

over

{\hat{β}}_{n}

becomes obvious with the growth of errors. Overall,

{\hat{β}}_{R}^{F N}

and

{\hat{β}}_{R}^{D R}

work much better than

{\hat{β}}_{n}

, and

{\hat{β}}_{R}^{D R}

is the best. Finally, because the composition of Example 2 is more complex, the average value of the obtained estimators is worse than that of Example 1.

6. Real Data Analysis

Based on the reweighting method proposed in this article, the air pollution data of Nanjing were studied, with the data coming from the China Meteorological Data Network and the real air data from the Environment Big Data Center. The data contain the air quality index (AQI) PM

_{2.5}

(

μ

g/m

^{3}

), PM

_{10}

(

μ

g/m

^{3}

), CO (mg/m

^{3}

), SO

_{2}

(

μ

g/m

^{3}

), NO

_{2}

(

μ

g/m

^{3}

), and ozone eight-hour O₃_8h (

μ

g/m

^{3}

) of Nanjing from 23 October 2020 to 23 October 2022, including the AQI in response to a variable Y and the rest of the variables as covariate

X_{1}, X_{2}, \dots, X_{6}

.

When the data were estimated using the rMAVE method, the value of the estimated parameter was

{\hat{β}}_{n} = {(0.6606, - 0.1529, 0.8679, 11.1681, 0.3236, 0.4541)}^{T}

. The results obtained by reweighting the data are as follows: first,

{\hat{β}}_{R}^{D R} = {(0.04167, - 0.0102, 0.0637, 0.9964, 0.0202, 0.02915)}^{T}

was obtained by adopting a dimensional-reduction structure. The other difference is estimated by taking the kernel regression, resulting in

{\hat{β}}_{R}^{F N} = (0.04238, - 0.0108, 0.0675,

{0.9961, 0.0200, 0.0293)}^{T}

. The fitting result is shown in Figure 1, which contains the true values as well as the values fitted by the three methods, where

R_{R m a v e}^{2} = 0.7586

,

R_{D R}^{2} = 0.7606

,

R_{F N}^{2} = 0.7606

. Due to the four lines being too close to distinguish, the right side of Figure 1 is the enlarged part. The estimators obtained by the reweighting method have a better fitting effect.

7. Conclusions

This work considers an estimation problem in single-index models with spatio-temporal correlation and heterogeneity. We propose a parametric component reweighting estimated method based on the variance function of the error. Theoretical results show that the proposed reweighting estimators have smaller asymptotic variances while maintaining the same biases. Numerical simulations show that the estimators revealed that heterogeneity is closer to the true value. A real data analysis was conducted to illustrate the proposed methods.

Author Contributions

Methodology, H.H.; Formal analysis, Z.Z.; Writing—original draft, H.W.; Writing—review & editing, C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Social Science Fund of China under Grant No. 22BTJ021.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Assumptions

In order to obtain asymptotic results, we will assume throughout the paper that at any fixed moment

t^{'},

{ε_{i, t^{'}}, i \in I_{n}}

satisfies the following mixing condition: there exists a function

φ_{1} (t) ↘ 0

as

t \to \infty

,

φ_{1} (0) = 1

, such that whenever

E, E^{'} \subset I_{n}

with finite cardinals

\begin{matrix} α (β (E), β (E^{'})) & = & \sup {P (A B) - P (A) P (B) | A \in β (E), B \in β (E^{'})} \\ \leq & ψ_{1} (c a r d (E), c a r d (E^{'})) φ_{1} (d (E, E^{'})), \end{matrix}

(A1)

where

β (E)

(resp,

β (E^{'})

) denotes the Borel

σ

—field generated by

{ε_{i, t^{'}}}_{i \in E}

(resp,

{ε_{i, t^{'}}}_{i \in E^{'}}

), Card E (resp, Card

E^{'}

) is the cardinality of E(resp,

E^{'}

) and

d (E, E^{'})

is the ordinary Euclidean distance.

ψ_{1} : N^{2} \to R^{+}

is a symmetric positive function that is nondecreasing in each variable.

Similarly, we assume that at any fixed location

i_{0},

{ε_{i_{0}, t}, t \in T_{n}}

satisfies the following mixing condition: there exists a function

φ_{2} (t) ↘ 0

as

t \to \infty

,

φ_{2} (0) = 1

, such that whenever

G, G^{'} \subset T_{n}

with finite cardinals

\begin{matrix} α (β (G), β (G^{'})) & = & \sup {P (A B) - P (A) P (B) | A \in β (G), B \in β (G^{'})} \\ \leq & ψ_{2} (c a r d (G), c a r d (G^{'}) φ_{2} (d (G^{'}, G^{'})), \end{matrix}

(A2)

where

β (G)

(resp,

β (G^{'})

) denote the Borel

σ

—field generated by

{ε_{i_{0}, t}}_{t \in G}

(resp,

{ε_{i_{0}, t}}_{t \in G^{'}}

),

ψ_{2} : N^{2} \to R^{+}

is a symmetric positive function that is nondecreasing in each variable. Furthermore, the random field

{(X_{i, t}, Y_{i, t}), i \in I_{n}, t \in T_{n}}

also satisfies the above assumption.

The following conditions are imposed to obtain the asymptotic properties of the resulting estimators. They are not the weakest, but they are introduced to make the evidence easier.

(C1): The density function $f_{β} (v)$ of $β^{T} X$ and its derivatives up to the third order are bounded on R for all $β$ : $∥ β - β_{0} ∥ < δ$ where $δ > 0$ is a constant, ${E ∥ X ∥}^{6} < \infty$ , and ${E | Y |}^{3} < \infty$ .
(C2): The conditional mean $g_{β} (v) = E (Y | β^{T} X = v)$ and its derivatives up to the third order are bounded for all $β$ : $∥ β - β_{0} ∥ < δ$ where $δ > 0$ .
(C3): $K (\cdot)$ is a symmetric univariate density function with finite moments of all orders and a bounded derivative. Bandwidth $h_{n} \propto n^{- 1 / 5}$ and $lim_{n \to \infty} \log (1 / h_{n}) / n h_{n}^{2} = 0$ .
(C4): $L (\cdot)$ is a symmetric multivariate density function with finite moments and bounded derivatives. Bandwidth $l_{n} \propto n^{- d / 5}$ and $lim_{n \to \infty} \log (1 / l_{n}^{d}) / n l_{n}^{2 d} = 0$ . $\exists s > 2, \tilde{δ} < 1 - s^{- 1}$ , such that $lim_{n \to \infty} n^{2 \tilde{δ} - 1} l_{n} = + \infty$ .
(C5): $Q (\cdot)$ is a symmetric univariate density function with finite moments and bounded derivatives. Bandwidth $b_{n} \propto n^{- 1 / 5}$ and $lim_{n \to \infty} \log (1 / b_{n}) / n b_{n}^{2} = 0$ . $\exists s > 2, \tilde{ρ} < 1 - s^{- 1}$ , such that $lim_{n \to \infty} n^{2 \tilde{ρ} - 1} b_{n} = + \infty$ .

According to (C1), covariate X can have discrete components provided that

β^{T} X

is continuous for

β

in a small neighbor of

β_{0}

; see also [23]. In order to be able to use the optimal bandwidth, there needs to be a moment of order higher than the second order moment for the variable Y. The smoothness requirement of the link function in hypothesis (C2) can be relaxed to the existence of a bounded second-order derivative but requires a more complex proof process and a smaller bandwidth. The hypotheses (C3)–(C5) are common hypotheses in kernel regression; see [13].

Let

ξ_{n} = {(n h_{n} / \log n)}^{- 1 / 4}

,

κ_{n} = h_{n}^{2} + ξ_{n}

, and

δ_{β} = ∥ β - β_{0} ∥

. For any k,

e_{k}

denotes a k-dimension vector with all elements equal to 1, and

E_{k}

denotes a

p \times p

matrix with all elements equal to 1. For any vector

V (v)

of functions of v, we define

{(V (v))}^{'} = d V (v) / d v

. Recall that

K (\cdot)

is a symmetric density function. Thus,

μ_{0} = \int K (v) d v = 1

and

μ_{1} = \int v K (v) d v = 0

. For ease of exposition, we assume that

μ_{2} = \int v^{2} K (v) d v = 1

. Otherwise,

K (v) = μ_{2}^{1 / 2} K (μ_{2}^{1 / 2} v)

. Let

D_{n} = {x : ∥ x ∥ < n^{c}, f_{β} (β^{T} x) > n^{- ϵ}, β \in B_{n}}

, where

B_{n} = {β : ∥ β - β_{0} ∥ < n^{- 1 + 2 c_{0}}}

,

c \geq 1

. In order to prove the result of the theorems, we first present the following lemmas.

Appendix B. Proof

Lemma A1.

Suppose

m_{n} (χ, Z)

,

n = 1, 2, \dots,

are measurable functions of Z with index

χ \in R^{d}

, where d is any integer number, such that

(i)

| m_{n} (χ, Z) | \leq a_{n} M (Z)

with

E (M {(Z)}^{r}) < \infty

for some

r > 2

, and

a_{n}

increases with n such that

a_{n} < c_{0} n^{1 - 2 / r}

;

(i i)

E (m_{n} (χ, Z)) \leq a_{n} m_{0}^{2} (χ)

with

| m_{0} (χ) - m_{0} (χ^{'}) | \leq c ∥ χ - χ^{'} ∥^{α_{1}}

, where

α_{1} > 0

and

c > 0

are two constants (without loss of generality, we assume

m_{0} (χ) \geq 1

); and that

(i i i)

| m_{n} (χ, Z) - m_{n} (χ^{'}, Z) | \leq | χ - χ^{'} |^{α_{1}} n^{α_{2}} G (Z)

with some

α_{2} > 0

, and

E G^{2} (Z)

exists. Suppose

{Z_{i}, i = 1, \dots, n}

is a random sample from Z. Then, for any positive

α_{0}

, we have:

\sup_{| χ | \leq n^{α_{0}}} | {m_{0} (χ)}^{- 1} n^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {m_{n} (χ, Z_{i, t}) - E m_{n} (χ, Z_{i, t})} | = O_{P} {{(a_{n} \log n / n)}^{1 / 4}} .

Proof of Lemma A1.

For the “continuity argument” approach, see [13,25]. For simplicity, let

m_{n} (χ, Z)

denote

m_{n} (χ, Z) / m_{0} (χ)

. Let

D_{n} \overset{def}{=} \{| χ | < n^{α_{0}}\}

be bounded and its Borel measure be less than

n^{d + α_{0}}

. There are

n^{α_{4}} (α_{4} > d + α_{0})

balls

B_{n_{k}}

centered at

χ_{n_{k}}, 1 \leq k \leq n^{α_{4}}

with a diameter smaller than

c n^{2 (2 + α_{2}) / α_{1}}

, so that

D_{n} \subset ⋃_{1 \leq k \leq n^{α_{4}}} B_{n_{k}}

. Then,

\begin{matrix} \underset{χ \in D_{n}}{s u p} |\frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} \{m_{n} (χ, Z_{i, t}) - E m_{n} (χ, Z_{i, t})\}| \\ \leq \underset{1 \leq k \leq n^{α_{4}}}{m a x} |\frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} \{m_{n} (χ_{n_{k}}, Z_{i, t}) - E m_{n} (χ_{n_{k}}, Z_{i, t})\}| \\ + \max_{1 \leq k \leq n^{α_{4}}} \sup_{χ \in B_{n_{k}}} |\frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} [\{m_{n} (χ, Z_{i, t}) - m_{n} (χ_{n_{k}}, Z_{i, t})\} - E \{m_{n} (χ, Z_{i, t}) - m_{n} (χ_{n_{k}}, Z_{i, t})\}]| \\ \overset{def}{=} \underset{1 \leq k \leq n^{α_{4}}}{m a x} |R_{n, k, 1}| + \underset{1 \leq k \leq n^{α_{4}}}{m a x} \underset{χ \in B_{n_{k}}}{s u p} |R_{n, 2}| . \end{matrix}

(A3)

By

(i i)

, we have:

\begin{matrix} \underset{1 \leq k \leq n^{α_{+}}}{m a x} \underset{χ \in B_{n_{k}}}{s u p} |m (χ, Z_{i, t}) - m (χ_{n_{k}}, Z_{i, t})| & \leq \underset{1 \leq k \leq n^{α_{+}}}{m a x} \underset{χ \in B_{n_{k}}}{s u p} n^{α_{2}} |χ - χ_{n_{k}}|^{α_{1}} G (Z_{i, t}) \\ \leq c n^{- 2} G (Z_{i, t}) . \end{matrix}

By assumption

(i i i)

,

\underset{1 \leq k \leq n^{α_{+}}}{m a x} \underset{χ \in B_{n_{k}}}{s u p} |R_{n, 2}| \leq c n^{- 2} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} G (Z_{i, t}) = O (n^{- 1}) .

(A4)

Let

L_{n} = n l o g (n) a_{n}^{- \frac{8}{9}}

,

m_{n}^{L} (χ, Z_{i, t}) = m_{n} (χ_{n_{k}}, Z_{i, t}) I \{|M (Z_{i, t})| \geq L_{n}\}

, and

m_{n}^{O} (χ, Z_{i, t}) = m_{n} (χ_{n_{k}}, Z_{i, t}) - m_{n}^{L} (χ, Z_{i, t})

; then, we have:

R_{n, k, 1} = \frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} [m_{n}^{L} (χ_{n_{k}}, Z_{i, t}) - E \{m_{n}^{L} (χ_{n_{k}}, Z_{i, t})\}] + \frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} ξ_{n_{k}, i, t},

(A5)

where

ξ_{n_{k}, i, t} = m_{n}^{O} (χ_{n_{k}}, Z_{i, t}) - E \{m_{n}^{O} (χ_{n_{k}}, Z_{i, t})\}

. Because of

P (| M (Z) | > L_{n}) \leq L_{n}^{- r} E {| M (Z) |}^{r}

and

\sum_{n = 1}^{\infty} T_{n}^{- r} < \infty

, for all sufficiently large

n

, with probability 1,

| M (Z_{i, t}) | \leq L_{n}

for

(i, t)

. This implies that the first term on the right-hand side of (A5) is eventually 0. By assumption

(i i)

, we have:

\underset{1 \leq k \leq n^{α_{4}}}{m a x} Var (\sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} n ξ_{n_{k}, i, t}) \leq c_{1} n a_{n} \overset{def}{=} S_{1} .

(A6)

By

(i)

, we can obtain:

\underset{1 \leq k \leq n^{α}}{m a x} |ξ_{n_{k}, i, t}| \leq c_{0} a_{n} T_{n} \leq c {\{n a_{n} \log n\}}^{1 / 9} \overset{def}{=} S_{2} .

(A7)

Let

S_{3} = c {\{n a_{n} \log n\}}^{\frac{3}{4}}

,

N = \frac{S_{3}}{c S_{2}}

. According to Theorem 2.1 of [26],

\begin{matrix} P (|\sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} ξ_{n_{k}, i, t}| > S_{3}) & \leq 4 \exp \{- S_{3}^{2} {(64 \frac{n}{N} S_{1} + \frac{8}{3} S_{3} N S_{2})}^{- 1}\} + 4 \frac{n}{N} α_{N} \\ \leq \exp c a_{n} \log n n^{\frac{1}{8}} + 4 \frac{n}{N} α_{N} . \end{matrix}

(A8)

According to the

α

mixing condition,

α_{N} \to 0

when N is large enough. Hence,

\sum_{n = 1}^{\infty} \Pr (\max_{1 \leq k \leq n^{α_{4}}} |\sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} ξ_{n_{k}, i, t}| \geq S_{3}) < \infty .

By the Borel-Cantelli lemma, we have:

\max_{1 \leq k \leq n^{α_{4}}} |\sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} ξ_{n_{k}, i, t}| = O (S_{3}) .

(A9)

Combining (A4) and (A9), we have:

\max_{1 \leq k \leq n^{α_{4}}} |R_{n, k, 1}| = O (S_{3}) .

(A10)

Therefore, Lemma A1 follows. □

Lemma A2 ([11]).

Assume that

φ (β)

and

ϕ (β)

are two measurable functions of

(X, Y)

such that

\sup_{β, θ} | φ (β) - φ (θ) | < | β - θ | ζ a . s .

,

\sup_{β, θ} | ϕ (β) - ϕ (θ) | < | β - θ | ζ a . s .

, with

E ζ^{r} < c

for some

r > 2

, and

E {ϕ (β) | β^{T} X} = 0

for all

β \in B

. Suppose that

(X_{i, t}, Y_{i, t})

,

φ_{i} (β)

, and

ϕ_{i} (β)

,

i = 1, \dots, n

are random copies of

(X, Y)

,

φ (β)

and

ϕ (β)

, respectively. If (C1), (C3), and (C4) hold, then:

\sup_{β \in B} |\frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} [K_{h_{n}} (β^{T} X_{i j}) φ_{j} (β) - E_{j} {K_{h_{n}} (β^{T} X_{i j}) φ_{j} (β)}] ϕ_{i} (β)| = O (ξ_{n}^{2}) .

Lemma A3.

Suppose

E (Z | β^{T} X = β^{T} x) = m (β^{T} x)

and its derivatives up to the second order are bounded for all

β \in B

, and suppose that

{E ∥ Z ∥}^{r}

exists for some

r > 3

. Let

(X_{i, t}, Z_{i, t})

be a sample from

(X, Z)

and

X_{(i, t) x} = X_{i, t} - x

. If conditions (C1), (C3), and (C4) hold, then:

\begin{matrix} n^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t) x}) {(β^{T} X_{(i, t) x} / h_{n})}^{p} Z_{i} & = & f_{β} (β^{T} x) m (β^{T} x) μ_{p} \\ + {f_{β} (β^{T} x) m (β^{T} x)}^{'} μ_{p + 1} h_{n} + O_{P} (κ_{n}), \end{matrix}

where

μ_{p} = \int K (v) v^{p} d v

,

p = 0, 1, \dots

.

Proof of Lemma A3.

Since:

\begin{matrix} E {K_{h_{n}} (β^{T} X_{(i, t) x}) {(β^{T} X_{(i, t) x} / h_{n})}^{p} m (β^{T} X_{i, t})} \\ = & f_{β} (β^{T} x) m (β^{T} x) μ_{p} + {f_{β} (β^{T} x) m (β^{T} x)}^{'} μ_{p + 1} h_{n} + O_{P} (h_{n}^{2}), \end{matrix}

where

μ_{p} = \int v^{p} K (v) d v

, combining it with Lemma A1, we complete the proof of Lemma A3. □

Lemma A4.

Let

Σ_{n}^{β} (x) = n^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t) x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t) x} / h_{n} \end{matrix}) {(\begin{matrix} 1 \\ β^{T} X_{(i, t) x} / h_{n} \end{matrix})}^{T},

and

(\begin{matrix} a_{x}^{β} \\ b_{x}^{β} h_{n} \end{matrix}) = {n Σ_{n}^{β} (x)}^{- 1} \sum_{i = 1}^{n} K_{h_{n}} (β^{T} X_{(i, t) x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t) x} / h_{n} \end{matrix}) Y_{i, t} .

Under conditions (C1)–(C4), we have:

\begin{matrix} a_{x}^{β} & = & g (β_{0}^{T} x) + g^{'} (β_{0}^{T} x) v_{β_{0}}^{T} (x) (β_{0} - β) + \frac{1}{2} g^{″} (β_{0}^{T} x) h_{n}^{2} + Θ_{n, 1} (x) \\ + O_{P} {(h_{n} κ_{n} + δ_{β^{2}}) n^{ϵ}} {(1 + ∥ x ∥}^{4}), \end{matrix}

b_{x}^{β} h_{n} = g^{'} (β_{0}^{T} x) h_{n} + Θ_{n, 2} (x) + O_{P} {(h_{n} κ_{n} + δ_{β^{2}}) n^{ϵ} e_{p}} {(1 + ∥ x ∥}^{4}),

where:

Θ_{n, 1} (x) = {n f_{β} (β^{T} x)}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t) x}) ε_{i, t},

and

Θ_{n, 2} (x) = {n h_{n} f_{β} (β^{T} x)}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t) x}) β^{T} X_{(i, t) x} ε_{i, t} .

Proof of Lemma A4.

By the Taylor expansion of

g (β_{0}^{T} X_{i, t})

at

β_{0}^{T} x

and

β_{0} = β + (β_{0} - β)

, we have:

\begin{matrix} Y_{i, t} & = & g (β_{0}^{T} x) + g^{'} (β_{0}^{T} x) β_{0}^{T} + \frac{1}{2} g^{″} (β_{0}^{T} x) {(β_{0}^{T} X_{(i, t) x})}^{2} + ε_{i, t} + O (| β_{0}^{T} X_{(i, t) x} |^{3}) \\ = & g (β_{0}^{T} x) + g^{'} (β_{0}^{T} x) β_{0}^{T} X_{(i, t) x} + \frac{1}{2} g^{″} (β_{0}^{T} x) {(β_{0}^{T} X_{(i, t) x})}^{2} + ε_{i, t} + Ω_{n} (x, X_{i, t}, β) \\ = & g (β_{0}^{T} x) + g^{'} (β_{0}^{T} x) β^{T} X_{(i, t), x} + \frac{1}{2} g^{″} (β_{0}^{T} x) {(β_{0}^{T} X_{(i, t), x})}^{2} + g^{'} (β_{0}^{T} x) {(β_{0} - β)}^{T} X_{(i, t), x} \\ + ε_{i, t} + Ω_{n} (x, X_{i, t}, β), \end{matrix}

(A11)

where

Ω_{n} (x, X_{i}, β) = O (| β^{T} X_{(i, t), x} |^{3} + | β^{T} X_{(i, t), x} | \cdot ∥ X_{(i, t), x} ∥ δ_{β} + ∥ X_{(i, t), x} ∥^{2} δ_{β}^{2})

. By Equation (A11), it follows that:

\begin{matrix} (\begin{matrix} a_{x}^{β} \\ b_{x}^{β} h_{n} \end{matrix}) & = & {n Σ_{n}^{β} (x)}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t), x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t), x} / h_{n} \end{matrix}) \\ \cdot & {g (β_{0}^{T} x) + g^{'} (β_{0}^{T} x) β^{T} X_{(i, t), x}} \\ + & {n Σ_{n}^{β} (x)}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t), x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t), x} / h_{n} \end{matrix}) \frac{1}{2} g^{″} (β_{0}^{T} x) {(β_{0}^{T} X_{(i, t), x})}^{2} \\ + & {n Σ_{n}^{β} (x)}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t), x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t), x} / h_{n} \end{matrix}) g^{'} (β_{0}^{T} x) {(β_{0} - β)}^{T} X_{(i, t), x} \\ + & {n Σ_{n}^{β} (x)}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t), x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t), x} / h_{n} \end{matrix}) ε_{i, t} \\ + & {n Σ_{n}^{β} (x)}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β^{T} X_{(i, t), x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t), x} / h_{n} \end{matrix}) Ω_{n} (x, X_{i, t}, β) \\ : = & A_{1} + A_{2} + A_{3} + A_{4} + A_{5} . \end{matrix}

(A12)

According to Lemma A3, we have:

Σ_{n}^{β} (x) = (\begin{matrix} f_{β} (β^{T} x) & f_{β}^{'} (β^{T} x) h_{n} \\ f_{β}^{'} (β^{T} x) h_{n} & f_{β} (β^{T} x) \end{matrix}) + O_{p} (κ_{n}) .

If

f_{β} (β^{T} x) > n^{ϵ}

, we have:

{Σ_{n}^{β} (x)}^{- 1} = f_{β}^{- 1} (β^{T} x) \{I - f_{β}^{- 1} (\begin{matrix} 0 & f_{β}^{'} (β^{T} x) h_{n} \\ f_{β}^{'} (β^{T} x) h_{n} & 0 \end{matrix})\} + O_{p} (κ_{n} n^{ϵ}) .

It is easy to check that

A_{1} = (\begin{matrix} g (β_{0}^{T} x) \\ g^{'} (β_{0}^{T} x) h_{n} \end{matrix}) .

(A13)

According to Lemma A3 and Equation (A11), we have:

A_{2} = (\begin{matrix} g^{″} (β_{0}^{T} x) h_{n}^{2} \\ g^{″} (β_{0}^{T} x) f_{β}^{'} (β^{T} x) h_{n}^{3} \end{matrix}) + O_{p} (h_{n} κ_{n} n^{ϵ} e_{d + 1}) .

(A14)

By Lemma A3, we have:

A_{3} = (\begin{matrix} g^{'} (β_{0}^{T} x) v_{β}^{T} (x) (β_{0} - β) \\ O (h_{n} δ_{β}) (1 + ∥ x ∥) \end{matrix}) + O_{p} (κ_{n} δ_{β} n^{ϵ} e_{d + 1}) (1 + ∥ x ∥) .

(A15)

For the noise term, we have:

A_{4} = {n f_{β} (β^{T} x)}^{- 1} \sum_{i = 1}^{n} K_{h_{n}} (β^{T} X_{(i, t), x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t), x} / h_{n} \end{matrix}) ε_{i, t} + O_{p} (κ_{n} δ_{β} n^{ϵ} e_{d + 1}) .

(A16)

It follows from Lemma A2 that:

n^{- 1} \sum_{i = 1}^{n} K_{h_{n}} (β^{T} X_{(i, t), x}) (\begin{matrix} 1 \\ β^{T} X_{(i, t), x} / h_{n} \end{matrix}) | β^{T} X_{(i, t), x} |^{k} ∥ X_{(i, t), x} ∥^{l} = O_{p} (h_{n}^{k}) {(1 + ∥ x ∥}^{l}) .

Then, we have:

A_{5} = O_{p} {(h_{n}^{3} + δ_{β}^{2}) n^{ϵ} e_{d + 1}} {(1 + ∥ x ∥}^{3}) .

(A17)

Thus, Lemma A4 follows from (A13)–(A17). □

Lemma A5.

Under conditions (C1)–(C4), we have:

\begin{matrix} {\{n^{- 2} \sum_{j \in I_{n}} \sum_{τ \in T_{n}} \sum_{i \in I_{n}} \sum_{t \in T_{n}} {\hat{ρ}}_{j, τ}^{β} {(b_{j, τ}^{β})}^{2} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) X_{(i, t), (j, τ)} X_{(i, t), (j, τ)}^{T} / {\hat{f}}_{β} (β^{T} X_{j, τ})\}}^{- 1} \\ = & β_{0} β_{0}^{T} {[E {g^{'} {(β_{0}^{T} X)}^{2}}]}^{- 1} h_{n}^{- 2} - \frac{1}{2} {[E {g^{'} {(β_{0}^{T} X)}^{2}}]}^{- 1} (β_{0} F^{T} W_{g 0}^{+} + W_{g 0}^{+} F β_{0}^{T}) + \frac{1}{2} W_{g 0}^{+} \\ + O_{P} {(κ_{n} h_{n} n^{ϵ} + δ_{β} / h_{n}^{2}) E_{d}}, \end{matrix}

where

W_{g 0} = E {g^{'} {(β_{0}^{T} X)}^{2} v_{β_{0}} (X) v_{β_{0}}^{T} (X)}

and

F = E {\frac{{[g^{'} (β_{0}^{T} X)]}^{2} {(f_{β_{0}} (β_{0}^{T} X) v_{β_{0}} (X))}^{'}}{f_{β_{0}} (β_{0}^{T} X)}}

.

Proof of Lemma A5.

According to condition (C1), we have:

\begin{matrix} \sum_{i \in I_{n}} \sum_{t \in T_{n}} P (⋃_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {X_{i, t} \notin D_{n}}) & \leq & \sum_{i \in I_{n}} \sum_{t \in T_{n}} n P (X_{i, t} \notin D_{n}) \leq \sum_{i \in I_{n}} \sum_{t \in T_{n}} n P (| X_{i, t} | \geq n^{c}) \\ < & \sum_{i \in I_{n}} \sum_{t \in T_{n}} n n^{- 6 c} E {| X |}^{6} < \infty . \end{matrix}

By the Borel–Cantelli lemma, we have:

P (⋃_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {X_{i, t} \notin D_{n}}, i . o .) = 0 .

(A18)

Let

{\tilde{D}}_{n} = {x : f_{β} (β^{T} x) > 2 n^{- ϵ}}

. Similarly, we have:

P (⋃_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {X_{i, t} \notin {\tilde{D}}_{n}}, i . o .) = 0 .

(A19)

Hence, we can exchange summations over

{X_{j, τ} : j \in I_{n}, τ \in T_{n}}

,

{X_{j, τ} : X_{j, τ} \in D_{n}, j \in I_{n}, τ \in T_{n}}

,

{X_{j, τ} : X_{j, τ} \in {\tilde{D}}_{n}, j \in I_{n}, τ \in T_{n}}

in the sense of almost sure consistently. By Lemma A3 and condition (C1), we have:

\begin{matrix} {\hat{f}}_{β} (β^{T} x) = f_{β} (β^{T} x) + O (κ_{n}) . \end{matrix}

By Equation (A19) and the smoothness of

ρ_{n} (\cdot)

with

| ρ_{n}^{'} (\cdot) | < c n^{ϵ}

for some

c > 0

, we have:

ρ_{n} ({\hat{f}}_{β} (β^{T} x)) = ρ_{n} (f_{β} (β^{T} x)) + O (κ_{n} n^{ϵ}) = 1 + O (κ_{n} n^{ϵ}) .

(A20)

Denote by

(β, D)

an orthogonal matrix. According to Lemma A3, we have:

\begin{matrix} n^{- 1} \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, t), x}) β^{T} X_{(i, t), x} X_{(i, t), x}^{T} β & = & f_{β} (β^{T} x) h_{n}^{2} + O_{p} (h_{n}^{2} κ_{n}), \\ n^{- 1} \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, t), x}) β^{T} X_{(i, t), x} X_{(i, t), x}^{T} D & = & f_{β}^{'} (β^{T} x) v_{β}^{T} (x) D h_{n}^{2} + O_{p} (h_{n}^{2} κ_{n} e_{p - 1}), \\ n^{- 1} \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, t), x}) D^{T} X_{(i, t), x} X_{(i, t), x}^{T} D & = & f_{β} (β^{T} x) C^{β} (x) + O_{p} (κ_{n} E_{p - 1}) {(1 + | x |}^{2}), \end{matrix}

where

C^{β} (x) = D^{T} {\tilde{w}}_{β} (x) D

,

{\tilde{w}}_{β} (x) = w_{β} (x) - x μ_{β}^{T} (x) - μ_{β} (x) x^{T} + x x^{T}

. It follows from (A19) that:

\begin{matrix} {n {\hat{f}}_{β} (β^{T} x)}^{- 1} \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, t), x}) X_{(i, t), x} X_{(i, t), x}^{T} \\ = & (β, D) (\begin{matrix} h_{n}^{2} & F_{β}^{T} (x) D h_{n}^{2} \\ D^{T} F_{β} (x) h_{n}^{2} & C^{β} (x) \end{matrix}) {(β, D)}^{T} + O_{p} (h_{n} κ_{n} n^{ϵ} E_{d}) {(1 + ∥ x ∥}^{2}), \end{matrix}

where

F_{β} (x) = f_{β}^{'} (β^{T} x) v_{β} (x) / f_{β} (β^{T} x)

. According to (A20), we have:

b_{x}^{β} = g^{'} (β_{0}^{T} x) + O (κ_{n} n^{2 ϵ} / h_{n}) {(1 + ∥ x ∥}^{4})

. Note that

E {g^{'} {(β_{0}^{T} X_{j, τ})}^{2} C^{β_{0}} (X_{j, τ})} = 2 D_{0}^{T} W_{g 0} D_{0}

. Thus,

\begin{matrix} n^{- 2} \sum_{i \in I_{n}} \sum_{t \in T_{n}} \sum_{j \in I_{n}} \sum_{τ \in T_{n}} {\hat{ρ}}_{j, τ}^{β} {(b_{j, τ}^{β})}^{2} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) X_{(i, t), (j, τ)} X_{(i, t), (j, τ)}^{T} / {\hat{f}}_{β} (β^{T} X_{j, τ}) \\ = & n^{- 1} \sum_{X_{j, τ} \in D_{n}} ρ_{n} ({\hat{f}}_{β} (β^{T} X_{j, τ})) {(b_{j, τ}^{β})}^{2} {n {\hat{f}}_{β} (β^{T} X_{j, τ})}^{- 1} \\ \cdot & \sum_{i \in I_{n}} \sum_{t \in T_{n}} K_{h_{n}} (β^{T} X_{(i, t), x}) X_{(i, t), (j, τ)} X_{(i, t), (j, τ)}^{T} \\ = & (β, D) n^{- 1} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \{g^{'} {(β_{0}^{T} X_{j, τ})}^{2} (\begin{matrix} h_{n}^{2} & F_{β}^{T} (X_{j, τ}) D h_{n}^{2} \\ D^{T} F_{β} (X_{j τ}) h_{n}^{2} & C^{β} (X_{j, τ}) \end{matrix}) + O (h_{n} κ_{n} n^{ϵ}) (1 + | X_{j, τ} |^{6})\} {(β, D)}^{T} \\ = & (β, D) (\begin{matrix} E {g^{'} {(β_{0}^{T} X)}^{2}} h_{n}^{2} & F^{T} D h_{n}^{2} \\ D^{T} F h_{n}^{2} & 2 D^{T} W_{g 0} D \end{matrix}) {(β, D)}^{T} + O (h_{n} κ_{n} n^{ϵ} + δ_{β} E_{d}), \end{matrix}

where

F = E {{[g^{'} (β_{0}^{T} X)]}^{2} F_{β_{0}} (X)}

. Therefore, by the matrix inversion formula in blocks [27], we have:

\begin{matrix} {\{n^{- 2} \sum_{i \in I_{n}} \sum_{t \in T_{n}} \sum_{j \in I_{n}} \sum_{τ \in T_{n}} {\hat{ρ}}_{j, τ}^{β} {(b_{j, τ}^{β})}^{2} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) X_{(i, t), (j, τ)} X_{(i, t), (j, τ)}^{T} / {\hat{f}}_{β} (β^{T} X_{j})\}}^{- 1} \\ = & β β^{T} {[E {g^{'} {(β_{0}^{T} X_{j, τ})}^{2}}]}^{- 1} h_{n}^{- 2} - \frac{1}{2} {[E {g^{'} {(β_{0}^{T} X_{j, τ})}^{2}}]}^{- 1} (D H D^{T} F β^{T} + β F^{T} D^{T} H D) \\ + \frac{1}{2} D H D^{T} + O {h_{n}^{- 2} (h_{n} κ_{n} n^{ϵ} + δ_{β}) E_{d}}, \end{matrix}

where

H = {(D^{T} W_{g 0} D)}^{- 1}

. Note that

D = D_{0} + δ_{β}

. Then,

H = {(D_{0}^{T} W_{g 0} D_{0})}^{- 1} + δ_{β}

. By the definition of the Moore–Penrose inverse, we have:

D_{0} {(D_{0}^{T} W_{g 0} D_{0})}^{- 1} D_{0}^{T} = W_{g 0}^{+} .

Combining the facts that

D D^{T} = I - β β^{T} = I - β_{0} β_{0}^{T} + O (δ_{β})

and

β_{0}^{T} W_{g 0} = 0

, we complete the proof. □

Lemma A6.

Under conditions (C1)–(C4), we have:

\begin{matrix} n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)} (Y_{i, t} - a_{j, τ}^{β} - b_{j, τ}^{β} β_{0}^{T} X_{(i, t), (j, τ)})) / {\hat{f}}_{β} (β^{T} X_{j, τ}) \\ = & W_{g 0} (β - β_{0}) + n^{- 1} \sum_{i \in I_{n}} \sum_{t \in T_{n}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} + o_{P} (n^{- 1 / 2} e_{d}) . \end{matrix}

Proof of Lemma A6.

According to Lemma A4, we have:

\begin{matrix} n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)} (Y_{i} - a_{j, τ}^{β} - b_{j, τ}^{β} β_{0}^{T} X_{(i, t), (j, τ)}) / {\hat{f}}_{β} (β^{T} X_{j, τ}) \\ = & n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} {\hat{f}}_{β}^{- 1} (β^{T} X_{j, τ}) K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)} ε_{i, t} \\ + n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} {\hat{f}}_{β}^{- 1} (β^{T} X_{j, τ}) K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)} g^{'} (β_{0}^{T} X_{j, τ}) v_{β}^{T} (X_{j, τ}) (β - β_{0}) \\ + n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} {\hat{f}}_{β}^{- 1} (β^{T} X_{j, τ}) K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)} \frac{1}{2} g^{″} (β_{0}^{T} X_{j, τ}) \\ \times {{(β^{T} X_{(i, t), (j, τ)})}^{2} - h_{n}^{2}} \\ - n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} {\hat{f}}_{β}^{- 1} (β^{T} X_{j, τ}) K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) \\ \times b_{j, τ}^{β} X_{(i, t), (j, τ)} {Θ_{n, 1} (X_{j, τ}) + Θ_{n, 2} (X_{j, τ}) β_{0}^{T} X_{(i, t), (j, τ)} / h_{n}} \\ + n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} {\hat{f}}_{β}^{- 1} (β^{T} X_{j, τ}) K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)} \\ \times O {(h_{n} κ_{n} + δ_{β}^{2}) n^{ϵ}} (1 + | X_{j, τ} |^{4}) (1 + | β_{0}^{T} X_{(i, t), (j, τ)} | / h_{n}) \\ : = & B_{1} + B_{2} + B_{3} + B_{4} + B_{5} . \end{matrix}

Denote by

U_{(i, t), (j, τ)} = ρ_{n} (f_{β} (β^{T} X_{j, τ})) K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)}

. We have

E {E_{j, τ} (U_{(i, t), (j, τ)}) - g^{'} (β_{0}^{T} X_{j, τ}) v_{β} (X_{j, τ})}^{2} = O {{(h_{n} + δ_{β})}^{2}}

. Note that

Θ_{n, 1} = O (κ_{n} n^{ϵ})

and

Θ_{n, 2} = O (κ_{n} n^{ϵ})

. By Lemmas A1 and A2, we have:

\begin{matrix} B_{1} & = & n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {U_{(i, t), (j, τ)} - E_{j} (U_{(i, t), (j, τ)})} ε_{i, t} + n^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} E_{j} (U_{(i, t), (j, τ)}) ε_{i, t} + o_{d} (n^{- 1 / 2} e_{d}) \\ = & n^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β} (X_{i, t}) ε_{i} + o_{p} (n^{- 1 / 2} e_{d}) . \end{matrix}

(A21)

According to Lemma A3, (A18) and (A19), we have:

B_{2} = W_{g 0} (β - β_{0}) + O_{p} {(κ_{n} + δ_{β}) e_{d} / h_{n}} .

(A22)

By Lemmas A3 and A4, we have:

B_{3} = O_{p} {(h_{n}^{2} κ_{n} n^{ϵ} e_{d}} .

(A23)

Letting

V_{(i, t), (j, τ)} = K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) X_{(i, t), (j, τ)}

, then we have

E_{i} (V_{(i, t), (j, τ)}) = v_{β} (X_{j, τ}) + O_{p} (h_{n}^{2} e_{d}) (1 + ∥ X_{j, τ} ∥)

. Note that

E {v_{β} (X_{j, τ})} = 0

. According to Lemma A2, we have:

\begin{matrix} B_{4} & = & n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} ρ_{n} (f_{β} (β^{T} X_{j, τ})) V_{(i, t), (j, τ)} g^{'} (β^{T} X_{j, τ}) f_{β}^{- 1} (β^{T} X_{j, τ}) \\ \times & {Θ_{n, 1} (X_{j, τ}) + Θ_{n, 2} (X_{j, τ}) β_{0}^{T} X_{(i, t) (j, τ)} / h_{n}} \\ + O_{p} (n^{- 1 / 2} e_{d}) \\ = & n^{- 1} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} ρ_{n} (f_{β} (β^{T} X_{j, τ})) v_{β} (X_{j, τ}) g^{'} (β^{T} X_{j, τ}) f_{β}^{- 1} (β^{T} X_{j, τ}) \\ \times & {Θ_{n, 1} (X_{j, τ}) + Θ_{n, 2} (X_{j, τ}) β_{0}^{T} X_{(i, t), (j, τ)} / h_{n}} \\ + O_{p} (n^{- 1 / 2} e_{d}) \\ = & O_{p} (n^{- 1 / 2}) e_{d} . \end{matrix}

(A24)

By Lemma A1 and Corollary 2 in [28] ( p. 122), we have:

B_{5} = O_{p} {(h_{n} κ_{n} + δ_{β}^{2} e_{d}) n^{ϵ}} O (1) n^{- 1} \sum_{j \in I_{n}} \sum_{τ \in T_{n}} {| X_{j, τ} |}^{5} = O_{p} {(h_{n} κ_{n} + δ_{β}^{2}) n^{ϵ} e_{d}} .

(A25)

Combining (A21)–(A25), we complete the proof of this lemma. □

Lemma A7.

According to Theorem 3 in [29], for

δ < 1 - s^{- 1}

lim_{n \to + \infty} n^{2 δ - 1} \tilde{h} = + \infty

, we have:

sup_{x \in R^{d}} | \frac{1}{n} \sum_{i \in I_{n}} \sum_{t \in T_{n}} [{\tilde{K}}_{\tilde{h}} ({\tilde{X}}_{i, t} - x) {\tilde{Y}}_{i, t} - E ({\tilde{K}}_{\tilde{h}} ({\tilde{X}}_{i, t} - x) {\tilde{Y}}_{i, t})] | = O_{p} [(\frac{l o g (n)}{n {\tilde{h}}^{d}})] .

Lemma A8.

Under conditions (C1)–(C5), we have:

sup_{x \in R^{d}} |σ_{β_{0}}^{2} (x) - \sum_{i \in I_{n}} \sum_{t \in T_{n}} {\tilde{w}}_{i, t} (x) σ_{β_{0}}^{2} (x)| = O_{p} \{{\tilde{h}}^{2 d} + {[\frac{l o g (1 / {\tilde{h}}^{d})}{n {\tilde{h}}^{d}}]}^{\frac{1}{2}}\},

where:

{\tilde{w}}_{i, t} (x) = \frac{{\tilde{K}}_{\tilde{h}} (X_{i, t} - x)}{\sum_{j \in I_{n}} \sum_{τ \in T_{n}} {\tilde{K}}_{\tilde{h}} (X_{j, τ} - x)}, j \in I_{n}, τ \in T_{n} .

Proof of Theorem 1.

For any

x \in R^{d}

, let:

ϖ_{j, τ} (x) = \frac{L_{l_{n}} (X_{j, τ} - x)}{\sum_{i \in I_{n}} \sum_{t \in T_{n}} L_{l_{n}} (X_{i, t} - x)} .

According to Equations (9) and (10), by Lemma A4, we have:

\begin{matrix} {\hat{σ}}_{{\hat{β}}_{n}}^{2} (x) \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} {\hat{ε}}_{j, τ}^{2} ϖ_{j, τ} (x) = \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} {(Y_{j, τ} - a_{i, t}^{{\hat{β}}_{n}})}^{2} ϖ_{j, τ} (x) \\ = & \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} {[Y_{j, τ} - g (β_{0}^{T} X_{j, τ}) - g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) - \frac{1}{2} g^{″} (β_{0}^{T} X_{j, τ}) h_{n}^{2} - Θ_{n, 1} (X_{j, τ})]}^{2} \\ \cdot & ϖ_{j, τ} (x) \\ = & \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} ε_{j, τ}^{2} ϖ_{j, τ} (x) + \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} {(β_{0}^{T} X_{j, τ})}^{2} v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) ϖ_{j, τ} (x) \\ + & \frac{1}{4} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{″} {(β_{0}^{T} X_{j, τ})}^{2} h_{n}^{4} ϖ_{j, τ} (x) + \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} Θ_{n, 1} {(X_{j, τ})}^{2} ϖ_{j, τ} (x) \\ - & 2 \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) ε_{j, τ} ϖ_{j, τ} (x) - \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{″} (β_{0}^{T} X_{j, τ}) h_{n}^{2} ε_{j, τ} ϖ_{j, τ} (x) \\ - & 2 \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} Θ_{n, 1} (X_{j, τ}) ε_{j, τ} ϖ_{j, τ} (x) + \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{″} (β_{0}^{T} X_{j, τ}) h_{n}^{2} Θ_{n, 1} (X_{j, τ}) ϖ_{j, τ} (x) \\ + & \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) g^{″} (β_{0}^{T} X_{j, τ}) h_{n}^{2} ϖ_{j, τ} (x) \\ + & 2 \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) Θ_{n, 1} (X_{j, τ}) ϖ_{j, τ} (x) \\ : = & C_{1} + C_{2} + C_{3} + C_{4} + C_{5} + C_{6} + C_{7} + C_{8} + C_{9} + C_{10} . \end{matrix}

(A26)

Thus, we can obtain from the results in Lemmas A7 and A8 that:

\begin{matrix} sup_{x \in R^{d}} |C_{1} - σ_{β_{0}}^{2} (x)| = sup_{x \in R^{d}} |C_{1} - \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} ϖ_{j, τ} (x) σ_{β_{0}}^{2} (X_{j, τ}) + \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} ϖ_{j, τ} (x) σ_{β_{0}}^{2} (X_{j, τ}) - σ_{β_{0}}^{2} (x)| \\ \leq & sup_{x \in R^{d}} |C_{1} - \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} ϖ_{j, τ} (x) σ_{β_{0}}^{2} (X_{j, τ})| + sup_{x \in R^{d}} |\sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} ϖ_{j, τ} (x) σ_{β_{0}}^{2} (X_{j, τ}) - σ_{β_{0}}^{2} (x)| \\ = & O_{p} (l_{n}^{2 d} + {[\frac{l o g (1 / l_{n}^{d})}{n l_{n}^{d}}]}^{\frac{1}{2}}) . \end{matrix}

(A27)

Since

{\hat{β}}_{n} \in B_{n}

, applying the result of Lemma A7 obtains:

\begin{matrix} sup_{x \in R^{d}} | C_{2} | & = sup_{x \in R^{d}} |\sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} {(β_{0}^{T} X_{j, τ})}^{2} v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) ϖ_{j, τ} (x)| \\ = O_{p} (n^{- 1 + 2 c_{0}}), \end{matrix}

(A28)

sup_{x \in R^{d}} | C_{3} | = sup_{x \in R^{d}} |\frac{1}{4} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{″} {(β_{0}^{T} X_{j, τ})}^{2} h_{n}^{4} ϖ_{j, τ} (x)| = O_{p} (h_{n}^{4}),

(A29)

sup_{x \in R^{d}} | C_{4} | = sup_{x \in R^{d}} |\sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} Θ_{n, 1} {(X_{j, τ})}^{2} ϖ_{j, τ} (x)| = O_{p} (\frac{l o g (n)}{n l_{n}^{d}}),

(A30)

\begin{matrix} sup_{x \in R^{d}} | C_{5} | & = sup_{x \in R^{d}} |2 \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) ε_{j, τ} ϖ_{j, τ} (x)| \\ = O_{p} [n^{- 1 + 2 c_{0}} \cdot {(\frac{l o g (n)}{n l_{n}^{d}})}^{\frac{1}{2}}], \end{matrix}

(A31)

sup_{x \in R^{d}} | C_{6} | = sup_{x \in R^{d}} |- \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{″} (β_{0}^{T} X_{j, τ}) h_{n}^{2} ε_{j, τ} ϖ_{j, τ} (x)| = O_{p} (h_{n}^{2}) \cdot O_{p} [{(\frac{l o g (n)}{n l_{n}^{d}})}^{\frac{1}{2}}],

(A32)

sup_{x \in R^{d}} | C_{7} | = sup_{x \in R^{d}} |- 2 \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} Θ_{n, 1} (X_{j, τ}) ε_{j, τ} ϖ_{j, τ} (x)| = O_{p} (\frac{l o g (n)}{n l_{n}^{d}}),

(A33)

sup_{x \in R^{d}} | C_{8} | = sup_{x \in R^{d}} |\sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{″} (β_{0}^{T} X_{j, τ}) h_{n}^{2} Θ_{n, 1} (X_{j, τ}) ϖ_{j, τ} (x)| = O_{p} (h_{n}^{2}) \cdot O_{p} [{(\frac{l o g (n)}{n l_{n}^{d}})}^{\frac{1}{2}}],

(A34)

\begin{matrix} sup_{x \in R^{d}} | C_{9} | & = sup_{x \in R^{d}} |\sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) g^{″} (β_{0}^{T} X_{j, τ}) h_{n}^{2} ϖ_{j, τ} (x)| \\ = O_{p} (h_{n}^{2} \cdot n^{- 1 + 2 c_{0}}), \end{matrix}

(A35)

\begin{matrix} sup_{x \in R^{d}} | C_{10} | & = sup_{x \in R^{d}} |2 \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}}^{T} (X_{j, τ}) (β_{0} - {\hat{β}}_{n}) Θ_{n, 1} (X_{j, τ}) ϖ_{j, τ} (x)| \\ = O_{p} (n^{- 1 + 2 c_{0}}) \cdot O_{p} [{(\frac{l o g (n)}{n l_{n}^{d}})}^{\frac{1}{2}}] . \end{matrix}

(A36)

Thus, combining (A26)–(A36), we complete the proof of Theorem 1. □

Proof of Theorem 2.

Theorem 2 can be proven in the same way as Theorem 1, so it is omitted here. □

Proof of Theorem 3.

After one iteration, by Lemmas A5 and A6, we have that the new

β^{(1)}

is:

\begin{matrix} β^{(1)} & = & β_{0} + {\{\sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} {(b_{j, τ}^{β})}^{2} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) X_{(i, t), (j, τ)} X_{(i, t), (j, τ)}^{T} / {\hat{f}}_{β} (β^{T} X_{j, τ})\}}^{- 1} \\ \times & \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\hat{ρ}}_{j, τ}^{β} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) X_{(i, t), (j, τ)} (Y_{i, t} - a_{j, τ}^{β} - b_{j, τ}^{β} β_{0}^{T} X_{(i, t), (j, τ)}) / {\hat{f}}_{β} (β^{T} X_{j, τ}) \\ = & β_{0} + h_{n}^{- 2} {E {[g^{'} (β_{0}^{T} X)]}^{2}}^{- 1} β_{0} β_{0}^{T} W_{g 0} (β - β_{0}) \\ + h_{n}^{- 2} {E {[g^{'} (β_{0}^{T} X)]}^{2}}^{- 1} \frac{1}{n} \sum_{i = 1}^{n} g^{'} (β_{0}^{T} X_{i, t}) β_{0} β_{0}^{T} v_{β_{0}} (X_{i, t}) ε_{i, t} \\ - \frac{1}{2} {E {[g^{'} (β_{0}^{T} X)]}^{2}}^{- 1} β_{0} F^{T} W_{g 0}^{+} W_{g 0} (β - β_{0}) \\ - \frac{1}{2 n} {E {[g^{'} (β_{0}^{T} X)]}^{2}}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) β_{0} F^{T} W_{g 0}^{+} v_{β_{0}} (X_{i, t}) ε_{i, t} \\ - \frac{1}{2} {E {[g^{'} (β_{0}^{T} X)]}^{2}}^{- 1} W_{g 0}^{+} F β_{0}^{T} W_{g 0} (β - β_{0}) \\ - \frac{1}{2 n} {E {[g^{'} (β_{0}^{T} X)]}^{2}}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) W_{g 0}^{+} F β_{0}^{T} v_{β_{0}} (X_{i, t}) ε_{i, t} \\ + \frac{1}{2} W_{g 0}^{+} W_{g 0} (β - β_{0}) + \frac{1}{2 n} W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} \\ = & β_{0} + \frac{1}{2} (I - β_{0} β_{0}^{T}) (β - β_{0}) + \frac{1}{2 n} W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} + O_{p} (n^{- \frac{1}{2}} e_{d}); \end{matrix}

the last equation holds because

β_{0}^{T} v_{β_{0}} (X_{i, t}) = 0

,

β_{0}^{T} W_{g 0} = 0

,

W_{g 0}^{+} W_{g 0} = I - β_{0} β_{0}^{T}

, and

\frac{1}{2 n} F^{T} W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} + \frac{1}{2} F^{T} W_{g 0}^{+} W_{g 0} (β - β_{0}) = O_{p} (e_{d}) .

It is easy to check that

∥ β^{(1)} ∥ = 1 + o (1)

, so then

β^{(1)} / ∥ β^{(1)} ∥ = β^{(1)}

. Let

β^{(k)}

be the value of

β

after k iterations. We have:

β^{(k + 1)} = β_{0} + \frac{1}{2} (I - β_{0} β_{0}^{T}) (β^{(k)} - β_{0}) + \frac{1}{2 n} W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} + + O_{p} (n^{- \frac{1}{2}} e_{d}) .

Recursing the preceding equation, we have, as the iteration

k \to \infty

:

\begin{matrix} β^{(k)} & = & β_{0} + \frac{1}{2^{k - 1}} (I - β_{0} β_{0}^{T}) (β^{(1)} - β_{0}) + (1 - \frac{1}{2^{k - 1}}) \frac{1}{n} W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} \\ \to & β_{0} + \frac{1}{n} W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} . \end{matrix}

The following lemmas are used to prove Theorem 3. □

Lemma A9.

(i) Suppose (A1) holds, and let

L_{l_{1}} (F)

denote the class of

F

—measurable r.v’s X satisfying

{∥ X ∥}_{l_{1}} = {(E | X |}^{l_{1}})^{{\frac{1}{l}}_{1}} < \infty

. Let

X \in L_{l_{1}} (B (E)), Y \in L_{l_{2}} (B (E))

. Suppose

1 \leq l_{1}, l_{2}, l_{3} < \infty,

and

l_{1}^{- 1} + l_{2}^{- 1} + l_{3}^{- 1} = 1

; then:

| E (X Y) - E X E Y | \leq {c ∥ X ∥}_{l_{1}} {∥ Y ∥}_{l_{2}} {ψ_{1} (C a r d E, C a r d E^{'}) φ_{1} (d (E, E^{'}))}^{\frac{1}{l_{3}}} .

(A37)

(ii) If, moreover,

| X |

and

| Y |

are

P_{- a . s}

bounded, the right-hand side of (A37) can be replaced with

c ψ_{1} (C a r d E, C a r d E^{'}) φ_{1} (d (E, E^{'}))

.

Lemma A10.

(i) Suppose (A2) holds, and let

L_{l_{1}} (F)

denote the class of

F

—measurable r.v’s X satisfying

{∥ X ∥}_{l_{1}} = {(E | X |}^{l_{1}})^{\frac{1}{l_{1}}} < \infty

. Let

X \in L_{l_{1}} (B (G)), Y \in L_{l_{2}} (B (G))

. Suppose

1 \leq l_{1}, l_{2}, l_{3} < \infty, a n d l_{1}^{- 1} + l_{2}^{- 1} + l_{3}^{- 1} = 1

; then:

| E (X Y) - E X E Y | \leq {c ∥ X ∥}_{l_{1}} {∥ Y ∥}_{l_{2}} {ψ_{2} (C a r d G, C a r d G^{'}) φ_{2} (d (G, G^{'}))}^{\frac{1}{l_{3}}} .

(A38)

(ii) If, moreover,

| X |

and

| Y |

are

P_{- a . s}

bounded, the right-hand side of (A38) can be replaced with

c ψ_{2} (C a r d G, C a r d G^{'}) φ_{2} (d (G, G^{'}))

.

Lemma A11.

Under assumptions (C1) and (C2), one has:

E [\sqrt{n} (β^{(k)} - β_{0})] = 0 .

(A39)

Proof of Lemma A11.

\begin{matrix} E [\sqrt{n} (β^{(k)} - β_{0})] & = \sqrt{n} E (β_{0} + \frac{1}{n} W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} - β_{0}) \\ = \frac{1}{\sqrt{n}} E W_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} \\ = 0 . \end{matrix}

□

Lemma A12.

Under assumptions (C1)–(C5), one has:

V a r [\sqrt{n} (β^{(k)} - β_{0})] = W_{g 0}^{+} Δ W_{g 0}^{+},

(A40)

where

W_{g 0} = E {g^{'} {(β_{0}^{T} X)}^{2} W_{0} (X)}

and

Δ = E {g^{'} {(β_{0}^{T} X)}^{2} W_{0} (X) σ_{β_{0}}^{2} (X)}

.

Proof of Lemma A12.

\begin{matrix} V a r [\sqrt{n} (β^{(k)} - β_{0})] & = \frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} V a r (W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} - β_{0}) \\ + \frac{1}{n} \sum_{\begin{matrix} i \neq j \\ t \neq τ \end{matrix}} E W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} W_{g 0}^{+} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}} (X_{j, τ}) ε_{j, τ} \\ = J_{1} + J_{2}, \end{matrix}

\begin{matrix} J_{1} & = \frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} V a r (W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t}) \\ = V a r (W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} - β_{0}) \\ = W_{g 0}^{+} E {g^{'} {(β_{0}^{T} X)}^{2} W_{0} (X) σ_{β_{0}}^{2} (X)} W_{g 0}^{+} . \end{matrix}

By the boundedness of

(W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t})

for

i \in I_{n}, t \in T_{n}

, and then by the mixing condition, and similar to the proof of

I_{22} \to 0

in Lemma 4.4 in [5],

\begin{matrix} J_{2} & = \frac{1}{n} \sum_{\begin{matrix} i \neq j \\ t \neq τ \end{matrix}} E W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} W_{g 0}^{+} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}} (X_{j, τ}) ε_{j, τ} \\ \leq \frac{c}{n} \sum_{\begin{matrix} i \neq j \\ t \neq τ \end{matrix}} E ε_{i, t} ε_{j, τ} \to 0 . \end{matrix}

□

Lemma A13.

Under assumptions (C1)–(C5), one has:

\sqrt{n} (β^{(k)} - β_{0}) - E [\sqrt{n} (β^{(k)} - β_{0})] \overset{D}{⟶} N (0, W_{g 0}^{+} Δ W_{g 0}^{+}) .

(A41)

Proof of Lemma A13.

To simplify, let

H_{n} = W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t}

,

\sqrt{n} (β^{(k)} - β_{0}) - E [\sqrt{n} (β^{(k)} - β_{0})] = \frac{1}{\sqrt{n}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} (H_{n} - E H_{n})

. Then, let

\frac{1}{\sqrt{n}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} (H_{n} - E H_{n}) = \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} ▵_{i, t} .

Let us now introduce a space–time block decomposition, which has been used by [30]. Fix integers

p_{k} = O {(\log n_{k})}^{\frac{1}{8 (N + 1)}}, k = 1, \dots, N + 1, q = O {(\log n_{N + 1})}^{\frac{1}{16 (N + 1)}}, k = 1, \dots, N + 1

, and assume that, for some integer r,

n_{k} = r (p_{k} + q) a n d w e h a v e lim_{n \to \infty} \frac{p_{k}}{q} = \infty, k = 1, \dots, N + 1 .

The random variables

▵_{i, t}

are now set into blocks of different sizes. Let:

\begin{matrix} U (1, n, j) & = & \sum_{\begin{matrix} i_{k} = j_{k} (p_{k} + q) + 1 \\ k = 1, \dots, N \end{matrix}}^{j_{k} (p_{k} + q) + p_{k}} \sum_{t = j_{N + 1} (p_{N + 1} + q) + 1}^{j_{N + 1} (p_{N + 1} + q) + p_{N + 1}} ▵_{i, t}, \\ U (2, n, j) & = & \sum_{\begin{matrix} i_{k} = j_{k} (p_{k} + q) + 1 \\ k = 1, \dots, N \end{matrix}}^{j_{k} (p_{k} + q) + p_{k}} \sum_{t = j_{N + 1} (p_{N + 1} + q) + p_{N + 1} + 1}^{(j_{N + 1} + 1) (p_{N + 1} + q)} ▵_{i, t}, \\ U (3, n, j) & = & \sum_{\begin{matrix} i_{k} = j_{k} (p_{k} + q) + 1 \\ k = 1, \dots, N - 1 \end{matrix}}^{j_{k} (p_{k} + q) + p_{k}} \sum_{i_{N} = j_{N} (p_{N} + q) + p_{N} + 1}^{(j_{N} + 1) (p_{N + 1} + q)} \sum_{t = j_{N + 1} (p_{N + 1} + q) + 1}^{j_{N + 1} (p_{N + 1} + q) + p_{N + 1}} ▵_{i, t}, \\ U (4, n, j) & = & \sum_{\begin{matrix} i_{k} = j_{k} (p_{k} + q) + 1 \\ k = 1, \dots, N - 1 \end{matrix}}^{j_{k} (p_{k} + q) + p_{k}} \sum_{i_{N} = j_{N} (p_{N} + q) + p_{N} + 1}^{(j_{N} + 1) (p_{N} + q)} \sum_{t = j_{N + 1} (p_{N + 1} + q) + p_{N + 1} + 1}^{(j_{N + 1} + 1) (p_{N + 1} + q)} ▵_{i, t}, \end{matrix}

and so on. Note that:

U (2^{N + 1} - 1, n, j) = \sum_{\begin{matrix} i_{k} = j_{k} (p_{k} + q) + p_{k} + 1 \\ k = 1, \dots, N \end{matrix}}^{(j_{k} + 1) (p_{k} + q)} \sum_{t = j_{N + 1} (p_{N + 1} + q) + 1}^{j_{N + 1} (p_{N + 1} + q) + p_{N + 1}} ▵_{i, t} .

Finally,

U (2^{N + 1}, n, j) = \sum_{\begin{matrix} i_{k} = j_{k} (p_{k} + q) + p_{k} + 1 \\ k = 1, \dots, N \end{matrix}}^{(j_{k} + 1) (p_{k} + q)} \sum_{t = j_{N + 1} (p_{N + 1} + q) + p_{N + 1} + 1}^{(j_{N + 1} + 1) (p_{N + 1} + q)} ▵_{i, t} .

Setting

R = {0, \dots, r - 1} \times \dots \times {0, \dots, r - 1}

, we define for each integer

i = 1, \dots, 2^{N + 1}

,

T (n, i) = \sum_{j \in R} U (i, n, j) .

Then, with this notation, we obtain the decomposition

\frac{1}{\sqrt{n}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} (H_{n} - E H_{n}) = \sum_{i = 1}^{2^{N + 1}} T (n, i) .

Note that

T (n, 1)

is the sum of the random variables

▵_{i, t}

over “large” blocks, whereas

T (n, i), 2 \leq i \leq 2^{N + 1}

are sums over “small” blocks. If it is not the case that

n_{k} = r (p_{k} + q)

for r, then an additional term

T (n, 2^{N + 1} + 1)

, say, containing all the

▵_{i, t}

terms that are not included in the big or small blocks, can be considered. This term will not change the proof much.

The main idea is to show that, as

n \to \infty

:

\begin{matrix} Q_{1} = | E \exp [i u T (n, 1)] - \prod_{\begin{matrix} j_{k} = 0 \\ k = 1, \dots, N + 1 \end{matrix}}^{r - 1} E \exp [i u U (1, n, j)] | \to 0, \end{matrix}

(A42)

\begin{matrix} Q_{2} = {(\sum_{i = 2}^{2^{N + 1}} T (n, i))}^{2} \to 0, \end{matrix}

(A43)

\begin{matrix} Q_{3} = \sum_{\begin{matrix} j_{k} = 0 \\ k = 1, \dots, N + 1 \end{matrix}}^{r - 1} E {[U (1, n, j)]}^{2} \to W_{g 0}^{+} Δ W_{g 0}^{+}, \end{matrix}

(A44)

\begin{matrix} Q_{4} = \sum_{\begin{matrix} j_{k} = 0 \\ k = 1, \dots, N + 1 \end{matrix}}^{r - 1} {E [(U (1, n, j))}^{2} I {| U (1, n, j) | > ε W_{g 0}^{+} Δ W_{g 0}^{+}}] \to 0 . \end{matrix}

(A45)

□

Proof of (A42)).

The proof is similar to the proof of (5.42) in [31]. □

Proof of (A43).

According to assumption (C1), if for each

2 \leq i \leq 2^{N + 1}, E {(T (n, i))}^{2} \to 0

exists, (A43) can be proven. Without loss of generality, it suffices to prove that, for

E {[T (n, 2)]}^{2} \to 0

. Enumerate the r.v’s

U (2, n, j)

in an arbitrary manner and refer to them as

{\hat{U}}_{1}, \dots, {\hat{U}}_{M},

where

M = r^{N + 1} = (\frac{n_{1}}{p_{1} + q}) \dots (\frac{n_{N + 1}}{p_{N + 1} + q})

. □

Now,

\begin{matrix} E {[T (n, 2)]}^{2} & = & \sum_{j = 1}^{M} V a r ({\hat{U}}_{i}) + 2 \sum_{\begin{matrix} i = 1 \\ i \end{matrix}}^{M} \sum_{\begin{matrix} j = 1 \\ \neq j \end{matrix}}^{M} C o v ({\hat{U}}_{i}, {\hat{U}}_{j}) \\ = & I_{1} + I_{2} . \end{matrix}

(A46)

If

I_{1} \to 0,

and

I_{2} \to 0

can be shown,

Q_{2} \to 0

is obvious.

\begin{matrix} V a r ▵_{i, t} & = & V a r (\frac{1}{\sqrt{n}} (H_{n} - E H_{n})) = \frac{1}{n} E {(H_{n} - E H_{n})}^{2} \\ = & \frac{1}{n} E W_{g 0}^{+} E {g^{'} {(β_{0}^{T} X)}^{2} W_{0} (X) σ_{β_{0}}^{2} (X)} W_{g 0}^{+} \end{matrix}

(A47)

\begin{matrix} \leq & \frac{c}{n} . \end{matrix}

(A48)

Similar to (A47), we have:

\begin{matrix} E ▵_{i, t} ▵_{j, τ} & = & \frac{1}{n} E W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} W_{g 0}^{+} g^{'} (β_{0}^{T} X_{j, τ}) v_{β_{0}} (X_{j, τ}) ε_{j, τ} \end{matrix}

(A49)

\begin{matrix} \leq & c \frac{1}{n} φ (∥ i - j ∥, | t - {τ |)}^{\frac{δ}{2 + δ}} . \end{matrix}

(A50)

Then, by Equations (A47) and (A49), we have:

\begin{matrix} V a r ({\hat{U}}_{i}) & = & V a r (\sum_{\begin{matrix} i_{k} = 1 \\ k = 1, \dots, N \end{matrix}}^{p_{k}} \sum_{t = 1}^{q} ▵_{i, t}) \\ \leq & \sum_{\begin{matrix} i_{k} = 1 \\ k \leq 1, \dots, N \end{matrix}}^{p_{k}} \sum_{t = 1}^{q} V a r ▵_{i, t} + \sum_{\begin{matrix} j_{k} = 1 \\ k = 1, \dots, N \\ i_{k} \neq j_{k} f o r \end{matrix}}^{p_{k}} \sum_{\begin{matrix} τ = 1 \\ s o m e \end{matrix}}^{q} \sum_{\begin{matrix} i_{k} = 1 \\ k = 1, \dots, N \\ 1 \leq k \leq N o r \end{matrix}}^{p_{k}} \sum_{\begin{matrix} t = 1 \\ t \neq τ \end{matrix}}^{q} | E ▵_{j, τ} ▵_{i, t} | \\ \leq & \sum_{\begin{matrix} i_{k} = 1 \\ k = 1, \dots, N \end{matrix}}^{p_{k}} \sum_{t = 1}^{q} V a r ▵_{i, t} + \sum_{\begin{matrix} i_{k} = 1 \\ k = 1, \dots, N \end{matrix}}^{p_{k}} \sum_{\begin{matrix} t = 1 \\ t \neq \end{matrix}}^{q} \sum_{\begin{matrix} τ = 1 \\ τ \end{matrix}}^{q} | E ▵_{i, τ} ▵_{i, t} | \\ + & \sum_{\begin{matrix} i_{k} = 1 \\ f o r s o m e \end{matrix}}^{p_{k}} \sum_{\begin{matrix} j_{k} \neq i_{k} \\ k = 1, \dots, N \end{matrix}}^{p_{k}} \sum_{\begin{matrix} t = 1 \\ t \neq \end{matrix}}^{q} \sum_{\begin{matrix} τ = 1 \\ τ \end{matrix}}^{q} | E ▵_{j, τ} ▵_{i, t} | + \sum_{t = 1}^{q} \sum_{\begin{matrix} i_{k} = 1 \\ f o r s o m e \end{matrix}}^{p_{k}} \sum_{\begin{matrix} j_{k} \neq i_{k} \\ k = 1, \dots, N \end{matrix}}^{p_{k}} | E ▵_{j, t} ▵_{i, t} | . \end{matrix}

By Lemma A9, the above inequality continues as:

\begin{matrix} \leq \frac{c p_{1} \dots p_{N} q}{n} + \frac{c p_{1} \dots p_{N} q}{n} [\sum_{t = 1}^{\infty} φ_{2} {(t)}^{\frac{δ}{2 + δ}} + \sum_{\begin{matrix} i_{k} = 1 \\ k = 1, \dots, N \end{matrix}}^{\infty} φ_{1} (∥ i ∥)) + q \sum_{\begin{matrix} i_{k} = 1 \\ k = 1, \dots, N \end{matrix}}^{\infty} φ_{1} (∥ i ∥)] \\ \leq \frac{c p_{1} \dots p_{N} q}{n} [1 + \sum_{t = 1}^{\infty} φ_{2} {(t)}^{\frac{δ}{2 + δ}} + q \sum_{i = 1}^{\infty} i^{N} φ_{1} {(i)}^{\frac{δ}{2 + δ}}] \leq \frac{c p_{1} \dots p_{N} q^{2}}{n} . \end{matrix}

Consequently,

I_{1} \leq M \frac{c p_{1} \dots p_{N} q^{2}}{n^{*}} = \frac{n^{*}}{(p_{1} + q) \dots (p_{N + 1} + q)} \cdot \frac{c p_{1} \dots p_{N} q^{2}}{n^{*}} \leq \frac{q^{2}}{p_{N + 1}},

which tends to zero by the definition of

p_{k} = O {(\log n_{k})}^{\frac{1}{8 (N + 1)}}

and

q = O {(\log n_{N + 1})}^{\frac{1}{16 (N + 1)}}

.

Let:

\begin{matrix} I (2, n, j) = {(i, t) : & j_{k} (p_{k} + q) + 1 \leq i_{k} \leq j_{k} (p_{k} + q) + p_{k}, 1 \leq k \leq N, \\ j_{N + 1} (p_{N + 1} + q) + p_{N + 1} + 1 \leq t \leq (j_{N + 1} + 1) (p_{N + 1} + q)} . \end{matrix}

Then,

U (2, n, j)

is the sum of

▵_{i, t}

with sites in

I (2, n, j)

. Since

p_{k} > q

, if

j

and

j^{'}

belong to the two distinct sets

I (2, n, j)

and

I (2, n, j^{'})

, then

j_{k} \neq j_{k}^{'}

for some

1 \leq k \leq N + 1

, and

∥ j - j^{'} ∥ \geq q

. To simplify, denoting

p = p_{1} \dots p_{N}

, then we obtain:

I_{2} \leq c p^{2} q^{2} {\sum_{t = τ} \sum_{∥ i - j ∥ \geq q} | E ▵_{j, τ} ▵_{i, t} | + \sum_{i = j} \sum_{| t - τ | \geq q} | E ▵_{j, τ} ▵_{i, t} | + \sum_{| t - τ | \geq q} \sum_{∥ i - j ∥ \geq q} | E ▵_{j, τ} ▵_{i, t} |};

by Lemma A9, we have the following equation:

\begin{matrix} \leq \frac{c p^{2} q^{2}}{n^{*}} {n_{N + 1} \sum_{\begin{matrix} k = 1 \\ k q \leq ∥ i - j ∥ < (k + 1) q \end{matrix}}^{\infty} φ_{1} (∥ i - {j ∥)}^{\frac{δ}{2 + δ}} + n_{1} \dots n_{N} \\ \cdot \sum_{\begin{matrix} k = 1 \\ k q \leq | t - τ | < (k + 1) q \end{matrix}}^{\infty} φ_{2} (| t - {τ |)}^{\frac{δ}{2 + δ}} + n_{N + 1}^{2} \sum_{\begin{matrix} k = 1 \\ k q \leq ∥ i - j ∥ < (k + 1) q \end{matrix}}^{\infty} φ_{1} (∥ i - {j ∥)}^{\frac{δ}{2 + δ}}} \\ \leq \frac{c p^{2} q^{2}}{n^{*}} [n_{1} \dots n_{N} \sum_{k = 1}^{\infty} φ_{2} {(k q)}^{\frac{δ}{2 + δ}} + (n_{N + 1} + n_{N + 1}^{2}) \sum_{k = 1}^{\infty} k^{N} φ_{1} {(k q)}^{\frac{δ}{2 + δ}}] \\ \leq \frac{c p^{2} q^{2}}{n_{N + 1}}, \end{matrix}

which tends to 0 by the definition of

p_{k}

and q. Hence, (A43) holds.

Proof of (A44).

Let

S_{n} = \sum_{i = 1}^{2^{N + 1}} T (n, i)

,

S_{n}^{^{'}} = T (n, 1)

and

S_{n}^{^{″}} = \sum_{i = 2}^{2^{N + 1}} T (n, i)

. Then,

S_{n}^{^{'}}

is the sum of

▵_{i, t}

terms over the “large” blocks, and

S_{n}^{^{″}}

is over the "small" ones. Lemmas A11 and A12 imply that

E S_{n}^{2} \to W_{g 0}^{+} Δ W_{g 0}^{+}

; this, combined with (A43), entails

E S_{n}^{^{'} 2} \to W_{g 0}^{+} Δ W_{g 0}^{+}

. Now,

E S_{n}^{^{'} 2} = Q_{3} + \sum_{\begin{matrix} j_{k} = 0 \\ k = 1, \dots, N + 1 \\ i_{k} \neq j_{k} f o r \end{matrix}}^{r - 1} \sum_{\begin{matrix} i_{k} = 0 \\ k = 1, \dots, N + 1 \\ s o m e k \end{matrix}}^{r - 1} C o v (U (1, n, j), U (1, n, i)) .

(A51)

If the last term of Equation (A51) tends to zero as

n \to \infty

, then Equation (A44) can be obtained. By the same argument used in obtaining

I_{2} \to 0

, the last term of (A51) is bounded by:

\begin{matrix} c {(p_{1} \dots p_{N + 1})}^{2} {\sum_{t = τ} \sum_{∥ i - j ∥ \geq q} | E ▵_{j, τ} ▵_{i, t} | + \sum_{i = j} \sum_{| t - τ | \geq q} | E ▵_{j, τ} ▵_{i, t} | \\ + \sum_{| t - τ | \geq q} \sum_{∥ i - j ∥ \geq q} | E ▵_{j, τ} ▵_{i, t} |} \\ \leq & \frac{c {(p_{1} \dots p_{N + 1})}^{2}}{n} [n_{1} \dots n_{N} \sum_{k = 1}^{\infty} φ_{2} {(k q)}^{\frac{δ}{2 + δ}} + (n_{N + 1} + n_{N + 1}^{2}) \sum_{k = 1}^{\infty} k^{N} φ_{1} {(k q)}^{\frac{δ}{2 + δ}}] \\ \leq & \frac{c {(p_{1} \dots p_{N + 1})}^{2}}{n_{N + 1}}, \end{matrix}

which tends to 0 by the assumptions and the definition of

p_{k}

and q. □

Proof of (A45).

We need a truncation argument, so set

L = L_{n} = O {(n)}^{\frac{1}{5}}

,

H_{n} = W_{g 0}^{+} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t}

,

▵_{i, t}^{L} = \frac{1}{\sqrt{n}} [H_{n}^{L} - E H_{n}^{L}]

, and define

U^{L} (1, n, j) = \sum_{\begin{matrix} i_{k} = j_{k} (p_{k} + q) + 1 \\ k = 1, \dots, N \end{matrix}}^{j_{k} (p_{k} + q) + p_{k}} \sum_{t = j_{N + 1} (p_{N + 1} + q) + 1}^{j_{N + 1} (p_{N + 1} + q) + p_{N + 1}} ▵_{i, t}^{L}

. Set

Q_{4}^{L} = \sum_{\begin{matrix} j_{k} = 0 \\ k = 1, \dots, N + 1 \end{matrix}}^{r - 1} E [{(U^{L} (1, n, j))}^{2} I {| U^{L} (1, n, j) | > ε W_{g 0}^{+} Δ W_{g 0}^{+}}] .

Clearly,

| ▵_{i, t} |^{L} \leq \frac{c L}{{(n)}^{\frac{1}{2}}}

; therefore,

| U^{L} (1, n, j) | \leq \frac{c L (p_{1} \dots p_{N + 1})}{{(n)}^{\frac{1}{2}}}

. Hence,

Q_{4}^{L} \leq \frac{c L^{2} {(p_{1} \dots p_{N + 1})}^{2}}{n} \sum_{\begin{matrix} j_{k} = 0 \\ k = 1, \dots, N + 1 \end{matrix}}^{r - 1} P (| U^{L} (1, n, j) | > ε σ^{2} W_{g 0}^{+} Δ W_{g 0}^{+}) .

Now,

| U^{L} (1, n, j) | \leq \frac{c L (p_{1} \dots p_{N + 1})}{{(n)}^{\frac{1}{2}}} \to 0

by the definition of

p_{k} = O {(\log n_{k})}^{\frac{1}{8 (N + 1)}}

. Thus,

P (| U^{L} (1, n, j) | > ε σ^{2} W_{g 0}^{+} Δ W_{g 0}^{+}) = 0

at all

j

for sufficiently large n, so

Q_{4}^{L} = 0

for large n. Hence,

\sum ▵_{i, t}^{L} \overset{L}{⟶} N (0, σ^{2} C^{T} Σ C) .

(A52)

Define

S^{*} = \sum (▵_{i, t} - ▵_{i, t}^{L})

, and we know that

\frac{1}{\sqrt{n}} (H_{n} - E H_{n}) = S^{*} + \sum ▵_{i, t}^{L}

. By Equation (A52), to prove the lemma, it suffices to prove

E S^{* 2} \to 0

. Similar to Lemma A12, we can show that

E S^{* 2} \to 0

. According to the above lemmas, we have:

\sqrt{n} (β^{(k)} - β_{0}) - E [\sqrt{n} (β^{(k)} - β_{0})] \overset{D}{⟶} N (0, W_{g 0}^{+} Δ W_{g 0}^{+}) .

(A53)

Additionally, by Lemma A11, we have

E [\sqrt{n} (β^{(k)} - β_{0})] \overset{P}{⟶} 0

, and then we obtain

\sqrt{n} (β^{(k)} - β_{0}) \overset{D}{⟶} N (0, W_{g 0}^{+} Δ W_{g 0}^{+}) .

□

Proof of Theorem 4.

By Theorems 1 and 2, similar to Lemma A5, under conditions (C1)–(C4), we have:

\begin{matrix} {\{n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\tilde{ρ}}_{j, τ}^{β} {(b_{j, τ}^{β})}^{2} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) X_{(i, t), (j, τ)} X_{(i, t), (j, τ)}^{T} / {\hat{f}}_{β} (β^{T} X_{j, τ})\}}^{- 1} \\ = & β_{0} β_{0}^{T} {[E {g^{'} {(β_{0}^{T} X)}^{2} / σ_{β_{0}}^{2} (X)}]}^{- 1} h_{n}^{- 2} - \frac{1}{2} {[E {g^{'} {(β_{0}^{T} X)}^{2} / σ_{β_{0}}^{2} (X)}]}^{- 1} (β_{0} {\tilde{F}}^{T} {\tilde{W}}_{g 0}^{+} + {\tilde{W}}_{g 0}^{+} \tilde{F} β_{0}^{T}) \\ + \frac{1}{2} {\tilde{W}}_{g 0}^{+} + O_{P} {(κ_{n} h_{n} n^{ϵ} + δ_{β} / h_{n}^{2}) E_{d}}, \end{matrix}

where

\tilde{F} = E {{[g^{'} (β_{0}^{T} X)]}^{2} / σ_{β_{0}}^{2} (X) {(f_{β_{0}} (β_{0}^{T} X) v_{β_{0}} (X))}^{'} / f_{β_{0}} (β_{0}^{T} X)}

. Similar to Lemma A6, under conditions (C1)–(C4), we have:

\begin{matrix} n^{- 2} \sum_{\begin{matrix} j \in I_{n} \\ τ \in T_{n} \end{matrix}} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} {\tilde{ρ}}_{j, τ}^{β} K_{h_{n}} (β^{T} X_{(i, t), (j, τ)}) b_{j, τ}^{β} X_{(i, t), (j, τ)} (Y_{i, t} - a_{j, τ}^{β} - b_{j, τ}^{β} β_{0}^{T} X_{(i, t), (j, τ)}) / {\hat{f}}_{β} (β^{T} X_{j, τ}) \\ = & {\tilde{W}}_{g 0} (β - β_{0}) + n^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} / σ_{β_{0}}^{2} (X_{i, t}) + o_{P} (n^{- 1 / 2}) . \end{matrix}

Similar to Theorem 3,

β_{0}^{T} {\tilde{W}}_{g 0} = 0

,

{\tilde{W}}_{g 0}^{+} {\tilde{W}}_{g 0} = I - β_{0} β_{0}^{T}

, recursing the preceding equation, and we have, as the iteration

k \to \infty

:

\begin{matrix} β^{(k)} & = & β_{0} + \frac{1}{2^{k - 1}} (I - β_{0} β_{0}^{T}) (β^{(1)} - β_{0}) + (1 - \frac{1}{2^{k - 1}}) \frac{1}{n} {\tilde{W}}_{g 0}^{+} \\ \cdot & \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) ε_{i, t} / σ_{β_{0}} (X_{i, t}) \\ \to & β_{0} + \frac{1}{n} {\tilde{W}}_{g 0}^{+} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t}) / σ_{β_{0}} (X_{i, t}) \cdot ε_{i, t} / σ_{β_{0}} (X_{i, t}) . \end{matrix}

Similar to the proof of Theorem 3, as

n \to \infty

, Theorem 4 holds. □

Proof of Theorem 5.

Set

η_{i, t} = g^{'} (β_{0}^{T} X_{i, t}) v_{β_{0}} (X_{i, t})

,

Σ = d i a g {σ_{β_{0}}^{2} (X_{1}), \dots, σ_{β_{0}}^{2} (X_{n})}

,

η = (η_{1}^{T}, η_{2}^{T}, \dots, η_{n}^{T})

. According to the definitions of

W_{g 0}, Δ

, and

{\tilde{W}}_{g 0}

, we have:

{\tilde{W}}_{g 0}^{+} = lim_{n \to + \infty} {(\frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} η_{i, t} η_{i, t}^{T} σ_{β_{0}}^{- 2} (X_{n}))}^{+} = lim_{n \to + \infty} n {(η^{T} Σ^{- 1} η)}^{+},

and

\begin{matrix} W_{g 0}^{+} Δ W_{g 0}^{+} & = & lim_{n \to + \infty} {(\frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} η_{i, t} η_{i, t}^{T})}^{+} lim_{n \to + \infty} (\frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} η_{i, t} η_{i, t}^{T} σ_{β_{0}}^{2} (X_{n})) lim_{n \to + \infty} {(\frac{1}{n} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} η_{i, t} η_{i, t}^{T})}^{+} \\ = & n {(η^{T} η)}^{+} (η^{T} Σ η) {(η^{T} η)}^{+} . \end{matrix}

For any d-dimensional vector

c \in R^{d}

, let

{∥ c ∥}_{Σ} = c^{T} Σ c

. Since

Σ

is a positive matrix, then

{∥ \cdot ∥}_{Σ}

is a norm of space

R^{p}

. Therefore, we have:

\begin{matrix} c^{T} {(η^{T} η)}^{+} (η^{T} Σ η) {(η^{T} η)}^{+} c = {∥ η {(η^{T} η)}^{+} c ∥}_{Σ} \\ = & ∥ η {(η^{T} η)}^{+} c - Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} c + Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} {c ∥}_{Σ} \\ = & ∥ η {(η^{T} η)}^{+} c - Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} {c ∥}_{Σ} + {∥ Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} c ∥}_{Σ} \\ + 2 c^{T} {(η^{T} Σ^{- 1} η)}^{+} η^{T} Σ^{- 1} Σ [η {(η^{T} η)}^{+} c - Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} c] \\ = & ∥ η {(η^{T} η)}^{+} c - Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} {c ∥}_{Σ} + {∥ Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} c ∥}_{Σ} \\ \geq & ∥ Σ^{- 1} η {(η^{T} Σ^{- 1} η)}^{+} {c ∥}_{Σ} = c^{T} {(η^{T} Σ^{- 1} η)}^{+} c . \end{matrix}

By the universality of the column vector c, we can obtain Theorem 5. □

Proof of Theorem 6.

For simplicity, set

v = β_{0}^{T} x

, by Lemma A4; then, we have:

{\hat{g}}_{n} (v) = a_{x}^{β_{0}} = g (v) + \frac{1}{2} g^{″} (v) h_{n}^{2} + Θ_{n, 1} (x) + O_{P} {(h_{n} κ_{n} + δ_{β}^{2}) n^{ϵ}} {(1 + ∥ x ∥}^{4}) .

Thus,

\sqrt{n h_{n}} \{{\hat{g}}_{n} (v) - g (v) - \frac{h_{n}^{2}}{2} g^{″} (v) + O_{p} (h_{n}^{2})\} = \sqrt{h_{n}} {(\sqrt{n} f_{β_{0} (v)})}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β_{0}^{T} X_{(i, t) x}) ε_{i, t} .

After some simple computations, by Lemma A1, we have:

E [\sqrt{h_{n}} {(\sqrt{n} f_{β_{0} (v)})}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β_{0}^{T} X_{(i, t) x}) ε_{i, t}] = 0,

V a r [\sqrt{h_{n}} {(\sqrt{n} f_{β_{0} (v)})}^{- 1} \sum_{\begin{matrix} i \in I_{n} \\ t \in T_{n} \end{matrix}} K_{h_{n}} (β_{0}^{T} X_{(i, t) x}) ε_{i, t}] = \frac{σ_{β_{0}}^{2} (x)}{f_{β_{0} (v)}} + o (1) .

Similar to the proof of Theorem 3, as

n \to \infty

, Theorem 6 holds. □

References

Fotheringham, A.S.; Yang, W.; Kang, W. Multiscale geographically weighted regression (MGWR). Ann. Am. Assoc. Geogr. 2017, 107, 1247–1265. [Google Scholar] [CrossRef]
Wu, C.; Ren, F.; Hu, W.; Du, Q. Multiscale geographically and temporally weighted regression: Exploring the spatio-temporal determinants of housing prices. Int. J. Geogr. Inf. Sci. 2019, 33, 489–511. [Google Scholar] [CrossRef]
Chu, H.J.; Huang, B.; Lin, C.Y. Modeling the spatio-temporal heterogeneity in the PM₁₀-PM_2.5 relationship. Atmos. Environ. 2015, 102, 176–182. [Google Scholar] [CrossRef]
Yuan, Z.; Zhou, X.; Yang, T. Hetero-ConvLSTM: A deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 984–992. [Google Scholar]
Wang, H.; Wang, J.; Huang, B. Prediction for spatio-temporal models with autoregression in errors. J. Nonparametric Stat. 2012, 24, 1–28. [Google Scholar] [CrossRef]
Wang, H.; Zhao, Z.; Wu, Y.; Luo, X. B-spline method for spatio-temporal inverse model. J. Syst. Sci. Complex. 2022, 35, 2336–2360. [Google Scholar] [CrossRef]
Římalovȧ, V.; Fišerováa, E.; Menafogliob, A.; Pinic, A. Inference for spatial regression models with functional response using a permutational approach. J. Multivar. Anal. 2022, 189, 104893. [Google Scholar] [CrossRef]
Lin, W.; Kulasekera, K.B. Identifiability of single-index models and additive-index models. Biometrika 2007, 94, 496–501. [Google Scholar] [CrossRef]
Stoker, T.M. Consistent estimation of scaled coefficients. Econometrica 1986, 54, 1461–1481. [Google Scholar] [CrossRef]
Powell, J.L.; Stock, J.H.; Stoker, T.M. Semiparametric estimation of index coefficients. Econometrica 1989, 57, 1403–1430. [Google Scholar] [CrossRef]
Xia, Y. Asymptotic distributions for two estimators of the single-index model. Econom. Theory 2006, 22, 1112–1137. [Google Scholar] [CrossRef]
Xia, Y. A constructive approach to the estimation of dimension reduction directions. Ann. Stat. 2007, 35, 2654–2690. [Google Scholar] [CrossRef]
Härdle, W.; Hall, H.; Ichimura, H. Optimal smoothing in single-index models. Ann. Stat. 1993, 21, 157–178. [Google Scholar] [CrossRef]
Li, G.; Peng, H.; Dong, K.; Tong, T. Simultaneous confidence bands and hypothesis testing for single-index models. Stat. Sin. 2014, 24, 937–955. [Google Scholar] [CrossRef]
Fan, Y.; James, G.M.; Radchenko, P. Functional additive regression. Ann. Stat. 2015, 43, 2296–2325. [Google Scholar] [CrossRef]
Xue, L. Estimation and empirical likelihood for single-index models with missing data in the covariates. Comput. Stat. Data Anal. 2013, 52, 1458–1476. [Google Scholar] [CrossRef]
Xue, L.; Zhu, L. Empirical likelihood for single-index model. J. Multivar. Anal. 2006, 97, 1295–1312. [Google Scholar] [CrossRef]
Cook, R.D.; Li, B. Dimension reduction for conditional mean in regression. Ann. Stat. 2002, 30, 455–474. [Google Scholar] [CrossRef]
Zhu, L.; Ferré, L.; Wang, T. Sufficient dimension reduction through discretization-expectation estimation. Biometrika 2010, 97, 295–304. [Google Scholar] [CrossRef]
Ma, Y.; Zhu, L. A review on dimension reduction. Int. Stat. Rev. 2013, 81, 134–150. [Google Scholar] [CrossRef]
Zhao, Y.; Li, J.; Wang, H.; Zhao, H.; Chen, X. Efficient estimation in heteroscedastic single-index models. J. Nonparametric Stat. 2021, 33, 273–298. [Google Scholar] [CrossRef]
Horowitz, J.L.; Härdle, W. Direct semiparametric estimation of single-index models with discrete covariates. J. Am. Stat. Assoc. 1996, 91, 1632–1640. [Google Scholar] [CrossRef]
Ichimura, H. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J. Econom. 1993, 58, 71–120. [Google Scholar] [CrossRef]
Cressie, N.A.C. Statistics for Spatial Data; John Wiley & Sons: New York, NY, USA, 1993. [Google Scholar]
Mack, Y.P.; Silverman, B.W. Weak and strong uniform consistency of kernel regression estimates. Zeitschrift für Wahrscheinlichkeitstheorie Und Verwandte Geb. 1982, 61, 405–415. [Google Scholar] [CrossRef]
Liebscher, E. Strong convergence of sums of α-mixing random variables with applications to density estimation. Stoch. Process. Their Appl. 1996, 65, 69–80. [Google Scholar] [CrossRef]
Schott, J.R. Matrix Analysis for Statistics; Wiley: Hoboken, NJ, USA, 1997. [Google Scholar]
Chow, Y.S.; Teicher, H. Probability Theory: Independence, Interchangeability, Martin-Gales; Springer: Berlin/Heidelberg, Germany, 1978; p. 122. [Google Scholar]
Lu, Z.; Arvid, L.; Dag, T.; Yao, Q. Exploring spatial nonlinearity using additive approximation. Bernoulli 2007, 13, 447–472. [Google Scholar] [CrossRef]
Wang, H.; Wang, J. Estimation of the trend function for spatio-temporal models. J. Nonparametric Stat. 2009, 21, 567–588. [Google Scholar] [CrossRef]
Lu, Z.; Linton, O. Local linear fitting under near epoch dependence. Econ. Theory 2007, 23, 37–70. [Google Scholar] [CrossRef]

Figure 1. Air pollution data fitting.

Table 1. Sample means, standard deviations, and relative efficiencies in Example 1 when

n = 8 \times 8 \times 8

.

Table 1. Sample means, standard deviations, and relative efficiencies in Example 1 when

n = 8 \times 8 \times 8

.

Item	Estimator	$θ_{0} = 1.0$	$θ_{0} = 1.5$	$θ_{0} = 2.0$
Mean	${\hat{β}}_{1, n}$	0.2793	0.2745	0.2615
	${\hat{β}}_{1, R}^{F N}$	0.2761	0.2772	0.2578
	${\hat{β}}_{1, R}^{D R}$	0.2779	0.2793	0.2600
	${\hat{β}}_{2, n}$	0.5487	0.5582	0.5291
	${\hat{β}}_{2, R}^{F N}$	0.5358	0.5390	0.5289
	${\hat{β}}_{2, R}^{D R}$	0.5420	0.5438	0.5277
	${\hat{β}}_{3, n}$	0.7858	0.7784	0.7933
	${\hat{β}}_{3, R}^{F N}$	0.7963	0.7941	0.7947
	${\hat{β}}_{3, R}^{D R}$	0.7916	0.7892	0.7937
	${\hat{β}}_{4, n}$	−0.007	−0.0070	0.0015
	${\hat{β}}_{4, R}^{F N}$	−0.0004	−0.0055	0.0009
	${\hat{β}}_{4, R}^{D R}$	−0.0031	−0.0036	0.0017
SSD	${\hat{β}}_{1, n}$	0.0279	0.0362	0.2145
	${\hat{β}}_{1, R}^{F N}$	0.0318	0.0334	0.1653
	${\hat{β}}_{1, R}^{D R}$	0.0349	0.0421	0.1597
	${\hat{β}}_{2, n}$	0.0430	0.0626	0.2183
	${\hat{β}}_{2, R}^{F N}$	0.0235	0.0319	0.1642
	${\hat{β}}_{2, R}^{D R}$	0.0228	0.0283	0.1583
	${\hat{β}}_{3, n}$	0.0255	0.0386	0.2253
	${\hat{β}}_{3, R}^{F N}$	0.0125	0.0178	0.1633
	${\hat{β}}_{3, R}^{D R}$	0.0151	0.0190	0.1558
	${\hat{β}}_{4, n}$	0.0144	0.0189	0.2306
	${\hat{β}}_{4, R}^{F N}$	0.0063	0.0113	0.1668
	${\hat{β}}_{4, R}^{D R}$	0.0064	0.0081	0.1572
SRE	${\hat{β}}_{1, n}$	0.6758	0.6974	2.5971
	${\hat{β}}_{1, R}^{F N}$	0.7974	0.6210	1.4813
	${\hat{β}}_{1, R}^{D R}$	0.9713	0.9779	1.3349
	${\hat{β}}_{2, n}$	4.1839	6.5845	2.6142
	${\hat{β}}_{2, R}^{F N}$	1.1334	1.5277	1.4925
	${\hat{β}}_{2, R}^{D R}$	1.1822	1.3079	1.3617
	${\hat{β}}_{3, n}$	3.0887	4.7931	2.7372
	${\hat{β}}_{3, R}^{F N}$	0.6304	0.8887	1.6809
	${\hat{β}}_{3, R}^{D R}$	1.1264	1.2255	1.6126
	${\hat{β}}_{4, n}$	4.0594	4.0605	2.6925
	${\hat{β}}_{4, R}^{F N}$	0.6396	1.5820	1.6784
	${\hat{β}}_{4, R}^{D R}$	0.8161	0.8023	1.6117

Table 2. Sample means, standard deviations, and relative efficiencies in Example 1 when

n = 10 \times 10 \times 10

.

Table 2. Sample means, standard deviations, and relative efficiencies in Example 1 when

n = 10 \times 10 \times 10

.

Item	Estimator	$θ_{0} = 1.0$	$θ_{0} = 1.5$	$θ_{0} = 2.0$
Mean	${\hat{β}}_{1, n}$	0.2602	0.2558	0.2642
	${\hat{β}}_{1, R}^{F N}$	0.2563	0.2587	0.2577
	${\hat{β}}_{1, R}^{D R}$	0.2516	0.2548	0.2644
	${\hat{β}}_{2, n}$	0.5345	0.5501	0.5096
	${\hat{β}}_{2, R}^{F N}$	0.5319	0.5423	0.5038
	${\hat{β}}_{2, R}^{D R}$	0.5391	0.5427	0.5312
	${\hat{β}}_{3, n}$	0.8029	0.7912	0.8169
	${\hat{β}}_{3, R}^{F N}$	0.8052	0.7998	0.8227
	${\hat{β}}_{3, R}^{D R}$	0.8032	0.7993	0.8044
	${\hat{β}}_{4, n}$	−0.0113	0.0493	−0.0018
	${\hat{β}}_{4, R}^{F N}$	−0.0059	0.0161	−0.0094
	${\hat{β}}_{4, R}^{D R}$	−0.0039	0.0229	−0.0028
SSD	${\hat{β}}_{1, n}$	0.0306	0.0446	0.0256
	${\hat{β}}_{1, R}^{F N}$	0.0238	0.0186	0.0346
	${\hat{β}}_{1, R}^{D R}$	0.0185	0.0125	0.0138
	${\hat{β}}_{2, n}$	0.0260	0.0179	0.0323
	${\hat{β}}_{2, R}^{F N}$	0.0142	0.0133	0.0278
	${\hat{β}}_{2, R}^{D R}$	0.0148	0.0141	0.0198
	${\hat{β}}_{3, n}$	0.0158	0.0214	0.0226
	${\hat{β}}_{3, R}^{F N}$	0.0041	0.0090	0.0257
	${\hat{β}}_{3, R}^{D R}$	0.0088	0.0120	0.0135
	${\hat{β}}_{4, n}$	0.0210	0.0305	0.0219
	${\hat{β}}_{4, R}^{F N}$	0.0207	0.0252	0.0214
	${\hat{β}}_{4, R}^{D R}$	0.0224	0.0212	0.0203
SRE	${\hat{β}}_{1, n}$	1.6501	9.6640	3.6479
	${\hat{β}}_{1, R}^{F N}$	1.1514	1.9198	7.0692
	${\hat{β}}_{1, R}^{D R}$	0.9846	1.4218	1.0852
	${\hat{β}}_{2, n}$	3.3675	2.855	4.9831
	${\hat{β}}_{2, R}^{F N}$	1.0456	1.2121	5.1506
	${\hat{β}}_{2, R}^{D R}$	1.1884	1.3611	1.2101
	${\hat{β}}_{3, n}$	3.9555	5.358	4.8345
	${\hat{β}}_{3, R}^{F N}$	0.4494	0.7964	7.1304
	${\hat{β}}_{3, R}^{D R}$	1.2435	1.4159	1.2323
	${\hat{β}}_{4, n}$	1.0402	4.8466	1.1845
	${\hat{β}}_{4, R}^{F N}$	0.8493	1.2902	1.3420
	${\hat{β}}_{4, R}^{D R}$	0.9447	1.4113	1.0247

Table 3. Sample means, standard deviations, and relative efficiencies in Example 2 when

n = 8 \times 8 \times 8

.

Table 3. Sample means, standard deviations, and relative efficiencies in Example 2 when

n = 8 \times 8 \times 8

.

Item	Estimator	$θ_{0} = 1.0$	$θ_{0} = 1.5$	$θ_{0} = 2.0$
Mean	${\hat{β}}_{1, n}$	0.3180	0.3203	0.3091
	${\hat{β}}_{1, R}^{F N}$	0.3026	0.3060	0.3146
	${\hat{β}}_{1, R}^{D R}$	0.3013	0.2936	0.3070
	${\hat{β}}_{2, n}$	0.5422	0.5131	0.5647
	${\hat{β}}_{2, R}^{F N}$	0.5417	0.5147	0.5581
	${\hat{β}}_{2, R}^{D R}$	0.5412	0.5152	0.5587
	${\hat{β}}_{3, n}$	0.7760	0.7945	0.7602
	${\hat{β}}_{3, R}^{F N}$	0.7824	0.7973	0.7681
	${\hat{β}}_{3, R}^{D R}$	0.7842	0.8022	0.7635
	${\hat{β}}_{4, n}$	0.0114	−0.0101	0.0185
	${\hat{β}}_{4, R}^{F N}$	0.0083	0.0015	0.0301
	${\hat{β}}_{4, R}^{D R}$	0.0076	0.0014	0.0045
SSD	${\hat{β}}_{1, n}$	0.0239	0.0599	0.1064
	${\hat{β}}_{1, R}^{F N}$	0.0228	0.0576	0.1007
	${\hat{β}}_{1, R}^{D R}$	0.0205	0.0317	0.0672
	${\hat{β}}_{2, n}$	0.0127	0.0278	0.0515
	${\hat{β}}_{2, R}^{F N}$	0.0095	0.0235	0.0421
	${\hat{β}}_{2, R}^{D R}$	0.0082	0.0202	0.0115
	${\hat{β}}_{3, n}$	0.0081	0.0373	0.0762
	${\hat{β}}_{3, R}^{F N}$	0.0098	0.0350	0.0632
	${\hat{β}}_{3, R}^{D R}$	0.0069	0.0308	0.0360
	${\hat{β}}_{4, n}$	0.0509	0.0200	0.0337
	${\hat{β}}_{4, R}^{F N}$	0.0357	0.0133	0.0337
	${\hat{β}}_{4, R}^{D R}$	0.0344	0.0055	0.0089
SRE	${\hat{β}}_{1, n}$	1.8127	0.8880	0.7741
	${\hat{β}}_{1, R}^{F N}$	1.0206	1.1188	1.2239
	${\hat{β}}_{1, R}^{D R}$	0.9112	0.9941	0.9501
	${\hat{β}}_{2, n}$	0.6660	1.5348	0.8369
	${\hat{β}}_{2, R}^{F N}$	1.2903	1.1565	1.0343
	${\hat{β}}_{2, R}^{D R}$	0.9032	0.9960	0.9485
	${\hat{β}}_{3, n}$	1.7353	0.8257	1.5072
	${\hat{β}}_{3, R}^{F N}$	1.0690	1.1678	1.3741
	${\hat{β}}_{3, R}^{D R}$	0.9846	1.0108	0.9238
	${\hat{β}}_{4, n}$	1.8545	4.6052	3.8478
	${\hat{β}}_{4, R}^{F N}$	0.8656	4.2492	1.9594
	${\hat{β}}_{4, R}^{D R}$	0.9830	0.9644	1.3002

Table 4. Sample means, standard deviations, and relative efficiencies in Example 2 when

n = 10 \times 10 \times 10

.

Table 4. Sample means, standard deviations, and relative efficiencies in Example 2 when

n = 10 \times 10 \times 10

.

Item	Estimator	$θ_{0} = 1.0$	$θ_{0} = 1.5$	$θ_{0} = 2.0$
Mean	${\hat{β}}_{1, n}$	0.3033	0.3071	0.3112
	${\hat{β}}_{1, R}^{F N}$	0.3022	0.3011	0.3018
	${\hat{β}}_{1, R}^{D R}$	0.2986	0.2960	0.2952
	${\hat{β}}_{2, n}$	0.5532	0.5267	0.5474
	${\hat{β}}_{2, R}^{F N}$	0.5492	0.5288	0.5448
	${\hat{β}}_{2, R}^{D R}$	0.5449	0.5298	0.5380
	${\hat{β}}_{3, n}$	0.7700	0.8020	0.7742
	${\hat{β}}_{3, R}^{F N}$	0.7764	0.8017	0.7799
	${\hat{β}}_{3, R}^{D R}$	0.7781	0.8016	0.7810
	${\hat{β}}_{4, n}$	0.0552	−0.0040	−0.012
	${\hat{β}}_{4, R}^{F N}$	0.0468	−0.0033	−0.0089
	${\hat{β}}_{4, R}^{D R}$	0.0463	−0.0031	-0.0063
SSD	${\hat{β}}_{1, n}$	0.0296	0.0154	0.0150
	${\hat{β}}_{1, R}^{F N}$	0.0238	0.0146	0.0117
	${\hat{β}}_{1, R}^{D R}$	0.0225	0.0100	0.0098
	${\hat{β}}_{2, n}$	0.0209	0.0162	0.0132
	${\hat{β}}_{2, R}^{F N}$	0.0188	0.0125	0.0098
	${\hat{β}}_{2, R}^{D R}$	0.0146	0.0102	0.0026
	${\hat{β}}_{3, n}$	0.0088	0.0256	0.0087
	${\hat{β}}_{3, R}^{F N}$	0.0073	0.0230	0.0033
	${\hat{β}}_{3, R}^{D R}$	0.0023	0.0160	0.0010
	${\hat{β}}_{4, n}$	0.0582	0.0062	0.0920
	${\hat{β}}_{4, R}^{F N}$	0.0566	0.0017	0.0832
	${\hat{β}}_{4, R}^{D R}$	0.0508	0.0009	0.0708
SRE	${\hat{β}}_{1, n}$	0.9646	1.5420	1.7484
	${\hat{β}}_{1, R}^{F N}$	0.9738	0.9465	0.8725
	${\hat{β}}_{1, R}^{D R}$	0.9994	1.0162	1.1460
	${\hat{β}}_{2, n}$	1.1077	1.8686	1.9866
	${\hat{β}}_{2, R}^{F N}$	1.0915	1.4366	0.9473
	${\hat{β}}_{2, R}^{D R}$	1.0115	0.7647	0.9885
	${\hat{β}}_{3, n}$	1.3524	1.3291	1.5706
	${\hat{β}}_{3, R}^{F N}$	0.8750	1.2429	0.8803
	${\hat{β}}_{3, R}^{D R}$	0.9261	1.0108	1.1370
	${\hat{β}}_{4, n}$	1.0374	1.4991	0.7267
	${\hat{β}}_{4, R}^{F N}$	1.0193	0.8973	0.9798
	${\hat{β}}_{4, R}^{D R}$	0.9941	0.9644	1.1907

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Zhao, Z.; Hao, H.; Huang, C. Estimation and Inference for Spatio-Temporal Single-Index Models. Mathematics 2023, 11, 4289. https://doi.org/10.3390/math11204289

AMA Style

Wang H, Zhao Z, Hao H, Huang C. Estimation and Inference for Spatio-Temporal Single-Index Models. Mathematics. 2023; 11(20):4289. https://doi.org/10.3390/math11204289

Chicago/Turabian Style

Wang, Hongxia, Zihan Zhao, Hongxia Hao, and Chao Huang. 2023. "Estimation and Inference for Spatio-Temporal Single-Index Models" Mathematics 11, no. 20: 4289. https://doi.org/10.3390/math11204289

APA Style

Wang, H., Zhao, Z., Hao, H., & Huang, C. (2023). Estimation and Inference for Spatio-Temporal Single-Index Models. Mathematics, 11(20), 4289. https://doi.org/10.3390/math11204289

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimation and Inference for Spatio-Temporal Single-Index Models

Abstract

1. Introduction

2. A Brief Description of the rMAVE

3. Estimation of the Variance Function

3.1. Estimation of the Variance Function with Fully Nonparametric Function

3.2. Estimation of the Variance Function with Dimension Reduction Structure

4. Reweighting Estimation and Asymptotic Properties

4.1. Reweighting Estimation

4.2. Asymptotic Properties

5. Monte Carlo Study

6. Real Data Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Assumptions

Appendix B. Proof

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI