K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples

Xueping Hu; Jingya Wang; Liuliu Wang; Keming Yu

doi:10.3390/axioms11030102

,

and

¹

College of Mathematics and Physics, Anqing Normal University, Anqing 246133, China

²

Department of Mathematics, Brunel University, London UB8 3PH, UK

^*

Author to whom correspondence should be addressed.

Axioms2022, 11(3), 102;https://doi.org/10.3390/axioms11030102

This article belongs to the Special Issue Current Research on Mathematical Inequalities

Version Notes

Order Reprints

Abstract

Functional data, which provides information about curves, surfaces or anything else varying over a continuum, has become a commonly encountered type of data. The k-nearest neighbor (kNN) method, as a nonparametric method, has become one of the most popular supervised machine learning algorithms used to solve both classification and regression problems. This paper is devoted to the k-nearest neighbor (kNN) estimators of the nonparametric functional regression model when the observed variables take values from negatively associated (NA) sequences. The consistent and complete convergence rate for the proposed kNN estimator is first provided. Then, numerical assessments, including simulation study and real data analysis, are conducted to evaluate the performance of the proposed method and compare it with the standard nonparametric kernel approach.

Keywords:

convergence rate; NA samples; functional data; nonparametric regression model; k-nearest neighbor estimator

MSC:

62G08; 62G20

1. Introduction

Functional data analysis (FDA) is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function.

Popularized by Ramsay and Silverman [1,2], statistics for functional data analysis have attracted considerable research interest because of its wide applications in many practical fields, such as medicine, economics and linguistics. For an introduction to the topics, we can refer to the monographs of Ramsay and Silverman [3] for parametric models, and Ferraty and Vieu [4] for nonparametric models.

In this paper, the following functional non-parametric regression model is considered.

Y = m (χ) + ϵ, (1)

where Y is a scalar response variable,

χ

is a covariate taking value in a subset

S_{F}

of an infinite-dimensional functional space

F

endowed with a semi-metric

d (\cdot, \cdot)

.

m (\cdot)

is the unknown regression operator from

S_{F}

to

R

, and the random error

ϵ

satisfies

E (ϵ | χ) = 0, a . s .

For the estimation of model (1), Ferraty and Vieu [5] investigated the classical functional Nadaraya-Watson (N-W) kernel type estimator of

m (\cdot)

and obtained the asymptotic properties with rates in the case of

α

-mixing functional data. Ling and Wu [6] studied the modified N-W kernel estimate and derived the asymptotic distribution for strong mixing functional time series data, Baíllo and Grane [7] proposed a functional local linear estimate based on the local linear idea. In this paper, we focus on the k-nearest neighbors (kNN) method for regression model (1). The kNN method, as one of the most simple and traditional nonparametric techniques, is often used as a nonparametric classification method. The kNN method was first developed by Evelyn Fix and Joseph Hodges in 1951 [8] and then expanded by Thomas Cover [9]. In our kNN regression, the input consists of the k-closest training examples in a dataset, whereas the output is the property value for the object. This value is the average of the values of the k-nearest neighbors. Under independent samples, research in kNN regression mostly focuses on the estimation of the continuous regression function

m (χ)

. For example, Burba et al. [10] investigated the kNN estimator based on the idea of the local adaptive bandwidth of functional explanatory variables. The papers [11,12,13,14,15,16,17,18], and others, obtained the asymptotic behavior of nonparametric regression estimators for functional data in independent and dependent cases. Further, Kudraszow and Vieu [19] obtained asymptotic results for a kNN generalized regression estimator when the observed variables take values in an abstract space. Kara-Zaitri et al. [20] provided an asymptotic theory for several different target operators and some simulated experiences, including regression, conditional density, conditional distribution and hazard operators. However, functional observations often behave with correlation, including satisfying some form of negative dependence or negative association.

Negatively associated (NA) sequences were introduced by Joag-Dev and Proschan in [21]. Random variables

{Y_{i}}_{1 \leq i \leq n}

are said to be NA, if for every pair of disjoint subsets

A, B \subset {1, 2, \dots, n},

Cov (f (Y_{i}, i \in A) g (Y_{j}, j \in B)) \leq 0,

or equivalently,

E (f (Y_{i}, i \in A), g (Y_{j}, j \in B)) \leq E (f (Y_{i}, i \in A)) E (g (Y_{j}, j \in B)),

where f and g are coordinatewise non-decreasing, such that this covariance exists. An infinite sequence

{Y_{n}}_{n \geq 1}

is NA if every finite subcollection is NA.

For example, if

{Y_{i}}_{1 \leq i \leq n}

follows permutation distributions, where

{Y_{1}, Y_{2}, \dots, Y_{n}} = {y_{1}, y_{2}, \dots, y_{n}}

always and

y_{1} \leq y_{2} \leq \dots \leq y_{n}

are n real numbers, then

{Y_{i}}_{1 \leq i \leq n}

is NA.

Whereas kNN regression under NA sequences has not been explored in the literature, in this paper, we extend the kNN estimation of functional data from the case of independent samples to NA sequences.

Let a pair

{(χ_{i}, Y_{i})}_{i = 1, \dots, n}

be a sample of NA pairs in

(χ, Y)

, which is a random vector valued in the

F \times R

.

(F, d)

is a semi-metric space,

F

is not necessarily of the finite dimension and we do not suppose the existence of a density for the functional random variable

χ

. For a fixed

χ \in F

, the closed ball with

χ

as the center and

ϵ

as the radius is denoted as:

d : B (χ, ϵ) = {χ^{'} \in F | d (χ^{'}, χ) \leq ϵ} .

The kNN regression estimator [10] is defined as follows:

{\hat{m}}_{k N N} (χ) = \frac{\sum_{i = 1}^{n} Y_{i} K (H_{n, k} {(χ)}^{- 1} d (χ_{i}, χ))}{\sum_{i = 1}^{n} K (H_{n, k} {(χ)}^{- 1} d (χ_{i}, χ))}, χ \in F,

(2)

where

K (\cdot)

is the kernel function supported on

[0, \infty)

.

H_{n, k} (χ)

is a positive random variable that depends on

(χ_{1}, χ_{2}, \dots, χ_{n})

and is defined by:

H_{n, k} (χ) = min {h \in R^{+} : \sum_{i = 1}^{n} I_{B (χ, h)} (χ_{i}) = k},

obviously, the kNN estimator can be seen as an expansion to a random locally adaptive neighborhood of the traditional kernel method [5] defined as:

{\hat{m}}_{n} (χ) = \frac{\sum_{i = 1}^{n} Y_{i} K (h_{n} {(χ)}^{- 1} d (χ_{i}, χ))}{\sum_{i = 1}^{n} K (h_{n} {(χ)}^{- 1} d (χ_{i}, χ))}, χ \in F,

(3)

where

h_{n} (χ)

is a sequence of positive real numbers such as

h_{n} (χ) \to 0

a.s.

n \to \infty

.

This paper is organized as follows. The main results of our paper about the asymptotic behavior of the kNN estimators using a data-driven random number of neighbors are given in Section 2. Section 3 illustrates the numerical performance of the proposed method, including nonparametric functional regression analysis of the sea level surface temperature (SST) data for the El Ni

\tilde{n}

o area (0–100 S, 800–900 W). The technical proofs are postponed to Section 4. Finally, Section 5 is devoted to comments on the results and to related perspectives for the future.

2. Assumptions and Main Results

In this section, we focus on the asymptotic property of the kNN regression estimator and need to state the convergence rate of an estimator.

One says that the rate of almost complete convergence of a sequence

{Y_{n}, n \geq 1}

to Y is of order

u_{n}

if only if for any

ϵ > 0

,

\sum_{n = 1}^{\infty} P (| Y_{n} - Y | > ϵ u_{n}) < \infty,

and we write

Y_{n} - Y = O_{a . c o .} (u_{n})

(see for instance [5]). By the Borel-Cantelli lemma, this implies that

\frac{Y_{n} - Y}{u_{n}} \to 0

almost surely, so almost complete convergence is a stronger result than almost sure convergence.

Our results are stated under some mild assumptions we gather below for easy references. Throughout the paper, we will denote by

C, C_{1}, C^{'}

some positive generic constants, which may be different in various places.

Assumption 1.

\forall ϵ > 0, P (χ \in B (χ, ϵ)) = φ_{χ} (ϵ) > 0

and

φ_{χ} (\cdot)

is a continuous function, and strictly monotonically increasing at the origin with

φ_{χ} (0) = 0

.

Assumption 2.

There exist a function

ϕ (\cdot) \geq 0

and a bounded function

f (\cdot) > 0

such that:

(i): F $ϕ (0) = 0$ , and $lim_{ϵ \to \infty} ϕ (ϵ) = 0 .$
(ii): $lim_{ϵ \to \infty} \frac{ϕ (u ϵ)}{ϕ (ϵ)} = 0,$ for any $u \in [0, 1] .$
(iii): $\exists τ > 0$ such that ${sup}_{χ \in S_{F}} |\frac{φ_{χ} (ϵ)}{ϕ (ϵ)} - f (χ)| = O (ϵ^{τ}), ϵ \to 0 .$

Assumption 3.

K (t)

is a nonnegative bounded kernel function with support [0, 1], and if

K (1) > 0

, the derivative

K^{^{'}} (t)

exists on [0, 1] satisfying:

- \infty < C < K^{^{'}} (t) < C^{'} < \infty, f o r \forall t \in [0, 1] .

Assumption 4.

m (\cdot)

is a bounded Lipschitz operator with order β on

S_{F}

, and there exists

β > 0

such that:

\forall χ_{1}, χ_{2} \in S_{F}, |m (χ_{1}) - m (χ_{2})| \leq C d {(χ_{1}, χ_{2})}^{β} .

Assumption 5.

\forall m \geq 2, E ({| Y |}^{m} ∣ X = χ) = δ_{m} (χ) < C

with

δ_{m} (\cdot)

continuous on

S_{F} .

Assumption 6.

Kolmogorov’s ϵ-entropy of

S_{F}

satisfies:

\sum_{n = 1}^{\infty} exp \{(1 - ω) Ψ_{S_{F}} (\frac{log n}{n})\} < \infty, f o r s o m e ω > 1 .

For

\forall ϵ > 0

, the Kolmogorov’s ϵ-entropy of some set

S_{F} \subset F

is defined by

Ψ_{S_{F}} = log (N_{ϵ} (S_{F})),

where

N_{ϵ} (S_{F})

is the minimal number of open balls, which can cover

S_{F}

with

χ_{1}, χ_{2}, \dots, χ_{N_{ϵ} (S_{F})}

as the center and ϵ as the radius in

F

.

Remark 1.

Assumption 1, Assumption 2((i)–(iii)) and Assumption 4 are the standard assumptions for small ball probability and regression operators in nonparametric FDA, see Kudraszow and Vieu [19]. Assumption 2(ii) will play a key role in the methodology particularly when we compute the asymptotic variance and permit it to be explicit in Ling and Wang [6]. Assumption 2(iii) shows that the small ball probability can be written as the product of the two independent functions

ϕ (\cdot)

and

f (\cdot)

, which has been used many times in Masry [11], Laib and Louani [12] and other literatures. Assumption 5 is standard in the nonparametric setting and concerns the existence of the conditional moments in Masry [11] and Burba [10], which aims to obtain the rate of uniform almost complete convergence. Assumption 6 assumes the Kolmogorov’s ϵ-entropy condition, which we will use in the following proof of the rate of uniform almost complete convergence.

Theorem 1.

Under Assumptions 1–6, suppose that sequence

{k_{n}, n \geq 1}

satisfies

\frac{k_{n}}{n} \to 0, n \to \infty

,

\frac{{log}^{2} n}{k_{n}} < Ψ_{S_{F}} (\frac{log n}{n}) < \frac{k_{n}}{log n}

and

0 < C_{1} < \frac{k_{n}}{{log}^{2} n} < C_{2} < \infty,

for n large enough, then we have:

sup_{χ \in S_{F}} |{\hat{m}}_{k N N} (χ) - m (χ)| = O_{a . c o .} (ϕ^{- 1} {(\frac{k_{n}}{n})}^{β} + \sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}) .

Remark 2.

The Theorem extends the kNN estimation result of Theorem 2 in Kudraszow and Vieu [19] from the independent case to the NA mixed dependent case, and obtains the same convergence rate under the assumptions. Second, the almost complete convergence rate of the prediction operator is divided into two parts, one part affected by strong mixing and Kolmogorov’s

ϵ

-entropy, and the other part depends on the smoothness of the regression operator and smoothness parameter k.

Corollary 1.

Under the condition of the Theorem, we have:

sup_{χ \in S_{F}} |{\hat{m}}_{k N N} (χ) - m (χ)| = O_{a . s .} (ϕ^{- 1} {(\frac{k_{n}}{n})}^{β} + \sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}) .

Corollary 2.

Under the condition of the Theorem, we have:

sup_{χ \in S_{F}} |{\hat{m}}_{k N N} (χ) - m (χ)| = O_{P} (ϕ^{- 1} {(\frac{k_{n}}{n})}^{β} + \sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}) .

3. Simulation

3.1. A simulation Study

In this section, we aim at illustrating the performance of the nonparametric functional regression model and we will make a comparison with traditional kernel density estimation methods. We consider the nonparametric functional regression model:

Y_{i} = m (χ_{i}) + ε_{i},

where

m (χ_{i}) = {(\int_{0}^{\frac{π}{5}} χ_{i}^{^{'}} (t) d t)}^{2}

,

ε_{i}

is distributed according to

N (0, 0.05)

, the functional curve

χ_{i} (t)

is generated in the following way:

χ_{i} (t) = a_{i} t^{3} + a r c t a n (b_{i} (t - \frac{π}{5})), t \in [0, \frac{π}{5}], i = 1, 2, \dots, n .

where

\{a_{i}\} \sim N (0, \frac{π}{10}), i = 1, 2, \dots, n

,

\{b_{1}, b_{2}, \dots, b_{n}\} \sim N_{n} (0, Σ)

, 0 represents zero vector and the covariance matrix is defined as:

\sum = {(\begin{matrix} 1 + θ^{2} & - θ & 0 & \dots & 0 & 0 & 0 \\ - θ & 1 + θ^{2} & - θ & \dots & 0 & 0 & 0 \\ 0 & - θ & 1 + θ^{2} & \dots & 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 & \dots & 1 + θ^{2} & - θ & 0 \\ 0 & 0 & 0 & \dots & - θ & 1 + θ^{2} & - θ \\ 0 & 0 & 0 & \dots & 0 & - θ & 1 + θ^{2} \end{matrix})}_{n \times n}, 0 < θ < 1 .

By the definition of NA, it can be seen that

(b_{1}, b_{2}, \dots, b_{n})

is an NA vector for each

n \geq 3

with a finite moment of any order (see Wu and Wang [22]).

We choose casually that

θ = 0.4

, the sample sizes n as

n = 330

, t takes 1000 equispaced values in

[0, \frac{π}{5}]

. We carry out the simulation of the curve

χ (t)

for the 330 samples (see Figure 1).

Figure 1. Curve-sample with sample size of n = 330.

We consider the Epanechnikov kernel given by

K (u) = \frac{3}{4} (1 - u^{2}) I_{[0, 1]} (u)

, and the semi-metrics

d (\cdot, \cdot)

based on derivatives of order q.

d (χ_{i}, χ_{j}) = \sqrt{\int_{0}^{\frac{π}{5}} {(χ_{i}^{(q)} (t) - χ_{j}^{(q)} (t))}^{2} d t}, \forall χ_{i}, χ_{j} \in F, q = {0, 1, 2, \dots} .

Our purpose is to compare the mean square error (MSE) of the kNN method with the NW kernel approach on finite simulated datasets. In the finite sample simulation, the following steps are followed.

Step 1: We take 300 curves to construct the training samples

{\{χ_{i}, Y_{i}\}}_{i = 1}^{300}

, and the other 30 constitute the test samples

{χ_{i}, Y_{i}}_{i = 301}^{330}

.

Step 2: In the training sample, the parameters k and h in the kNN method and NW kernel method are automatically selected based on the cross-validation method, respectively.

Step 3: Based on the MSE standard (see [4] for details), we obtain that the respective semi-metric parameters q in both the kNN method and the NW method takes

q = 1

.

Step 4: The response values

{\{{\hat{Y}}_{i}\}}_{i = 301}^{330}

and

{\{{\tilde{Y}}_{i}\}}_{i = 301}^{330}

of the test sample

{\{Y_{i}\}}_{i = 301}^{330}

are calculated by using the kNN method and the NW method, respectively, and their MSE and scatter plots against the true value

{Y_{i}}_{i = 301}^{330}

are represented by Figure 2.

Figure 2. Prediction effects of the two estimation methods. (a) kNN estimation method. (b) NW estimation method.

As we can see in Figure 2, the MSE of the kNN method is much smaller than that of the NW method, and the scattered points in Figure 2 are more densely distributed around the linear function

y = x

, which shows that the kNN method has a better fit and higher prediction accuracy for the NA dependent functional samples.

The kNN method and NW method were used to conduct 100 independent replicated experiments at sample sizes of

n = 200, 300, 500, 800

, respectively. AMSE was calculated for both methods at different sample sizes using the following equation.

AMSE = \frac{1}{100} \sum_{j = 1}^{100} \frac{1}{30} \sum_{i = n - 30}^{n} {({\bar{Y}}_{i} - Y_{i})}^{2}, {\bar{Y}}_{i} = {\hat{Y}}_{i}, {\tilde{Y}}_{i}, n = 200, 300, 500, 800

As can be seen from Table 1, the AMSE of the kNN method is much smaller than that of the NW kernel method when the sample size is fixed at

n = 200, 300, 500, 800

, respectively; when the estimation method is fixed, the AMSE of the two estimation methods have the same trend—they both decrease as the sample size increases. However, the decreasing speed of the kNN method is significantly faster than that of the NW kernel method.

Table 1. The AMSE of the predicted response variables of the two methods under different sample sizes.

3.2. A Real Study

This section applies the proposed kNN regression analysis of the data, which consist of the sea level surface temperature (SST) for the El

N i \tilde{n} o

area (0–100 S, 800–900 W) for a total of 31 years from 1 January 1990 to 31 December 2020. The data are available online at the website: https://www.cpc.ncep.noaa.gov/data/indices/ (accessed on 1 January 2022). More relevant discussions of these data can be found in Ezzahrioui et al. [13,14], Delsol et al. [23], and Ferraty et al. [24] The 1618 weekly SST data from the original data were preprocessed and averaged by month to obtain 372 monthly average SST discrete data. Figure 3 displays the decomposition of the multiplicative time series of the monthly SST.

Figure 3. Monthly mean SST factor decomposition fitting comprehensive output diagram.

Figure 4 shows that the monthly average SST in El Ni

\tilde{n}

o regions from 1990 to 2020 had a clear seasonal variation, and the monthly trend of SST can also clearly be observed from the seasonal index plot of the monthly mean SST.

Figure 4. Time series curve of SST in El Ni

\tilde{n}

o during 31 years.

The main factors affecting the temperature variation can be generally summarized as seasonal factors and random fluctuations. If the seasonal factor is removed, the SST should be left with only random fluctuations, i.e., the values fluctuate up and down at some mean value. At the same time, if the effect of random fluctuations is not considered, the SST is left with only the seasonal factor, i.e., the SST will have similar values in the same month in different years.

The following steps implement the kNN regression estimation method for the analysis of the SST data and display the comparison with the NW sum estimation method in Figure 5.

Figure 5. Forecast value of SST in 2020 by KNN method and NW method.

Step 1: Transform 372 months (31 years) of SST data

{Z_{i}, i = 1, \dots, 372}

into functional data.

Step 2: Divide the 31 samples of data

{(χ_{j}, Y_{j} (s))}_{j = 1, \dots, 31}

into two parts: 30 training samples of data

{(χ_{j}, Y_{j} (s))}_{j = 1, \dots, 30}

for model fitting and 1 test sample of data

(χ_{31}, Y_{31} (s))

for prediction assessment.

Step 3: Here, the functional principal component analysis (FPCA) is applicable to semi-measures for rough curves such as SST data (see Chapter 3 of Ferraty et al. [25] for the methodology). A quadratic kernel function used in Section 3.1 is used in kNN regression.

Step 4: The SST values

({\hat{Y}}_{31} (s), s = 1, \dots, 12)

for 12 months in 2020 are predicted by the kNN method and the NW method, respectively, along with obtaining their MSEs for both methods.

Then, in step 1, we split the discrete monthly average temperature data of 372 months into 31 years of temperature profiles and express them as

χ_{i} = {Z_{i} (t), 12 (j - 1) < t < 12 j}, i = 1, \dots, 31

. Therefore, the response variable can be expressed as

Y_{j} (s) = {Z_{12 j + s}, s = 1, \dots, 12}

,

j = 1, \dots, 30

. Thus,

{(χ_{j}, Y_{j} (s))}_{j = 1, \dots, 30}

is the sample set of dependent function type with a sample size of 30, where

χ_{j}

is the function type data, and

Y_{j} (s)

is a real value.

In Step 3, the choice of parameters q for the kNN method and NW method is performed via computation of cross-validation in R, which gives

q = 3

and

q = 1

for the kNN regression method and NW method, respectively. The selection of parameters k and h is similar to Section 3.1.

From Figure 5, which compares the MSE values calculated by the two methods, it can be seen that the MSE of the kNN method is much smaller than that of the NW method. Further, noting that the degree of fit between the curves fitted by the two methods to the true curve (dotted line), the predicted curves by two methods are generally closer to the true curve, indicating that the prediction effect of both methods is very good. However, a closer look reveals that the predicted values of the kNN method obviously have better fitting at the inflection points of the curves, such as January, February, March, November and December, which fully reflect the fact that the kNN method pays more attention to the local variation than the NW method when processing the data like this, including the abnormal or extreme distribution of the response variable.

4. Proof of Theorem

In order to prove the main results, we give some lemmas. Let

{(A_{i}, B_{i})}_{i = 1, 2, \dots, n}

be n random pairs valued in

(Ω \times R, A \times B (R))

, where

(Ω, A)

is a general measurable space. Let

S_{Ω}

be a fixed subset of

Ω

,

G (\cdot, (χ, \cdot)) : R \times (S_{Ω} \times Ω) \to R^{+}

be a measurable function, for

\forall t, t^{'} \in R

,

(L_{0}) : t \leq t^{'} \Rightarrow G (t, z) \leq G (t^{'}, z), \forall z \in S_{Ω} \times Ω,

{D_{n} (χ)}_{n \in N}

is a sequence of random real variables (r.r.v.), and

c (\cdot) : S_{Ω} \to R

is a nonrandom function such that

{sup}_{χ \in S_{Ω}} |c (χ)| < \infty

. For

\forall χ \in S_{Ω}, n \in N / {0}

, we define:

c_{n, χ} (t) = \frac{\sum_{i = 1}^{n} B_{i} G (t, (χ, A_{i}))}{\sum_{i = 1}^{n} G (t, (χ, A_{i}))} .

Lemma 1

([10]). Let

{u_{n} (χ)}_{n \in N}

be a decreasing positive real sequence satisfying

{lim}_{n \to \infty} u_{n} = 0

. For any increasing sequences

β_{n} \in (0, 1)

and

β_{n} - 1 = O (u_{n})

, there exist two real random sequences

{D_{n}^{-} (β_{n}, χ)}_{n \in N}

and

{D_{n}^{+} (β_{n}, χ)}_{n \in N}

such that:

(L1): $D_{n}^{-} (β_{n}, χ) \leq D_{n}^{+} (β_{n}, χ), \forall n \in N, \forall χ \in S_{Ω}$ ,
(L2): $I_{\{D_{n}^{-} (β_{n}, χ) \leq D_{n} (β_{n}, χ) \leq D_{n}^{+} (β_{n}, χ), \forall χ \in S_{Ω}\}} \to 1,$ a.co. $n \to \infty,$
(L3): ${sup}_{χ \in S_{Ω}} |\frac{\sum_{i = 1}^{n} G (D_{n}^{-} (β_{n}, χ))}{\sum_{i = 1}^{n} G (D_{n}^{+} (β_{n}, χ))} - β_{n}| = O_{a . c o .} (u_{n}),$
(L4): ${sup}_{χ \in S_{Ω}} |c_{n, χ} (D_{n}^{-} (β_{n}, χ)) - c (χ)| = O_{a . c o .} (u_{n}),$
(L5): ${sup}_{χ \in S_{Ω}} |c_{n, χ} (D_{n}^{+} (β_{n}, χ)) - c (χ)| = O_{a . c o .} (u_{n}),$

then, we have:

sup_{χ \in S_{Ω}} |c_{n, χ} (D_{n} (β_{n}, χ)) - c (χ)| = O_{a . c o .} (u_{n}) .

The proof of Lemma 1 is not presented here because it follows, step by step, the same argument in Burba et al. [10], Kudraszow and Vieu [19].

Lemma 2

([26]). Let

{X_{n}, n \in N}

be an NA random sequence with zero mean, and there exists a positive constant

c_{k}, k = 1, 2, \dots, n

such that

| X_{k} | \leq c_{k}

, let

S_{n} = X_{1} + X_{2} + \dots + X_{n}

. For any

ϵ > 0

, we get:

P (S_{n} \geq n ϵ) \leq exp \{\frac{- n^{2} ϵ^{2}}{2 \sum_{i = 1}^{n} c_{i}^{2}}\},

(4)

and

P (| S_{n} | \geq n ϵ) \leq 2 exp \{\frac{- n^{2} ϵ^{2}}{2 \sum_{i = 1}^{n} c_{i}^{2}}\} .

Lemma 3.

Suppose that Assumptions 1–6 hold, and

h_{n} (χ) \to 0

a.s.

n \to \infty

in model (3) satisfying:

lim (φ_{χ} (H_{n, k} (χ)) - φ_{χ} (h_{n} (χ))) = 0,

(5)

0 < C_{1} h_{n} \leq inf_{χ \in S_{F}} h_{n} (χ) \leq sup_{χ \in S_{F}} h_{n} (χ) \leq C_{2} h_{n} < \infty,

(6)

and for n large enough,

\frac{{log}^{2} n}{n ϕ (h_{n})} < Ψ_{S_{F}} (\frac{log n}{n}) < \frac{n ϕ (h_{n})}{log n},

(7)

0 < C_{1} < \frac{n ϕ (h_{n})}{{log}^{2} n} < C_{2} < \infty,

(8)

then we have:

sup_{χ \in S_{F}} |{\hat{m}}_{n} (χ) - m (χ)| = O_{a . c o .} (h_{n}^{β}) + O_{a . c o .} (\sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}),

(9)

where

ϵ = \frac{log n}{n} .

Proof of Lemma 3.

In order to simplify the proof, we introduce some notations in this article. For

\forall χ \in S_{F}

, let

k (χ) = arg min_{k = 1, 2, \dots, N_{ϵ} (S_{F})} d (χ, χ_{k})

,

s_{n}^{2} = max \{s_{n, 1}^{2}, s_{n, 2}^{2}, s_{n, 3}^{2}, s_{n, 4}^{2}\}

be the mixed operator covariance,

s_{n, 1}^{2} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} |Cov (Y_{i} u_{i}, Y_{j} u_{j})|, s_{n, 2}^{2} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} |Cov (v_{i}, v_{j})|,

s_{n, 3}^{2} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} |Cov (u_{i}, u_{j})|, s_{n, 4}^{2} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} |Cov (w_{i}, w_{j})|,

where

u_{i} = I_{B (χ_{k_{χ}}, C_{2} h_{n} (χ) + ϵ)}

,

0 < h_{n} \to 0,

v_{i} = \frac{Y_{i} K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{i}))}{E K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{1}))} - \frac{E (Y_{i} K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{i}))}{E K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{1}))},

w_{i} = \frac{K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{i}))}{E K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{1}))} - \frac{E K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{i}))}{E K (h_{n} {(χ_{k})}^{- 1} d (χ_{k}, χ_{1}))} .

For the fixed

χ \in S_{F}

in model (3), we have the decomposition as follows:

\begin{matrix} {\hat{m}}_{n} (χ) - m (χ) & = \frac{{\hat{m}}_{2 n} (χ)}{{\hat{m}}_{1 n} (χ)} - m_{n} (χ) \\ = \frac{1}{{\hat{m}}_{1 n} (χ)} [{\hat{m}}_{2 n} (χ) - E {\hat{m}}_{n} (χ)] + \frac{1}{{\hat{m}}_{1 n} (χ)} [E {\hat{m}}_{n} (χ) - m_{n} (χ)] \\ + \frac{m_{n} (χ)}{{\hat{m}}_{1 n} (χ)} [1 - {\hat{m}}_{1 n} (χ)] . \end{matrix}

where:

{\hat{m}}_{1 n} (χ) = \frac{\sum_{i = 1}^{n} K (h_{n} {(χ)}^{- 1} d (χ, χ_{i}))}{n E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))}, {\hat{m}}_{2 n} (χ) = \frac{\sum_{i = 1}^{n} Y_{i} K (h_{n} {(χ)}^{- 1} d (χ, χ_{i}))}{n E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))} .

It suffices to prove the three following results in order to establish (9),

sup_{χ \in S_{F}} |E {\hat{m}}_{2 n} (χ) - m (χ)| = O_{a . c o .} (h_{n}^{β}),

(10)

sup_{χ \in S_{F}} |{\hat{m}}_{2 n} (χ) - E {\hat{m}}_{2 n} (χ)| = O_{a . c o .} (\sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}),

(11)

sup_{χ \in S_{F}} |{\hat{m}}_{1 n} (χ) - 1| = sup_{χ \in S_{F}} |{\hat{m}}_{1 n} (χ) - E {\hat{m}}_{1 n} (χ)| = O_{a . c o .} (\sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) .

(12)

As to the Equation (10). For

χ \in S_{F}

, by the Equation (6) and Assumption 4, it follows that:

\begin{matrix} |E {\hat{m}}_{2 n} (χ) - m (χ)| & = |E \frac{\sum_{i = 1}^{n} Y_{i} K (h_{n} {(χ)}^{- 1} d (χ, χ_{i}))}{n E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))} - m (χ)| \\ = |\frac{E Y_{1} K (h_{n} {(χ)}^{- 1} d (χ, χ_{i}))}{E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))} - m (χ)| \\ = |\frac{m (χ_{1}) E K (h_{n} {(χ)}^{- 1} d (χ, χ_{i})) - m (χ) E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))}{E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))}| \\ = |m (χ_{1}) - m (χ)| = O_{a . c o .} (h_{n}^{β}) . \end{matrix}

Then, we need to show the Equation (11). In fact, we have the decomposition as follows:

\begin{matrix} sup_{χ \in S_{F}} |{\hat{m}}_{2 n} (χ) - E {\hat{m}}_{2 n} (χ)| \\ \leq sup_{χ \in S_{F}} |{\hat{m}}_{2 n} (χ) - {\hat{m}}_{2 n} (χ_{k (χ)})| + sup_{χ \in S_{F}} |{\hat{m}}_{2 n} (χ_{k (χ)}) - E {\hat{m}}_{2 n} (χ_{k (χ)})| \\ + sup_{χ \in S_{F}} |E {\hat{m}}_{2 n} (χ_{k (χ)}) - E {\hat{m}}_{2 n} (χ)| = : I_{1} + I_{2} + I_{3} . \end{matrix}

For

I_{1}

, by Assumption 3, it is easily seen that:

0 < C_{1} < E K (h_{n} {(χ)}^{- 1} d (χ, χ_{i})) < C_{2} < \infty,

thus,

\begin{matrix} I_{1} & = sup_{χ \in S_{F}} |\frac{\sum_{i = 1}^{n} Y_{i} K (h_{n} {(χ)}^{- 1} d (χ, χ_{i}))}{n E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))} - \frac{\sum_{i = 1}^{n} Y_{i} K (h_{n} {(χ_{k (χ)})}^{- 1} d (χ_{k (χ)}, χ_{i}))}{n E K (h_{n} {(χ_{k (χ)})}^{- 1} d (χ_{k (χ)}, χ_{1}))}| \\ = sup_{χ \in S_{F}} \frac{1}{n} \sum_{i = 1}^{n} Y_{i} |\frac{K (h_{n} {(χ)}^{- 1} d (χ, χ_{i}))}{E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))} - \frac{K (h_{n} {(χ_{k (χ)})}^{- 1} d (χ_{k (χ)}, χ_{i}))}{E K (h_{n} {(χ_{k (χ)})}^{- 1} d (χ_{k (χ)}, χ_{1}))}| \\ \leq C sup_{χ \in S_{F}} \frac{1}{n} \sum_{i = 1}^{n} Y_{i} |K (\frac{d (χ, χ_{i})}{h_{n} (χ)}) - K (\frac{d (χ_{k (χ)}, χ_{i})}{h_{n} (χ_{k (χ)})})| I_{B (χ, h_{n} (χ)) \cup B (χ_{k (χ)}, h_{n} (χ))} (χ_{i}) \\ \leq C sup_{χ \in S_{F}} \frac{1}{n} \sum_{i = 1}^{n} Y_{i} I_{B (χ_{k (χ)}, C_{2} h_{n} (χ) + ϵ)} (χ_{i}), \end{matrix}

for

\forall η > 0

, we have:

\begin{matrix} P (I_{1} > η \sqrt{\frac{s_{n, 1}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) \\ \leq P (C sup_{χ \in S_{F}} \frac{1}{n} \sum_{i = 1}^{n} Y_{i} I_{B (χ_{k (χ)}, C_{2} h_{n} (χ) + ϵ)} (χ_{i}) > η \sqrt{\frac{s_{n, 1}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) \\ \leq C N_{ϵ} (S_{F}) max_{k \in \{χ_{1}, χ_{2}, \dots, χ_{N_{ϵ} (S_{F})}\}} P (\sum_{i = 1}^{n} | Y_{i} | I_{B (χ_{k (χ)}, C_{2} h_{n} (χ) + ϵ)} (χ_{i}) > η \sqrt{s_{n, 1}^{2} Ψ_{S_{F}} (ϵ)}) . \end{matrix}

According to (4) in Lemma 2 and Assumption 6, we have:

\begin{matrix} P (\sum_{i = 1}^{n} | Y_{i} | I_{B (χ_{k (χ)}, C_{2} h_{n} (χ) + ϵ)} (χ_{i}) > η \sqrt{s_{n, 1}^{2} Ψ_{S_{F}} (ϵ)}) \\ \leq exp \{\frac{- η s_{n, 1}^{2} Ψ_{S_{F}} (ϵ)}{2 \sum_{i = 1}^{n} c_{i}^{2}}\} \leq exp \{(1 - \frac{- η s_{n, 1}^{2}}{2 \sum_{i = 1}^{n} c_{i}^{2}}) - Ψ_{S_{F}} (ϵ)\} < \infty . \end{matrix}

Hence, it follows that:

I_{1} = O_{a . c o .} (\sqrt{\frac{s_{n, 1}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) .

(13)

For

I_{2}

, similar to the proof of

I_{1}

, for

\forall η > 0

, we have:

\begin{matrix} P (I_{2} > η \sqrt{\frac{s_{n, 2}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) \\ = P (sup_{χ \in S_{F}} |\frac{\sum_{i = 1}^{n} Y_{i} K (\frac{d (χ_{k (χ)}, χ_{i})}{h_{n} (χ_{k (χ)})})}{n E K (\frac{d (χ_{k (χ)}, χ_{1})}{h_{n} (χ_{k (χ)})})} - \frac{E \sum_{i = 1}^{n} Y_{i} K (\frac{d (χ_{k (χ)}, χ_{i})}{h_{n} (χ_{k (χ)})})}{n E K (\frac{d (χ_{k (χ)}, χ_{1})}{h_{n} (χ_{k (χ)})})}| > η \sqrt{\frac{s_{n, 2}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) \\ = P (sup_{χ \in S_{F}} \frac{1}{n} \sum_{i = 1}^{n} |\frac{Y_{i} K (\frac{d (χ_{k (χ)}, χ_{i})}{h_{n} (χ_{k (χ)})})}{E K (\frac{d (χ_{k (χ)}, χ_{1})}{h_{n} (χ_{k (χ)})})} - \frac{E Y_{i} K (\frac{d (χ_{k (χ)}, χ_{i})}{h_{n} (χ_{k (χ)})})}{E K (\frac{d (χ_{k (χ)}, χ_{1})}{h_{n} (χ_{k (χ)})})}| > η \sqrt{\frac{s_{n, 2}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) \\ \leq N_{ϵ} (S_{F}) max_{k \in \{χ_{1}, χ_{2}, \dots, χ_{N_{ϵ} (S_{F})}\}} P (\frac{1}{n} \sum_{i = 1}^{n} |\frac{Y_{i} K (\frac{d (χ_{k (χ)}, χ_{i})}{h_{n} (χ_{k (χ)})})}{E K (\frac{d (χ_{k (χ)}, χ_{1})}{h_{n} (χ_{k (χ)})})} - \frac{E Y_{i} K (\frac{d (χ_{k (χ)}, χ_{i})}{h_{n} (χ_{k (χ)})})}{E K (\frac{d (χ_{k (χ)}, χ_{1})}{h_{n} (χ_{k (χ)})})}| > η \sqrt{\frac{s_{n, 2}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) \\ \leq C N_{ϵ} (S_{F}) max_{k \in \{χ_{1}, χ_{2}, \dots, χ_{N_{ϵ} (S_{F})}\}} P (\frac{1}{n} \sum_{i = 1}^{n} |Y_{i}| I_{B (χ_{k (χ)}, C_{2} h_{n} (χ) + ϵ)} (χ_{i}) > η \sqrt{\frac{s_{n, 2}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) . \end{matrix}

Thus,

I_{2} = O_{a . c o .} (\sqrt{\frac{s_{n, 2}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) .

(14)

Finally, for

I_{3}

, we can get

I_{3} \leq E ({sup}_{χ \in S_{F}} |{\hat{m}}_{2 n} (χ_{k (χ)}) - {\hat{m}}_{2 n} (χ)|)

. The proof process is similar to

I_{1}

, and we can obtain:

I_{3} = O_{a . c o .} (\sqrt{\frac{s_{n, 1}^{2} Ψ_{S_{F}} (ϵ)}{n^{2}}}) .

(15)

Therefore, combining the Equations (13)–(15), the Equation (11) can be established.

Similarly, we may prove the Equation (12). Hence, the proof of Lemma 3 is completed. □

Proof of Theorem 1.

According to Lemma 1, let

S_{Ω} = S_{F}

,

A_{i} = χ_{i}

,

B_{i} = Y_{i}

,

G (t, (χ, A_{i})) = K (t^{- 1} d (χ, χ_{i}))

,

D_{n} (χ) = H_{n, k} (χ)

,

c_{n, χ} (χ) = {\hat{m}}_{k N N} (χ)

,

c (χ) = {\hat{m}}_{n} (χ)

. Let

β_{n} \in (0, 1)

be an increasing sequence such that

β_{n} - 1 = O (u_{n})

, where

u_{n} = ϕ^{- 1} {(\frac{k_{n}}{n})}^{β} + \sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}

is a decreasing positive real sequence such that

{lim}_{n \to \infty} u_{n} = 0

and

h_{n} = ϕ^{- 1} {(\frac{k_{n}}{n})}^{β}

. Let

{\{D_{n}^{-} (β_{n}, χ)\}}_{n \in N}

and

{\{D_{n}^{+} (β_{n}, χ)\}}_{n \in N}

be two real random sequences such that:

φ_{χ} (D_{n}^{-} (β_{n}, χ)) = φ_{χ} (h_{n} (χ)) β_{n}^{\frac{1}{2}},

(16)

φ_{χ} (D_{n}^{+} (β_{n}, χ)) = φ_{χ} (h_{n} (χ)) β_{n}^{- \frac{1}{2}},

(17)

Firstly, we verify the conditions

(L_{4})

and

(L_{5})

in Lemma 1. By

φ_{χ} (D_{n}^{-} (β_{n}, χ))

and

β_{n} - 1 = O (u_{n})

, it is easy to follow that the local bandwidth

D_{n}^{-} (β_{n}, χ)

satisfies the condition (5). Combining

h_{n} = ϕ^{- 1} {(\frac{k_{n}}{n})}^{β}

with Assumption 2, it follows that

h_{n} (χ)

satisfies the condition (6). Let

k_{n} = n ϕ (h_{n})

, from Assumption 2(i) we obtain that

\frac{k_{n}}{n} = \frac{n ϕ (h_{n})}{n} = n ϕ (h_{n})

is satisfied. Hence, according to the conditions of the Theorem, the Equations (7) and (8) in Lemma 3 hold. Thus, by Lemma 3, we have:

sup_{χ \in S_{F}} |c_{n, χ} (D_{n}^{-} (β_{n}, χ)) - c (χ)| = O_{a . c o .} (ϕ^{- 1} {(\frac{k_{n}}{n})}^{β} + \sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}) = O_{a . c o .} (u_{n}) .

Similarly, for

D_{n}^{+} (β_{n}, χ)

, we can also get:

sup_{χ \in S_{F}} |c_{n, χ} (D_{n}^{+} (β_{n}, χ)) - c (χ)| = O_{a . c o .} (ϕ^{- 1} {(\frac{k_{n}}{n})}^{β} + \sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}) = O_{a . c o .} (u_{n}) .

Secondly, checking the conditions

(L_{1})

and

(L_{2})

in Lemma 1, and combining (16) and (17) with

β_{n} \in (0, 1)

, it is clearly followed that:

φ_{χ} (D_{n}^{-} (β_{n}, χ)) \leq φ_{χ} (h_{n} (χ)) \leq φ_{χ} (D_{n}^{+} (β_{n}, χ)),

(18)

By Assumption 1 we get:

D_{n}^{-} (β_{n}, χ) \leq h_{n} (χ) \leq D_{n}^{+} (β_{n}, χ) .

According to (5) and (18), for

n \to \infty

, we have:

φ_{χ} (D_{n}^{-} (β_{n}, χ)) \leq φ_{χ} (H_{n, χ} (χ)) \leq φ_{χ} (D_{n}^{+} (β_{n}, χ)),

That is:

φ_{χ} (D_{n}^{-} (β_{n}, χ)) \leq φ_{χ} (D_{n} (χ)) \leq φ_{χ} (D_{n}^{+} (β_{n}, χ)),

Therefore, by Assumption 1 we can get:

D_{n}^{-} (β_{n}, χ) \leq D_{n} (χ) \leq D_{n}^{+} (β_{n}, χ),

Thus,

I_{\{D_{n}^{-} (β_{n}, χ) \leq D_{n} (β_{n}, χ) \leq D_{n}^{+} (β_{n}, χ), \forall χ \in S_{F}\}} \to 1, a . c o . n \to \infty .

(L_{2})

is checked.

Finally, we establish the condition

(L_{3})

in Lemma 1. Similar to Kudraszow and Vieu [19], we denote:

f^{*} (χ, h_{n} (χ)) = E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1})), \forall χ \in S_{F} .

and let:

F_{1} = \frac{f^{*} (χ, D_{n}^{-} (β_{n}, χ))}{f^{*} (χ, D_{n}^{+} (β_{n}, χ))}, F_{2} = \frac{{\hat{m}}_{1 n} (χ, D_{n}^{-} (β_{n}, χ))}{{\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))} - 1, F_{3} = \frac{f^{*} (χ, D_{n}^{+} (β_{n}, χ))}{f^{*} (χ, D_{n}^{-} (β_{n}, χ))} β_{n} - 1 .

Then,

(L_{3})

can be decomposed as follows:

\begin{matrix} sup_{χ \in S_{F}} |\frac{\sum_{i = 1}^{n} G (D_{n}^{-} (β_{n}, χ))}{\sum_{i = 1}^{n} G (D_{n}^{+} (β_{n}, χ))} - β_{n}| \\ = |\frac{{\hat{m}}_{1 n} (χ, D_{n}^{-} (β_{n}, χ))}{{\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))} \frac{f^{*} (χ, D_{n}^{-} (β_{n}, χ))}{f^{*} (χ, D_{n}^{+} (β_{n}, χ))} - \frac{f^{*} (χ, D_{n}^{+} (β_{n}, χ))}{f^{*} (χ, D_{n}^{-} (β_{n}, χ))} \frac{f^{*} (χ, D_{n}^{-} (β_{n}, χ))}{f^{*} (χ, D_{n}^{+} (β_{n}, χ))} β_{n}| \\ = |\frac{f^{*} (χ, D_{n}^{-} (β_{n}, χ))}{f^{*} (χ, D_{n}^{+} (β_{n}, χ))}| |\frac{{\hat{m}}_{1 n} (χ, D_{n}^{-} (β_{n}, χ))}{{\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))} - \frac{f^{*} (χ, D_{n}^{+} (β_{n}, χ))}{f^{*} (χ, D_{n}^{-} (β_{n}, χ))} β_{n}| \\ \leq |\frac{f^{*} (χ, D_{n}^{-} (β_{n}, χ))}{f^{*} (χ, D_{n}^{+} (β_{n}, χ))}| (|\frac{{\hat{m}}_{1 n} (χ, D_{n}^{-} (β_{n}, χ))}{{\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))} - 1| + |\frac{f^{*} (χ, D_{n}^{+} (β_{n}, χ))}{f^{*} (χ, D_{n}^{-} (β_{n}, χ))} β_{n} - 1|) \\ ≜ | F_{1} | | F_{2} | + | F_{1} | | F_{3} | . \end{matrix}

(19)

By Assumption 3, it is followed that:

sup_{χ \in S_{F}} | F_{1} | \leq C,

(20)

and for

\forall χ \in S_{F}

,

{\hat{m}}_{1 n} (χ) = \frac{\sum_{i = 1}^{n} K (h_{n} {(χ)}^{- 1} d (χ, χ_{i}))}{n E K (h_{n} {(χ)}^{- 1} d (χ, χ_{1}))},

refering to Ferraty et al. [25], we have:

sup_{χ \in S_{F}} |{\hat{m}}_{1 n} (χ) - 1| = O_{a . c o .} (\sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}),

Therefore,

\begin{matrix} sup_{χ \in S_{F}} | F_{2} | & = sup_{χ \in S_{F}} |\frac{{\hat{m}}_{1 n} (χ, D_{n}^{-} (β_{n}, χ))}{{\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))} - 1| \\ = sup_{χ \in S_{F}} |\frac{{\hat{m}}_{1 n} (χ, D_{n}^{-} (β_{n}, χ)) - 1 + 1 - {\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))}{{\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))}| \\ \leq \frac{{sup}_{χ \in S_{F}} |{\hat{m}}_{1 n} (χ, D_{n}^{-} (β_{n}, χ)) - 1| + {sup}_{χ \in S_{F}} |{\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ)) - 1|}{{inf}_{χ \in S_{F}} {\hat{m}}_{1 n} (χ, D_{n}^{+} (β_{n}, χ))} \\ = O_{a . c o .} (\sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}) . \end{matrix}

(21)

Moreover, for

F_{3}

, according to Lemma 1 in Ezzahrioui and Ould-said [13] and Assumption 2(iii), there exists

τ > 0

, for

\forall χ \in S_{F}

,

f^{*} (χ, h_{n} (χ)) = ϕ (h_{n} (χ)) τ f (χ) + O (ϕ (h_{n} (χ)) h_{n} {(χ)}^{β}) = τ φ (h_{n} (χ)) + O (ϕ (h_{n}) h_{n}^{β}),

by

\frac{φ (D_{n}^{-} (β_{n}, χ))}{φ (D_{n}^{+} (β_{n}, χ))} = β_{n}

,

{sup}_{χ \in S_{F}} | F_{3} | = O (ϕ (h_{n}) h_{n}^{β}) = O (\sqrt{β_{n}} ϕ^{- 1} {(\frac{k_{n}}{n})}^{β})

holds. Hence, for

β_{n} \to 1

, it follows that

sup_{χ \in S_{F}} | F_{3} | = O (ϕ^{- 1} {(\frac{k_{n}}{n})}^{β}) .

(22)

Combining (19)–(22), we obtain:

sup_{χ \in S_{F}} |\frac{\sum_{i = 1}^{n} G (D_{n}^{-} (β_{n}, χ))}{\sum_{i = 1}^{n} G (D_{n}^{+} (β_{n}, χ))} - β_{n}| = O_{a . c o .} (u_{n}) .

(L_{3})

is established.

Thus, the conditions

(L 1)

–

(L 5)

in Lemma 1 have been established. By Lemma 1, we can get:

sup_{χ \in S_{F}} |{\hat{m}}_{K N N} (χ) - m (χ)| = O_{a . c o .} (ϕ^{- 1} {(\frac{k_{n}}{n})}^{β} + \sqrt{\frac{s_{n}^{2} Ψ_{S_{F}} (\frac{log n}{n})}{n^{2}}}) .

The proof of the Theorem 1 is completed. □

5. Conclusions and Future Research

Functional data analysis deals with the analysis and theory of data that are in the form of functions, images and shapes, or more general objects. In a way, correlation is really the heart of data science. The correlation between variables may be complicated, from simply independent to

α

-mixing or something else, such as negatively associated (NA). The kNN method, as one of the nonparametric methods, is very useful in statistical estimation and machine learning. While regression analysis of functional data under many variable correlated cases, except NA sequences, has been explored. This paper builds a kNN regression estimator of the functional regression model. In particular, we obtain the almost complete convergence rate of kNN estimation. Some simulated experiments and real data analyses illustrate the feasibility and the finite-sample behavior of the method. Further work includes introducing the kNN machine learning algorithm for functional data analysis and kNN high-dimensional modeling with NA sequences.

Author Contributions

Conceptualization, X.H. and J.W.; methodology, X.H.; software, J.W.; writing—original draft preparation, X.H. and J.W.; writing—review and editing, K.Y. and L.W.; visualization, K.Y.; supervision, X.H.; project administration, K.Y.; funding acquisition, K.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Foundation (Grant No. 21BTJ040).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://www.cpc.ncep.noaa.gov/data/indices/ (accessed on 9 January 2022).

Acknowledgments

The authors are most grateful to the Editor and anonymous referee for carefully reading the manuscript and for valuable suggestions which helped in improving an earlier version of this paper. This research was funded by the National Social Science Foundation (Grant No. 21BTJ040).

Conflicts of Interest

The authors declare no conflict of interest in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

NA	Negatively Associated
kNN	k-Nearest Neighbor

References

Ramsay, J.; Dalzell, C. Some Tools for Functional Data Analysis. J. R. Stat. Soc. Ser. B Methodol. 1991, 53, 539–561. [Google Scholar] [CrossRef]
Ramsay, J.; Silverman, B. Functional Data Analysis; Springer: New York, NY, USA, 1997. [Google Scholar]
Ramsay, J.; Silverman, B. Functional Data Analysis, 2nd ed.; Springer: New York, NY, USA, 2005. [Google Scholar]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer: New York, NY, USA, 2006. [Google Scholar]
Ferraty, F.; Vieu, P. Nonparametric Models for Functional Data, with Application in Regression, Time Series Prediction and Curve Discrimination. J. Nonparametr. Stat. 2004, 16, 111–125. [Google Scholar] [CrossRef]
Ling, N.X.; Wu, Y.H. Consistency of Modified Kernel Regression Estimation with Functional Data. Statistics 2012, 46, 149–158. [Google Scholar] [CrossRef]
Baíllo, A.; Grané, A. Local Linear Regression for Functional Predictor and Scalar Response. J. Multivar. Anal. 2009, 100, 102–111. [Google Scholar] [CrossRef]
Fix, E.; Hodges, J. Discriminatory Analysis. Nonarametric Discrimination: Consistency Properties. Inter. Stat. Re. 1989, 57, 238–247. [Google Scholar] [CrossRef]
Altman, N.S. An introduction to kernel and nearest-neighbor nonparametic regression. Am. Stat. 1992, 46, 175–185. [Google Scholar]
Burba, F.; Ferraty, F.; Vieu, P. k-Nearest Neighbour Method in Functional Nonparametric Regression. J. Nonparametr. Stat. 2009, 21, 453–469. [Google Scholar] [CrossRef]
Masry, E. Nonparametric Regression Estimation for Dependent Functional Data: Asymptotic Normality. Stoch. Pro 2005, 115, 155–177. [Google Scholar] [CrossRef]
Laib, N.; Louani, D. Rates of strong consistencies of the regression function estimator for functional stationary ergodic data. J. Stat. Plan. Inference 2011, 141, 359–372. [Google Scholar] [CrossRef]
Ezzahrioui, M.; Ould-Sad, E. Asymptotic Normality of a Nonparametric Estimator of the Conditional Mode Function for Functional Data. J. Nonparametr. Stat. 2008, 20, 3–18. [Google Scholar] [CrossRef]
Ezzahriouia, M.; Ould-Said, E. Some Asymptotic Results of a Nonparametric Conditional Mode Estimator for Functional Time-Series data. Stat. Neerl. 2010, 64, 171–201. [Google Scholar] [CrossRef]
Horvath, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer: New York, NY, USA, 2012. [Google Scholar]
Ling, N.X.; Wang, C.; Ling, J. Modified Kernel Regression Estimation with Functional Time Series data. Stat. Probab. Lett. 2016, 114, 78–85. [Google Scholar] [CrossRef]
Abdelmalek, G.; Abdelhak, C. Strong uniform consistency rates of the local linear estimation of the conditional hazard estimator for functional data. Int. J. Appl. Math. Stat. 2020, 59, 1–13. [Google Scholar]
Mustapha, M.; Salim, B.; Ali, L. The consistency and asymptotic normality of the kernel type expectile regression estimator for functional data. J. Multivar. Anal. 2021, 181. [Google Scholar] [CrossRef]
Kudraszow, N.L.; Vieu, P. Uniform Consistency of kNN Regressors for Functional Variables. Stat. Probab. Lett. 2013, 83, 1863–1870. [Google Scholar] [CrossRef]
Kara, L.Z.; Laksaci, A.; Rachdi, M.; Vieu, F. Data-driven kNN Estimation in Nonparametric Functional Data Analysis. J. Multivar. Anal. 2017, 153, 176–188. [Google Scholar] [CrossRef]
Joag-Dev, K.; Proschan, F. Negative Association of Random Variables with Application. Ann. Stat. 1983, 11, 286–295. [Google Scholar] [CrossRef]
Yi, W.; Wang, X.; Sung, S.H. Complete Moment Convergence for Arrays of Rowwise Negatively Associated Random Variables and its Application in Non-parametric Regression Model. Probab. Eng. Inf. Sci. 2017, 32, 37–57. [Google Scholar]
Delsol, L. Advances on Asymptotic Normality in Nonparametric Functional Time Series Analysis. Statistics 2009, 43, 13–33. [Google Scholar] [CrossRef]
Ferraty, F.; Rabhi, A.; Vieu, P. Conditional Quantiles for Dependent Functional Data with Application to the Climatic El Niño Phenomenon. Sankhyā Indian J. Stat. 2005, 67, 378–398. [Google Scholar]
Ferraty, F.; Laksaci, A.; Tadj, A.; Vieu, P. Rate of Uniform Consistency for Nonparametric Estimates with Functional Variables. J. Stat. Plan. Inference 2010, 140, 335–352. [Google Scholar] [CrossRef]
Christofides, T.C.; Hadjikyriakou, M. Exponential Inequalities for N-demimartingales and Negatively Associated Random Variables. Stat. Probab. Lett. 2009, 79, 2060–2065. [Google Scholar] [CrossRef][Green Version]

Figure 1. Curve-sample with sample size of n = 330.

Figure 2. Prediction effects of the two estimation methods. (a) kNN estimation method. (b) NW estimation method.

Figure 3. Monthly mean SST factor decomposition fitting comprehensive output diagram.

Figure 4. Time series curve of SST in El Ni

\tilde{n}

o during 31 years.

Figure 4. Time series curve of SST in El Ni

\tilde{n}

o during 31 years.

Figure 5. Forecast value of SST in 2020 by KNN method and NW method.

Table 1. The AMSE of the predicted response variables of the two methods under different sample sizes.

n	200	300	500	800
kNN- AMSE	0.0623	0.0470	0.0327	0.0291
NW- AMSE	0.2764	0.2593	0.2329	0.2117

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

K-Nearest Neighbor Estimation of Functional Nonparametric Regression Model under NA Samples

Abstract

1. Introduction

2. Assumptions and Main Results

3. Simulation

3.1. A simulation Study

3.2. A Real Study

4. Proof of Theorem

5. Conclusions and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics