Next Article in Journal
On the Derivation of a Fast Solver for Nonlinear Systems of Equations Utilizing Frozen Substeps with Applications
Previous Article in Journal
A Novel Neural Network-Based Approach Comparable to High-Precision Finite Difference Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

k-Nearest Neighbour Estimation of the Conditional Set-Indexed Empirical Process for Functional Data: Asymptotic Properties

1
Laboratory of Stochastic Models, Statistics and Applications, University of Saida-Dr. Moulay Tahar, P.O. Box 138 EN-NASR, Saïda 20000, Algeria
2
Laboratoire de Mathématiques Appliquées de Compiègne (L.M.A.C.), Université de Technologie de Compiègne, 60200 Compiègne, France
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(2), 76; https://doi.org/10.3390/axioms14020076
Submission received: 24 December 2024 / Revised: 7 January 2025 / Accepted: 16 January 2025 / Published: 21 January 2025
(This article belongs to the Section Mathematical Analysis)

Abstract

:
The main aim of this paper is to improve the existing limit theorems for set-indexed conditional empirical processes involving functional strong mixing random variables. To achieve this, we propose using the k-nearest neighbor approach to estimate the regression function, as opposed to the traditional kernel method. For the first time, we establish the weak consistency, asymptotic normality, and density of the proposed estimator. Our results are derived under certain assumptions about the richness of the index class C , specifically in terms of metric entropy with bracketing. This work builds upon our previous papers, which focused on the technical performance of empirical process methodologies, and further refines the prior estimator. We highlight that the k-nearest neighbor method outperforms the classical approach due to several advantages.

1. Introduction

The framework of empirical processes has long been recognized as a fundamental component of statistical methodology, underpinning theoretical developments and applications in areas such as testing for independence, parameter estimation, and bootstrap procedures. Furthermore, this framework has offered insights into the asymptotic properties of semi-parametric mixture models and various other sophisticated methods. For detailed expositions of both theoretical and applied dimensions of empirical process theory, the reader is referred to [1,2]. Over approximately the last three decades, empirical process theory in finite contexts has undergone marked expansion, as reflected by numerous contributions focusing on the case of independent and identically distributed (i.i.d.) observations.
An early milestone in this line of inquiry was provided by [3], who examined the asymptotic distribution of the standard empirical process when sample sizes are random. The work of [3] established weak convergence for continuous empirical processes within suitably chosen metric spaces. Subsequently, ref. [4] investigated the classes C of sets for which the Glivenko–Cantelli theorem holds (up to measurability), thus solidifying the basis for uniform convergence of empirical distribution functions. Building on these results, ref. [5] introduced both strong and weak approximations of empirical processes by leveraging Gaussian processes, ultimately highlighting associated invariance principles. In a related direction, ref. [6] studied empirical processes pertaining to particular classes of sets, laying out the conditions under which weak convergence to Gaussian processes emerges in terms of metric entropy with inclusion. This analysis extends well beyond the classical framework of Donsker’s theorem and accommodates independent random variables. Additional seminal results are available in [7,8,9], among other contributions, underscoring the continued advancement of empirical process theory.
Over subsequent years, attention turned toward mixing processes and their conditional aspects in separable metric spaces. ref. [10] provided broad generalizations of the central limit theorem for three varieties of mixing variables defined on the interval [ 0 , 1 ] . ref. [11] later investigated weak convergence under ϕ -mixing conditions, while ref. [12] addressed specific regularity constraints leading to the asymptotic normality of parameter estimators. Extending these ideas further, ref. [13] examined set-indexed empirical processes in α -mixing settings to establish results involving weak convergence in probability and asymptotic normality, as well as deriving variance expressions and continuity properties vital to asymptotic investigations. Subsequent research by [14] refined these theoretical insights. For comprehensive treatments and related studies on mixing empirical processes, see [15,16,17,18,19,20]. A parallel body of work addresses the need to analyze infinite-dimensional data—commonly referred to as functional data or functional variables—across fields such as environmental science, chemometrics, biometrics, medicine, and econometrics. These functional data structures often capture intricate phenomena more effectively than finite-dimensional representations, which is particularly evident in research areas like finance and biology. Comprehensive overviews of functional data analysis are provided by [21,22,23,24,25,26]. In recent years, this domain has experienced rapid expansion, as evidenced by a substantial increase in scholarly work and specialized monographs, highlighting the evolving and dynamic nature of functional data analysis.
Despite the significant advancements in functional data analysis, the integration of functional data within the framework of set-indexed empirical processes remains relatively underexplored. Notably, recent work by [27] made a substantial contribution by extending the results of [13], thereby proving the invariance principle for the set-indexed conditional empirical process. ref. [27] further applied these theoretical advancements to the problem of conditional independence testing, demonstrating the practical utility of the extended framework. This line of research was further developed in subsequent works by [28,29], which addressed the complexities associated with ergodic data. More recently, ref. [30] incorporated the presence of missing data into their analysis, leveraging the strengths of the k-nearest neighbors (k-NN) method for estimation purposes. The k-NN method has enjoyed widespread application within finite frameworks, with foundational contributions from [8], ref. [31], as well as adaptations for functional data models. Notably, ref. [32] investigated the asymptotic properties of the k-NN kernel estimator for regression tasks, providing comparative analyses with traditional kernel methods. ref. [33] offered asymptotic results for k-NN generalized regression estimators in abstract spaces. For further exploration of k-NN methods in the context of infinite-dimensional data, researchers can consult the works of [34,35,36,37].

1.1. Contribution

Despite the substantial rise in research on empirical processes and the application of k-NN methods to functional data, studies that examine the asymptotic properties of empirical processes unifying functional data with k-NN methods are still relatively scarce. Nonetheless, recent contributions have begun to address this gap. In particular, refs. [28,38,39] offered important theoretical and practical insights by establishing both the uniform consistency and weak convergence of k-NN empirical conditional processes and k-NN conditional U-processes adapted to functional mixing data. These developments hold considerable promise for set-indexed conditional U-statistics, Kendall’s rank correlation coefficient, classification tasks, and the prediction of time series. Building upon these findings, the present paper introduces an innovative methodological framework aimed at efficiently tackling the challenges inherent in set-indexed conditional empirical processes.
The main distinction between this work and [28] or [29] lies in our consideration of a more intricate data-driven bandwidth structure. In addition, we establish novel results concerning the uniform consistency with respect to the number of neighborhoods. Compared with [28,38,39], our results are derived under bracketing numbers that capture the complexity of the relevant sets in a manner that differs from the entropy-based techniques explored in our previous research. Furthermore, the strong mixing condition assumed here is more general than the regular processes context adopted in those earlier works to prove weak convergence.

1.2. Organization of the Paper

The structure of this article is meticulously organized to facilitate a coherent and comprehensive exposition of the subject matter. Section 2 delineates the notation and fundamental definitions essential for understanding the ensuing discussions. It also introduces the k-NN set-indexed conditional empirical process, laying the groundwork for the theoretical developments that follow. Section 3 is dedicated to presenting the main results of this study, highlighting the key theoretical contributions and their implications. In Section 4, an application of these theoretical results to the domain of statistical classification is discussed, demonstrating the practical relevance and utility of the proposed methods. Section 3.3 addresses the pragmatic aspects related to the selection of the parameter k, which is pivotal for the performance of the k-NN method. This section provides guidance on optimal bandwidth selection, balancing bias and variance considerations. The concluding remarks, along with prospective directions for future research, are provided in Section 5, with all proofs collected in Appendix A.

2. Presentation of the k-NN Set-Indexed Conditional Empirical Process

We consider a sequence of random pairs ( X 1 , Y 1 ) , , ( X n , Y n ) , where each pair is a realization of the random variable ( X , Y ) , which takes values in the space E × R d . The functional space E is equipped with a semi-metric d E ( · , · ) (A semi-metric, or pseudo-metric, allows d ( x 1 , x 2 ) = 0 even if x 1 x 2 ). Our goal is to examine the relationship between X and Y by estimating functional operators related to the conditional distribution of Y given X, such as the regression operator. For any measurable set C in a class of sets C , the conditional probability is defined as:
Γ ( C x ) = E [ 1 { Y C } X = x ] .
To model this relationship, we use a Nadaraya–Watson estimator [40,41] and present a k-nearest neighbors (k-NN) version of the conditional empirical distribution. Specifically, we define:
Γ n ( C , x ) = i = 1 n 1 { Y i C } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) ,
where K ( · ) is a real-valued kernel function defined on [ 0 , ) , and H n , k ( x ) is a smoothing parameter that satisfies H n , k ( x ) 0 as n . Here, C is a measurable set, and x E . When C = ( , z ] for some z R d , the estimator reduces to the conditional empirical distribution function F n ( z x ) = Γ n ( ( , z ] , x ) . Hence, the corresponding class of sets is C = { ( , z ] : z R d } . We further define H n , k ( x ) as the smallest value h R + such that:
i = 1 n 1 B ( x , h ) ( X i ) = k ,
where B ( x , h ) represents the ball of radius h around x. It is clear that H n , k ( x ) is a positive random variable with respect to the semi-metric topology on E . To relate this work to the existing literature and to highlight the differences between the k-NN and classical kernel approaches, we recall the functional Nadaraya–Watson estimator from [23] and refer to [27,29,30] for more information on the asymptotic properties of nonparametric functional distribution estimators. The classical kernel-based estimator is defined as:
Γ ^ n ( C , x ) = i = 1 n 1 { Y i C } K d E ( x , X i ) h n i = 1 n K d E ( x , X i ) h n ,
where h n is a smoothing parameter. To further clarify the notation, we use:
B ( x , t ) = { x 1 E : d E ( x 1 , x ) t }
to represent the ball in E with center x and radius t and define the small ball probability function as:
F ( t ; x ) = P ( d E ( x , X i ) t ) = P ( X i B ( x , t ) ) = P ( D i t ) ,
where D i = d E ( x , X i ) . The behavior of F ( t ; x ) as t 0 is of particular interest, and [42] assumed that F ( h ; x ) = ϕ ( h n ) f 1 ( x ) as h 0 , where f 1 ( x ) represents the probability density (functional). When H = R m , the small ball probability function takes the form F ( h ; x ) = P ( x X i h ) , where ϕ ( h n ) = C ( m ) h m is the volume of a unit ball in R m . This result motivates the assumption ( H 4 ) ( i ) ( i i ) , as discussed in [43].
In statistics, observations are often weakly dependent rather than independent. Ignoring this dependence can have serious consequences for statistical inference. The concept of dependence quantifies the extent to which a sequence of random variables deviates from independence and is useful for extending classical results to weakly dependent or mixing sequences. We refer to [44] for further details on dependence structures. For our analysis, we assume that the sequence { ( X i , Y i ) } i = 1 n is strongly mixing.
Definition 1.
A sequence { ζ k , k 1 } is said to be α-mixing if the α-mixing coefficient:
α ( n ) = sup k 1 sup | P ( A B ) P ( A ) P ( B ) | : A F n + k , B F 1 k
converges to zero as n , where F l m = σ ( ζ l , ζ l + 1 , , ζ m ) denotes the σ-algebra generated by ζ l , ζ l + 1 , , ζ m with l m . A sequence is geometrically strong mixing if, for some a > 0 and β > 1 ,
α ( j ) a j β ,
and exponentially strong mixing if, for some b > 0 and 0 < γ < 1 ,
α ( k ) b γ k .
In the exploration of mixing conditions, α -mixing stands out as a somewhat lenient criterion that is nonetheless broadly applicable across various stochastic processes, including many time series models. Works by [45,46] pinpointed the necessary conditions for α -mixing in linear processes. These studies showed that both linear autoregressive models and the more complex bilinear time series models exhibit robust mixing properties characterized by mixing coefficients that decrease exponentially. Additionally, the study [47] shed light on the importance of α -mixing, including aspects like geometric ergodicity, in understanding nonlinear time series models, as further discussed in [48,49,50]. Similarly, ref. [51] illustrated that functional autoregressive processes can achieve geometric ergodicity when certain conditions are met. Moreover, the studies [52,53] showed that with only minimal assumptions, both autoregressive conditional heteroscedastic processes and nonlinear additive autoregressive models that include exogenous variables maintain stationarity and α -mixing.
Definition 2.
A class C of subsets of a set C is called a VC-class if there exists a polynomial B ( · ) such that, for every set of N points in C, the class C can select at most B ( N ) distinct subsets.
Definition 3.
A class of functions F is called a VC-subgraph class if the graphs of the functions in F form a VC-class of sets. Specifically, the subgraph of a real-valued function f on a set S is defined as
G f = { ( s , t ) : 0 t f ( s ) or f ( s ) t 0 } ,
and the collection { G f : f F } is a VC-class of sets on S × R .
Definition 4.
Let S E be a subset of a semi-metric space E , and let N E be a positive integer. A finite set { e 1 , , e N E } E is called an ε-net of S E if:
S E j = 1 N E B ( e j ; E ) ,
where B ( e j ; E ) is the ball of radius ε around e j . The Kolmogorov entropy (or metric entropy) of S E is defined as
ψ S E ( E ) = log N E ( S E ) ,
where N E ( S E ) is the cardinality of the smallest E -net required to cover S E .
The concept of semi-metrics plays a central role in this study. For further discussions on how to choose the appropriate semi-metric, we refer the reader to [23] (see Chapters 3 and 13).

2.1. Notations and Hypothesis

Throughout this paper, we fix x as an element of the functional space E . We introduce the notion of metric entropy with inclusion, which quantifies the complexity or richness of the class of sets C . For each ε > 0 , the covering number is defined as follows:
N ε , C , Γ · x = inf n N : C 1 , , C n C such that for all C C , 1 i , j n with C i C C j and Γ ( C j C i x ) < ε ,
The term log N ε , C , Γ · x is called the metric entropy with the inclusion of C with respect to Γ · x . Estimates for such covering numbers are available for many classes; see, for example, [54]. We will often assume below that either log N ε , C , Γ · x or N ε , C , Γ · x behaves like powers of ε 1 . We say that the condition ( R γ ) holds if
log N ε , C , Γ · x H γ ( ε ) , for all ε > 0 ,
where
H γ ( ε ) = log ( A ε ) if γ = 0 , A ε γ if γ > 0 ,
for some constants A , r > 0 . As in [13], it is worth noting that the condition (3), for γ = 0 , holds for intervals, rectangles, balls, ellipsoids, and for classes constructed from these by performing finitely many set operations such as union, intersection, and complement. The classes of convex sets in R d ( d 2 ) satisfy the condition (3), where γ = ( d 1 ) / 2 . Other classes of sets satisfying (3) with γ > 0 can be found in [54].
In this section, we derive the almost complete convergence and central limit theorems for the conditional empirical process { Λ n ( C , x ) : C C } , defined by
Λ n ( C , x ) : = n ϕ ( h n ) Γ n ( C , x ) Γ ( C , x ) .
Next, we list the conditions required for our analysis:
(H1) 
On the distributions/small-ball probabilities
(H1)(i)
For h R + and x E , and positive constants C 1 , C 2 , we assume that
0 < C 1 ϕ ( h n ) ϕ x ( h n ) C 2 ϕ ( h n ) < .
(H1)(ii)
For all i 1 ,
0 < c 5 ϕ ( h n ) f 1 ( x ) P ( X i B ( x , h ) ) = F ( h ; x ) c 6 ϕ ( h n ) f 1 ( x ) ,
where ϕ ( h n ) 0 as h 0 and f 1 ( x ) is a nonnegative functional in x E .
(H1)(iii)
We have
sup i j P ( ( X i , X j ) B ( x , h ) × B ( x , h ) ) = sup i j P ( D i h , D j h ) ψ ( h ) f 2 ( x ) ,
where ψ ( h ) 0 as h 0 and f 2 ( x ) is a nonnegative functional in x E . We assume that the ratio ψ ( h ) / ϕ 2 ( h ) is bounded.
(H2) 
On the smoothness of the model
(H2)(i)
There exist constants β > 0 and η 1 > 0 such that for all x 1 , x 2 N x , a neighborhood of x, we have
| Γ ( C x 1 ) Γ ( C x 2 ) | η 1 d E β ( x 1 , x 2 ) .
(H2)(ii)
Let g 2 ( u ) = Var ( 1 { Y j C } X j = u ) for u E . Assume that g 2 ( u ) is independent of j and continuous in some neighborhood of x, as h 0 ,
sup { u : d ( x , u ) h } | g 2 ( u ) g 2 ( x ) | = o ( 1 ) ,
and assume that
g ν ( u ) = E 1 { Y i C } Γ ( C x ) ν X i = u
is continuous in some neighborhood of x.
(H2)(iii)
Define, for i j , u , v E ,
g ( u , v ; x ) = E 1 { Y i C } Γ ( C x ) 1 { Y j C } Γ ( C x ) X i = u , X j = v .
Assume that g ( u , v ; x ) does not depend on i , j and is continuous in some neighborhood of ( x , x ) .
(H3) 
On the kernel function
(H3)(i)
The kernel function K ( · ) is supported within [ 0 , 1 ] , and there exist constants 0 < η 1 η 2 < such that:
0 1 K ( x ) d x = 1 ,
and
0 < η 1 1 ( 0 , 1 ) ( · ) K ( · ) η 2 1 ( 0 , 1 ) ( · ) .
(H3)(ii)
The kernel K ( · ) is a positive, differentiable function on [ 0 , 1 ] with derivative K ( · ) such that
< η 3 < K ( · ) < η 4 < 0 .
(H4) 
On the classes of sets
(H4)(i)
The metric entropy of the class C satisfies, for some 1 p < ,
0 log N ( u , C σ , · p ) 1 2 d u < ,
where
C σ = { C 1 , C 2 C : Γ ( C 1 C 2 X = x ) σ } .
(H4)(ii)
The class of sets C is assumed to be of VC-type with the envelope function previously defined. Hence, there are two finite positive constants a and p such that:
N ε , C , · L 2 ( Q ) b F κ L 2 ( Q ) ϵ a ,
for any ϵ > 0 and each probability measure such that Q ( F ) 2 < .
(H5) 
On the dependence of the random variables
(H5)(i)
For some v > 2 and ς > 1 2 v , we have
= 1 ς α ( ) 1 2 v < .
(H5)(ii)
There exists a sequence of positive integers { s n } n N * such that, as n ,
s n , s n = n ϕ ( h ( x ) ) ,
and
n ϕ ( h ( x ) ) 1 2 α ( s n ) 0 .
(H6) 
On the entropy For large enough n and some ω > 1 , Kolmogorov’s entropy satisfies:
( log n ) 2 n ϕ ( h K ) m ψ S E log n n n log n ϕ ( h K ) ,
i = 1 exp m ( 1 ω ) ψ S E log n n
where ψ S E ( ε ) : = log N ε ( S E ) , and N ε ( S E ) is the minimal number of open balls of radius ε of E needed to cover S E .
(H7) 
Condition on the smoothing parameter The smoothing parameter h n satisfies:
h n 0 and log n n min ( h n 2 , ϕ 2 ( h n ) ) 0 as n .
(H8) 
There exist sequences ρ n ( 0 , 1 ) , { k 1 , n } Z + such that
0 < μ ν < ,
and the conditions
μ ϕ 1 ρ n k 1 , n n ϕ x 1 ρ n k 1 , n n ,
and
ϕ x 1 k 2 , n ρ n n 0 ,
and
min 1 ρ n 4 k 1 , n log n , ( 1 ρ n ) 2 4 ρ n k 1 , n log n > 2 ,
and
log n / n min μ ϕ 1 ρ n k 1 , n n , ϕ μ ϕ 1 ρ n k 1 , n n .
Additional/Alternative conditions
(C1) 
For all t > 0 , we have ϕ ( t ) > 0 . For all t ( 0 , 1 ) , the function τ 0 ( t ) exists, where
τ 0 ( t ) = lim r 0 ϕ ( r t ) ϕ ( r ) = lim r 0 P ( d E ( x , X ) r t P ( d E ( x , X ) t ) ) < .
(C2) 
The kernel function K ( · ) is supported within [ 0 , 1 ] , and there exist constants 0 < ϑ 2 and 0 < ϑ 1 ϑ 2 < such that for j = 1 , 2 :
0 1 K ( x ) d x = 1 , K ( · ) ϑ 2 1 [ 0 , 1 ] ( · ) , h n ϕ ( h K ( x ) ) 0 1 K j ( x ) ϕ ( ν h K ) d ν ϑ 2 , as n .

2.2. Comments on the Hypothesis

We address a complex theoretical problem within the framework of a nonparametric functional regression model, focusing on functional central limit theorems for set-indexed conditional empirical processes under functionally strong mixing data. Specifically, we utilize k-nearest neighbors (k-NN)-based random (or data-dependent) bandwidths. The investigation begins with Assumption ( H 1 ) , which serves as a standard condition on the small-ball probability. This condition, adapted from [43], pertains to the case where E is an infinite-dimensional space. Under this assumption, ϕ ( h k ) converges to 0 exponentially as n , and the function ϕ x ( · ) is employed to control behavior around zero. Notably, the small-ball probability can be approximated as the product of two independent functions, ϕ ˜ ( · ) and f 1 ( · ) . For related examples, see [55] for diffusion processes, ref. [56] for Gaussian measures, and ref. [57] for general Gaussian processes. To adjust for the bias in nonparametric estimators, some understanding of the variability of the small-ball probability is typically required when dealing with functional data. This is achieved by assuming:
τ 0 ( t ) = lim r 0 ϕ ( r t ) ϕ ( r ) = lim r 0 P ( d E ( x , X ) r t P ( d E ( x , X ) t ) ) < .
Assumption ( H 2 ) pertains to the model’s regularity, encompassing mild requirements such as the continuity of certain conditional moments and the usual Lipschitz condition for regression ( ( H 2 ) ( i ) ). ( H 3 ) is a standard condition in functional nonparametric estimation, primarily concerning the kernel function K ( · ) . Notably, ( H 3 ) ( i ) can be substituted with hypothesis ( C 2 ) to derive the asymptotic variance. Assumption ( H 4 ) is invoked for bounded sets. For ( H 4 ) ( iii ) , see examples in [58,59] (§4.7), [60] (Theorem 2.6.7), and [61] (§9.1), which provide necessary conditions. For further discussions, see [62] (§3.2). The class of functions is generally assumed to be pointwise measurable, a condition satisfied when K ( · ) is defined on R d . See, for instance, [63,64,65]. Assumption ( H 5 ) concerns the decay rate of the mixing coefficient α ( n ) , a critical requirement for studying the weak convergence of empirical processes and establishing asymptotic equicontinuity. ( H 6 ) addresses topological considerations by controlling the Kolmogorov entropy of the sets S E . This is standard in nonparametric models, aiding in the uniform convergence and consistency of estimators. It quantifies the complexity of function classes using tools such as covering numbers or ε -nets. Examples include closed balls in Sobolev spaces, unit balls in Cameron-Martin spaces, and compact subsets in Hilbert spaces with projection semi-metrics. See [33,66,67] for further details. For specific functional spaces E and subsets S E , it is noted in [33] that ψ S E ( log ( n ) / n ) log ( n ) . Assumption ( H 7 ) is fundamental, as convergence would not hold without it. This condition adapts ( H 8 ) to the context of functional conditional processes in the k-NN framework.

3. Main Results

We adopt the notation Z = D N ( μ , σ 2 ) to denote that the random variable Z follows a normal distribution with mean μ and variance σ 2 . Convergence in distribution is denoted by D , while convergence in probability is indicated by P . Let ( z n ) n N be a sequence of real random variables. We say that z n converges almost completely (a.co.) to zero if and only if:
ϵ > 0 , n = 1 P ( | z n | > ϵ ) < .
Moreover, for a sequence ( u n ) n N * of positive real numbers, we say z n = O a . c o ( u n ) if and only if:
ϵ > 0 , n = 1 P ( | z n | > ϵ u n ) < .
This type of convergence implies both almost-sure convergence and convergence in probability.

3.1. Uniform in the Number of Neighbors Consistency

Now, we can state the main results of this section concerning the k-NN functional estimator. Recall the bandwidths k 1 , n and k 2 , n given in the condition (H8).
Theorem 1.
Consider the hypotheses (H1)(i), (H2)(i), (H3)(i), (H4)(i), (H4)(ii), (H6), (H7), and(H8). Let ( X t , Y t ) be a geometrically strong mixing sequence with β > 2 . Furthermore, assume that for every C C , as n , we have:
sup C C sup k 1 , n k k 2 , n sup x S E Γ n ( C , x ) Γ ( C , x ) = O ϕ 1 k 2 , n ρ n n + O a . c o ψ S x log n n n ϕ μ ϕ 1 ρ n k 1 , n n .

3.2. Uniform Central Limit Theorems

The main objective of this section is to investigate the central limit theorems for the functional conditional empirical process defined by (4).
Theorem 2.
Assume that hypotheses (H1)(ii), (H1)(iii), (H2)(i), (H2)(ii), (C2),  (H5), and(H8) hold, and that ( X i , Y j ) is geometrically strong mixing with β > 2 . Furthermore, suppose that n ϕ ( h n ) as n . For m 1 and C 1 , , C m C , if the smoothing parameter k satisfies the condition:
k ϕ 1 k 2 2 γ 0 as n ,
then the sequence
Λ ( C i , x ) i = 1 , , m D N ( 0 , Σ ) ,
where Σ = σ i j ( x ) for i , j = 1 , , m is the covariance matrix given by:
σ i j ( x ) = ϑ 2 Γ ( C i C j , x ) Γ ( C i , x ) Γ ( C j , x ) ϑ 1 f 1 ( x ) ,
whenever f 1 ( x ) > 0 .
We limit our discussion to the beta mixing setting in the theorems presented below.
Theorem 3.
Assume that conditions(H3)(i),(H4)(i)–(H5)(i), and(H8)are satisfied. Then the family { Λ n ( C ) : C C } is dense in the space L ( C ) , · C , which can be expressed as
lim σ 0 lim n sup P sup C C σ | Λ n ( C ) | > ϵ = 0 ,
where
C σ = { C 1 , C 2 C : Γ ( C 1 C 2 X = x ) σ } .
Theorem 4.
Under the conditions of Theorems 2 and 3, and with the additional conditions(H2)and(H4)(iii), the process:
Λ n ( C , x ) : C C
converges in law to a Gaussian process:
Λ ( C , x ) : C C ,
which admits a version with uniformly bounded and uniformly continuous paths with respect to the · 2 -norm, with covariance σ i j ( x ) given in Theorem 2.
Remark 1.
An important consideration when working with infinite-dimensional data, particularly functional data, is the need to account for the local structures inherent in the data. However, the bandwidth (denoted by h) used in kernel-based estimators is fixed and does not depend on the covariate component x, meaning it fails to capture such local features. An alternative and widely used approach in nonparametric and semiparametric statistics incorporates ideas from k-nearest neighbors (kNN). This method is especially advantageous when the data exhibit local structures, as kNN-based estimators use a location-dependent bandwidth (i.e., a bandwidth that varies with x), allowing them to effectively capture local features. In other words, kNN provides location-adaptive methods. The kNN-based estimators offer at least two key practical advantages over kernel-based methods. First, although the number of neighbors k is fixed, the bandwidth H n , k ( X ) varies with x, which provides the local-adaptive property of kNN estimators, thus allowing adaptation to heterogeneous designs. Second, the computational cost associated with selecting the smoothing parameter k is lower than that of selecting h, since k takes values from the finite set { 1 , 2 , , n } . However, the theoretical analysis of kNN statistics presents greater challenges, primarily because H n , k ( X ) is a random variable that depends on X i (for i = 1 , , n ), introducing additional technical complexities in the proofs. It is important to note that this paper extends the work of [29].

3.3. The Parameter k Selection Criterion

Various methods have been developed to establish bandwidth selection rules for nonparametric kernel estimators, with a focus on achieving asymptotic optimality, especially for the Nadaraya–Watson regression estimator. Notable contributions in this area include [68,69,70]. This parameter has been appropriately chosen either in finite-dimensional settings or infinite-dimensional settings, aiming to highlight good practical performance. To define the leave-out- ( X i , Y i ) estimator for the regression function, we use the following expression:
Γ n ( C , x , k ) = i = 1 , i j n 1 { Y i C } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) ,
To minimize the quadratic loss function, we consider the following criterion for some known non-negative weight function W :
C V ( C , k ) = 1 n j = 1 n 1 { Y j C } Γ n ( C , X j , k ) 2 W ( X j ) .
This criterion is based on the practical approach developed by [69], where the smoothing parameter is chosen by minimizing the above criterion. The goal is to select k ^ n that minimizes the following expression:
sup C C C V ( C , k ) .
In our analysis, we aim to derive the asymptotic properties of our estimate, even when the bandwidth parameter is treated as a random variable, as suggested in the previous equation. We can refine (13) by compensating with the following:
C V ( C , k ) = 1 n j = 1 n 1 { Y j C } Γ n ( C , X j , k ) 2 W ^ ( X j , x ) .
In practice, the global uniform weights W ( X j ) = 1 are typically used for j = 1 , , n , and the local weights are defined as:
W ^ ( X j , x ) = 1 if d ( X j , x ) h , 0 otherwise .
In this work, we adopt the popular method of cross-validation to select the bandwidth; though this approach can be extended to other bandwidth selection methods, such as those based on Bayesian ideas, as discussed in [71].
Remark 2.
Establishing the asymptotic theory for a random, cross-validated bandwidth requires a specialized stochastic equicontinuity argument. Cross-validation, as employed by [72], is used to investigate whether two functions (both unconditional and conditional) are equal in mixed categorical and continuous settings. Although cross-validation performs optimally for estimation, its optimality may not extend to nonparametric kernel testing. In the context of testing a parametric model for the conditional mean function against a nonparametric alternative, ref. [73] introduced an adaptive-rate-optimal rule. An alternative approach to bandwidth selection is proposed by [74], who leverage the Edgeworth expansion of the test’s asymptotic distribution to identify the bandwidth that maximizes test power while controlling its size.

4. Application

4.1. Statistical Classification

In this section, we apply the findings obtained in the previous sections to the problem of statistical classification. Given a sample of random elements ( X 1 , Y 1 ) , , ( X n , Y n ) drawn from the joint distribution of ( X , Y ) , where X takes values in a space E and Y in R d . In our classification, the objective is to predict the integer-valued label Y based on the covariate vector X. More formally, we aim to find a function (classifier) θ : E R d for which the probability of misclassification error (incorrect prediction), i.e., P ( θ ( X ) Y ) , is minimized. Let
τ k ( x ) = P ( Y = k X = x ) , x E , 1 k n .
Demonstrating that the optimal classifier, i.e., the one with the minimum probability of error, is given by
θ B ( x ) = arg max 1 k n τ k ( x ) ,
i.e., the best classifier, θ B satisfies
max 1 k n τ k ( x ) = τ θ B ( x ) ( x ) .
As θ B is unknown, the data are utilized to construct estimates of θ B . Specifically, let D n = ( X 1 , Y 1 ) , , ( X n , Y n ) represent a random sample from the distribution of ( X , Y ) , where each ( X i , Y i ) is fully observable. Let θ ^ n be any sample-based classifier. In other words, θ ^ n ( X ) is the predicted value of Y, based on D n and X. Let
L n ( θ ^ n ) = P ( θ ^ n ( X ) Y D n ) ,
be the conditional probability of error of the sample-based classifier θ ^ n . Then θ ^ n is said to be consistent if L n ( θ ^ n ) L n ( θ n ) = P ( θ B ( X ) Y ) as n , for k = 1 , , n . Let τ ^ k ( x ) be any sample-based estimator of τ k ( x ) = P ( Y = k X = x ) and define the classification rule θ ^ n by
θ ^ n ( x ) = arg max 1 k n τ ^ k ( x ) .
In other words, θ ^ n satisfies
max 1 k n τ ^ k ( x ) = τ ^ θ ^ n ( x ) ( x ) ,
To show
L n ( θ ^ n ) L n ( θ B ) 0 ,
we put for all τ ^ k ( x ) , it is sufficient to show that
τ ^ k ( x ) τ k ( x ) 0
We can rewrite the conditional distribution as
Γ n ( C , x ) = i = 1 n τ ^ k ( x ) 1 { Y i C } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) .
Theorem 5.
Under the conditions of Theorem 1, we have the convergence
L n ( θ ^ n ) L n ( θ B ) 0 .

4.2. Testing Conditional Independence

Conditional independence is an essential concept that unifies many diverse aspects of statistical inference; see [75]. The tasks of quantifying and detecting conditional dependence underlie a wide range of statistical problems, including limit theorems, Markov chain theory, sufficiency, and causality [76], among others. Conditional independence also plays a pivotal role in graphical models [77], causal inference [78], and artificial intelligence [79]; see also [80] for more recent developments. The notion of treating conditional independence as an abstract concept with a dedicated calculus was first introduced by [76], who demonstrated that many results and theorems related to statistical ideas (such as ancillarity, sufficiency, and causality) can be viewed as specific applications of the fundamental properties of conditional independence, extended to include both stochastic and non-stochastic variables. For more details, see [81].
Let C 1 , C 2 be two classes of sets. In this section, we focus on a sample of random elements.
X 1 , Y 1 , 1 , Y 1 , 2 , , X n , Y n , 1 , Y n , 2 ,
which are i.i.d. copies of ( X , Y 1 , Y 2 ) taking values in E × R d 1 × R d 2 . For C 1 , C 2 C 1 × C 2 , define
G n ( C 1 × C 2 , x ) = i = 1 n 1 { Y i , 1 C 1 } 1 { Y i , 2 C 2 } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) ,
G n , 1 ( C 1 , x ) = i = 1 n 1 { Y i , 1 C 1 } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) ,
G n , 2 ( C 2 , x ) = i = 1 n 1 { Y i , 2 C 2 } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) .
We investigate the following empirical processes for C 1 , C 2 C 1 × C 2 :
ν ^ n ( C 1 , C 2 , x ) = n ϕ ( h n ) G n ( C 1 × C 2 , x ) E G n ( C 1 , x ) E G n ( C 2 , x ) ,
ν ˘ n ( C 1 , C 2 , x ) = n ϕ ( h n ) G n ( C 1 × C 2 , x ) G n , 1 ( C 1 , x ) G n , 2 ( C 2 , x ) .
Observe that
ν ˘ n ( C 1 , C 2 , x ) = n ϕ ( h n ) G n ( C 1 × C 2 , x ) E G n ( C 1 , x ) E G n ( C 2 , x ) + n ϕ ( h n ) E G n ( C 2 , x ) G n ( C 1 , x ) E G n ( C 1 , x ) n ϕ ( h n ) G n ( C 1 , x ) G n ( C 2 , x ) E G n ( C 2 , x ) .
Hence,
ν ˘ n ( C 1 , C 2 , x ) = d n ϕ ( h n ) G n ( C 1 × C 2 , x ) E G n ( C 1 , x ) E G n ( C 2 , x ) + n ϕ ( h n ) E G n ( C 2 , x ) G n ( C 1 , x ) E G n ( C 1 , x ) n ϕ ( h n ) E G n ( C 1 , x ) G n ( C 2 , x ) E G n ( C 2 , x ) = ν ^ n ( C 1 , C 2 , x ) + E G n ( C 2 , x ) ν ˜ n ( C 1 , x ) E G n ( C 1 , x ) ν ˜ n ( C 2 , x ) ,
where ν ˜ n ( C k , x ) is defined analogously.
It can be shown that for A 1 , B 1 , A 2 , B 2 C 1 × C 2 ,
cov ν ^ n ( A 1 , B 1 , x ) , ν ^ n ( A 2 , B 2 , x ) = C 2 C 1 2 f 1 ( x ) E 1 { Y A 1 A 2 } X = x E 1 { Y A 1 } X = x E 1 { Y A 2 } X = x × E 1 { Y B 1 B 2 } X = x E 1 { Y B 1 } X = x E 1 { Y B 2 } X = x ,
provided f 1 ( x ) > 0 . Although the decomposition in (21) suggests the structure of ν ˘ n ( C 1 , C 2 , x ) , its covariance is more delicate to derive. Let { ν ^ ( C 1 , C 2 , x ) : ( C 1 , C 2 ) C 1 × C 2 } be a Gaussian process whose covariance is given by (22). Define the following limiting process for C 1 , C 2 C 1 × C 2 :
ν ˘ ( C 1 , C 2 , x ) = ν ^ C 1 , C 2 , x + G ( C 2 , x ) ν ˜ C 1 , x G ( C 1 , x ) ν ˜ C 2 , x .
We aim to test the null hypothesis
H 0 : Y 1 and Y 2 are conditionally independent given X = x ,
against the alternative
H 1 : Y 1 and Y 2 are conditionally dependent .
To this end, one may use the following test statistics:
S 1 , n = sup ( C 1 , C 2 ) C 1 × C 2 ν ^ n ( C 1 , C 2 , x ) ,
S 2 , n = sup ( C 1 , C 2 ) C 1 × C 2 ν ˘ n ( C 1 , C 2 , x ) .
By combining Theorem 4 with the continuous mapping theorem, we obtain the following result.
Theorem 6.
Under the conditions of Theorem 4, as n , it holds that
S 1 , n sup ( C 1 , C 2 ) C 1 × C 2 ν ^ C 1 , C 2 , x ,
S 2 , n sup ( C 1 , C 2 ) C 1 × C 2 ν ˘ C 1 , C 2 , x .

5. Concluding Remarks

In this paper, we present new limit theorems for an improved version of the set-indexed conditional empirical process, with particular attention to the interplay between mixing conditions and functional aspects of the data. A central feature of our contribution is the incorporation of the widely recognized k-nearest neighbors (k-NN) approach for estimating the regression function. Building on the results in [29], we establish analogous theorems on weak convergence in probability and asymptotic normality. Furthermore, our findings bolster existing research by refining the density properties inherent to the process under consideration.
The k-NN algorithm, valued for its non-parametric nature and relative simplicity, is well-suited to both classification and regression tasks. One principal advantage of applying k-NN in time series settings lies in its capacity to function effectively without strict assumptions regarding the underlying data distribution, thus making it particularly appropriate for non-stationary data. In addition to its methodological adaptability, k-NN is notable for its computational tractability and straightforward implementation. Consequently, the insights provided in this study hold the potential to guide future methodological developments. We also advocate for the utility of bootstrap procedures in functional data contexts, with the functional form of the wild bootstrap illustrating a promising avenue for selecting smoothing parameters. Nevertheless, the theoretical underpinnings that justify this bandwidth selection strategy via the functional wild bootstrap remain an open question.
It is widely recognized that bias constitutes a key challenge in kernel smoothing. Although local linear fitting offers a theoretical avenue for bias mitigation, there is a pressing need for more extensive investigation within the realm of nonparametric functional data analysis in order to comprehensively address bias-related issues. Empirical evidence suggests that bias correction strategies can markedly improve estimation accuracy. At the same time, further theoretical rigor is essential for understanding and controlling the asymptotic properties of these enhanced approaches. To the best of our knowledge, such formal examinations are still underrepresented in nonparametric functional data analysis, particularly in the context of locally stationary processes. Against this backdrop, a promising objective is to refine estimation procedures through the integration of local linear smoothing with established bias-correction techniques. Contributions from [82] and related foundational works [83] offer a natural basis for developing such integrative methods. Extending our current framework to simultaneously incorporate k-NN and single index models could generate more robust and practically relevant outcomes, thereby paving the way for subsequent empirical investigations.
It is worth noting that the notion of mixing, while commonly adopted due to its relative tractability, may be ill-suited for data exhibiting substantial dependence. Extending non-parametric functional methodologies to more general dependence structures remains an underexplored area. In this regard, ergodic frameworks avoid the stringent requirements of strong mixing assumptions and the intricate probabilistic arguments they necessitate. Nonetheless, adapting our approach to accommodate functional ergodic data would involve considerable mathematical complexity and thus exceeds the present scope.

Author Contributions

Conceptualization, S.B.; methodology, S.B.; validation, S.B. and Y.S.; formal analysis, S.B. and Y.S.; investigation, S.B. and Y.S.; original draft preparation, S.B. and Y.S.; writing—review and editing, S.B. and Y.S.; supervision, S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the Editor-in-Chief, an Associate-Editor, and the three referees for their extremely helpful remarks, which resulted in a substantial improvement of the original form of the work and a presentation that was more sharply focused.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A

We present Lemma A1 in a more general context, as seen in [28]. This generalization extends a result found in [32] and may prove useful in a broader range of scenarios than the one we discuss here. More generally, this technique is valuable for handling random bandwidths.
Let ( A i , B i ) n 1 be a sequence of random vectors valued in ( Ω , Ω ; A , B ) , a general space. Let S Ω be a fixed subset of Ω , and define the function G : R × ( S Ω × Ω ) R + such that, for all t S Ω , the function G ( · , ( t , · ) ) is measurable. Moreover, for all x , x R , the following condition holds:
(L0)
x x G ( x , z ) G ( x , z ) , z S Ω × Ω .
Next, let ( D n ( t ) ) n N be a sequence of random real vectors (r.r.v.) such that for every t S Ω and for each measurable function 1 { · } : Ω R belonging to a class of sets C , define the nonrandom function M : C × S Ω R with the condition:
sup C C sup t S Ω | M ( C , t ) | < .
For all t S Ω and n N { 0 } , we define the sequence
M n ( C , t ) = i = 1 n 1 B i G ( t , A i ) i = 1 n G ( t , A i ) .
Lemma A1.
Let  ( U n ) n N  be a decreasing sequence of positive numbers such that  lim n U n = 0 . For every increasing sequence  ξ n ( 0 , 1 )  with ξ n 1 = O ( U n ) , there exist two sequences of random real vectors  ( D n , k ( ξ n , t ) ) n N  and  ( D n , k + ( ξ n , t ) ) n N  satisfying the following conditions:
(L1)
n N , t S Ω , D n , k ( ξ n , t ) D n , k + ( ξ n , t ) .
(L2)
1 D n , k ( ξ n , t ) D n , k ( t ) D n , k + ( ξ n , t ) 1 , a . s . , t S Ω .
(L3)
sup C C sup t S Ω i = 1 n G ( D n , k ( ξ n , t ) , ( t , A i ) ) i = 1 n G ( D n , k + ( ξ n , t ) , ( t , A i ) ) ξ n = O a . s . ( U n ) ,
(L4)
sup C C sup t S Ω M n ( C , t ; D n , k ( ξ n , t ) ) M n ( C , t ) = O a . s . ( U n ) ,
(L5)
sup C C sup t S Ω M n ( C , t ; D n , k + ( ξ n , t ) ) M n ( C , t ) = O a . s . ( U n ) .
Then, as  n , the following holds:
sup C C sup t S Ω M n ( C , t ; D n , k ( t ) ) M n ( C , t ) = O a . s . ( U n ) .
For the proof of Lemma A1, we refer to [28].
Here is a reformulation of the lemma with the original content preserved:
Lemma A2
(Lemma 6.1 in [34], p.186)). Let X 1 , , X n be independent Bernoulli random variables with P ( X i = 1 ) = p for each i = 1 , , n . Define
U = X 1 + X 2 + + X n
and μ = p n . Then, for any ω 1 , the following bound holds:
P ( U ( 1 + ω ) μ ) exp μ min ( ω 2 , ω ) 4 ,
and if ω ( 0 , 1 ) , we have
P ( U ( 1 ω ) μ ) exp μ ω 2 2 .
The following notation is necessary for determining the convergence rates:
i ( x , h K ( x ) ) = K d E ( x , X i ) h k ( x ) ,
Γ n , 2 ( C , x ; h k ( x ) ) = 1 n E ( 1 ( x ; h k ( x ) ) ) i = 1 n 1 { Y i C } i ( x , h K ( x ) ) ,
Γ n , 1 ( C , x ; h k ( x ) ) = 1 n E ( 1 ( x ; h k ( x ) ) ) i = 1 n i ( x , h K ( x ) ) .
This leads to the expression:
Γ n ( C , x ; h k ( x ) ) = Γ n , 2 ( C , x ; h k ( x ) ) Γ n , 1 ( C , x ; h k ( x ) ) .
Now, consider the following decomposition:
Γ n ( C , x ; h k ( x ) ) Γ ( C , x ) = 1 Γ 1 , 2 ( C , x ; h k ( x ) ) Γ n , 2 ( C , x ; h k ( x ) ) E Γ n , 2 ( C , x ; h k ( x ) ) + 1 Γ n , 1 ( C , x ; h k ( x ) ) E Γ n , 2 ( C , x ; h k ( x ) ) Γ ( C , x ) + Γ ( C , x ) Γ n , 2 ( C , x ; h k ( x ) ) 1 Γ n , 2 ( C , x ; h k ( x ) ) .

Appendix A.1. Proof of Theorem 1

As in [28], we proceed to prove Theorem 1. To ascertain the condition in (A1), we first identify the variables as follows: S Ω = S E , A i = X i , 1 { B i C } = 1 { Y i C } ,
G ( H , ( x ; A i ) ) = K H 1 d E ( x , X i ) ;
D n , k ( x ) = H n , k ( x ) ;
M n ( C , x ; H n , k ( x ) ) = Γ n ( C , x ; H n , k ( x ) ) ;
M ( C , x ) = Γ ( C , x ) .
Next, choose D n , k ( ξ n , x ) and D n , k + ( ξ n , x ) :
ϕ t ( D n , k ( ξ n , x ) ) = ξ n k n ,
ϕ t ( D n , k + ( ξ n , x ) ) = k n ξ n .
We define h ( x ) = D n , k ( ξ n , x ) , h + ( x ) = D n , k + ( ξ n , x ) , and consider an increasing sequence ξ n ( 0 , 1 ) such that ξ n 1 = O ( U n ) . For all ξ n ( 0 , 1 ) and x S E , with k 1 , n k k 2 , n , we have:
ϕ x 1 ξ n k 1 , n n h ( x ) ϕ x 1 ξ n k 2 , n n ,
ϕ x 1 k 1 , n n ξ n h + ( x ) ϕ x 1 k 2 , n n ξ n .
Using the condition (8), we can deduce that the bandwidths h ( x ) and h + ( x ) both belong to the interval:
h n , 1 , h n , 2 = μ ϕ 1 ρ n k 1 , n n , ν ϕ 1 k 1 , n ρ n n .
Checking conditions ( L 4 ) and ( L 5 ) To check ( L 4 ) , we begin by noting that since ξ n is bounded by 1, the local bandwidth h ( x ) leads to:
sup C C sup h n , 1 h ( x ) h n , 2 sup x S E M n ( C , x ; D n , k ( ξ n , x ) ) M n ( C , x ) = sup h n , 1 h ( x ) h n , 2 sup x S E i = 1 n 1 { Y i C } K d E ( x , X i ) h ( x ) i = 1 n K d E ( x , X i ) h ( x ) Γ ( C , x ) = sup h n , 1 h ( x ) h n , 2 sup x S E Γ n ( C , x ; h ( x ) ) Γ ( C , x ) = O a . c o h n , 2 γ + O a . c o ψ S x log n n n ϕ ( h n , 1 ) .
This is equivalent to:
sup C C sup k n , 1 k k n , 2 sup x S E Γ n ( C , x ; h ( x ) ) Γ ( C , x ) = O a . c o ϕ 1 k 2 , n ξ n n γ + ψ S x log n n n ϕ μ ϕ 1 ρ n k 1 , n n = O a . c o ( U n ) .
We use similar reasoning to verify condition ( L 5 ) , and we obtain:
sup C C sup k n , 1 k k n , 2 sup x S E M n ( C , x ; D n , k ( ξ n , x ) ) M n ( C , x ) = O a . c o ( U n ) .
Thus, conditions ( L 4 ) and ( L 5 ) are verified.
Verifying condition ( L 2 ) To verify condition ( L 2 ) , we show that for all x S E and ε 0 ,
n 1 P 1 D n , k ( ξ n , x ) H n , k ( x ) D n , k + ( ξ n , x ) 1 > ε 0 < .
For fixed ε 0 , let { x 1 , , x N E } be an ε -net for S E . For each x S E , we have:
P 1 D n , k ( ξ n , x ) H n , k ( x ) D n , k + ( ξ n , x ) 1 > ε 0 P H n , k ( x ) ϕ x 1 ξ n k 1 , n n + P H n , k ( x ) ϕ x 1 k 2 , n ξ n n = 1 N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 k 2 , n ξ n n + = 1 N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 ξ n k 1 , n n N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 ξ n k 1 , n n + N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 k 2 , n ξ n n .
We begin by stating a lemma similar to the one presented in [34] (see Lemma A2). For the sake of completeness, we include the proof here.
By applying Lemma A2, we deduce that
P H n , k ( x ) ϕ x 1 α k 1 , n n = P i = 1 n 1 B x , ϕ x 1 α k 1 , n n ( X i ) > k = P i = 1 n 1 B x , ϕ x 1 α k 1 , n n ( X i ) > k α k 1 , n α k 1 , n exp k α k 1 , n 4 .
From this, we obtain the following inequality:
N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 k 1 , n α n N ε n ( S E ) k 2 , n exp 1 α k 1 , n 4 N ε n ( S E ) n 1 ( 1 α ) 4 k 1 , n ln n .
Similarly, we derive the bound
P H n , k ( x ) ϕ x 1 k 2 , n α n exp ( k 2 , n α k ) 2 2 α K 2 , n .
Thus, we conclude that
N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 k 2 , n α n N ε n ( S E ) k 2 , n exp 1 α k 1 , n 4 N ε n ( S E ) n 1 ( 1 α ) 4 k 2 , n ln n .
Therefore, as k i , n / ln n for i = 1 , 2 , we conclude
N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 k 1 , n α n N ε n ( S E ) n 1 ( 1 α ) 4 k 1 , n ln n ,
N ε n ( S E ) k = k 1 , n k = k 2 , n P H n , k ( x ) ϕ x 1 k 2 , n α n N ε n ( S E ) n 1 ( 1 α ) 4 k 2 , n ln n .
Next, to verify the condition ( L 3 ), we introduce the following quantities:
Q n 1 : = 1 ( x , D n ( ξ n , x ) ) 1 ( x , D n + ( ξ n , x ) ) ,
Q n 2 : = Γ n , 1 ( x , D n ( ξ n , x ) ) Γ n , 1 ( C , x ; D n + ( ξ n , x ) ) 1 ,
Q n 3 : = 1 ( x , D n ( ξ n , x ) ) 1 ( x , D n + ( ξ n , x ) ) ξ n 1 .
The condition ( L 3 ) is then expressed as
i = 1 n K d E ( x , X i ) D n ( ξ n , x ) i = 1 n K d E ( x , X i ) D n + ( ξ n , x ) ξ n | Q n 1 | | Q n 2 | + | Q n 1 | | Q n 3 | .
Given that ξ n 1 , the desired result follows, leading to
sup K n , 1 K K n , 2 sup x S E i = 1 n K d E ( x , X i ) D n ( ξ n , x ) i = 1 n K d E ( x , X i ) D n + ( ξ n , x ) ξ n = O a . c o ( U n ) .
Proof of (A9).
We use the following results:
sup k n , 1 k k n , 2 sup x S E Q n 1 C ,
sup k n , 1 k k n , 2 sup x S E Q n 2 = O a . c o ψ S x log n n n ϕ μ ϕ 1 ρ n k 1 , n n ,
sup k n , 1 k k n , 2 sup x S E Q n 3 = O ϕ 1 k 2 , n ρ n n .
Proof of (A10).
We use the condition ( H 3 (i)) to conclude that
E K d E ( X 1 , 1 ) h ( x ) η 2 ϕ ( h ( x ) ) .
Then, by applying condition ( H 1 (ii)), we immediately obtain
sup k n , 1 k k n , 2 sup x S E Q n 1 C .
Proof of (A11).
We begin with the following expression:
sup k n , 1 k k n , 2 sup x S E Q n 2 = sup k n , 1 k k n , 2 sup x S E Q n ( k , x , h ( x ) ) Q n ( k , x , h + ( x ) ) 1 1 inf k n , 1 k k n , 2 inf x S E Q n ( K , x , h + ( x ) ) sup k n , 1 k k n , 2 sup x S E Q n ( K , x , h ( x ) ) 1 + sup k n , 1 k k n , 2 sup x S E Q n ( K , x , h + ( x ) ) 1 ) .
To establish this inequality, which provides the following results:
sup k n , 1 k k n , 2 sup x S E Q n ( K , x , h ( x ) ) 1 = O a . c o ψ S x log n n n ϕ μ ϕ 1 ρ n k 1 , n n ,
and
sup k n , 1 k k n , 2 sup x S E Q n ( K , x , h + ( x ) ) 1 = O a . c o ψ S x log n n n ϕ μ ϕ 1 ρ n k 1 , n n .
Additionally, combining (A15), (A16) with the fact that
n 1 P inf k n , 1 k k n , 2 inf x S E Γ n , 1 ( C , x ; D n + ( ξ n , x ) ) < C < ,
it follows that
sup k n , 1 k k n , 2 sup x S E Q n 2 = O a . c o ψ S x log n n n ϕ μ ϕ 1 ρ n k 1 , n n .
Proof of (A12).
As used in [33], part of the proof follows from Lemma 1 in [84]. Additionally, we will rely on computations similar to those in the proof of Lemma 5.7.0.3 in [85]. Consider the following quantity:
B ^ ( x , C K ) = E [ Γ n , 2 ( C , x ; D n + ( ξ n , x ) ) ] E [ Γ n , 1 ( C , x ; D n ( ξ n , x ) ) ] Γ ( C , x ) .
Now, analyzing the quantity:
B ^ ( x , C K ) = ϕ x ( D n ( ξ n , x ) ) ϕ x ( D n + ( ξ n , x ) ) E K d E ( x , X 1 ) D n + ( ξ n , x ) E 1 { Y 1 C } X 1 E K d E ( x , X 1 ) D n ( ξ n , x ) Γ ( C , x ) 1 E K d E ( x , X 1 ) D n ( ξ n , x ) ϕ x ( D n ( ξ n , x ) ) ϕ x ( D n + ( ξ n , x ) ) E K d E ( x , X 1 ) D n + ( ξ n , x ) E 1 { Y 1 C } X 1 Γ ( C , x ) .
Using the fact that ϕ x ( D n ( ξ n , x ) ) ϕ x ( D n + ( ξ n , x ) ) = ξ n , and assuming the condition (H2(i)) holds, i.e.,
| Γ ( X x ) Γ ( C x ) | η 1 d E γ ( X , x ) ,
and that conditions (H1(ii)) and (H3(i)) are satisfied, we then have for all x S E , D n ( ξ n , x ) , and D n + ( ξ n , x ) :
E [ Γ n , 2 ( C , x ; D n + ( ξ n , x ) ) ] E [ Γ n , 1 ( C , x ; D n + ( ξ n , x ) ) ] Γ ( C , x ) η 3 κ 2 ξ n η 1 ϕ x ( D n ( ξ n , x ) ) E 1 B ( x , D n + ( ξ n , x ) ) ( X ) d E γ ( X , x ) η 1 η 2 ξ n η 1 ϕ x ( D n ( ξ n , x ) ) ϕ x ( D n + ( ξ n , x ) ) ( D n + ( ξ n , x ) ) γ C 2 C 3 η 2 ξ n C 1 η 1 ( D n + ( ξ n , x ) ) γ C ( D n + ( ξ n , x ) ) γ .
Taking into account the condition (H3(i)) and the fact that ξ n 1 , we conclude:
sup ( h ( x ) , h + ( x ) ) [ h n , 1 , h n , 2 ] 2 sup x S E E [ K d E ( x , X 1 ) D n + ( ξ n , x ) ] E [ K d E ( x , X 1 ) D n ( ξ n , x ) ] ξ n 1 C h n , 2 γ .
Finally, rewriting (A19) with C 1 , we obtain:
sup ( h ( x ) , h + ( x ) ) [ h n , 1 , h n , 2 ] 2 sup x S E E [ K d E ( x , X 1 ) D n + ( ξ n , x ) ] E [ K d E ( x , X 1 ) D n ( ξ n , x ) ] ξ n 1 C h n , 2 γ .
This is equivalent to
sup k n , 1 k k n , 2 sup x S E Q n 3 = O ϕ 1 k 2 , n ρ n n .
Combining the results from (A10), (A18), and (A20), and noting that ξ n 1 , we conclude:
sup k n , 1 k k n , 2 sup x S E i = 1 n K d E ( x , X i ) D n ( ξ n , x ) i = 1 n K d E ( x , X i ) D n + ( ξ n , x ) ξ n = O a . c o ( U n ) .
Thus, condition ( L 3 ) is satisfied. It is clear that condition ( L 0 ) is satisfied by (H3(i)), and condition (L1) is trivially satisfied due to the construction of D n ( ξ n , x ) and D n + ( ξ n , x ) , as well as the application of (A1). This completes the proof of Theorem 1. □

Appendix A.2. Proof Theorem 2

To prove Theorem 2, we use the Cramér-Wold device, which means that it is enough to establish the convergence of the one-dimensional distribution. Moreover, due to the linearity of Λ ( C ; k K x ) , it suffices to show that
Λ ( Φ ; H n , k ( x ) x ) N ( 0 , σ 2 ( Φ , x ) ) ,
for all Φ of the form
Φ = p = 1 L c p 1 C p , c 1 , , c L R , C 1 , , C L C .
Thus, our task reduces to proving convergence in one dimension. Recall that we are working with
Λ ( C ; H n ; k ( x ) x ) = n ϕ ( h n ) Γ n ( C , x ; H n , k ( x ) ) Γ ( C , x ) = n ϕ ( h n ) i = 1 n 1 { Y i C } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) Γ ( C , x ) ,
To achieve the desired result, we decompose as follows:
Γ n ( C , x ; H n , k ( x ) ) Γ ( C , x ) = Γ n , 2 ( C , x ; H n , k ( x ) ) Γ n , 1 ( 1 , x ; H n , k ( x ) ) Γ ( C , x ) = 1 Γ n , 1 ( 1 , x ; H n , k ( x ) ) Γ n , 2 ( C , x ; H n , k ( x ) ) Γ n , 2 ( C M , x ; H n , k ( x ) ) Γ ( C , x ) E ( Γ n , 2 ( C , x ; H n , k ( x ) ) ) + E ( Γ n , 2 ( C M , x ; H n , k ( x ) ) ) E ( Γ n , 2 ( C , x ; H n , k ( x ) ) ) + Γ n , 2 ( C M , x ; H n , k ( x ) ) E ( Γ n , 2 ( C M , x ; H n , k ( x ) ) ) Γ ( C , x ) · ( Γ n , 1 ( 1 , x ; H n , k ( x ) ) 1 ) = 1 Γ n , 1 ( 1 , x ; H n , k ( x ) ) I 1 ( x ; H n , k ) + I 2 ( x ; H n , k ) ,
where
I 1 ( x ; H n , k ) = Γ n , 2 ( C , x ; H n , k ( x ) ) Γ n , 2 ( C M , x ; H n , k ( x ) ) E Γ n , 2 ( C , x ; H n , k ( x ) ) Γ n , 2 ( C M , x ; H n , k ( x ) ) + E ( Γ n , 2 ( C M , x ; H n , k ( x ) ) ) Γ ( C , x ) ,
and
I 2 ( x ; H n , k ) = Γ n , 2 ( C M , x ; H n , k ( x ) ) E ( Γ n , 2 ( C M , x ; H n , k ( x ) ) ) Γ ( C , x ) Γ n , 1 ( 1 , x ; H n , k ( x ) ) 1 .
Lemma A3.
Under the assumptions  (H2)(i), (H2)(ii), (H3)(ii), (H4)(i), and(H8), we have
n ϕ ( h k ) I 1 ( x , H n , k ( x ) ) P 0 as n .

Appendix A.3. Proof of Lemma A3

Let us consider the term I 1 ( x , H n , k ( x ) ) , where 1 D n H n , k ( x ) D n + holds as k n 0 . By applying condition (H3) (ii), we obtain the inequality:
Γ n , 2 ( C , x ; D n + ) Γ n , 2 ( C , x ; H n , k ( x ) ) Γ n , 2 ( C , x ; D n ) .
Next, we have:
n ϕ ( h k ) S 1 ( x , H n , k ( x ) ) E ( S 1 ( x , H n , k ( x ) ) )
n ϕ ( h k ) Γ n , 2 ( C , x , H n , k ( x ) ) E ( Γ n , 2 ( C , x , H n , k ( x ) ) )
+ n ϕ ( h k ) E ( Γ n , 2 ( C M , x , H n , k ( x ) ) ) Γ n , 2 ( C M , x , H n , k ( x ) ) = I 11 ( x , H n , k ( x ) ) + I 12 ( x , H n , k ( x ) ) .
From Equation (A23), we have:
I 11 ( x , H n , k ( x ) ) = n ϕ ( h k ) Γ n , 2 ( C , x , H n , k ( x ) ) E ( Γ n , 2 ( C , x , H n , k ( x ) ) ) n ϕ ( h k ) Γ n , 2 ( C , x , D n ) E ( Γ n , 2 ( C , x , D n + ) ) n ϕ ( h k ) Γ n , 2 ( C , x , D n ) E ( Γ n , 2 ( C , x , D n ) )
+ n ϕ ( h k ) E Γ n , 2 ( C , x , D n ) + E Γ n , 2 ( C , x , D n + ) .
Similarly, from Equation (A24), we have:
I 12 ( x , H n , k ( x ) ) = n ϕ ( h k ) Γ n , 2 ( C M , x , H n , k ( x ) ) E ( Γ n , 2 ( C M , x , H n , k ( x ) ) ) n ϕ ( h k ) Γ n , 2 ( C M , x , D n ) E ( Γ n , 2 ( C M , x , D n ) ) + n ϕ ( h k ) E ( Γ n , 2 ( C M , x , D n ) ) + E ( Γ n , 2 ( C M , x , D n + ) ) .
We can apply the following decomposition to handle Equation (A25):
n ϕ ( h k ) Γ n , 2 ( C , x , D n ) E ( Γ n , 2 ( C , x , D n ) )
n ϕ ( h k ) Γ n , 2 ( C , x , D n ) Γ n , 2 ( C M , x , D n )
+ n ϕ ( h k ) Γ n , 2 ( C M , x , D n ) E ( Γ n , 2 ( C M , x , D n ) )
+ n ϕ ( h k ) E ( Γ n , 2 ( C M , x , D n ) ) E ( Γ n , 2 ( C , x , D n ) ) .
For some ξ n ( 0 , 1 ) , recall that D n and D n + are defined in Equations (A2) and (A3), respectively. Thus, observing:
E ( Γ n , 2 ( C M , x , D n ) ) E ( Γ n , 2 ( C , x , D n ) ) = 1 n E ( 1 ( x , D n ) ) E i = 1 n 1 { Y i C } 1 { | F ( Y i ) | > δ n } i ( x , D n ) E 1 { Y C } 1 { | F ( Y ) | > δ n } 1 ( x , D n ) E ( 1 ( x , D n ) ) 1 .
Under conditions (H8), using Hölder’s inequality with β 1 = p 2 and β 2 such that 1 β 1 + 1 β 2 = 1 , we can write:
n ϕ ( h k ) E ( Γ n , 2 ( C M , x , D n ) ) E ( Γ n , 2 ( C , x , D n ) ) n ϕ ( h k ) ϕ ( D n ) E 1 / β 1 | F ( Y ) | p 1 { | F ( Y ) | > δ n } E 1 / β 2 1 ( x , D n ) β 2 n ϕ ( h k ) ϕ ( D n ) δ n 1 E 1 / β 1 | F ( Y ) p | X ϕ 1 / β 2 ( D n ) C n ϕ ( h k ) δ n ϕ 2 / p ( D n ) C ζ 1 / p n δ n k n 1 / 2 2 / p 0 as n .
To establish convergence in probability for Equation (A28), it is necessary to use assumptions (H8), and Markov’s inequality. Thus, for all ε > 0 , we obtain:
P n ϕ ( h k ) Γ n , 2 ( C M , x , D n ) Γ n , 2 ( C , x , D n ) > n ϕ ( h k ) ε C ϕ ( 1 β 2 ) / β 2 ( D n ) ε δ n n ϕ ( h k ) C ε δ n ϕ 2 / p ( D n ) n ϕ ( h k ) 0 as n .
For the second term (A27) and (Lemma 8.2 in [38]), we have
Γ n , 2 ( C M , x , D n ) E ( Γ n , 2 ( C M , x , D n ) ) 0 as n .
Using the fact that k ϕ 1 k n 2 γ 0 and condition (H2)(i), we have for u = u n = ϕ 1 k n :
n ϕ ( h k ) E ( Γ n , 2 ( C , x , D n ) ) E ( Γ n , 2 ( C , x , D n + ) ) n ϕ ( h k ) E ( Γ n , 2 ( C , x , D n ) ) Γ ( C , x ) + n ϕ ( h k ) E ( Γ n , 2 ( C , x , D n + ) ) Γ ( C , x ) n ϕ ( h k ) E 1 E 1 ( x , D n ) 1 ( x , D n ) 1 { Y i C } Γ ( C , x ) + n ϕ ( h k ) E 1 E 1 ( x , D n + ) 1 ( x , D n + ) 1 { Y i C } Γ ( C , x ) n ϕ ( h k ) u = D n , D n + 1 E 1 ( x , u ) E 1 ( x , u ) 1 { B ( x , u ) } Γ ( C , X 1 ) Γ ( C , x ) C 1 { B ( x , D n ) } + C 1 { B ( x , D n ) } d E ( x , X 1 ) γ 2 C ζ γ k ϕ 1 k n 2 γ 0 as n .
On the other hand, recall that 1 { Y C M } : = 1 { Y C } 1 { F ( Y ) < M } , and under the given conditions, using the fact that the regression function satisfies the Lipschitz condition, we obtain:
n ϕ ( h k ) E ( Γ n , 2 ( C M , x , D n ) ) E ( Γ n , 2 ( C M , x , D n + ) ) n ϕ ( h k ) E ( Γ n , 2 ( C M , x , D n ) ) Γ ( C M , x ) + n ϕ ( h k ) E ( Γ n , 2 ( C M , x , D n + ) ) Γ ( C M , x ) n ϕ ( h k ) E 1 E 1 ( x , D n ) 1 ( x , D n ) 1 { Y C M } Γ ( C M , x ) + n ϕ ( h k ) E 1 E 1 ( x , D n + ) 1 ( x , D n + ) 1 { Y C M } Γ ( C M , x ) n ϕ ( h k ) u = D n , D n + 1 E 1 ( x , u ) E 1 ( x , u ) 1 { B ( x , u ) } Γ ( C M , X 1 ) Γ ( C M , x ) C 1 { B ( x , D n ) } + C 1 { B ( x , D n ) } d E ( x , X 1 ) γ 2 C ζ γ k ϕ 1 k n 2 γ 0 as n .
For the second term on the right-hand side of Equation (A21), we have:
S 2 ( x , H n , k ( x ) ) = E Γ n , 2 ( C , x , H n , k ( x ) ) Γ ( C , x )
E Γ n , 2 ( C , x , D n ) Γ ( C , x ) .
Using Lemma A1, we readily deduce:
n ϕ ( h k ) S 2 ( x , H n , k ( x ) ) = o ζ γ k ϕ 1 k n 2 γ .
The proof of S 2 ( x , H n , k ( x ) ) is now complete. Therefore, combining the results from Equations (A21), (A31), (A32), (A33) and (A36), we conclude the convergence of I 1 ( x , H n , k ( x ) ) to 0.
Lemma A4.
Under the assumptions(H1)(ii),(H2)(i),(H2)(ii),(C2)and(H5)and if
k ϕ 1 k 2 2 γ 0 as n
we have
n ϕ ( h k ) I 2 ( C , x , H n , k ( x ) ) N ( 0 , σ 2 ( x ) ) as n

Appendix A.4. Proof of Lemma A4

Recall that
I 2 ( x , u ) = Γ n , 2 ( C M , x , u ) E Γ n , 2 ( C M , x , u ) Γ ( C , x ) Γ n , 1 ( 1 , x , u ) 1 = Γ n , 2 ( C M , x , u ) Γ ( C , t ) Γ n , 1 ( 1 , x , u ) E Γ n , 2 ( C M , x , u ) Γ ( C , t ) Γ n , 1 ( 1 , x , u ) .
Thus, the decomposition below is an essential step in proving this lemma:
n ϕ ( h K ) I 2 ( C , x , H n , k ( x ) ) = n ϕ ( h K ) Γ n , 2 ( C M , x , H n , k ( x ) ) Γ ( C , t ) Γ n , 1 ( 1 , x , H n , k ( x ) ) E Γ n , 2 ( C M , x , H n , k ( x ) ) Γ ( C , t ) Γ n , 1 ( 1 , x , H n , k ( x ) ) = n ϕ ( h K ) Γ n , 2 ( C M , x , H n , k ( x ) ) Γ n , 2 ( C M , x , D n + ) + n ϕ ( h K ) Γ n , 2 ( C M , x , D n + ) Γ ( C , x ) Γ n , 1 ( 1 , x , D n + ) E Γ n , 2 ( C M , x , D n + ) Γ ( C , x ) Γ n , 1 ( 1 , x , D n + ) + n ϕ ( h K ) Γ ( C , x ) Γ n , 1 ( 1 , x , D n + ) E Γ n , 1 ( 1 , x , D n + ) + n ϕ ( h K ) Γ ( C , x ) Γ n , 1 ( 1 , x , D n + ) E Γ n , 1 ( 1 , x , D n + ) + n ϕ ( h K ) Γ ( C , x ) E Γ n , 1 ( 1 , x , H n , k ( x ) ) Γ n , 1 ( 1 , x , H n , k ( x ) ) + n ϕ ( h K ) E Γ n , 2 ( C M , x , D n + ) E Γ n , 2 ( C M , x , H n , k ( x ) ) = J 1 ( t ) + J 2 ( t ) + J 3 ( t ) + J 4 ( t ) + J 5 ( t ) .
We apply the same arguments used in the proof of I 1 ( C , x , H n , k ( x ) ) , which gives us:
| J 1 ( t ) | = n ϕ ( h K ) | Γ n , 2 ( C M , x , H n , k ( x ) ) Γ n , 2 ( C M , x , D n + ) | n ϕ ( h K ) | Γ n , 2 ( C M , x , D n ) Γ n , 2 ( C M , x , D n + ) | n ϕ ( h K ) | Γ n , 2 ( C M , x , D n ) E Γ n , 2 ( C M , x , D n ) | + n ϕ ( h K ) | E Γ n , 2 ( C M , x , D n + ) Γ n , 2 ( C M , x , D n + ) |
+ n ϕ ( h K ) | E Γ n , 2 ( C M , x , D n ) E Γ n , 2 ( C M , x , D n + ) | ,
and
| J 5 ( t ) | = n ϕ ( h K ) E Γ n , 2 ( C M , x , D n + ) E Γ n , 2 ( C M , x , H n , k ( x ) ) n ϕ ( h K ) | E Γ n , 2 ( C M , x , D n + ) E Γ n , 2 ( C M , x , D n + ) |
+ n ϕ ( h K ) | E Γ n , 2 ( C M , x , D n + ) E Γ n , 2 ( C M , x , H n , k ( x ) ) |
2 n ϕ ( h K ) | E Γ n , 2 ( C M , x , D n + ) E Γ n , 2 ( C M , x , D n + ) | .
By combining Equations (A38) and (A39) and using similar arguments as in the analysis of I 11 ( C M , x , H n , k ( x ) ) and I 12 ( C M , x , H n , k ( x ) ) , we obtain:
| J 1 ( t ) + J 5 ( t ) | n ϕ ( h K ) Γ n , 2 ( C M , x , D n ) E Γ n , 2 ( C M , x , D n ) + 3 n ϕ ( h K ) E Γ n , 2 ( C M , x , D n + ) Γ n , 2 ( C M , x , D n + ) + n ϕ ( h K ) E Γ n , 2 ( C M , x , D n ) E Γ n , 2 ( C M , x , D n + ) .
Thus, as n , we find that | J 1 ( t ) + J 5 ( t ) | 0 . Now, consider the term J 2 ( t ) on the right-hand side of (A37). We introduce the following sum:
J 2 ( t ) = i = 1 n Z n i ,
where
Z n i = n ϕ ( h K ) n E ( 1 ( x , D n + ) ) 1 { Y i C M } Γ ( C M , x ) i ( x , D n + ) E 1 { Y i C M } Γ ( C M , x ) i ( x , D n + ) ,
and
J 2 ( t ) = n ϕ ( h K ) Γ n , 2 ( C M , x , D n + ) Γ ( C , x ) Γ n , 2 ( 1 , x , D n + ) E Γ n , 2 ( C M , x , D n + ) Γ ( C , x ) Γ n , 2 ( 1 , x , D n + ) .
Thus, the result we claim is:
J 2 ( t ) N ( 0 , σ 2 ( x ) ) .
The asymptotic normality of J 2 ( t ) was showed in (Lemma 8.8 in [38]) by choosing the bandwidth parameters as u = D n + . For J 3 ( t ) and J 4 ( t ) , we obtain by (Lemma 8.2 in [38]) and the fact that E ( Γ n , 1 ( 1 , x , u ) ) = 1 with u = D n or D n + :
| J 3 ( t ) + J 4 ( t ) | = n ϕ ( h K ) Γ ( C , x ) Γ n , 1 ( 1 , x , D n + ) E ( Γ n , 1 ( 1 , x , D n + ) ) + n ϕ ( h K ) Γ ( C , x ) E ( Γ n , 1 ( 1 , x , H n , k ( x ) ) ) Γ n , 1 ( 1 , x , H n , k ( x ) ) n ϕ ( h K ) Γ ( C , x ) Γ n , 1 ( 1 , x , D n + ) 1 + n ϕ ( h K ) 1 Γ n , 1 ( 1 , x , D n + ) 2 n ϕ ( h K ) Γ ( C , x ) Γ n , 1 ( 1 , x , D n + ) 1 .
We find that
n ϕ ( h K ) Γ n , 1 ( 1 , x , D n + ) 1 0 as n .
Finally, we conclude that
| J 3 ( t ) + J 4 ( t ) | 0 .
Hence, the proof is complete.
Lemma A5.
Under the assumptions (H1)(ii),(H2)(i),(H2)(ii),(C2), and(H5), and provided that
k ϕ 1 k 2 2 γ 0 as n ,
it follows that for each x C :
n ϕ ( h k ) Γ n , 1 ( 1 , x ; H n , k ( x ) ) 1 P 0 as n .

Appendix A.5. Proof of Lemma A5

To prove this lemma, it is sufficient to apply the results from [32] in inequality (A22), along with Chebyshev’s inequality. For any ε > 0 , we derive the following:
P Γ n , 1 ( 1 , x , H n , k ( x ) ) 1 > δ P Γ n , 1 ( 1 , x , D n ) E Γ n , 1 ( 1 , x , D n ) > ε Var Γ n , 1 ( 1 , x , D n ) ε 2 .
Using the fact that
Var Γ n , 1 ( 1 , x , D n ) = o 1 n ϕ ( x , D n ) ,
we ultimately obtain the result
Γ n , 1 ( 1 , x ; H n , k ( x ) ) 1 P 0 as n .
Thus, the proof is complete.
Lemma A6.
Given the conditions (H1)(ii), (H3), and(H5)(i), along with the assumption that n ϕ ( h K ) , for each x C , we have
Γ n , 1 ( C , x ; H K ( x ) ) p . s . E Γ n , 1 ( C , x ; H K ( x ) ) , as n .

Appendix A.6. Proof of Lemma A6

To prove Lemma A6, we first express the following:
Γ n , 2 ( C , x , h K ( x ) ) E Γ n , 2 ( C , x , h K ( t ) ) = 1 n E 1 ( x , h K ( x ) ) i = 1 n 1 { Y i C } i ( x , h K ( x ) ) E 1 { Y i C } i ( x , h K ( x ) ) = 1 n E 1 ( x , h K ( x ) ) i = 1 n Z i ( x , h K ( x ) ) ,
where
Z i ( x , h K ( x ) ) = 1 { Y i C } i ( x , h K ( x ) ) E 1 { Y i C } i ( x , h K ( x ) ) .
Taking into account assumption H4, and using Hölder’s inequality with 2 < q < p such that 1 p + 1 q = 1 , we obtain for i j :
E 1 { Y i C } i ( x ; h K ( x ) ) 1 { Y j C } j ( x ; h K ( x ) ) = E i ( x ; h K ( x ) ) j ( x ; h K ( x ) ) E 1 { Y i C } 1 { Y j C } E i ( x ; h K ( x ) ) j ( x ; h K ( x ) ) E 1 / p [ F p ( Y ) | X ] E 1 / q [ F q ( Y ) | X ] C E i ( x ; h K ( x ) ) j ( x ; h K ( x ) ) C sup i j P ( X i , X j ) B ( x , h K ( x ) ) × B ( x , h K ( x ) ) C Ψ ( h K ( x ) ) f 2 ( x ) ,
where the last inequality follows from condition H1(iii). Next, we examine the variance of the sum:
Var 1 n E 1 ( x , h K ( x ) ) i = 1 n Z i ( x , h K ( x ) ) = 1 n 2 E 2 1 ( x , h K ( x ) ) i = 1 n j = 1 n Cov 1 { Y i C } i ( x , h K ( x ) ) ; 1 { Y j C } j ( x , h K ( x ) ) = 1 n 2 E 2 1 ( x , h K ( x ) ) Var 1 { Y 1 C } 1 ( x , h K ( x ) ) + 1 n 2 E 2 1 ( x , h K ( x ) ) i j j = 1 n Cov 1 { Y i C } i ( x , h K ( x ) ) ; 1 { Y j C } j ( x , h K ( x ) ) = : V 1 + V 2 .
We begin with the term V 1 , considering assumptions H1(ii) and H2(i), we have the following bounds:
c 5 η 1 ϕ ( h K ( x ) ) f 1 ( x ) E 1 j ( x , h K ( x ) ) c 6 η 2 ϕ ( h K ( x ) ) f 1 ( x ) ,
and for the variance:
Var 1 { Y 1 C } 1 ( x ; h K ( x ) ) = E 1 { Y 1 C } 2 1 2 ( x , h K ( x ) ) E 2 1 { Y 1 C } 1 ( x , h K ( x ) ) = E 1 { Y 1 C } 2 1 2 ( x , h K ( x ) ) = E E F 2 ( Y ) | X 1 2 ( x , h K ( x ) ) = E 2 / p F p ( Y ) | X E 2 / q 1 q ( x , h K ( x ) ) = C 2 / p C 2 2 / q η 2 2 ϕ 2 ( h K ( x ) ) f 1 2 ( x ) .
Hence, combining this with (A42) gives:
const 1 n f 1 ( x ) ϕ ( h K ( x ) ) V 1 const 2 n f 1 ( x ) ϕ ( h K ( x ) ) ,
for f 1 ( x ) > 0 and constants const 1 < const 2 .
Next, consider V 2 :
V 2 = 1 n 2 E 2 1 ( x , h K ( x ) ) 0 < | i j | ω n j = 1 n Cov 1 { Y i C } i ( x , h K ( x ) ) ; 1 { Y j C } j ( x , h K ( x ) ) + | i j | ω n j = 1 n Cov 1 { Y i C } i ( x , h K ( x ) ) ; 1 { Y j C } j ( x , h K ( x ) )
= : V 2 , 1 + V 2 , 2 ,
where ω n satisfies ω n = o ( n ) . Using conditions H1(iii) and H3(i), we deduce that:
Cov 1 { Y i C } i ( x ; h K ( x ) ) , 1 { Y j C } j ( x ; h K ( x ) ) C Ψ ( h K ( x ) ) f 2 ( x ) C 2 / p C 2 2 / q η 2 2 ϕ 2 ( h K ( x ) ) f 1 2 ( x ) .
This yields the bound:
V 2 , 1 Ψ ( h K ( x ) ) f 2 ( x ) C 2 / p C 2 2 / q η 2 2 ϕ 2 ( h K ( x ) ) f 1 2 ( x ) n 2 E 2 1 ( x , h K ( x ) ) n ω n = η 2 2 f 2 ( x ) Ψ ( h K ) ω n n E 2 1 ( x , h K ( x ) ) C 2 / p C 2 2 / q η 2 2 ϕ 2 ( h K ( x ) ) f 1 2 ( x ) ω n n .
Using Equation (A42), we obtain:
V 2 , 1 const f 2 ( x ) Ψ ( h K ) ω n n f 1 2 ( x ) ϕ 2 ( h K ) const ω n n ϕ 2 ( h K ( x ) ) f 1 2 ( x ) .
This, in combination with (A43), implies:
V 2 , 1 V 1 const f 2 ( x ) f 1 ( x ) Ψ ( h K ) ω n ϕ ( h K ) ω n ϕ ( h K ) f 1 ( x ) C 2 / p C 2 2 / q η 2 2 ϕ 2 ( h K ( x ) ) f 1 2 ( x ) const f 2 ( x ) f 1 ( x ) Ψ ( h K ) ω n ϕ ( h K ) const ω n ϕ 3 ( h K ) f 1 2 ( x ) .
By choosing ω n such that the above bound tends to 0 as n , we conclude that V 2 , 1 vanishes. Now consider V 2 , 2 . By applying Davydov’s lemma for strong mixing sequences and using the hypothesis H5(i), we have:
Cov 1 { Y i C } i ( x ; h K ( x ) ) , 1 { Y j C } j ( x ; h K ( x ) ) 8 E 1 { Y i C } i ( x ; h K ( x ) ) p 2 / p α ( | i j | ) 1 2 / p 8 E E [ | F ( Y i ) | p X ] | i ( x ; h K ( x ) ) | p 2 / p α ( | i j | ) 1 2 / p 8 C E | i ( x ; h K ( x ) ) | p 2 / p α ( | i j | ) 1 2 / p .
Thus, we obtain:
V 2 , 2 const f 1 2 / p ( x ) ϕ ( h K ) 2 / p n 2 E 2 1 ( x , h K ( x ) ) | i j | ω n α ( | i j | ) 1 2 / p .
By using (A42) and reducing the double sum, we get:
V 2 , 2 const n ω n δ f 1 2 ( 1 1 / p ) ( x ) ϕ ( h K ) 2 ( 1 1 / p ) k = ω n + 1 k δ α ( k ) 1 2 / p .
Combining the above with the bound on V 1 , we get:
V 2 , 2 V 1 const ω n δ ( log ( ω n ) ) δ ( 1 1 / p ) f 1 1 2 / p ( x ) ϕ ( h K ) ( 1 2 / p ) k = ω n + 1 k δ log ( k ) 1 2 / p ( α ( k ) ) 1 2 / p .
By choosing ω n = ϕ ( h K ) ( 1 2 / p ) / δ and using assumption H5(i), we get:
V 2 , 2 V 1 const 1 δ 1 2 p log ( ϕ ( h K ) ) δ ( 1 1 / p ) f 1 1 2 / p ( x ) k = ω n + 1 k δ log ( k ) 1 2 / p ( α ( k ) ) 1 2 / p
0 as n .
Thus, by the choice of ω n and the boundedness of Ψ ( h K ) ϕ 2 ( h K ) , we conclude that:
V 2 , 2 V 1 const f 2 ( x ) Ψ ( h K ) f 1 ( x ) ϕ ( h K ) 2 ω n ϕ ( h K ) + ϕ ( h K ) 1 ( 1 2 / p ) / δ .
Since both terms tend to 0 as n , the proof is complete.

Appendix A.7. Proof of Theorem 3

Here, we will apply the same technique as in [20] and earlier work [86], utilizing the blocking approach. This method involves breaking down a strictly stationary sequence ( X 1 , , X n ) into 2 υ n equal-sized blocks, each of length n 2 υ n a n , while keeping in mind the notation. The goal is to establish the asymptotic equi-continuity of the conditional empirical process:
Λ n ( C , x ) : = n ϕ ( h n ) Γ n ( C , x ; H n , k ( x ) ) Γ ( C , x ) , C C K , C C .
Let us define, for any C K C K and x E :
W n ( x , C ; H n , k ( x ) ) = i = 1 n 1 { Y i C } K d E ( x , X i ) H n , k ( x ) n E 1 { Y 1 C } K d E ( x , X i ) H n , k ( x ) ,
and
Λ ( C x ) = n ϕ ( h n ) Γ n ( C , x ; H n , k ( x ) ) Γ ( C , x ) = n ϕ ( h n ) Γ n ( C , x ; H n , k ( x ) ) Γ ( C , x ) .
Thus, we have:
Λ ( C x ) = n ϕ ( h n ) Γ n ( C , x ; H n , k ( x ) ) Γ ( C , x ) = n ϕ ( h n ) i = 1 n 1 { Y i C } K d E ( x , X i ) H n , k ( x ) i = 1 n K d E ( x , X i ) H n , k ( x ) Γ ( C , x ) = 1 Γ ˘ n , 1 ( C , x ; H n , k ( x ) ) 1 n ϕ ( h n ) W n ( x , C ; H n , k ( x ) ) E ( Γ ˘ n , 2 ( C , x ; H n , k ( x ) ) ) Γ ˘ n , 1 ( C , x ) E ( Γ ˘ n , 1 ( C , x ; H n , k ( x ) ) ) 1 n ϕ ( h n ) W n ( x , 1 ; H n , k ( x ) ) n ϕ ( h n ) ( x ; H n , k ( x ) ) ,
where for h K > 0 , we define:
Γ ˘ n , 2 ( C , x ; h k ) : = 1 n ϕ ( h K ( x ) ) i = 1 n 1 { Y i C } i ( x , h K ) ,
Γ ˘ n , 1 ( 1 , x ; h k ) : = 1 n ϕ ( h K ( x ) ) i = 1 n i ( x , h K ) .
We now study the asymptotic equi-continuity of each of the terms above. For a class of functions G , let α n ( · ) be an empirical process based on the sample ( X 1 , Y 1 ) , , ( X n , Y n ) , indexed by G :
α n ( g ) = 1 n i = 1 n g ( X i , Y i ) E ( g ( X i , Y i ) ) , α n ( g ) G = sup g G | α n ( g ) | ,
and for a measurable function 1 { · C } and x E , define:
η n , x , C , K ( u , v , h K ) = 1 { v C } K d E ( u , x ) h K , for u , v E .
Thus, we have:
1 n ϕ ( h K ( x ) ) W n ( x , C ; H n , k ( x ) ) = 1 ϕ ( h K ( x ) ) α n ( η n , x , C , K ( u , v , h K ) ) .
Recall that 1 { D n H n , k ( x ) D n + } a . c . 1 as k n 0 . We aim to prove the asymptotic equi-continuity of:
n k α n ( η n , x , C , K ) : η n , x , C , K E K ,
which means, for every ε > 0 , we need to show:
lim b 0 lim n sup P n k α n ( η n , x , C , K ) E K ( b , · ) > ε = 0 ,
where
E K ( b , · ) = η n , x , C , K 1 η n , x , C , K 2 : η n , x , C , K 1 η n , x , C , K 2 p < b , η n , x , C , K 1 , η n , x , C , K 2 E K .
The idea is to work with the independent block sequence { ξ j = ( ζ j , ς j ) } j = 1 rather than focusing on the dependent one, ((A1) proof of Lemma 8.2 in [38]) we have:
P k 1 / 2 j = 1 n 1 { Y i C } K d E ( x , X i ) H n , k ( x ) P ( η n , x , C , K 1 ( H n , k ( x ) ) ) E K ( b , · ) > δ 2 P k 1 / 2 j = 1 υ n i H j 1 { Y i C } K d E ( ς i , x ) H n , k ( x , ς ) P ( η n , x , C , K 1 ( H n , k ( x , ς ) ) ) E K ( b , · ) > δ
+ 2 ( υ n 1 ) α a n ,
where H n , k ( x , ς ) is defined as:
H n , k ( x , ς ) = min h R + : i = 1 n 1 B ( x , h ) ( ς i ) = k .
We choose:
a n = ( log n ) 1 n p 2 ϕ ( h K ) 1 2 ( p 2 ) and υ n = n 2 a n .
Note that a n in our setting is equivalent to:
( log n ) 1 n 2 k p 1 2 ( p 2 ) .
Using assumption ( H 5 )(i), we obtain ( υ n 1 ) α a n 0 as n 0 . Therefore, it remains to analyze the right-hand side term of (A49). Let us begin with the blocks being independent. We symmetrize using a sequence of i.i.d. Rademacher variables { ϵ j } j N * , where P ( ϵ j = 1 ) = P ( ϵ j = 1 ) = 1 / 2 . Importantly, for all δ > 0 , we have:
lim b 0 lim n P k 1 / 2 j = 1 υ n ϵ j i H j 1 { Y i C } K d E ( ς i , x ) H n , k ( x , ς ) E K ( b , · ) > δ = 0 .
Finally, using the fact that 1 { D n H n , k ( x ) D n + } a . c . 1 as k n 0 , and ((A9) proof of Lemma 8.10 in [38]) it suffices to prove:
lim b 0 lim n P k 1 / 2 j = 1 υ n ϵ j i H j 1 { Y i C } K d E ( ς i , x ) D n E K ( b , · ) > δ = 0 .
Since the p-th conditional moment, we can truncate and get, for each λ > 0 as n :
k 1 / 2 j = 1 n E η 2 F ( ς i ) 1 { F ( ς i ) λ ( M n ) 1 / 2 ( p 1 ) } = k 1 / 2 0 P η 2 F 1 { F λ ( M n ) 1 / 2 ( p 1 ) } t d t = k 1 / 2 0 λ ( M n ) 1 / 2 ( p 1 ) P F λ ( M n ) 1 / 2 ( p 1 ) d t + k 1 / 2 0 λ ( M n ) 1 / 2 ( p 1 ) P F x d t
n 0 .
Hence, there exists a sequence λ n 0 as n such that:
k 1 / 2 E η 2 F 1 { F ( ς i ) λ ( M n ) 1 / 2 ( p 1 ) } 0 as n .
Next, we aim to demonstrate the following:
lim b 0 lim n P k 1 / 2 j = 1 υ n ϵ j i H j 1 { Y i C } K d E ( ς i , x ) D n 1 { F ( ς i ) λ ( M n ) 1 / 2 ( p 1 ) } E K ( b , · ) > δ = 0 .
We now define:
Λ n ( 2 ) ( η n , x , C , K ) = k 1 / 2 j = 1 υ n ϵ j i H j 1 { Y i C } K d E ( ς i , x ) D n 1 { F ( ς i ) λ ( M n ) 1 / 2 ( p 1 ) } .
The chaining approach from [86] uses the choice b q = 2 q , q = 0 , , q n , where q n satisfies:
2 1 λ n ( log ( n ) ) 1 b q 2 λ n ( log ( n ) ) 1 .
For the class C K q of measurable sets, we have the covering number N q = N ( b q , C K q , · p ) , and the following bound on the minimal distance:
sup η n , x , C 1 , K 1 C K min η n , x , C 2 , K 2 C K η n , x , C 1 , K 1 η n , x , C 2 , K 2 p b q .
There exists a map π q : C K C K q that maps each η n , x , C , K C K to its closest set in C K q , such that:
η n , x , C , K π q ( η n , x , C , K ) p b q .
By utilizing the chaining technique, we get the following inequality:
sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K η n , x , C 1 , K 1 η n , x , C 2 , K 2 p b Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K η n , x , C 1 , K 1 η n , x , C 2 , K 2 p b q n Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) + 2 q = 1 q n sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K q 1 η n , x , C 1 , K 1 η n , x , C 2 , K 2 p 3 b q Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) + sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K 0 η n , x , C 1 , K 1 η n , x , C 2 , K 2 p 2 b Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) .
Let δ q be defined as:
δ q = ( b q ) 1 / 2 3 b q ( 8 + c p , α 2 ) 1 / 2 ( log N q ) 1 / 2 .
Let Γ be chosen such that:
2 q = 1 + δ q δ .
From the above, we deduce that:
P sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K η n , x , C 1 , K 1 η n , x , C 2 , K 2 p b Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) 3 δ P sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K η n , x , C 1 , K 1 η n , x , C 2 , K 2 p b q n Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) δ + 2 q = 1 q n P sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K q 1 η n , x , C 1 , K 1 η n , x , C 2 , K 2 p 3 b q Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) δ q + P sup η n , x , C 1 , K 1 , η n , x , C 2 , K 2 C K 0 η n , x , C 1 , K 1 η n , x , C 2 , K 2 p 2 b Λ n ( 2 ) ( η n , x , C 1 , K 1 η n , x , C 2 , K 2 ) .
By the boundedness of the terms in Λ n ( 2 ) ( η n , x , C , K ) , we apply Bernstein’s inequality to obtain:
B 2 q = 1 q n exp 2 log N q δ q 2 k n b q 2 c p , α + ( 4 / 3 ) δ q a n λ n n p / 2 ( p 1 ) ϕ ( h K ) ( p 2 ) / 2 ( p 1 ) .
Using the expression for b q , we have:
δ q a n λ n n p / 2 ( p 1 ) ϕ ( h K ) ( p 2 ) / 2 ( p 1 ) = ( 4 / 3 ) δ q λ n k ( log ( n ) ) 1 ( 8 / 3 ) n b q 2 δ q 8 n b q 2 .
Thus:
B 2 q = 1 q n exp 2 log N q δ q 2 ( 8 + c p , α 2 ) b q 2 2 q = 1 q n exp δ q 2 2 ( 8 + c p , α 2 ) b q 2 2 q = 1 q n exp 2 q 2 ( 8 + c p , α 2 ) b 0 as b 0 .
Finally, we deduce that for each δ > 0 :
lim n P Λ n ( 2 ) ( η n , x , C , K ) C K ( λ n 1 / 2 ( log ( n ) ) 1 / 2 , · p ) δ = 0 .
By applying the square root trick (Lemma 5.2 in [87]), similar methods as in [88], and following the arguments in [86], we obtain:
A 2 0 .
Thus, the theorem is proved.

References

  1. Pollard, D. Empirical Processes: Theory and Applications; NSF-CBMS Regional Conference Series in Probability and Statistics; Institute of Mathematical Statistics: Hayward, CA, USA; American Statistical Association: Alexandria, VA, USA, 1990; Volume 2, p. viii+86. [Google Scholar]
  2. Shorack, G.R.; Wellner, J.A. Empirical Processes with Applications to Statistics; Classics in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 2009; Volume 59, p. xli+956. [Google Scholar] [CrossRef]
  3. Pyke, R. The weak convergence of the empirical process with random sample size. Proc. Camb. Philos. Soc. 1968, 64, 155–160. [Google Scholar] [CrossRef]
  4. Vapnik, V.N.; Červonenkis, A.J. The uniform convergence of frequencies of the appearance of events to their probabilities. Teor. Verojatnost. Primenen. 1971, 16, 264–279. [Google Scholar] [CrossRef]
  5. Csörg˝o, M. Invariance principles for empirical processes. Handb. Stat. 1984, 4, 431–462. [Google Scholar] [CrossRef]
  6. Dudley, R.M. Central limit theorems for empirical measures. Ann. Probab. 1978, 6, 899–929. [Google Scholar] [CrossRef]
  7. Pollard, D. A central limit theorem for empirical processes. J. Aust. Math. Soc. 1982, 33, 235–248. [Google Scholar] [CrossRef]
  8. Stute, W. Conditional empirical processes. Ann. Statist. 1986, 14, 638–647. [Google Scholar] [CrossRef]
  9. Ossiander, M. A central limit theorem under metric entropy with L2 bracketing. Ann. Probab. 1987, 15, 897–919. [Google Scholar] [CrossRef]
  10. Withers, C.S. Convergence of empirical processes of mixing rv’s on [0,1]. Ann. Statist. 1975, 3, 1101–1108. [Google Scholar] [CrossRef]
  11. Philipp, W. Invariance principles for sums of mixing random elements and the multivariate empirical process. Colloq. Math. Soc. János Bolyai 1984, 36, 843–873. [Google Scholar]
  12. Yoshihara, K.i. Conditional empirical processes defined by ϕ-mixing sequences. Comput. Math. Appl. 1990, 19, 149–158. [Google Scholar] [CrossRef]
  13. Polonik, W.; Yao, Q. Set-indexed conditional empirical and quantile processes based on dependent data. J. Multivar. Anal. 2002, 80, 234–255. [Google Scholar] [CrossRef]
  14. Poryvaĭ, D.V. An invariance principle for conditional empirical processes formed by dependent random variables. Izv. Math. 2005, 69, 129–148. [Google Scholar] [CrossRef]
  15. Yu, B. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 1994, 22, 94–116. [Google Scholar] [CrossRef]
  16. Doukhan, P.; Massart, P.; Rio, E. Invariance principles for absolutely regular empirical processes. Ann. l’IHP Probab. Stat. 1995, 31, 393–427. [Google Scholar]
  17. Eberlein, E. Weak convergence of partial sums of absolutely regular sequences. Statist. Probab. Lett. 1984, 2, 291–293. [Google Scholar] [CrossRef]
  18. Bouzebda, S. Weak convergence of the conditional single index U-statistics for locally stationary functional time series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
  19. Bouzebda, S.; Nemouchi, B. Central limit theorems for conditional empirical and conditional U-processes of stationary mixing sequences. Math. Methods Statist. 2019, 28, 169–207. [Google Scholar] [CrossRef]
  20. Bouzebda, S.; Nemouchi, B. Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. 2023, 26, 33–88. [Google Scholar] [CrossRef]
  21. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2005; p. xx+426. [Google Scholar]
  22. Bosq, D. Linear Processes in Function Spaces: Theory and Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 2000; Volume 149, p. xiv+283. [Google Scholar] [CrossRef]
  23. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis: Theory and Practice; Springer Series in Statistics; Springer: New York, NY, USA, 2006; p. xx+258. [Google Scholar]
  24. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications. Springer: New York, NY, USA, 2012; p. xiv+422. Springer Series in Statistics. [Google Scholar] [CrossRef]
  25. Agua, B.M.; Bouzebda, S. Single index regression for locally stationary functional time series. AIMS Math. 2024, 9, 36202–36258. [Google Scholar] [CrossRef]
  26. Bouzebda, S.; Taachouche, N. Oracle inequalities and upper bounds for kernel conditional U-statistics estimators on manifolds and more general metric spaces associated with operators. Stochastics 2024, 96, 2135–2198. [Google Scholar] [CrossRef]
  27. Bouzebda, S.; Madani, F.; Souddi, Y. Some asymptotic properties of the conditional set-indexed empirical process based on dependent functional data. Int. J. Math. Stat 2022, 22, 77–105. [Google Scholar]
  28. Bouzebda, S.; Nezzal, A. Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci. 2022, 5, 431–533. [Google Scholar] [CrossRef]
  29. Souddi, Y.; Madani, F.; Bouzebda, S. Some characteristics of the conditional set-indexed empirical process involving functional ergodic data. Bull. Inst. Math. Acad. Sin. (N.S.) 2021, 16, 367–399. [Google Scholar] [CrossRef]
  30. Bouzebda, S.; Souddi, Y.; Madani, F. Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data. Mathematics 2024, 12, 448. [Google Scholar] [CrossRef]
  31. Horváth, L.; Yandell, B.S. Asymptotics of conditional empirical processes. J. Multivariate Anal. 1988, 26, 184–206. [Google Scholar] [CrossRef]
  32. Burba, F.; Ferraty, F.; Vieu, P. k-nearest neighbour method in functional nonparametric regression. J. Nonparametr. Stat. 2009, 21, 453–469. [Google Scholar] [CrossRef]
  33. Kudraszow, N.L.; Vieu, P. Uniform consistency of kNN regressors for functional variables. Statist. Probab. Lett. 2013, 83, 1863–1870. [Google Scholar] [CrossRef]
  34. Kara, L.Z.; Laksaci, A.; Rachdi, M.; Vieu, P. Data-driven kNN estimation in nonparametric functional data analysis. J. Multivariate Anal. 2017, 153, 176–188. [Google Scholar] [CrossRef]
  35. Almanjahie, I.M.; Bouzebda, S.; Chikr Elmezouar, Z.; Laksaci, A. The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. Stat. Risk Model. 2022, 38, 47–63. [Google Scholar] [CrossRef]
  36. Bouzebda, S.; Laksaci, A.; Mohammedi, M. The k-nearest neighbors method in single index regression model for functional quasi-associated time series data. Rev. Mat. Complut. 2023, 36, 361–391. [Google Scholar] [CrossRef]
  37. Almanjahie, I.M.; Chikr Elmezouar, Z.; Laksaci, A.; Rachdi, M. kNN local linear estimation of the conditional cumulative distribution function: Dependent functional data case. C. R. Math. Acad. Sci. Paris 2018, 356, 1036–1039. [Google Scholar] [CrossRef]
  38. Bouzebda, S.; Nezzal, A. Uniform in number of neighbors consistency and weak convergence of kNN empirical conditional processes and kNN conditional U-processes involving functional mixing data. AIMS Math. 2024, 9, 4427–4550. [Google Scholar] [CrossRef]
  39. Bouzebda, S. Uniform in Number of Neighbor Consistency and Weak Convergence of k-Nearest Neighbor Single Index Conditional Processes and k-Nearest Neighbor Single Index Conditional U-Processes Involving Functional Mixing Data. Symmetry 2024, 16, 1576. [Google Scholar] [CrossRef]
  40. Nadaraja, E.A. On a regression estimate. Teor. Verojatnost. Primenen. 1964, 9, 157–159. [Google Scholar]
  41. Watson, G.S. Smooth regression analysis. Sankhyā Ser. A 1964, 26, 359–372. [Google Scholar]
  42. Gasser, T.; Hall, P.; Presnell, B. Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998, 60, 681–691. [Google Scholar] [CrossRef]
  43. Masry, E. Nonparametric regression estimation for dependent functional data: Asymptotic normality. Stochastic Process. Appl. 2005, 115, 155–177. [Google Scholar] [CrossRef]
  44. Bradley, R.C. Introduction to Strong Mixing Conditions. Vol. 3; Kendrick Press: Heber City, UT, USA, 2007; p. xii+597. [Google Scholar]
  45. Gorodeckiĭ, V.V. The strong mixing property for linearly generated sequences. Teor. Verojatnost. Primenen. 1977, 22, 421–423. [Google Scholar]
  46. Withers, C.S. Conditions for linear processes to be strong-mixing. Z. Wahrsch. Verw. Gebiete 1981, 57, 477–480. [Google Scholar] [CrossRef]
  47. Auestad, B.; Tjøstheim, D. Identification of nonlinear time series: First order characterization and order determination. Biometrika 1990, 77, 669–687. [Google Scholar] [CrossRef]
  48. Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Matemática Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef] [PubMed]
  49. Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Statist. Theory Methods 2017, 46, 2454–2493. [Google Scholar] [CrossRef]
  50. Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Commun.-Stat.-Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
  51. Chen, R.; Tsay, R.S. Functional-coefficient autoregressive models. J. Am. Stat. Assoc. 1993, 88, 298–308. [Google Scholar] [CrossRef]
  52. Masry, E.; Tjøstheim, D. Nonparametric estimation and identification of nonlinear ARCH time series. Econom. Theory 1995, 11, 258–289. [Google Scholar] [CrossRef]
  53. Masry, E.; Tjøstheim, D. Additive nonlinear ARX time series and projection estimates. Econom. Theory 1997, 13, 214–252. [Google Scholar] [CrossRef]
  54. Dudley, R.M. A course on empirical processes. In École d’été de probabilités de Saint-Flour, XII—1982; Springer: Berlin/Heidelberg, Germany, 1984; Volume 1097, pp. 1–142. Lecture Notes in Mathematics. [Google Scholar] [CrossRef]
  55. Mayer-Wolf, E.; Zeitouni, O. The probability of small Gaussian ellipsoids and associated conditional moments. Ann. Probab. 1993, 21, 14–24. [Google Scholar] [CrossRef]
  56. Bogachev, V.I. Gaussian Measures; Mathematical Surveys and Monographs; American Mathematical Society: Providence, RI, USA, 1998; Volume 62, p. xii+433. [Google Scholar] [CrossRef]
  57. Li, W.V.; Shao, Q.M. Gaussian processes: Inequalities, small ball probabilities and applications. Handb. Stat. 2001, 19, 533–597. [Google Scholar] [CrossRef]
  58. Pollard, D. Convergence of Stochastic Processes; Springer Series in Statistics; Springer: New York, NY, USA, 1984; p. xiv+215. [Google Scholar] [CrossRef]
  59. Dudley, R.M. Uniform Central Limit Theorems, 2nd ed.; Cambridge Studies in Advanced Mathematics; Cambridge University Press: New York, NY, USA, 2014; Volume 142, p. xii+472. [Google Scholar]
  60. van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes: With Applications to Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 1996; p. xvi+508. [Google Scholar] [CrossRef]
  61. Kosorok, M.R. Introduction to Empirical Processes and Semiparametric Inference; Springer Series in Statistics; Springer: New York, NY, USA, 2008. [Google Scholar] [CrossRef]
  62. Deheuvels, P. One bootstrap suffices to generate sharp uniform bounds in functional estimation. Kybernetika 2011, 47, 855–865. [Google Scholar]
  63. Hardy, G.H. On double Fourier series, and especially those which represent the double zeta-function with real and incommensurable parameters. Quart. J. Math 1906, 37, 53–79. [Google Scholar]
  64. Krause, M. Über mittelwertsätze im gebiete der doppelsummen and doppelintegrale. Leipz. Ber 1903, 55, 239–263. [Google Scholar]
  65. Volkonskiĭ, V.A.; Rozanov, Y.A. Some limit theorems for random functions. I. Theor. Probab. Appl. 1959, 4, 178–197. [Google Scholar] [CrossRef]
  66. Kolmogorov, A.N.; Tihomirov, V.M. ε-entropy and ε-capacity of sets in functional space. Am. Math. Soc. Transl. 1961, 17, 277–364. [Google Scholar]
  67. van der Vaart, A.; van Zanten, H. Bayesian inference with rescaled Gaussian process priors. Electron. J. Stat. 2007, 1, 433–448. [Google Scholar] [CrossRef]
  68. Härdle, W.; Marron, J.S. Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist. 1985, 13, 1465–1481. [Google Scholar] [CrossRef]
  69. Rachdi, M.; Vieu, P. Nonparametric regression for functional data: Automatic smoothing parameter selection. J. Stat. Plan. Inference 2007, 137, 2784–2801. [Google Scholar] [CrossRef]
  70. Bouzebda, S.; El-hadjali, T. Uniform convergence rate of the kernel regression estimator adaptive to intrinsic dimension in presence of censored data. J. Nonparametric Stat. 2020, 32, 864–914. [Google Scholar] [CrossRef]
  71. Shang, H.L. Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density. J. Nonparametric Stat. 2014, 26, 599–615. [Google Scholar] [CrossRef]
  72. Li, Q.; Maasoumi, E.; Racine, J.S. A nonparametric test for equality of distributions with mixed categorical and continuous data. J. Econom. 2009, 148, 186–200. [Google Scholar] [CrossRef]
  73. Horowitz, J.L.; Spokoiny, V.G. An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative. Econometrica 2001, 69, 599–631. [Google Scholar] [CrossRef]
  74. Gao, J.; Gijbels, I. Bandwidth selection in nonparametric kernel testing. J. Am. Stat. Assoc. 2008, 103, 1584–1594. [Google Scholar] [CrossRef]
  75. Dawid, A.P. Conditional independence for statistical operations. Ann. Statist. 1980, 8, 598–617. [Google Scholar] [CrossRef]
  76. Dawid, A.P. Conditional independence in statistical theory. J. R. Stat. Soc. Ser. B Stat. Methodol. 1979, 41, 1–31. [Google Scholar] [CrossRef]
  77. Koller, D.; Friedman, N. Probabilistic Graphical Models; Adaptive Computation and Machine Learning Seriess; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
  78. Pearl, J. Causality, 2nd ed.; Models, Reasoning, and Inference; Cambridge University Press: Cambridge, UK, 2009; p. xx+464. [Google Scholar] [CrossRef]
  79. Zhang, K.; Peters, J.; Janzing, D.; Schölkopf, B. Kernel-based Conditional Independence Test and Application in Causal Discovery. arXiv 2012, arXiv:1202.3775. [Google Scholar] [CrossRef]
  80. Zhou, Y.; Liu, J.; Zhu, L. Test for conditional independence with application to conditional screening. J. Multivar. Anal. 2020, 175, 104557. [Google Scholar] [CrossRef]
  81. Bouzebda, S. General tests of conditional independence based on empirical processes indexed by functions. Jpn. J. Stat. Data Sci. 2023, 6, 115–177. [Google Scholar] [CrossRef]
  82. Yao, W. A bias corrected nonparametric regression estimator. Statist. Probab. Lett. 2012, 82, 274–282. [Google Scholar] [CrossRef]
  83. Karlsson, A. Bootstrap methods for bias correction and confidence interval estimation for nonlinear quantile regression of longitudinal data. J. Stat. Comput. Simul. 2009, 79, 1205–1218. [Google Scholar] [CrossRef]
  84. Ezzahrioui, M.; Ould-Saïd, E. Asymptotic normality of a nonparametric estimator of the conditional mode function for functional data. J. Nonparametric Stat. 2008, 20, 3–18. [Google Scholar] [CrossRef]
  85. Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametric Stat. 2020, 32, 452–509. [Google Scholar] [CrossRef]
  86. Arcones, M.A.; Yu, B. Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theor. Probab. 1994, 7, 47–71. [Google Scholar] [CrossRef]
  87. Giné, E.; Zinn, J. Some limit theorems for empirical processes. Ann. Probab. 1984, 12, 929–998, With discussion. [Google Scholar] [CrossRef]
  88. Le Cam, L. A remark on empirical measures. In A Festschrift for Erich Lehmann in Honor of His Sixty-Fifth Birthday; Wadsworth: Belmont, CA, USA, 1983; pp. 305–327. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Souddi, Y.; Bouzebda, S. k-Nearest Neighbour Estimation of the Conditional Set-Indexed Empirical Process for Functional Data: Asymptotic Properties. Axioms 2025, 14, 76. https://doi.org/10.3390/axioms14020076

AMA Style

Souddi Y, Bouzebda S. k-Nearest Neighbour Estimation of the Conditional Set-Indexed Empirical Process for Functional Data: Asymptotic Properties. Axioms. 2025; 14(2):76. https://doi.org/10.3390/axioms14020076

Chicago/Turabian Style

Souddi, Youssouf, and Salim Bouzebda. 2025. "k-Nearest Neighbour Estimation of the Conditional Set-Indexed Empirical Process for Functional Data: Asymptotic Properties" Axioms 14, no. 2: 76. https://doi.org/10.3390/axioms14020076

APA Style

Souddi, Y., & Bouzebda, S. (2025). k-Nearest Neighbour Estimation of the Conditional Set-Indexed Empirical Process for Functional Data: Asymptotic Properties. Axioms, 14(2), 76. https://doi.org/10.3390/axioms14020076

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop