Next Article in Journal
Evaluating Bank Efficiency in the West Balkan Countries Using Data Envelopment Analysis
Next Article in Special Issue
Statistical Inference on the Entropy Measures of Gamma Distribution under Progressive Censoring: EM and MCMC Algorithms
Previous Article in Journal
On Join-Dense Subsets of Certain Families of Aggregation Functions
Previous Article in Special Issue
A Semiparametric Bayesian Joint Modelling of Skewed Longitudinal and Competing Risks Failure Time Data: With Application to Chronic Kidney Disease
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Non-Parametric Conditional U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design

by
Salim Bouzebda
*,† and
Inass Soukarieh
Laboratory of Applied Mathematics of Compiègne(LMAC), Université de Technologie de Compiègne, CEDEX, 60 203 Compiègne, France
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(1), 16; https://doi.org/10.3390/math11010016
Submission received: 21 October 2022 / Revised: 14 December 2022 / Accepted: 16 December 2022 / Published: 20 December 2022
(This article belongs to the Special Issue Current Developments in Theoretical and Applied Statistics)

Abstract

:
Stute presented the so-called conditional U-statistics generalizing the Nadaraya–Watson estimates of the regression function. Stute demonstrated their pointwise consistency and the asymptotic normality. In this paper, we extend the results to a more abstract setting. We develop an asymptotic theory of conditional U-statistics for locally stationary random fields { X s , A n : s in R n } observed at irregularly spaced locations in R n = [ 0 , A n ] d as a subset of R d . We employ a stochastic sampling scheme that may create irregularly spaced sampling sites in a flexible manner and includes both pure and mixed increasing domain frameworks. We specifically examine the rate of the strong uniform convergence and the weak convergence of conditional U-processes when the explicative variable is functional. We examine the weak convergence where the class of functions is either bounded or unbounded and satisfies specific moment conditions. These results are achieved under somewhat general structural conditions pertaining to the classes of functions and the underlying models. The theoretical results developed in this paper are (or will be) essential building blocks for several future breakthroughs in functional data analysis.

1. Introduction

The regression problem has been studied by statisticians and probability theorists for many years, resulting in a vast array of approaches. Various themes have been covered, such as modeling, estimate method applications, tests, and other related topics. In addition to the parametric framework, in which one must estimate a finite number of parameters based on an a priori specified model structure, the non-parametric framework is devoted to data that lack a priori structural information. As inherent disadvantages, non-parametric processes are susceptible to estimation biases and reductions in convergence rates compared to parametric methods. Kernel non-parametric function estimation techniques have long been of great interest; for good references to research literature and statistical applications in this area, see [1,2,3,4,5,6] and the references therein. Even though they are widely used, they are just one of several possible approaches to building reliable function estimators. Despite their popularity, methods such as nearest neighbor, spline, neural network, and wavelet analysis are examples of these approaches. These techniques have been utilized on a vast range of different types of data. In this article, our focus will be narrowed to the development of consistent kernel-type estimators for the conditional U-statistics in the context of spatial data. Spatial data are typically generated in numerous research fields, such as econometrics, epidemiology, environmental science, image analysis, oceanography, meteorology, geostatistics, etc. These data are typically collected in various fields and treated statistically on measurement sites. Consult [7,8,9,10] as well as the references contained in these works to find reliable sources of references to the research literature in this area and discover some statistical applications. In the context of non-parametric estimation for spatial data, the existing papers are mostly concerned with estimating probability density and regression functions. Hence, we will cite some important references [11,12,13,14,15] and the references in which they are included. By considering the conditional U-processes, we give a more generic and abstract context based on this research. With many possible applications, the idea of U-statistics (introduced in a landmark work by [16]) and U-processes have attracted a great deal of interest over the past few decades. U-processes are effective for resolving intricate statistical issues: density estimation, non-parametric regression tests, and goodness-of-fit tests are among the examples. Specifically, U-processes emerge in statistics in a variety of contexts, such as the terms of higher order in von Mises expansions. In particular, U-statistics assist in the analysis of estimators and function estimators, with varying degrees of smoothness. For example, Ref. [17] aimed to analyze the product limit estimator for shortened data, so he employs almost sure uniform bounds for P -canonical U-processes. In addition, Ref. [18] introduced two novel normality tests based on U-processes. Likewise, new tests for normality that use as test statistics weighted L 1 -distances between the standard normal density and local U-statistics based on standardized data were introduced by [19,20,21]. In addition, Ref. [22] challenged the estimate of the mean of multivariate functions under the assumption of possibly heavy-tailed distributions and presented the median-of-means based on U-statistics. The applications of U-processes in various statistical applications may also include tests for functions’ qualitative features in non-parametric statistics (c.f. [23,24,25]), cross-validation for density estimation [26], and establishing the limiting distributions of M-estimators (see, e.g., Refs. [27,28,29]). Historically, Ref. [27] furnishes the necessary and sufficient criteria for the law of large numbers and the sufficient conditions for the central limit theorem for U-processes, equipped by [16,30,31], who provided (amongst others) the first asymptotic results for the case that the underlying random variables are independent and identically distributed. However, under weak dependency assumptions, asymptotic outcomes are illustrated in [32,33,34] or just lately in [35] and in a more general setting in [36,37,38,39,40,41]. The applicability of U-statistics in estimation and machine learning applications is comprehensive. We refer to the U-statistics with random kernels of divergent orders to [40,42,43,44,45]. Infinite-order U-statistics are helpful tools for creating simultaneous prediction intervals. These constructed intervals are important to quantify ensemble methods’ uncertainty such as subbagging and random forests. For additional information on the topic, c.f [46]. The MeanNN method estimation for differential entropy, which was first described by [47], is a remarkable instance of the U-statistic. A novel test statistic for goodness-of-fit tests was proposed by [48] using U-statistics. Using U-statistics, the conference [49] proposed a measure to quantify the level of clustering quality exhibited by a partition. The interested reader may refer to [50,51,52] for outstanding resources of references on the U-statistics. The book of [29] provides a profound and in-depth view of the notion of U-processes.
In this work, our primary focus is on the scenario, including spatial–functional data. We give an excerpt from [53]: “Functional data analysis (FDA) is a branch of statistics concerned with the analysis of infinite-dimensional variables such as curves, sets, and images. It has undergone phenomenal growth over the past 20 years, stimulated in part by major advances in data collection technology that have brought about the “Big Data" revolution. Often perceived as a somewhat arcane area of research at the turn of the century, FDA is now one of the most active and relevant fields of investigation in data science.” The reader is directed to the works of reference [54,55] for an overview of this subject area. These references include fundamental approaches to functional data analysis and a wide range of case studies from diverse disciplines, such as criminology, economics, archaeology, and neurophysiology. It is important to note that the extension of probability theory to random variables taking values in normed vector spaces (for example, Banach and Hilbert spaces), including extensions of certain classical asymptotic limit theorems, predates the recent literature on functional data; the reader is referred to the book [56] for more information on this topic. Considering density and mode estimates for data with values in a normed vector space is the focus of the work presented by [57]. The problem of the curse of dimensionality, which occurs when functional data have too many dimensions, is discussed in this study, along with potential solutions to the issue. According to [55], non-parametric models were deemed useful in regression estimation. We could also refer to [58,59,60].
Modern empirical process theory has recently been applied to processing functional data. Ref. [61] provided the consistency rates of several conditional models, such as the regression function, the conditional cumulative distribution, the conditional density, and others, uniformly over a subset of the explanatory variable. Ref. [62] extended [63]’s UIB consistency to the ergodic setting. Ref. [64] considered the problem of local linear estimation of the regression function when the regressor is functional and showed strong convergence, with specified rates, uniformly in bandwidth parameters. Ref. [65] examined the k-nearest neighbors (kNN) estimate of the non-parametric regression model for strong mixing functional time series data and determined the uniform, almost complete convergence rate of the kNN estimator under some mild conditions. Ref. [66] treated the ergodic data and offered a variety of results related to the limiting distribution for the conditional mode in the functional setting; for recent references, c.f [38,67,68,69,70,71,72].
Ref. [73] raised a class of estimators for r ( m ) ( φ , t ) , known as conditional U-statistics, attempted to generalize the Nadaraya–Watson regression function estimations. Foremost, we present Stute’s estimators. Consider the regular sequence of random elements { ( X i , Y i ) , i N } with X i R d and Y i Y some polish space and N = N { 0 } . Let φ : Y m R be a measurable function. In this study, the estimation of the conditional expectation, or regression function, is our primary concern:
r ( m ) ( φ , t ) = E φ ( Y 1 , , Y m ) ( X 1 , , X m ) = t , for t R d m ,
whenever it exists, i.e.,
E φ ( Y 1 , , Y m ) < .
We now introduce a kernel function K : R d R with support contained in [ B , B ] d , B > 0 , satisfying:
sup x R d | K ( x ) | = : κ < and K ( x ) d x = 1 .
Hence, the class of estimators for r ( m ) ( φ , t ) , given by [73], is defined, for each t R d m , as follows:
r ^ n ( m ) ( φ , t ; h n ) = i I n m φ ( Y i 1 , , Y i m ) K t 1 X i 1 h n K t m X i m h n i I n m K t 1 X i 1 h n K t m X i m h n ,
where
I n m = i = ( i 1 , , i m ) : 1 i j n and i j i r if j r ,
denotes the set of all m-tuples of different integers i j between 1 and n and { h n : = h n } n 1 is a sequence of positive constants that converge to zero with rate n h n m .
For m = 1 , the r ( m ) ( φ , t ) becomes
r ( 1 ) ( φ , t ) = E ( φ ( Y ) | X = t )
and the estimate of Stute will be transformed to the Nadaraya–Watson estimator of r ( 1 ) ( φ , t ) .
Behind, Ref. [74] aimed to estimate the rate of uniform convergence in t of r ^ n ( m ) ( φ , t ; h n ) to r ( m ) ( φ , t ) . Meanwhile, the study of [75] developed the limit distributions of r ^ n ( m ) ( φ , t ; h n ) , by discussing and contrasting the findings of Stute. Correspondingly, under appropriate mixing conditions, Ref. [76] spread the results of [73] to weakly dependent data and employed their findings to validate the Bayes risk consistency of the relevant discrimination rules. Ref. [77] suggested symmetrized nearest neighbor conditional U-statistics as alternatives to conventional kernel estimators. Ref. [78] took into consideration the functional conditional U-statistic and established the finite-dimensional asymptotic normality. Nevertheless, the non-parametric estimate of the conditional U-statistics in the functional data framework had not received significant attention, despite the subject’s relevance. Some recent developments are discussed in references [79,80], in which the authors examine the challenges associated with maintaining a uniform bandwidth in a general framework. The test of independence in the functional framework was based on the Kendall statistics, which may be thought of as examples of the U-statistics; for instance, see [81]. The extension of the investigation described above to conditional empirical U-processes is theoretically attractive, practically helpful, and technically challenging.
The primary objective of this study is to examine a general framework and the weak convergence’s characterization of the regular sequence of random spatial functions based on conditional U-processes. This inquiry is simple, as it is difficult to hold the asymptotic equicontinuity under minimal conditions in this general setting, which constitutes a fundamentally unresolved open subject in the literature. We intend to fill this gap in the literature by merging the findings of [37,82,83] with techniques handling the functional data given in [84,85,86,87]. However, as demonstrated in the following section, the challenge requires much more than “just” merging concepts from the current outcomes. In fact, complex mathematical derivations are necessary to deal with the typical functional data in our context. This requires the successful application of large-sample theoretical tools, which have been established for empirical processes, where we used the results of the work of [37,82,83].
The structure of the present article is as follows. Section 2 introduces the functional framework and the definitions requested in our work. The assumptions used in our asymptotic analysis go along with a brief discussion. Section 3 gives the uniform rates of the strong convergence. Section 4 includes the paper’s main results concerning the uniform weak convergence for the conditional U-processes. In Section 5, we provide some potential applications. In Section 6, we consider the conditional U-statistics in the right censored data framework. In Section 7, we present how to select the bandwidth through the cross-validation procedures. Some concluding remarks and possible future developments are relegated to Section 8. All proofs are gathered in Section 9 to prevent interrupting the presentation flow. Finally, some relevant technical results are given in Appendix A.

2. The Functional Framework

2.1. Notation

For any set A R d , | A | represents the Lebesgue measure of A and [ [ A ] ] denotes the number of elements in A. For any positive sequence a n , b n , we write a n b n if a constant C > 0 independent of n exists such that a n C b n for all n, a n b n if a n b n and b n a n , and a n b n si a n / b n 0 as n . We use the notation d to indicate convergence in the distribution. We write X = d Y if the random variables X and Y have the same distribution. P S will denote the joint probability distribution of the sequence of independent and identically distributed (i.i.d.) random vectors { S 0 , j } j 1 , and P . | S is the conditional probability distribution for { S 0 , j } j 1 . Let E . | S represent the conditional expectation and Var . | S represent the variance for { S 0 , j } j 1 .

2.2. Generality on the Model

In this investigation, we examine the following model:
φ ( Y s i 1 , A n , , Y s i m , A n ) = r ( m ) φ , X s i 1 , A n , , X s i m , A n , s i 1 A n , , s i m A n + j = 1 m σ s i j A n , X ϵ i j = r ( m ) φ , X s i 1 , A n , , X s i m , A n , s i 1 A n , , s i m A n + j = 1 m ϵ s i j , A n , s i j R n , j = 1 , , m ,
where E [ ϵ s , A n | X s , A n ] = 0 and R n = [ 0 , A n ] d R d denotes a sampling region with A n as n . Here, Y s j , A n and X s , A n denote random functions in H and Y . We consider { X s , A n : s R n } as a locally stationary random function field on R n R d ( d 2 ). As suggested by [88], locally stationary processes are nonstationary time series in which the parameters of the time series can change over time. Locally in time, they can be modeled by a stationary time series, which makes it possible to use asymptotic theories to estimate the parameters of models that depend on time. Time series analyses mostly look at locally stationary models in a parametric framework with coefficients that change over time.

2.3. Local Stationarity

A random function field { X s j , A n : s R n } ( A n as n ) is considered to be locally stationary if it exhibits behavior that is approximately stationary in the local space. To guaranteed that it is locally stationary around each rescaled space point u , a process { X s , A n } can be approximated by a stationary random function field { X u ( s ) : s R d } stochastically; for instance, see [89]. The following is one possible way to define this idea.
Definition 1.
The H -valued stochastic process { X s , A n : s R n } denotes locally stationary if for each rescaled time point u [ 0 , 1 ] d , there exists an associated H -valued process { X u ( s ) : s R d } with the following properties:
(i) 
{ X u ( s ) : s R d } denotes strictly stationary.
(ii) 
It holds that
d X s , A n , X u ( s ) s A n u 2 + 1 A n d U s , A n ( u ) a . s . ,
where { U s , A n ( u ) } denotes a process of positive variables satisfying E [ ( U s , A n ( u ) ) ρ ] < C for some ρ > 0 , C < ; C is independent of u , s , and A n . . 2 is arbitrary norms of R d .
The concept of local stationarity for real-valued time series was first presented by [88], and Definition 1 is a natural extension of that idea.
In addition, the definition we offer is the same as that of [90] (Definition 2.1) when H is the Hilbert space L R 2 ( [ 0 , 1 ] ) of all real-valued functions that are square integrable with respect to the Lebesgue measure on the unit interval [ 0 , 1 ] with the L 2 -norm given by
f 2 = f , f , f , g = 0 1 f ( t ) g ( t ) d t ,
where f , g L R 2 ( [ 0 , 1 ] ) . In addition to this, the authors provide necessary conditions so that an L R 2 ( [ 0 , 1 ] ) -valued stochastic process { X t , T } satisfies (5) with d ( f , g ) = f g 2 and ρ = 2 .

2.4. Sampling Design

We are going to look at the stochastic sampling strategy in order to accommodate the data that are irregularly spaced. First, define R n as the sampling region. Let { A n } n 1 be a sequence of positive numbers such that A n as n . We consider the sampling region as follows:
R n = [ 0 , A n ] d .
We will discuss the (random) sample designs we will use. Let f S ( s 0 ) be a continuous, everywhere positive probability density function on R 0 = [ 0 , 1 ] d , and let { S 0 , j } j 1 be a sequence of i.i.d. random vectors with probability density f S ( s 0 ) such that { S 0 , j } j 1 and { X s , A : s R n } share a common probability space ( Ω , F , P ) and are independent. The realizations s 0 , 1 , , s 0 , n of random vectors S 0 , 1 , , S 0 , n by the following relation:
s j = A n s 0 , j , j = 1 , , n .
gives the sampling sites s 1 , , s n
Herein, we assume that n A n d as n .
Remark 1.
In practice, A n can be derived by taking the sampling region’s diameter. We can extend the applicability of the assumption (6) to R n to a broader range of situations, i.e.,
R n = j = 1 d [ 0 , A j , n ] ,
where A j , n are sequences of positive constants with A j , n as n . To avoid more challenging outcomes, we assumed (6). For additional discussion, please refer to [85,87,91,92,93] and ([94], Chapter 12).

2.5. Mixing Condition

The sequence Z 1 , Z 2 , is said to be β -mixing or absolute regular, refer to [95,96], if:
β ( k ) : = E sup l 1 P A | σ 1 l P A : A σ l + k 0 as k .
Notably, Ref. [97] produced a comprehensive description of stationary Gaussian processes matching the last condition. Now, we define β -mixing coefficients for a random function field X ˜ . Let σ X ˜ ( T ) = σ ( { X ˜ ( s ) : s T } ) be the σ -field generated by variables { X ( ˜ s ) : s T } , T R d . For subsets T 1 and T 2 of R d , let
β ¯ ( T 1 , T 2 ) = sup 1 2 j = 1 J k = 1 K | P ( A j B k ) P ( A j ) P ( B k ) | ,
where the supremum is taken over all pairs of (finite) partitions { A 1 , , A J } and { B 1 , , B K } of R d such that A j σ X ˜ ( T 1 ) and B k σ X ˜ ( T 2 ) . Furthermore, let
d ( T 1 , T 2 ) = inf { | x y | : x T 1 , y T 2 } ,
where | x | = j = 1 d | x j | for x R d , and let R ( b ) be the collection of all finite disjoint unions of cubes in R d with a volume total not exceeding b. Subsequently, the β -mixing coefficients for the random field X ˜ can be defined as
β ( a ; b ) = sup { β ¯ ( T 1 , T 2 ) : d ( T 1 , T 2 ) a , T 1 , T 2 R ( b ) } .
We assume that a non-increasing function β 1 with lim a β 1 ( a ) = 0 and a non-decreasing function g 1 exist such that the β -mixing coefficient β ( a ; b ) satisfies the following inequality:
β ( a ; b ) β 1 ( a ) g 1 ( b ) , a > 0 , b > 0 ,
where g 1 may be unbounded for d 2 .
Remark 2
(Some remarks about mixing conditions). The size of index sets T 1 and T 2 in the definition of β ( a ; b ) must be restricted. Let us explain this point. If the β-mixing coefficients of a random field X are defined similarly to the β-mixing coefficients for the time series as follows: Let O 1 and O 2 be half-planes with boundaries L 1 and L 2 , respectively. For each real number a > 0 , define
β ( a ) = sup β ¯ O 1 , O 2 : d O 1 , O 2 a ,
where sup is taken over all pairs of parallel lines L 1 and L 2 such that d L 1 , L 2 a . Subsequently, ([98] Theorem 1) shows that if X ( s ) : s R 2 is a strictly stationary mixing random field, and a > 0 is a real number. Then, β ( a ) = 1 or 0. This means that if a random field X is β-mixing ( lim a β ( a ) = 0 ) , then for η, a positive constant and for some a > η , the random field X is “m-dependent”, i.e., β ( a ) = 0 . However, this is highly restricted in practice. In order to loosen these results and make them more flexible for practical purposes, it will be necessary to restrict the size of T 1 and T 2 and adopt Definition 7 for the β-mixing. We refer to [85,87,99,100,101] for additional information on mixing coefficients for random fields.
Ref. [93] writes the form of mixing condition given in Equation (8) for the α -mixing condition and it was considered also in the works of [102,103]. We have considered the β -mixing case, and it is well known that the β -mixing implies the α -mixing. In general, in the expression (8) β 1 is a function defined in a way that it could be dependent on n as the random field X s , A n depends on n, yet, g does not, just for the simplicity sake, despite that the general cases where g changes with n are not difficult. We note that the random field Y s , A n ( o r φ ( Y s , A n ) ) does not necessarily satisfy the mixing condition (8), since the mixing condition is assumed for X s , A n , but with the regression form represented by the model in (4), Y s , A n ( o r φ ( Y s , A n ) ) may have a flexible dependence structure.

2.6. Generality on the Model

Let { X s , A n , Y s , A n : s R n } be random variables where Y s , A n is in Y and X s , A n takes values in some semi-metric space H with a semi-metric d ( · , · ) (a semi-metric (sometimes called pseudo-metric) d ( · , · ) is a metric which allows d ( x 1 , x 2 ) = 0 for some x 1 x 2 ) defining a topology to measure the proximity between two elements of H and which is dissociated from the definition of X in order to prevent concerns with measurability. This study aims to establish the weak convergence of the conditional U-process using the following U-statistic.
r ^ n ( m ) ( x , u ; h n ) : = r ^ n ( m ) ( φ , x , u ; h n ) = i I n m φ ( Y s i 1 , A n , , Y s i m , A n ) j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n = i I n m φ ( Y s i 1 , A n , , Y s i m , A n ) j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n ,
where
I n m : = i = ( i 1 , , i m ) : 1 i j n and i r i j if r j , K ¯ ( u ) = = 1 d K 1 ( u ) ,
and φ : Y m R is a symmetric, measurable function that belongs to some class of functions F m , and h n n N is a sequence of positive real numbers satisfying h n 0 as n . In order to examine the weak convergence of the conditional empirical process and the conditional U-process under functional data, we must introduce new notations, let
F m = { φ : Y m R } ,
be a point-wise measurable class of real-valued symmetric measurable functions on Y m with a measurable envelope function:
F ( y ) sup φ F m | φ ( y ) | , for y Y m .
For a kernel function K ( · ) , we define the point-wise measurable class of functions, for 1 m n
K m : = ( x 1 , , x m ) i = 1 m K d ( x i , · ) h n , 0 < h n < 1 and ( x 1 , , x m ) H m .
We use the notation
ψ ( · , · ) F m K m : = φ 1 ( · ) φ 2 ( · ) : φ 1 F m , φ 2 ( · ) K m ,
and
ψ ( · , · ) F 1 K 1 : = F K = φ 1 ( · ) φ 2 ( · ) : φ 1 F 1 , φ 2 ( · ) K 1 .

2.6.1. Small Ball Probability

In the absence of a universal reference measure, such as the Lebesgue measure, the density function of the functional variable does not exist, which is one of the technical challenges in infinite-dimensional spaces. To circumvent this obstacle, we employ the concept of “small-ball probability”. The function ϕ x ( · ) precisely controls the concentration of the probability measure of the functional variable on a small ball, which is defined, for a fixed x H for all r > 0 , by:
P X B ( x , r ) = : ϕ x ( r ) > 0 ,
where the space H is equipped with the semi-metric d ( . , . ) , and:
B ( x , r ) = z H : d ( z , x ) r
is a ball in H with the center x and radius r .

2.6.2. VC-Type Classes of Functions

The asymptotic analysis of functional data is related to concentration properties expressed in terms of the small-ball probability concept. When considering a process indexed by a class of functions, one must also account for other topological concepts, including metric entropy and VC-subgraph classes (“VC” for Vapnik and Červonenkis).
Definition 2.
Let S E denote a subset of a semi-metric space E and N ε a positive integer, a finite set of points { e 1 , , e N ε } E is called, for a given ε > 0 , a ε-net of S E if:
S E j = 1 N ε B ( e j , ε ) .
If N ε ( S E ) denotes the cardinality of the smallest ε-net (the minimal number of open balls of radius ε) in E , needed to cover S H , then we call Kolmogorov’s entropy (metric entropy) of the set S E the quantity
ψ S E ( ε ) : = log N ε ( S E ) .
From its name, one can deduce that Kolmogorov invented this idea of metric entropy (cf. Ref. [104]), which was then explored for different metric spaces. This concept was utilized by [105] to provide necessary conditions for the continuity of Gaussian processes. It served as the foundation for remarkable expansions of Donsker’s theorem on the weak convergence of empirical processes. B H and S H represent two subsets of the space H with Kolmogorov’s entropy (for the radius ε ) ψ B H ( ε ) and ψ S H ( ε ) , respectively, then the Kolmogorov entropy for the subset B H × S H of the semi-metric space H 2 by:
ψ B H × S H ( ε ) = ψ B H ( ε ) + ψ S H ( ε ) .
Hence, m ψ S H ( ε ) is the Kolmogorov entropy of the subset S H m of the semi-metric space H m . We specify by d the semi-metric on H ; then, this semi-metric defined on H m by:
d H m x , y : = 1 m d x 1 , y 1 + + 1 m d x m , y m
for
x = ( x 1 , , x m ) , y = ( y 1 , , y m ) H m .
In this type of study, the semi-metric plays a crucial role. The reader can discover helpful discussions on how to select the semi-metric in [55] (see Chapter 3 and Chapter 13). We must additionally consider another topological term: namely, VC-subgraph classes (“VC” for Vapnik and Červonenkis).
Definition 3.
We call a class of subsets C on a set C a VC-class if there exists a polynomial P ( · ) such that, for every set of N ε points in C, the class C picks out at most P ( N ε ) distinct subsets.
Definition 4.
A class of functions F is called a VC-subgraph class if the graphs of the functions in F form a VC-class of sets, i.e., if we define the subgraph of a real-valued function f on S as the following subset G f on × R :
G f = { ( s , t ) : 0 t f ( s ) o r f ( s ) t 0 }
the class { G f : f F } is a VC-class of sets on S × R . Informally, VC-class functions are identified by their polynomial covering number (the minimal number of required functions to make a covering on the entire class of functions).
A VC-class of functions F with envelope function F have the following entropy property, for a given 1 q < , there are constants a and b such as:
N ( ϵ , F , . L q ( Q ) ) a ( Q F q ) 1 / q ϵ b
for any ϵ > 0 and each probability measure such that Q F q < . For instance, the following references ([26], Lemma 22), ([106], §4.7), ([107], Theorem 2.6.7), ([108], §9.1) provide a number of sufficient conditions under which (13) holds; refer to ([109], §3.2) for further discussions.

2.7. Conditions and Comments

Assumption 1.
(Model and distribution assumptions)
(M1) 
The H -valued stochastic process { X s , A n : s R n } is locally stationary. Hence, for each time point u [ 0 , 1 ] d , a strictly stationary process { X u ( s ) : s R d } exists such that for . an arbitrary norm on R d ,
d X s , A n , X u ( s ) s A n u 2 + 1 A n d U s , A n ( u ) a . s . ,
with E [ ( U s , A n ( u ) ) ρ ] < C for some ρ 1 and C < that is independent of u , s and A n .
(M2) 
For i = 1 , , m , let B ( x i , h ) = { y H : d ( x i , y ) h } be a ball centered at x i H with radius h, and let c d < C d be positive constants. For all u [ 0 , 1 ] d ,
ϕ x ( h n ) : = P X u ( s 1 ) B ( x 1 , h n ) , , X u ( s m ) B ( x m , h n ) = F u ( h n , x 1 , , x m )
satisfies:
0 < c d ϕ ( h ) f 1 ( x ) ϕ x ( h ) C d ϕ ( h ) f 1 ( x ) ,
where ϕ ( h ) 0 as h , and f 1 ( x ) is a non-negative functional in x H m . Moreover, there exist constants C ϕ > 0 and ε 0 > 0 such that for any 0 < ε < ε 0 ,
0 ε ϕ ( u ) d u > C ϕ ε ϕ ( ε ) .
(M3) 
Let X s , A n = ( X s m , A n , , X s 1 , A n ) and X v , A n = ( X v 1 , A n , , X v m , A n ) and B ( x , h ) = i = 1 m B ( x i , h ) . Assume
sup s , x , A n sup s v P ( ( X s , A n , X v , A n ) B ( x , h ) × B ( x , h ) ) ψ ( h ) f 2 ( x ) ,
where ψ ( h ) 0 as h 0 , and f 2 ( x ) is a non-negative functional in x H m . We assume that the ratio ψ ( h ) / ϕ 2 ( h ) is bounded.
(M4) 
σ : [ 0 , 1 ] × H m R is bounded by some constant C σ < from above and by some constant c σ > 0 from below, that is, 0 < c σ σ ( u , x ) C σ < for all u and x .
(M5) 
σ ( . , . ) is Lipschitz continuous with respect to u .
(M6) 
sup u [ 0 , 1 ] m sup z : d ( x , z ) h | σ ( u , x ) σ ( u , z ) | = o ( 1 ) as h 0 .
(M7) 
r ( m ) ( u , x ) is Lipschitz, and it satisfies
sup u [ 0 , 1 ] | r ( m ) ( u 1 , x ) r ( m ) ( u 2 , z ) | c m d H m x , z α + u 1 u 2 α
for some c m > 0 and α > 0 and the semi-metric d H m x , z is defined on H m by:
d H m x , z : = 1 m d x 1 , z 1 + + 1 m d x m , z m
for x = ( x 1 , , x m ) , z = ( z 1 , , z m ) H m , and it is twice continuously partially differentiable with first derivatives
u i r ( m ) ( u , x ) = u i r ( m ) ( u , x ) ,
and second derivatives
u i u j 2 r ( m ) ( u , x ) = 2 u i u j r ( m ) ( u , x ) .
Assumption 2.
(Kernel assumptions)
(KB1) 
The kernel K 2 ( · ) is non-negative, bounded by κ ˜ , and has support in [ 0 , 1 ] such that 0 < K 2 ( 0 ) and K 2 ( 1 ) = 0 . Moreover, K 2 ( v ) = d K 2 ( v ) / d v exists on [ 0 , 1 ] and satisfies C 1 K 2 ( v ) C 2 for two real constants < C 1 < C 2 < 0 .
(KB2) 
The kernel K ¯ : R d [ 0 , ) is bounded and has compact support [ C , C ] d . Moreover,
[ C , C ] d K ¯ ( x ) d x = 1 , [ C , C ] d x α K ¯ ( x ) d x = 0 , for any α Z d with | α | = 1 ,
and | K ¯ ( u ) K ¯ ( v ) | C u v .
(KB3) 
The bandwidth h converges to zero at least at a polynomial rate; that is, there exists a small ξ 1 > 0 such that h C n ξ 1 for some constant 0 < C < .
Assumption 3.
(Sampling design assumptions)
(S1) 
For any α N d with | α | = 1 , 2 , α f S ( s ) exists and is continuous on ( 0 , 1 ) d .
(S2) 
C 0 n A n d C 1 n η 1 for some C 0 , C 1 > 0 and small η 1 ( 0 , 1 ) .
Assumption 4.
(Block decomposition assumptions)
(B1) 
Let A 1 , n n 1 and A 2 , n n 1 be two sequences of positive numbers such that A 1 , n , A 2 , n , A 2 , n = o A 1 , n , and A 1 , n = o A n , or A 1 , n A n + A 2 , n A 1 , n C 0 1 n η 0 for some C 0 > 0 and η > 0 .
(B2) 
We have lim n n A n d = κ ( 0 , ] with A n n κ ¯ for some κ ¯   >   0 .
(B3) 
We have
1 n h m d ϕ ( h ) 1 / 3 A 1 , n A n 2 d / 3 A 2 , n A 1 , n 2 / 3 g 1 1 / 3 A 1 , n d k = 1 A n / A 1 , n k d 1 β 1 1 / 3 k A 1 , n + A 2 , n 0 .
(B4) 
We have lim n A n d A 1 , n d β A 2 , n ; A n d = 0 .
Assumption 5.
(Regularity conditions) Let α n = log n / n h m d ϕ ( h ) . As n ,
(R1) 
h ( m d ) ϕ ( h ) 1 α n m d A n d A 1 , n d β ( A 2 , n ; A n d ) 0 and A 1 , n d A n d n h m d ϕ ( h ) ( log n ) 0 ,
(R2) 
n 1 / 2 h ( m d ) / 2 ϕ ( h ) 1 / 2 / A 1 , n d n 1 / ζ C 0 n η for some 0 < C 0 < and η > 0 and ζ > 2 .
(R3) 
A n d p ϕ ( h ) , where p is defined in the sequel.
Assumption 6.
(E1) 
For W s i , A n = j = 1 m ϵ s i j , A n , it holds that sup x H m E | W s , A n | ζ C and
sup x H m E | W s , A n | ζ X i , n = x C
for ζ > 2 and C < .
(E2) 
The β-mixing coefficients of the array X s , A n , W s , A n satisfy β ( a ; b ) β 1 ( a ) g 1 ( b ) with β 1 ( a ) 0 as a .
Assumption 7.
(Class of functions assumptions)
The classes of functions K m and F m are such that:
(C1) 
The class of functions F m is bounded, and its envelope function satisfies for some 0 < M < :
F ( y ) M , y Y m .
(C2) 
The class of functions F m is unbounded and its envelope function satisfies for some p > 2 :
θ p : = sup t S H m E F p ( Y ) x = x < .
(C3) 
The metric entropy of the class F m K m satisfies, for some 2 < p < :
0 ( log N ( u , F m K m , L 1 ( P m ) ) ) 1 2 d u < , 0 ( log N ( u , F m K m , L 2 ( P m ) ) ) ) 1 2 d u < , 0 ( log N ( u , F m K m , L p ( P m ) ) ) 1 2 d u < .

Comments

When it comes to functional data, traditional statistical methods are entirely ineffective. In our non-parametric functional regression model, we took on the complex theoretical challenge of establishing functional central limit theorems for the conditional U-process under functional absolute regular data and in the context of a two-fold situation. This was accomplished by adopting a two-fold framework. Despite this, the imposed assumptions coincide with some properties of the infinite-dimensional space. These properties include the topological structure on H m , the probability distribution of X , and the measurability concept for the classes F m and K m ; consequently, a discussion regarding the aforementioned assumptions is necessary. The majority of these assumptions were motivated by [37,55,57,84,85,86,110]. Assumption 1 is the beginning of a formalization of the property of X i to be locally stationary, and we continue by placing certain restrictions on the distribution behavior of the variables. This allows us to formalize the property in a more precise manner. The condition (M1) refers to the idea of a locally stationary time series, and various random fields can fulfill this requirement. Ref. [85] gave us some examples, and particularly, he proved that this condition is satisfied for locally stationary versions of Lévy-driven moving average random fields. Condition (M2) has been adopted by [84], who in turn was inspired by [57] in his non-parametric density estimation under functional observations. Ref. [84] clarifies that if H m = R m , then the condition overlaps with the fundamental axioms of probability calculus; furthermore, if H m is an infinitely dimensional Hilbert space, then ϕ ( h n ) can drop toward 0 by an exponential speed as n . Equation (15) controls the behavior of the small ball probability around zero and is the quite usual condition on the small ball probability. This approximately shows that the small ball probability can be written approximately as the product of two independent functions ϕ m ( · ) and f 1 ( · ) ; for instance, for m = 1 , refer to [111] for the diffusion process, Ref. [112] for a Gaussian measure, Ref. [113] for a general Gaussian process and [84] employed these assumptions for strongly mixing processes. For example, the function ϕ ( · ) can be expressed as ϕ ( ϵ ) = ϵ δ exp ( C / ϵ a ) with δ 0 and a 0 , and it corresponds to the Ornstein–Uhlenbeck and general diffusion processes (for such processes, a = 2 and δ = 0 ) and the fractal processes (for such processes, δ > 0 and a = 0 ). We refer to the paper of [114] for other examples. Conditions (M4), (M5), (M6) and Assumption 2 represent the regularity conditions, and they are the umbrella that covers the limiting theorems of such a process. Due to the sampling design strategy employed in Section 2.4, a non-uniform density is possible across the sampling region, by which the number of sampling sites is enabled to increase at different rates with respect to the region’s volume O ( A n d ) . This sampling design allows the pure increasing domain case lim n n A n d = κ ( 0 , ) and the mixed increasing domain case ( lim n n A n d = . Assumption 3 is assumed to address this sampling design and the infill sampling criteria in the stochastic design case, which can also be seen in [94,115]. In addition to the non-uniform possibility of the sampling density, an approach for irregularly spaced sampling sites based on a homogeneous Poisson point process was discussed in ([10], Chapter 8), where the sampling sites must have a uniform distribution over the sampling region. This makes the sampling design used in this work more flexible than the homogeneous Poisson point process and more useful for practical applications. Condition (B1) in Assumption 4 is related to the Blocking technique used to decompose the sampling region R n into big and small blocks. The sequences A 1 , n and A 2 , n are related to the large-block–small-block argument, respectively, which is commonly used in proving CLTs for sums of mixing random variables; see [94]. Precisely, A 1 , n corresponds to the side length of large blocks, while A 2 , n corresponds to the side length of small blocks. Furthermore, Assumptions 6 help for deriving the weak convergence of the conditional U-statistic ψ ^ defined in Section 3. Condition (C1) says that we are dealing with bounded functions, but we are also interested in establishing the functional central limit theorem for conditional U-processes indexed by an unbounded class of functions; therefore, this condition will be replaced by (C2). Each of these generic assumptions is sufficiently weak in connection to the many objects described in our preliminary results. They discuss and utilize the four key axes of this work, which are the topological structure of the functional variables, the probability measure in this functional space, the idea of measurability on the class of functions, and the uniformity governed by the entropy characteristics.
Remark 3.
Note that Assumption (C2) in 7 might be substituted by more general hypotheses upon moments of Y as in [109]. That is
(C4)
We denote by { M ( x ) : x 0 } a non-negative continuous function, increasing on [ 0 , ) , and such that, for some s > 2 , ultimately as x ,
x s M ( x ) ; x 1 M ( x ) .
For each t M ( 0 ) , we define M i n v ( t ) 0 by M ( M i n v ( t ) ) = t . Assuming further that:
E ( M ( F ( Y ) ) ) < .
The following choices of M ( · ) are of particular interest:
(i) 
M ( x ) = x p for some p > 2 ;
(ii) 
M ( x ) = exp ( s x ) for some s > 0 .

3. Uniform Convergence Rates for Kernel Estimators

Before expressing the asymptotic behavior of our estimator represented in (9), we will generalize the study to a U-statistic estimator defined by:
ψ ^ ( u , x ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n ,
where W s i , A n is an array of one-dimensional random variables. In this study, we use the results with W s i , A n = 1 and W s i , A n = j = 1 m ϵ s i j , A n .

3.1. Hoeffding’s Decomposition

Note ψ ^ ( u , x ) is a standard U-statistic with a kernel depending on n. We define
ξ j : = 1 h d K ¯ u j s i j / A n h n , H ( Z 1 , , Z m ) : = j = 1 m 1 ϕ ( h ) K 2 d ( x j , X s i j , A n ) h n W s i , A n ,
thus, the U-statistic in (19) can be viewed as a weighted U-statistic of degree m:
ψ ^ ( u , x ) = ( n m ) ! n ! i I n m ξ i 1 ξ i m H ( Z i 1 , , Z i m ) .
We can write Hoeffding’s decomposition in this case. If we will not assume symmetry for W s i , A n or H, we must define:
  • The expectation of H ( Z i 1 , , Z i m ) :
    θ ( i ) : = E H ( Z i 1 , , Z i m ) = W s i , A n j = 1 m 1 ϕ ( h ) K 2 d ( x j , ν s i j , A n ) h n d P i ( z i ) .
  • For all { 1 , , m } the position of the argument, construct the function π such that:
    π ( z ; z 1 , , z m 1 ) : = ( z 1 , , z 1 , z , z , , z m 1 ) .
  • Define:
    H ( ) z ; z 1 , , z m 1 : = H π z ; z 1 , , z m 1 θ ( ) i ; i 1 , i 2 , , i m 1 : = θ π i ; i 1 , i 2 , , i m 1 .
Hence, the first order expansion of H ( · ) will be seen as:
H ˜ ( ) z : = E H ( ) z , Z 1 , , Z m 1 = W s ( 1 , , 1 , i , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d ( x j , ν s j , A n ) h × 1 ϕ ( h ) K 2 d ( x i , ν s i , A n ) h × P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) : = 1 ϕ ( h ) K 2 d ( x i , X s i , A n ) h W s i , A n × W s ( 1 , , 1 , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d ( x j , ν s j , A n ) h × P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) ,
with P as the underlying probability measure, and define
f i , i 1 , , i m 1 ( ) : = = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ ( ) z θ ( ) i ; i 1 , , i m 1 .
Then, the first-order projection can be defined as:
H ^ 1 , i ( u , x ) : = ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) f i , i 1 , , i m 1 ( ) ,
where
I n 1 m 1 ( i ) : = 1 i 1 < < i m 1 n and i j i for all j { 1 , , m 1 } .
For the remainder terms, we denote by i i : = ( i 1 , , i l 1 , i l + 1 , , i m ) and for { 1 , , m } , let
H 2 , i ( z ) : = H ( z ) l = 1 m H ˜ i i ( ) ( z ) + ( m 1 ) θ ( i ) ,
where
H ˜ i i ( ) ( z ) = E H Z 1 , , Z 1 , z , Z + 1 Z m 1 ,
defined in (22), this projection derives us to the following remainder term:
ψ ^ 2 ( u , x ) : = ( n m ) ! ( n ) ! i I n m ξ i 1 ξ i m H 2 , i ( z ) .
Finally, using Equations (24) and (26), and under conditions that:
E H ^ 1 , i ( u , X ) = 0 ,
E H 2 , i ( Z Z k ) = 0 a.s. ,
we obtain the [16] decomposition:
ψ ^ ( u , x ) E ψ ^ ( u , x ) = 1 n i = 1 n H ^ 1 , i ( u , x ) + ψ ^ 2 , i ( u , x ) : = ψ ^ 1 ( u , x ) + ψ ^ 2 ( u , x ) .

3.2. Strong Uniform Convergence Rate

We start by giving the following general result concerning the rate of convergence of the U-process presented in (19).
Proposition 1.
Let F m K m be a measurable VC-subgraph class of functions satisfying Assumption 7 and assume that Assumptions 2, 3, and Condition (B1) in Assumptions 4–6 are also satisfied. Then, the following result holds
sup F m K m sup x H m sup u [ 0 , 1 ] m ψ ^ ( u , x ) E [ ψ ^ ( u , x ) ] = O P . S log n n h m d ϕ m ( h ) P S a . s .
Next, the uniform rate of convergence of the estimator (9) of the mean function r ( m ) in the model (4) will be given, using the results of the last proposition.
Theorem 1.
Let F m K m be a measurable VC-subgraph class of functions satisfying Assumption 7. Let I h = [ C 1 h , 1 C 1 h ] d m and let S c be a compact subset of H m . Suppose that
inf u [ 0 , 1 ] d f ( u ) > 0 .
Then, under Assumptions 1–3, Condition (B1) in Assumptions 4–6 (with W s i , A n = 1 and ϵ s i , A n ), the following result holds for P S almost surely:
sup F m K m sup x S c sup u I h r ^ n ( m ) ( x , u ; h n ) r ( m ) ( x , u ; h n ) = O P . | S log n / n h m d ϕ m ( h ) + h 2 α + 1 A n d p ϕ ( h ) ,
where p = min { 1 , ρ } , and ρ > 0 given in Definition 1.
It is worth to note here that the approximation of the functional time series X s , A n by a functional stationary random field X u ( s ) provides the error term A n d p ϕ 1 ( h ) .

4. Weak Convergence for Kernel Estimators

In this section, we are interested in studying the weak convergence of the conditional U-process, defined by Equation (9), under absolute regular observations. The following theorem represents the main result in this work concerning the weak convergence of the functional locally stationary random field estimator. Let us define, for φ 1 , φ 2 F m
σ ( φ 1 , φ 2 ) = E . S ( n h m d ϕ m ( h ) ( r ^ n ( m ) ( φ 1 , x , u ; h n ) r ( m ) ( φ 1 , x , u ) × n h m d ϕ m ( h ) ( r ^ n ( m ) ( φ 2 , x , u ; h n ) r ( m ) ( φ 2 , x , u ) ) .
Theorem 2.
Let F m K m be a measurable VC-subgraph class of functions satisfying Assumption 7. Suppose that f S ( u ) > 0 and ϵ s i j , A n = σ s i j / A n , x ϵ i j , where σ ( . , . ) is continuous and { ϵ i } i = 1 n is a sequence of i.i.d. random variables with mean zero and variance 1. Moreover, suppose n h m ( d + 1 ) + 4 c 0 for a constant c 0 . If all assumptions assumed in Theorem 1 hold in addition of Conditions (B2), (B3) and (B4), then the following result holds for P S almost surely:
n h m d ϕ m ( h ) r ^ n ( m ) ( φ , x , u ; h ) r ( m ) ( φ , x , u ) B u , x
converges to a Gaussian process G n over F m K m , whose simple paths are bounded and informally continuous with respect to . 2 -norm with co-variance function given by (29) and where the bias term B u , x = O P . S ( h 2 α ) .
Remark 4.
Set A n d = O n 1 η ¯ 1 for some η ¯ 1 [ 0 , 1 ) , A 1 , n = O A n γ A 1 , A 2 , n = O A n γ A 2 with 0 < γ A 2 < γ A 1 < 1 / 3 and p = min { 1 , ρ } = 1 . Assume that we can take a sufficiently large ζ > 2 such that 2 ζ < 1 η ¯ 1 1 3 γ A 1 . Then, Assumption 4 is satisfied for d 1 .
Remark 5.
It is simple to modify the proofs of our results to demonstrate that they still hold when the entropy condition is replaced by the bracketing condition:
0 ( log N [ ] ( u , FK , L p ( P m ) ) ) 1 2 d u <
Refer to p. 270 of [116] for the definition of N [ ] ( u , FK , L p ( P m ) ) .
Remark 6.
There are basically no restrictions on the choice of the kernel function in our setup apart from satisfying some mild conditions. The selection of the bandwidth, however, is more problematic. It is worth noticing that the choice of the bandwidth is crucial to obtain a good rate of consistency; for example, it has a big influence on the size of the estimate’s bias. In general, we are interested in the selection of bandwidth that produces an estimator which has a good balance between the bias and the variance of the considered estimators. It is then more appropriate to consider the bandwidth varying according to the criteria applied and to the available data and location which cannot be achieved by using the classical methods. The interested reader may refer to [117] for more details and discussion on the subject. It would be of interest to establish uniform-in-bandwidth central limit theorems in our setting; i.e., we will let h > 0 vary in such a way that h n h h n , where h n n 1 and h n n 1 are two sequences of positive constants such that 0 < h n h n < and, for either choice of h n = h n or h n = h n , fulfills our conditions. It will be of interest to show that
sup h n h h n n h m d ϕ m ( h ) r ^ n ( m ) ( φ , x , u ; h ) r ( m ) ( φ , x , u ) B u , x
converges to a Gaussian process G n over F m K m .

5. Applications

5.1. Metric Learning

Metric learning tries to adapt the metric to the data and has garnered significant interest in recent years; for an overview of metric learning and its applications, see [118,119]. This is prompted by applications ranging from computer vision to bioinformatics-based information retrieval. As an example of the utility of this notion, we give the metric learning issue for supervised classification as described in [119]. Let us consider dependent copies ( X s 1 , A n , Y 1 ) , , ( X s n , A n , Y n ) of a H × Y valued random couple ( X , Y ) , where H is some feature space and Y = { 1 , , C } , with C 2 say, a finite set of labels. Let D be a set of distance measures D : H × H R + . The intuitive objective of metric learning in this context is to identify a measure under which points with the same label are close together and those with different labels are far apart. The standard way to define the risk of a metric D is as follows:
R ( D ) = E ϕ 1 D X , X · 2 1 Y = Y 1 ,
where ϕ ( u ) is a convex loss function upper bounding the indicator function 1 { u 0 } : for instance, the hinge loss ϕ ( u ) = max ( 0 , 1 u ) . To estimate R ( D ) , we consider the natural empirical estimator
R n ( D ) = 2 n ( n 1 ) 1 i < j n K ¯ u i s i / A n h n K ¯ u j s j / A n h n × ϕ D X s i , A n , X s j , A n 1 · 2 π Y i = Y j 1 ,
which is one sample U-statistic of degree two with kernel given by:
φ D ( x , y ) , x , y = ϕ D x , x 1 · 2 1 y = y 1 .
The convergence to (30) of a minimizer of (31), in the non-spatial setting, has been studied in the frameworks of algorithmic stability [120], algorithmic robustness [121] and based on the theory of U-processes under appropriate regularization [122].

5.2. Multipartite Ranking

Let us recall the problem from [119]. Let X H be a random vector of attributes/features and the (temporarily hidden) ordinal labels Y { 1 , , K } assigned to it. The goal of multipartite ranking, which uses a training set of labeled examples, is to put the attributes or features in the same order as the labels. Many different fields use this statistical learning problem (e.g., medicine, finance, search engines, e-commerce). Rankings are usually defined by a scoring function, s : H R , which moves the natural order on the real line to the feature space. The ROC manifold, or its usual summary, the VUS criterion (VUS stands for Volume Under the ROC Surface), is the gold standard for evaluating the ranking performance of s ( x ) ; see [123] and the references therein. The best scoring functions, according to [124], are those that are best for all bipartite subproblems. More specifically, they are increasing transformations of the likelihood ratio dF k + 1 / dF k , where F k is the class-conditional distribution for the kth class. When the set of optimal scoring functions is not empty, the authors showed that it is the same as the set of functions that maximizes the amount of space under the ROC surface
VUS ( s ) = P s X s 1 < < s X s K Y 1 = 1 , , Y K = K .
Given K independent samples X s 1 , A n k ( k ) , , X s n k , A n k ( k ) with distribution F k ( d x ) for k = 1 , , K , the empirical counterpart of the VUS can be written in the following way:
VUS ^ ( s ) = 1 k = 1 K n k i 1 = 1 n 1 i k = 1 n K j = 1 K K ¯ u j s i j / A n h n 1 s X s i 1 , A n 1 ( 1 ) < < s s i K , A n K ( K ) .
The empirical VUS (32) is a K-sample U-statistic of degree ( 1 , , K ) with kernel given by:
φ s x 1 , , x K = 1 s x 1 < < s x K .

5.3. Set Indexed Conditional U-Statistics

We aim to study the links between X and Y by estimating functional operators associated with the conditional distribution of Y given X , such as the regression operator, for C 1 × × C k : = C ˜ in a class of sets C m ,
G ( m ) ( C 1 × × C m t , u ) = E i = 1 m 1 { Y i C i } ( X 1 , , X k ) = ( t 1 , , t m ) = t for t S c ,
where u = ( u 1 , , u d ) . We define metric entropy with the inclusion of the class of sets C . For each ε > 0 , the covering number is defined as:
N ( ε , C , G ( 1 ) ( · x ) ) = inf { n N : C 1 , , C n C such that C C 1 i , j n with C i C C j and G ( 1 ) ( C j C i x ) < ε } ,
the quantity log ( N ( ε , C , G ( 1 ) ( · x ) ) ) is called metric entropy with inclusion of C with respect to the conditional distribution G ( 1 ) ( · x ) . The quantity log N ( ε , C , G ( 1 ) ( · x ) ) is called metric entropy with inclusion of C with respect to G ( · x ) . Estimates for such covering numbers are known for many classes (see, e.g., [125]). We will often assume below that either log N ( ε , C , G ( 1 ) ( · x ) ) or N ( ε , C , G ( 1 ) ( · x ) ) behave like powers of ε 1 : we say that the condition ( R γ ) holds if
log N ( ε , C , G ( 1 ) ( · x ) ) H γ ( ε ) , for all ε > 0 ,
where
H γ ( ε ) = log ( A ε ) if γ = 0 , A ε γ if γ > 0 ,
for some constants A , r > 0 . As in [126], it is worth noticing that the condition (33), γ = 0 , holds for intervals, rectangles, balls, ellipsoids, and for classes which are constructed from the above by performing set operations union, intersection and complement finitely many times. The classes of convex sets in R d ( d 2 ) fulfill the condition (33), γ = ( d 1 ) / 2 . This and other classes of sets satisfying (33) with γ > 0 can be found in [125]. As a particular case of (9), we estimate G ( m ) ( C 1 × × C m t )
G ^ n ( m ) ( C ˜ , t , u ) = i I n m j = 1 m 1 { Y s i j , A n C j } = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n .
One can apply Theorem 1 to infer that, in probabbility,
sup C ˜ C m sup t S c , u I h G ^ n ( m ) ( C ˜ , t ) G ( m ) ( C ˜ t ) 0 .
Remark 7.
Another point of view is to consider the following situation, for a compact J R d m ,
G ( m ) ( y 1 , , y k t , u ) = E i = 1 m 1 { Y i y i } ( X 1 , , X m ) = t f o r t S c , ( y 1 , , y m ) J .
Let L ( · ) be a distribution in R d and h n is a sequence of positive real numbers. One can estimate G ( m ) ( y 1 , , y m t , u ) = G ( m ) ( y t , u ) by
G ^ n ( m ) ( y , t , u ) : = = i I n m j = 1 m L Y s i j , A n t j h n = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n .
One can use Theorem 1 to infer that, in probability,
sup t S c , u I h sup y J G ^ n ( m ) ( y , t , u ) G ( m ) ( y t , u ) 0 .

5.4. Discrimination

Now, we apply the results on the problem of discrimination described in Section 3 of [127], refer to also to [128]. We will use similar notations and settings. Let φ ( · ) be any function taking at most finitely many values, say 1 , , M . The sets
A j = ( y 1 , , y k ) : φ ( y 1 , , y k ) = j , 1 j M
then yield a partition of the feature space. Predicting the value of φ ( y 1 , , y k ) is tantamount to predicting the set in the partition to which ( Y 1 , , Y k ) belongs. For any discrimination rule g ( · ) , we have
P ( g ( X 1 , , X m ) = φ ( Y 1 , , Y m ) ) j = 1 M { x : g ( t ) = j } max m j ( t ) d P ( t ) ,
where
m j ( t ) = P ( φ ( Y 1 , , Y m ) = j X 1 , , X m = t ) , t S c .
The above inequality becomes equality if
g 0 ( t ) = arg max 1 j M m j ( t ) .
The function g 0 ( · ) is called the Bayes rule, and the pertaining probability of error
L = 1 P ( g 0 ( X 1 , , X m ) = φ ( Y 1 , , Y m ) ) = 1 E max 1 j M m j ( t )
is called the Bayes risk. Each of the above unknown functions m j ( · ) values can be consistently estimated by one of the methods discussed in the preceding sections. Let, for 1 j M ,
m n j ( x , u ) = i I n m 1 { φ ( Y s i 1 , A n , , Y s i m , A n ) = j } j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n .
Set
g 0 , n ( t ) = arg max 1 j M m n j ( t ) .
Let us introduce
L n = P ( g 0 , n ( X 1 , , X m ) φ ( Y 1 , , Y m ) ) .
Then, one can show that the discrimination rule g 0 , n ( · ) is asymptotically Bayes’ risk consistent
L n L .
This follows from the obvious relation
L L n 2 E max 1 j M m n j ( X , u ) m j ( X ) .

6. Extension to the Censored Case

Consider a triple ( Y , C , X ) of random variables defined in R × R × H . Here, Y is the variable of interest, C is a censoring variable and X is a concomitant variable. Throughout, we will use [129] notation and we work with a sample { ( Y i , C i , X s i , A n } of identically distributed replication of ( Y , C , X ) , n 1 . Actually, in the right censorship model, the pairs ( Y i , C i ) , 1 i n , are not directly observed, and the corresponding information is given by Z i : = min { Y i , C i } and δ i : = 1 { Y i C i } , 1 i n . Accordingly, the observed sample is
D n = { ( Z i , δ i , X s i , A n ) , i = 1 , , n } .
Survival data in clinical trials or failure time data in reliability studies, for example, are often subject to such censoring. More specifically, many statistical experiments result in incomplete samples, even under well-controlled conditions. For example, clinical data for surviving most types of disease are usually censored by other competing risks to life which result in death. In the sequel, we impose the following assumptions upon the distribution of ( X , Y ) . For < t < , set
F Y ( t ) = P ( Y t ) , G ( t ) = P ( C t ) , a n d H ( t ) = P ( Z t ) ,
the right-continuous distribution functions of Y, C and Z respectively. For any right-continuous distribution function L defined on R , denote by
T L = sup { t R : L ( t ) < 1 }
the upper point of the corresponding distribution. Now, consider a point-wise measurable class F of real measurable functions defined on R , and assume that F is of VC-type. We recall the regression function of ψ ( Y ) evaluated at X = x , for ψ F and x H , given by
r ( 1 ) ( ψ , x ) = E ( ψ ( Y ) X = x ) ,
when Y is right-censored. To estimate r ( 1 ) ( ψ , · ) , we make use of the Inverse Probability of Censoring Weighted (I.P.C.W.) estimators have recently gained popularity in the censored data literature (see [130,131,132]). The key idea of I.P.C.W. estimators is as follows. Introduce the real-valued function Φ ψ ( · , · ) defined on R 2 by
Φ ψ ( y , c ) = 1 { y c } ψ ( y c ) 1 G ( y c ) .
Assuming the function G ( · ) to be known, first note that Φ ψ ( Y i , C i ) = δ i ψ ( Z i ) / ( 1 G ( Z i ) ) is observed for every 1 i n . Moreover, under the Assumption ( I ) below,
( I )
C and ( Y , X ) are independent.
We have
r ( 1 ) ( Φ ψ , x ) : = E ( Φ ψ ( Y , C ) X = x ) = E 1 { Y C } ψ ( Z ) 1 G ( Z ) X = x = E ψ ( Y ) 1 G ( Y ) E ( 1 { Y C } X , Y ) X = x = r ( 1 ) ( ψ , x ) .
Therefore, any estimate of r ( 1 ) ( Φ ψ , · ) , which can be built on fully observed data, turns out to be an estimate for r ( 1 ) ( ψ , · ) too. Thanks to this property, most statistical procedures known to provide estimates of the regression function in the uncensored case can be naturally extended to the censored case. For instance, kernel-type estimates are particularly easy to construct. Set, for x H , h l n , 1 i n ,
ω ¯ n , K 1 , 2 , h n , i ( 1 ) ( x , u ) : = = 1 d K 1 u s j , A n h n K 2 d ( x , X s j , A n ) h n j = 1 n = 1 d K 1 u s j , A n h n K 2 d ( x , X s j , A n ) h n .
In view of (37)–(39), whenever G ( · ) is known, a kernel estimator of r ( 1 ) ( ψ , · ) is given by
r ˘ n ( 1 ) ( ψ , x , u ; h n ) = i = 1 n ω ¯ n , K 1 , 2 , h n , i ( 1 ) ( x , u ) δ i ψ ( Z i ) 1 G ( Z i ) .
The distribution function G ( · ) is generally unknown and has to be estimated. We will denote by G n ( · ) the Kaplan–Meier estimator of the function G ( · ) [133]. Namely, adopting the conventions
= 1
and 0 0 = 1 and setting
N n ( u ) = i = 1 n 1 { Z i u } ,
we have
G n ( u ) = 1 i : Z i u N n ( Z i ) 1 N n ( Z i ) ( 1 δ i ) , for u R .
Given this notation, we will investigate the following estimator of r ( 1 ) ( ψ , · )
r ˘ n ( 1 ) ( ψ , x , u ; h n ) = i = 1 n ω ¯ n , K 1 , 2 , h n , i ( 1 ) ( x , u ) δ i ψ ( Z i ) 1 G n ( Z i ) ,
refer to [129,130]. Adopting the convention 0 / 0 = 0 , this quantity is well defined, since G n ( Z i ) = 1 if and only if Z i = Z ( n ) and δ ( n ) = 0 , where Z ( k ) is the kth-ordered statistic associated with the sample ( Z 1 , , Z n ) for k = 1 , , n and δ ( k ) is the δ j corresponding to Z k = Z j . When the variable of interest is right-censored, the functional of the (conditional) law can generally not be estimated on the complete support (see [132]). To obtain our results, we will work under the following assumptions.
(A.1) 
F = { ψ : = ψ 1 1 { ( , τ ) m } , ψ 1 F 1 } , where τ < T H and F 1 is a point-wise measurable class of real measurable functions defined on R and of type VC.
(A.2) 
The class of functions F has a measurable and uniformly bounded envelope function Υ with,
Υ ( y 1 , , y k ) sup ψ F ψ ( y 1 , , y k ) , y i T H .
We now have all the ingredients to state the result corresponding to the censored case. By combining the results of Proposition 9.6 and Lemma 9.7 of [134], Theorem 1, we have, in probability,
sup x , u r ˘ n ( 1 ) ( ψ , x , u ; h n ) E ^ ( r ˘ n ( 1 ) ( ψ , x , u ; h n ) ) 0 .
A right-censored version of an unconditional U-statistic with a kernel of degree m 1 is introduced by the principle of a mean preserving reweighting scheme in [135]. Ref. [136] has proved almost sure convergence of multi-sample U-statistics under random censorship and provided application by considering the consistency of a new class of tests designed for testing equality in distribution. To overcome potential biases arising from right-censoring of the outcomes and the presence of confounding covariates, Ref. [137] proposed adjustments to the classical U-statistics. Ref. [138] proposed a different way in the estimation procedure of the U-statistic by using a substitution estimator of the conditional kernel given the observed data. To our best knowledge, the problem of the estimation of the conditional U-statistics was opened up to the present, and it gives our main motivation to the study of this section. A natural extension of the function defined in (37) is given by
Φ ψ ( y 1 , , y m , c 1 , , c m ) = i = 1 m { 1 { y i c i } ψ ( y 1 c 1 , , y m c m ) i = 1 m { 1 G ( y i c i ) } .
From this, we have an analogous relation to (38) given by
E ( Φ ψ ( Y 1 , , Y m , C 1 , , C m ) ( X 1 , , X m ) = t ) = E i = 1 m { 1 { Y i C i } ψ ( Y 1 C 1 , , Y k C m ) i = 1 m { 1 G ( Y i C i ) } ( X 1 , , X m ) = t = E ψ ( Y 1 , , Y m ) i = 1 m { 1 G ( Y i ) } E i = 1 m { 1 { Y i C i } ( Y 1 , X 1 ) , ( Y m , X m ) ( X 1 , , X m ) = t = E ψ ( Y 1 , , Y m ) ( X 1 , , X m ) = t = m ψ ( t ) .
An analogue estimator to (9) in the censored case is given by
r ˘ n ( m ) ( ψ , t , u ; h n ) = ( i 1 , , i m ) I ( m , n ) δ i 1 δ i m ψ ( Z i 1 , , Z i m ) ( 1 G ( Z i 1 ) ( 1 G ( Z i k ) ) ω ¯ n , K 1 , 2 , h n , i ( m ) ( t , u ) ,
where, for i = ( i 1 , , i k ) I ( k , n ) ,
ω ¯ n , K 1 , 2 , h n , i ( k ) ( x , u ) j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n .
The estimator that we will investigate is given by
r ˘ n ( m ) ( ψ , t , u ; h n ) = ( i 1 , , i k ) I ( m , n ) δ i 1 δ i k ψ ( Z i 1 , , Z i m ) ( 1 G n ( Z i 1 ) ( 1 G n ( Z i m ) ) ω ¯ n , K 1 , 2 , h n , i ( k ) ( t , u ) .
Corollary 1.
Under the assumptions (A.1)(A.2) and the conditions of Theorem 1, we have
sup x H m sup u I h , x S c r ˘ n ( m ) ( ψ , t , u ; h n ) E r ˘ n ( m ) ( ψ , t , u ; h n ) = O P . | S log n / n h m d ϕ m ( h ) + 1 A n d p ϕ ( h ) ,
In the last corollary, we use the law of iterated logarithm for G n ( · ) established in [139] ensuring that
sup t τ | G n G ( t ) | = O log log n n almost surely as n .
At this point, we may refer to [69,134,140].

7. The Bandwidth Selection Criterion

Many methods have been established and developed to construct, in asymptotically optimal ways, bandwidth selection rules for non-parametric kernel estimators especially for the Nadaraya–Watson regression estimator we quote among them [141,142,143]. This parameter has to be selected suitably, either in the standard finite dimensional case, or in the infinite dimensional framework for ensuring good practical performances. However, according to our knowledge, such studies do not presently exist for treating a such general functional conditional U-statistic. Nevertheless, an extension of the leave-one-out cross-validation procedure allows us to define, for any fixed j = ( j 1 , , j m ) I n m :
r ^ n , j ( m ) ( x , u ; h n ) = i I n m ( j ) φ ( Y s i 1 , A n , , Y s i m , A n ) j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d ( x j , X s i j , A n ) h n ,
where
I n m ( j ) : = i I n m and i j = I n m { j } .
The Equation (47) represents the leave-one-out- X j , Y j estimator of the functional regression and also could be considered as a predictor of φ ( Y s j 1 , A n , , Y s j m , A n ) : = φ ( Y j ) . In order to minimize the quadratic loss function, we introduce the following criterion: we have for some (known) non-negative weight function W ( · ) :
C V φ , h n : = ( n m ) ! n ! j I n m φ Y j r ^ n , j ( m ) ( X j , u ; h n ) 2 W ˜ X j ,
where X j = ( X s j 1 , A n , , X s j m , A n ) . Following the ideas developed by [143], a natural way for choosing the bandwidth is to minimize the precedent criterion, so let us choose h ^ n [ a n , b n ] minimizing among h [ a n , b n ] :
C V φ , h n .
The main interest of our results is the possibility to derive the asymptotic properties of our estimate even if the bandwidth parameter is a random variable, as in the last equation. Following [144] where the bandwidths are locally chosen by a data-driven method based on the minimization of a functional version of a cross-validated criterion, one can replace (48) by
C V φ , h n : = ( n m ) ! n ! j I n m φ Y j r ^ n , j ( m ) ( X j , u ; h n ) 2 W ^ X j , x ,
where
W ^ s , t : = i = 1 m W ^ ( s i , t i ) .
In practice, one takes for i I n m the uniform global weights W ˜ X i = 1 , and the local weights
W ^ ( X i , t ) = 1 if d ( X i , t ) h , 0 otherwise .
For the sake of brevity, we have just considered the most popular method: that is, the cross-validated selected bandwidth. This may be extended to any other bandwidth selector such as the bandwidth based on Bayesian ideas [145].
Remark 8.
For notational convenience, we have chosen the same bandwidth sequence for each margin. This assumption can be dropped easily if one wants to use the vector bandwidths (see, in particular, Chapter 12 of [6]). With obvious changes of notation, our results and their proofs remain true when h n is replaced by a vector bandwidth h n = ( h n ( 1 ) , , h n ( d ) ) , where min h n ( i ) > 0 . In this situation, we set h n = i = 1 d h n ( i ) , and for any vector v = ( v 1 , , v d ) , we replace v / h n by ( v 1 / h n ( 1 ) , , v 1 / h n ( d ) ) . For ease of presentation, we chose to use real-valued bandwidths throughout.
Remark 9.
We mention that a different bandwidth criterion suggested by [1] is the rule of thumb. Strictly speaking, since the cross-validated bandwidth is random, the asymptotic theory can only be justified with this random bandwidth via a specific stochastic equicontinuity argument. Cross-validation is employed by [146] to examine the equality of two unconditional and conditional functions in the context of mixed categorical and continuous data. However, this approach, which is optimal for estimation, loses its optimality when applied to non-parametric kernel testing. For testing a parametric model for conditional mean function against a non-parametric alternative, Ref. [147] proposed an adaptive-rate-optimal rule. Ref. [148] present the other method for selecting a proper bandwidth. Ref. [148] propose, utilizing the Edgeworth expansion of the asymptotic distribution of the test, to select the bandwidth such that the power function of the test is maximized while the size function is controlled. Future investigation will focus on the aforementioned three approaches.

8. Concluding Remarks

In this paper, we considered the kernel type estimator for conditional U-statistics, including a particular case, the Nadaraya–Watson estimator, in a functional setting with random fields. To obtain our results, we ought to make assumptions requiring some regularity on the conditional U-statistics and conditional moments, some decay rates on the probability of the variables belonging to shrinking open balls, and suitable decreasing rates on the mixing coefficients. Mainly, the conditional moment assumption enables the consideration of unbounded classes of functions. The proof of the weak convergence respects a typical technique: finite dimensional convergence and equicontinuity of the conditional U-processes.
Both results, the uniform rate of convergence and the weak convergence, are grounded on a general blocking technique adjusted for irregularly spaced sampling sites, where we need to pay attention to the effect of the non-equidistant sampling sites. We intricately reduce the work to the independent setting to address this issue. Indeed, as there is no practical guidance for introducing order to spatial points as opposed to time series, not asymptotically but exactly independent blocks of observations have been constructed by ([149], Corollary 2.7) (Lemma A4) and then results of independent data could be applied directly to the independent blocks. Here, Ref. [149] declares that the uniform convergence result requires the β -mixing condition to connect the original sequence with the sequence of the independent blocks, and this connection still holds under the ϕ -mixing condition but is not necessary under the α -mixing conditions. Therefore, we use the β -mixing sequence as we aim to derive the weak convergence for processes indexed by classes of functions.
Ref. [85] in his work gives us a possible extension of the sampling region inspired by [93]. This extension can be explained as follows. It is feasible to generalize the definition of the sample region R n to include non-standard forms. For instance, we may use the sample region concept [93] as follows: First, let R n be the sampling region. Define R 0 as an open connected subset of ( 2 , 2 ] d containing [ 1 , 1 ] d and R 0 as a Borel set such that R 0 R 0 R ¯ 0 , and where for any set S R d , S ¯ signifies its closure. Let A n n 1 be a sequence of positive numbers such that A n as n and define R n = A n R 0 as a sampling region. In addition, for any sequence of positive numbers a n n 1 with a n 0 as n , let O a n d + 1 , as n , be the number of cubes of the form a n i + [ 0 , 1 ) d , i Z d with their lower left corner a n i on the lattice a n Z d that intersects both R 0 and R 0 c (see Condition B in [93], Chapter 12, Section 12.2) (This condition is the prototype R 0 boundary’s condition; it must always be assumed on the region R n to prevent pathological situations, and it is satisfied by the majority of areas of practical significance. This condition is satisfied in the plane (d = 2), for instance, if the boundary R 0 of R 0 is defined by a simple rectifiable curve of limited length. When sample sites are defined on the integer grid Z d , this condition means that the effect of data points toward the boundary of R n is small compared to the overall number of data points). In addition, define f as a continuous, everywhere positive probability density function on R 0 , and let S 0 , i i 1 be a sequence of i.i.d. random vectors with density f. Assume that S 0 , i i 1 and X s , A n are independent. Replacing our setting in Section 2.4 with this new one, our results still hold, and it will be possible to show uniform convergence and weak convergence under the same assumptions and identical proofs. For future investigation, it will be interesting to relax the mixing conditions to the weak dependence (or the ergodicity framework). This generalization is nontrivial, since we need some maximal moment inequalities in our asymptotic results that are not available in this setting. Another interesting direction is to consider the incomplete data setting (missing at random, censored in different schemes) for locally spatial–functional data. A natural question is how to adapt our results to the wavelet-based estimators, the delta sequence estimators, the kNN estimators, and the local linear estimators.

9. Mathematical Developments

The proofs for our results are covered in this section. The following continues to use the notations that were previously presented.
To avoid the repetition of the Blocking technique and the notation used, we will devote the following subsection to introducing all notations needed for this decomposition.

9.1. Preliminaries

This treatment requires an extension of the Blocking techniques of Bernstein to the spacial process, refer to [85]. Let us introduce some notations related to this technique. Recall that A 1 , n and A 2 . n are sequences of positive numbers such that
A 1 , n / A n + A 2 , n / A 1 , n 0 as n .
Let
A 3 , n = A 1 , n + A 2 , n .
We consider a partition of R d by hypercubes of the form Γ n ( ; 0 ) = + ( 0 , 1 ] d A 3 , n , = 1 , , d Z d and divide Γ n ( ; 0 ) into 2 d hypercubes as follows:
Γ n ( ; ϵ ) = j = 1 d I j ϵ j , ϵ = ϵ 1 , , ϵ d { 1 , 2 } d ,
where for j = 1 , , d ,
I j ϵ j = j A 3 , n , j A 3 , n + A 1 , n if ϵ j = 1 , j A 3 , n + A 1 , n , j + 1 A 3 , n if ϵ j = 2 .
We note that
Γ n ( ; ϵ ) = A 1 , n q ( ϵ ) A 2 , n d q ( ϵ )
for any Z d and ϵ { 1 , 2 } d , where
q ( ϵ ) = 1 j d : ϵ j = 1 .
Let ϵ 0 = ( 1 , , 1 ) . The partitions Γ n ; ϵ 0 correspond to “large blocks” and the partitions Γ ( ; ϵ ) for ϵ ϵ 0 correspond to “small blocks”.
Let
L 1 , n = Z d : Γ n ( , 0 ) R n
be the index set of all hypercubes Γ n ( , 0 ) that are contained in R n , and let
L 2 , n = Z d : Γ n ( , 0 ) R n 0 , Γ n ( , 0 ) R n c
denote the boundary hypercubes index set. Define L n = L 1 , n L 2 , n .

9.2. Proof of Proposition 1

As we mentioned, our statistic is a weighted U-statistic that can be decomposed into a sum of U-statistics using the Hoeffding decomposition. We will treat this decomposition detailed in Section 3.1 to achieve the desired results. In the mentioned section, we have seen that
ψ ^ ( u , x ) E ψ ^ ( u , x ) = ψ ^ 1 ( u , x ) + ψ ^ 2 ( u , x ) ,
where the linear term ψ ^ 1 ( u , x ) and the remainder term ψ ^ 2 ( u , x ) are well defined in (24) and (26), respectively. We aim to prove that the linear term leads the rate of convergence of this statistic while the remaining one converges to zero almost surely as n . We will deal with the first term in the decomposition. For B = [ 0 , 1 ] , α n = log n / n h m d ϕ m ( h ) and τ n = ρ n n 1 / ζ , where ζ is a positive constant given in Assumption 6 part (i), with ρ n = ( log n ) ζ 0 for some ζ 0 > 0 . Define
H ˜ 1 ( ) ( z ) : = H ˜ ( ) ( z ) 1 W s i , A n τ n ,
H ˜ 2 ( z ) : = H ˜ ( ) ( z ) 1 W s i , A n > τ n ,
and
ψ ^ 1 ( 1 ) ( u , x ) θ ( i ) = 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ 1 ( ) ( z ) , ψ ^ 1 ( 2 ) ( u , x ) θ ( i ) = 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ 2 ( ) ( z ) .
Clearly, we have
ψ ^ 1 ( u , x ) E ψ ^ 1 ( u , x ) = ψ ^ 1 ( 1 ) ( u , x ) E ψ ^ 1 ( 1 ) ( u , x ) + ψ ^ 1 ( 2 ) ( u , x ) E ψ ^ 1 ( 2 ) ( u , x ) .
To begin, it is plain to see that
P · S sup F m K m sup x H m sup u B m ψ ^ 1 ( 2 ) ( u , x ) θ ( i ) > α n = P · S sup F m K m sup x H m sup u B m ψ ^ 1 ( 2 ) ( u , x ) θ ( i ) > α n sup F m K m sup x H m i = 1 n W s i , A n > τ n sup F m K m sup x H m i = 1 n W s i , A n > τ n c P · S sup F m K m sup x H m sup u B m ψ ^ 1 ( 2 ) ( u , x , φ ) θ ( i ) > α n sup F m K m sup x H m sup u B m i = 1 n W s i , A n > τ n + P · S sup F m K m sup x H m sup u B m ψ ^ 2 ( 1 ) ( u , x , φ ) θ ( i ) > α n sup F m K m sup x H m sup u B m i = 1 n W s i , A n > τ n c P · S sup F m K m sup x H m sup u B m W s i , A n > τ n for some i = 1 , , n + P · S ( ) τ n ζ i = 1 n E · S sup F m K m sup x H m sup u B m W s i , A n ζ n τ n ζ = ρ n ζ 0 .
We infer that
E · S ψ ^ 1 ( 2 ) ( u , x ) 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 E · S H ˜ 2 ( ) ( z ) ,
where
E · S H ˜ 2 ( ) ( z ) = E · S 1 ϕ ( h ) K 2 d ( x i , X s i , A n ) h W s i , A n × W s ( 1 , , 1 , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d ( x j , ν s j , A n ) h P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) E · S 1 ϕ ( h ) K 2 d x i , X s i , A n h + K 2 d x i , X u i ( s i ) h K 2 d x i , X u i ( s i ) h W s i , A n 1 W s i , A n > τ n τ n ( ζ 1 ) ϕ ( h ) E · S K 2 d x i , X s i , A n h K 2 d x i , X u i ( s i ) h W s i , A n ζ + K 2 d x i , X u i ( s i ) h W s i , A n ζ τ n ( ζ 1 ) ϕ ( h ) E · S h 1 d x i , X s i , A n d x i , X u i ( s i ) W s i , A n ζ + E · S K 2 d x i , X u i ( s i ) h W s i , A n ζ τ n ( ζ 1 ) ϕ ( h ) × 1 n h + ϕ ( h ) τ n ( ζ 1 ) n h ϕ ( h ) + τ n ( ζ 1 ) .
Hence, we have
E · S ψ ^ 1 ( 2 ) ( u , x ) 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 τ n ( ζ 1 ) τ n ( ζ 1 ) 1 n m i I n m j = 1 m 1 h d K ¯ u j s i j / A n h n = C τ n ( ζ 1 ) f S ( u ) + O log n n h m d + h 2 ( Using   Lemma   A 1 ) C τ n ( ζ 1 ) = C ρ n ( ζ 1 ) n ( ζ 1 ) / ζ C α n P S a . s .
As a result, we obtain that
sup F m K m sup x H m sup u B m ψ ^ 1 ( 2 ) ( u , x ) E · S ψ ^ 1 ( 2 ) ( u , x ) = O P · S ( α n ) .
Second, let us treat
sup F m K m sup x H m sup u B m ψ ^ 1 ( 1 ) ( u , x , φ ) E ψ ^ 1 ( 1 ) ( u , x , φ ) .
Recall the large blocks and small blocks and the notation given in Section 9.1, and define
S s , A n ( u , x ) : = ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ 1 ( ) ( z ) , S n ( ; ϵ ) = i : s i Γ n ( ; ϵ ) R n S s , A n ( u , x ) = S n ( 1 ) ( ; ϵ ) , , S n ( p ) ( ; ϵ ) .
Then, we have
S n = S n ( 1 ) , , S n ( m ) = i = 1 n S s , A n ( u , x ) = L n S n ; ϵ 0 + ϵ ϵ 0 L 1 , n S n ( ; ϵ ) = : S 2 , n ( ϵ ) + ϵ ϵ 0 L 2 , n S n ( ; ϵ ) = : S 3 , n ( ϵ ) = : S 1 , n + ϵ ϵ 0 S 2 , n ( ϵ ) + ϵ ϵ 0 S 3 , n ( ϵ ) .
In order to achieve our result, we will pass by the following two steps.
Step 1 (Reduction to independence). Recall
S n ( ; ϵ ) = i : s i Γ n ( ; ϵ ) R n S s , A n ( u , x ) .
For each ϵ { 1 , 2 } d , let S ˘ n ( ; ϵ ) : L n be a sequence of independent random variables in R under P · S such that
S ˘ n ( ; ϵ ) = d S n ( ; ϵ ) , under P . S , L n .
Define
S ˘ 1 , n = L n S ˘ n ; ϵ 0 = S ˘ 1 , n ( 1 ) , , S ˘ 1 , n ( m )
and for ϵ ϵ 0 , define
S ˘ 2 , n ( ϵ ) = L 1 , n S ˘ n ( ; ϵ )
and
S ˘ 3 , n ( ϵ ) = L 2 , n S ˘ n ( ; ϵ ) .
We start by confirming the following results:
sup t > 0 P · S S 1 , n > t P · S S ˘ 1 , n > t C A n A 1 , n d β A 2 , n ; A n d ,
sup t > 0 P · S S 2 , n ( ϵ ) > t P · S S ˘ 2 , n ( ϵ ) > t C A n A 1 , n d β A 2 , n ; A n d ,
sup t > 0 P · S S 3 , n ( ϵ ) > t P · S S ˘ 3 , n ( ϵ ) > t C A n A 1 , n d β A 2 , n ; A n d .
Keep in mind that
L n = O A n / A 3 , n d A n / A 1 , n d .
For ϵ { 1 , 2 } d and 1 , 2 L n with 1 2 , let
J 1 ( ϵ ) = 1 i 1 n : s i 1 Γ n 1 ; ϵ , J 2 ( ϵ ) = 1 i 2 n : s i 2 Γ n 2 ; ϵ .
For any s i k = s 1 , i k , , s d , i k , k = 1 , 2 in such a way that i 1 J 1 ( ϵ ) and i 2 J 2 ( ϵ ) , we obtain max 1 u d s u , i 1 s u , i 2 A 2 , n using the definition of Γ ( ; ϵ ) . This gives
s i 1 s i 2 A 2 , n .
For any ϵ { 1 , 2 } d , let S n 1 ; ϵ , , S n L n ; ϵ be an arrangement of S n ( ; ϵ ) : L n . Let P . S ( a ) be the marginal distribution of S n a ; ϵ and let P · S ( a : b ) be the joint distribution of S n k ; ϵ : a k b } . The β -mixing property of X gives that for 1 k L n 1 ,
P · S P · S ( 1 : k ) × P · S k + 1 : L n TV β A 2 , n ; A n d .
The inequality is independent of the arrangement of S n ( ; ϵ ) : L n . Therefore, the Assumption A2 in Lemma A4 is fulfilled for S n ( ; ϵ ) : L n with τ β A 2 , n ; A n d and m A n / A 1 , n d . Combining the boundary condition on R n and Lemma A4, we get (59)–(61).
Remark 10.
Since
ϵ { 1 , 2 } d : ϵ ϵ 0 = 2 d 1 , L 1 , n A n / A 3 , n d A n / A 1 , n d
and
L 2 , n A n / A 3 , n d 1 A n / A 1 , n d 1 L 1 , n ,
Lemma A5 and Equation (52) give for sufficiently large n the summands numbers of S 2 , n and S 3 , n are at most
O A 1 , n d 1 A 2 , n n A n d A n / A 1 , n d = O A 2 , n A 1 , n n
and
O A 1 , n d 1 A 2 , n n A n d A n / A 1 , n d 1 = O A 2 , n A n n ,
respectively.
Step 2. Recall that we aim to treat
sup F m K m sup x H m sup u B m ψ ^ 1 ( 1 ) ( u , x , φ ) E · S ψ ^ 1 ( 1 ) ( u , x , φ ) .
To achieve the intended result, we will cover the region B m = [ 0 , 1 ] d m by
k 1 , , k m = 1 N ( u ) j = 1 m B ( u k j , r ) ,
for some radius r. Hence, for each u = ( u 1 , , u m ) [ 0 , 1 ] d m , there exists l ( u ) = ( l ( u 1 ) , , l ( u m ) ) , where 1 i m , 1 l ( u i ) N ( u ) in such a way that
u i = 1 m B ( u l ( u i ) , r ) and | u i u l ( u i ) | r , for 1 i m ,
then for each u [ 0 , 1 ] d m , the closest center will be u l ( u ) , and the ball with the closest center will be defined by
B ( u , l ( u ) , r ) : = j = 1 m B ( u k j , r ) .
In the same way, H m should be covered by
k 1 , , k m = 1 N ( x ) j = 1 m B ( x k j , r ) ,
for some radius r. Hence, for each x = ( x 1 , , x m ) H m , there exists l ( x ) = ( l ( x 1 ) , , l ( x m ) ) , where 1 i m , 1 l ( x i ) N ( x ) in such a way that
x i = 1 m B ( u l ( x i ) , r ) and d ( x i , x l ( u i ) ) r , for 1 i m ,
then for each x H m , the closest center will be x l ( x ) , and the ball with the closest center will be defined by
B ( x , l ( x ) , r ) : = i = 1 m B ( x l ( x i ) , r ) .
We define:
K ( ω , v ) C 0 j = 1 m = 1 d 1 ( | ω j , | 2 C 1 ) j = 1 m K 2 ( v k ) for ( ω , v ) R 2 .
We can show that for ( u , x ) B j , n and n large enough,
K ¯ u s / A n h n K 2 d ( x i , X s i , A n ) h K ¯ u n s / A n h n K 2 d ( x n , i , X s i , A n ) h α n K u n s / A n , d ( x n , i , X s i , A n ) h n .
Then, for
ψ ^ 1 ( 1 ) ( u , x ) = 1 n i = 1 n ξ i 1 ϕ ( h ) K 2 d ( x i , X s i , A n ) h W s i , A n 1 W s i , A n τ n × ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i m 1 × W s ( 1 , , 1 , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d ( x j , ν s j , A n ) h P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) .
Let us define
ψ ¯ 1 ( 1 ) ( u , x ) = 1 n h d ϕ ( h ) i = 1 n K u n s i / A n , d ( x n , i , X s i , A n ) h n W s i , A n 1 W s i , A n τ n × ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i m 1 × W s ( 1 , , 1 , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d ( x j , ν s j , A n ) h P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) : = 1 n h d ϕ ( h ) i = 1 n S s , A n ( u , x ) .
We mention that
E · S ψ ¯ 1 ( 1 ) ( u , x , φ ) M < ,
for some M large enough. Let N F m K m N ( x ) m N ( u ) denote the covering number related, respectively, to the class of functions F m K m , the balls that cover [ 0 , 1 ] m and the balls that cover H m . Then, we obtain
sup F m K m sup x H m sup u B ψ ^ 1 ( 1 ) ( u , x , φ ) E · S ψ ^ 1 ( 1 ) ( u , x , φ ) N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) ( u , x , φ ) E · S ψ ^ 1 ( 1 ) ( u , x , φ ) N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) u n , x E · S ψ ^ 1 ( 1 ) u n , x + N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) α n ψ ¯ 1 ( 1 ) u n , x + E · S ψ ¯ 1 ( 1 ) u n , x N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) u n , x E · S ψ ^ 1 ( 1 ) u n , x + N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ¯ 1 ( 1 ) u n , x E · S ψ ¯ 1 ( 1 ) u n , x + 2 M F ( y ) α n N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) L n S n ; ϵ 0 + N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 1 , n S n ( ; ϵ ) + N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 2 , n S n ( ; ϵ ) + N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) L n S n ; ϵ 0 + N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 1 , n S n ( ; ϵ ) + N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 2 , n S n ( ; ϵ ) + 2 M F ( y ) α n .
Even more, for each ϵ { 1 , 2 } d , let S ˘ n ( ; ϵ ) : L n denote a sequence of independent random vectors in R m under P · S such that
S ˘ n ( ; ϵ ) = d S n ( ; ϵ ) , under P . S , L n .
Show that
P · S N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r )
max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) ( u , x ) E · S ψ ^ 1 ( 1 ) ( u , x ) > 2 m d + 1 M a n N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S sup ( u , x ) B k ψ ^ 1 ( u , x ) E · S ψ ^ 1 ( u , x ) > 2 m d + 1 M a n
ϵ { 1 , 2 } d Q ^ n ( ϵ ) + ϵ { 1 , 2 } d Q ¯ n ( ϵ ) + 2 m d + 1 N F m K m N ( x ) m N ( u ) m A n A 1 , n d β A 2 , n ; A n d ,
where
Q ^ n ϵ 0 = N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ 0 ) > M a n n m h m d ϕ ( h ) , Q ¯ n ϵ 0 = N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ 0 ) > M a n n m h m d ϕ ( h ) ,
and for ϵ ϵ 0
Q ^ n ϵ = N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ ) > M a n n m h m d ϕ ( h ) , Q ¯ n ϵ = N F m K m N ( x ) m N ( u ) m max 1 i 1 < < i m m sup B ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ ) > M a n n m h m d ϕ ( h ) .
Due to the similarity between the two cases, ϵ ϵ 0 and ϵ = ϵ 0 , we are going to treat Q ^ n only for ϵ ϵ 0 . An application of Lemma A5, with the fact that S ˘ n ( ; ϵ ) are zero-mean random variables, shows us that:
P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h ) 2 P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h )
and
S ˘ n ( ; ϵ ) C A 1 , n d 1 A 2 , n ( log n ) τ n , P S a . s . ( from Lemma A 5 ) E · S S ˘ n ( ; ϵ ) 2 C h m d ϕ ( h ) A 1 , n d 1 A 2 , n ( log n ) , P S a . s . ( By Lemma A 6 )
Using Bernstein’s inequality represented in Lemma A7, we have
P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h ) exp 1 2 × M n h m d ϕ ( h ) log n A n A 1 , n d A 1 , n d 1 A 2 , n h m d ϕ ( h ) ( log n ) + 1 3 × M 1 / 2 n 1 / 2 h m d / 2 ϕ ( h ) 1 / 2 ( log n ) 1 / 2 A 1 , n d 1 A 2 , n τ n .
Observe that
n h m d log n A n A 1 , n d A 1 , n d 1 A 2 , n h m d ( log n ) = n A n d A 1 , n A 2 , n A 1 , n A 2 , n n η ,
n h m d ϕ ( h ) log n n 1 / 2 h m d / 2 ϕ ( h ) 1 / 2 ( log n ) 1 / 2 A 1 , n d 1 A 2 , n τ n = n 1 / 2 h m d / 2 ϕ ( h ) 1 / 2 ( log n ) 1 / 2 A 1 , n m d A 2 , n A 1 , n ρ n n 1 / ζ C 0 n η / 2 .
Taking M > 0 to be sufficiently large, and for N C h m d ϕ ( h ) α n m , this shows the desired result.
We must move on to the nonlinear part of the Hoeffding decomposition. Accordingly, the goal is to prove that:
P · S sup F m K m sup x H m sup u B m ψ ^ 2 ( u , x ) > λ 0 as n .
In the following, we will give a lemma that can be viewed as a technical result in the proof of our proposition, and it helps us to achieve our goal in Expression (69). The proof of this lemma used the Blocking technique defined before but for the U-statistic, making the block treatment more complicated.
Lemma 1.
Let F m K m be a uniformly bounded class of measurable canonical functions, m 2 . Assume that there are finite constants a and b in such a way that the F m K m covering number fulfills:
N ( ϵ , F m K m , · L 2 ( Q ) ) a ϵ b ,
for all ϵ > 0 and all probability measures Q. If the mixing coefficients β of the local stationary sequence { Z i = ( X s i , A n , W s i , A n ) i N satisfy Condition (E2) in Assumption 6, then, for some r > 1 , we have:
sup F m K m sup x H m sup u B m P h m d / 2 ϕ m / 2 ( h ) n m + 1 / 2 i I n m ξ i 1 ξ i m H ( Z i 1 , , Z i m ) 0 .
Remark 11.
As mentioned before, W s i , A n will be equal to 1 or ϵ s i , A n = σ s i A n , X s i , A n ϵ s i . In the proof of the previous lemma, W s i , A n will be equal ε i , n = σ i n , X i , n ϵ i , and we will use the notation W s i , A n ( u ) to indicate σ u , x ϵ i .

9.2.1. Proof of Lemma 1

This lemma’s proof is based on the blocking technique employed by [82], and it is called Bernstein’s method, referred to [150], in which we are enabled to apply the symmetrization and the many other techniques available for the i.i.d random variables. We will extend this technique to the spacial processes in the U-statistics setting, in the same line as in [93]. In addition to the notation in Section 9.1, define
L n : = L 1 , n L 2 , n ,
Δ 1 = { 2 : min 1 i d | 1 i 2 i | 1 }
Δ 2 = { 2 : min 1 i d | 1 i 2 i | 2 }
With the notation introduced above, it is easy to show that, for m = 2 ,
1 h 2 d ϕ 2 ( h ) i I n 2 j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n = 1 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n + 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n + 2 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n + 2 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n + 1 h 2 d ϕ 2 ( h ) 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 1 : s i 1 Γ n ( 1 ; ϵ ) R n i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n + 1 h 2 d ϕ 2 ( h ) 1 L 1 , n L 2 , n ϵ ϵ 0 i 1 < i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n : = I + II + III + IV + V + VI .
(I):
The Same Type of Blocks but Not the Same Block
Let { η i } i N be a sequence of independent blocks. An application of Lemma A4 shows that:
P sup F m K m sup x H m sup u B m n 3 / 2 1 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n > δ P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , X s i j , A n ) h n j = 1 2 K 2 d x i , X u j ( s i j ) h W s i , A n > δ ) + P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d x i , X u j ( s i j ) h W s i , A n W s i , A n ( u ) > δ ) + P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d x i , X u j ( s i j ) h W s i , A n ( u ) > δ ) P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) > δ + C A n A 1 , n d β A 2 , n ; A n d + o P ( 1 ) + o P ( 1 ) ,
Because:
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , X s i j , A n ) h n j = 1 2 K 2 d x j , X u j ( s i j ) h W s i , A n = 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n E . S j = 1 2 K 2 d ( x j , X s i j , A n ) h n j = 1 2 K 2 d x j , X u j ( s i j ) h W s i , A n = 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n E . S j = 1 2 K 2 d ( x j , X s i j , A n ) h n j = 1 2 K 2 d x j , X u j ( s i j ) h j = 1 m ϵ s i j , A n = 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n E . S j = 1 2 K 2 d ( x j , X s i j , A n ) h n j = 1 2 K 2 d x j , X u j ( s i j ) h j = 1 m σ s i j A n , X s i j , A n ϵ s i j = 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j E . S j = 1 2 K 2 d ( x j , X s i j , A n ) h n j = 1 2 K 2 d x j , X u j ( s i j ) h j = 1 m σ s i j A n , X s i j , A n j = 1 m σ u j , x j + j = 1 m σ u j , x j 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j E . S C j = 1 m K 2 d ( x j , X s i j , A n ) h n K 2 d x j , X s i j A n ( s i j ) h p j = 1 m σ u j , x j + o P ( 1 ) ( Using a telescoping argument , and the boundedness of K 2 for p = min ( ρ , 1 ) and C < ) 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j E . S ϕ m 1 ( h ) C A n d U s i j , A n s i j A n p j = 1 m σ u j , x j + o P ( 1 ) o P ( 1 ) ,
and
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d x i , X u j ( s i j ) h W s i , A n W s i , A n ( u ) = 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j E . S j = 1 2 K 2 d x i , X u j ( s i j ) h j = 1 m σ s i j A n , X s i j , A n j = 1 m σ u j , x j 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j × ( o P ( 1 ) ) 0 h k = 1 m K 2 y k h d F i k / n ( y k , x k ) 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j × ( o P ( 1 ) ) ( ϕ 2 ( h ) ) o P ( 1 ) .
Under the assumptions of the lemma, we have β ( a ; b ) β 1 ( a ) g 1 ( b ) with β 1 ( a ) 0 as a and n , so the term to consider is the first summand. For the second part of the inequality, we will use the work of [27] in the non-fixed kernels settings, precisely, we will define f i 1 , , i m = k = 1 m ξ i k × H and F i 1 , , i m , respectively, as a collection of kernels and the class of functions related to this kernel; then, we will use ([29], Theorem 3.1.1 and Remarks 3.5.4 part 2) for decoupling and randomization. As we mentioned above, we will suppose that m = 2 . Then, we can see that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W i , φ , n ( u ) F 2 K 2 = E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n f i 1 , i 2 ( u , η ) F i 1 , i 2 c 2 E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n f i 1 , i 2 ( u , η ) F i 1 , i 2 c 2 E . S 0 D n h ( U 1 ) N t , F i 1 , i 2 , d ˜ n h , 2 ( 1 ) d t , ( By Lemma A 9 and Proposition A 1 . )
where D n h ( U 1 ) is the diameter of F i 1 , i 2 according to the distance d ˜ n h , 2 ( 1 ) , respectively, which is defined as
D n h ( U 1 ) : = E ϵ 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n f i 1 , i 2 ( u , η ) F i 1 , i 2 = E ϵ 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 ,
and:
d ˜ n h , 2 ( 1 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) : = E ϵ 1 n 3 / 2 h d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ξ 1 i 1 ξ 1 i 2 k = 1 2 K 1 , 2 d ( x k , η i k ) h W s i , A n ( u ) p q L n ϵ p ϵ q i H p ( U ) j H q ( U ) ξ 2 i 1 ξ 2 i 2 k = 1 2 K 2 , 2 d ( x k , η i k ) h W s i , A n ( u ) .
Let consider another semi-norm d ˜ n h , 2 ( 2 ) :
d ˜ n h , 2 ( 2 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) = 1 n h d ϕ 2 ( h ) 1 2 L n ξ 1 i 1 ξ 1 i 2 k = 1 2 K 1 , 2 d ( x k , η i k ) h W s i , A n ( u ) p q υ n ϵ p ϵ q i H p ( U ) j H q ( U ) ξ 2 i 1 ξ 2 i 2 k = 1 2 K 2 , 2 d ( x k , η i k ) h W s i , A n ( u ) 2 1 / 2 .
One can see that
d ˜ n h , 2 ( 1 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) A 1 , n n 1 / 2 h d ϕ ( h ) d ˜ n h , 2 ( 2 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) .
We readily infer that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W i , φ , n ( u ) F 2 K 2 c 2 E . S 0 D n h ( U 1 ) N t A 1 , n d n 1 / 2 , F i , j , d ˜ n h , 2 ( 2 ) d t c 2 A 1 , n d n 1 / 2 P D n h ( U 1 ) A 1 , n d n 1 / 2 λ n + c m A 1 , n d n 1 / 2 0 λ n log t 1 d t ,
where λ n 0 . We have
0 λ n log t 1 d t λ n log λ n 1 0 ,
where λ n must be chosen in such a way that the following relation will be achieved
A 1 , n d λ n n 1 / 2 log λ n 1 0 .
By utilizing the triangle inequality in conjunction with Hoeffding’s trick, we are easily able to derive that
A 1 , n d n 1 / 2 P D n h ( U 1 ) λ n A 1 , n d n 1 / 2 λ n 2 A 1 , n d n 5 / 2 h ϕ 1 ( h ) E . S 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 c 2 [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 h ϕ 1 ( h ) E . S 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 ,
where η i i N are independent copies of ( η i ) i N . By imposing
λ n 2 A 1 , n d r n 1 / 2 0 ,
we readily infer that
[ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 h ϕ 1 ( h ) E . S 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 k = 1 2 K 2 d ( x k , η i k ) h W s i , A n ( u ) 2 F 2 K 2 O λ n 2 A 1 , n d r n 1 / 2 .
A symmetrization of the last inequality in (78) succeeded by an application of the Proposition A1 in the Appendix A gives
[ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 h ϕ 1 ( h ) E . S 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ϵ p ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 c 2 E . S 0 D n h ( U 2 ) log N ( u , F i , j , d ˜ n h , 2 ) 1 / 2 ,
where
D n h ( U 2 ) = E ϵ | [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 ϕ 1 ( h ) 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 .
and for ξ 1 . K 2 , 1 W , ξ 2 . K 2 , 2 W F i j :
d ˜ n h , 2 ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) : = E ϵ [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 ϕ 1 ( h ) 1 L n ϵ p i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ 1 i 1 ξ 1 i 2 K 2 , 1 d ( x 1 , η i 1 ) h K 2 , 1 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 i 1 , i 2 H p ( U ) ξ 2 i ξ 2 j K 2 , 2 d ( x 1 , η i 1 ) h K 2 , 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 .
By the fact that:
E ϵ [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 ϕ 1 ( h ) 1 L n ϵ p i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 A 1 , n 3 d / 2 λ n 2 n 1 [ [ L n ] ] 1 A 1 , n 2 d ϕ 2 ( h n ) 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d ( x i , η i 1 ) h K 2 d ( x 2 , η j ) h W s i , A n ( u ) 4 1 / 2 ,
so:
A 1 , n d 3 / 2 λ n 2 n 1 0 ,
we have the convergence of (80) to zero. Recall that
L n = O A n / A 3 , n d A n / A 1 , n d .
(II):
The Same Block
P sup F m K m sup x H m sup u B m 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n > δ P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , X s i j , A n ) h n j = 1 2 K 2 d x i , X u j ( s i j ) h W s i , A n > δ ) + P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d x i , X u j ( s i j ) h W s i , A n W s i , A n ( u ) > δ ) + P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d x i , X u j ( s i j ) h W s i , A n ( u ) > δ ) P ( sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) > δ + C A n A 1 , n d β A 2 , n ; A n d + o P ( 1 ) + o P ( 1 ) ,
In the same manner as I , we can show that the first and the second term in the previous inequality is of order o P ( 1 ) . So, as the preceding proof, it suffices to prove that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 0 .
Notice that we treat uniformly bounded classes functions in which we obtain uniformly in B m × F 2 K 2
E . S i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) = O ( a n ) .
This implies that we have to prove that, for u B m
E E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) E . S j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 0 .
As for empirical processes, to prove (82), it is enough to symmetrize and show that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ϵ p j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 0 .
Similarly to how in (75), we have
E . S 1 n 3 / 2 h d + 1 ϕ ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ϵ p j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 E 0 D n h ( U 3 ) log N u , F i 1 , i 2 , d ˜ n h , 2 ( 3 ) 1 / 2 d u ,
where
D n h ( U 3 ) = E ϵ 1 n 3 / 2 h d ϕ ( h ) 1 L n ϵ p i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 ,
and the semi-metric d ˜ n h , 2 ( 3 ) is defined by
d ˜ n h , 2 ( 3 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) = E ϵ 1 n 3 / 2 h d ϕ ( h ) 1 L n ϵ p i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ξ 1 i ξ 1 j K 2 , 1 d ( x 1 , η i 1 ) h K 2 , 1 d ( x 2 , η i 2 ) h W s i , A n ( u ) ξ 2 i ξ 2 j K 2 , 2 d ( x 1 , η i 1 ) h K 2 , 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) .
Since we are considering uniformly bounded classes of functions, we obtain
E ϵ n 3 / 2 h ϕ 1 ( h n ) 1 L n ϵ p i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) A 1 , n 3 d / 2 n 1 h ϕ 1 ( h n ) 1 [ [ L n ] ] A 1 , n 2 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 1 / 2 O A 1 , n 3 d / 2 n 1 ϕ 1 ( h n ) .
Since A 1 , n 3 d / 2 n 1 ϕ 1 ( h ) 0 , D n h ( U 3 ) 0 , we obtain II 0 as n .
(III):
Different Types of Blocks
Avoiding the repetition, we can directly see that:
P sup F m K m sup x H m sup u B m 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n > δ P sup F m K m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) > δ + C A n A 1 , n d β A 2 , n ; A n d + o P ( 1 ) + o P ( 1 ) .
For p = 1 and p = ν n :
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 = E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 .
For 2 p υ n 1 , we obtain:
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 = E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 4 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 ,
therefore, it suffices to show that:
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 .
By similar arguments as in [82], the usual symmetrization is applied and:
E . S L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 2 E . S L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 = 2 E . S L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 1 D n h ( U 4 ) γ n + 2 E . S L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) F 2 K 2 1 D n h ( U 4 ) > γ n = 2 III 1 + 2 III 2 ,
where
D n h ( U 4 ) = L n n 3 / 2 h 2 d ϕ 2 ( h ) 2 : min 1 i d 2 i = 3 ϵ ϵ 0 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j h W s i , A n ( u ) 2 1 / 2 F 2 K 2 .
In a similar way as in (75), we infer that
III 1 c 2 0 γ n log N t , F i 1 , i 2 , d ˜ n h , 2 ( 4 ) 1 / 2 d t ,
where
d ˜ n h , 2 ( 4 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) : = E ϵ L n n 3 / 2 h ϕ 1 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q ξ 1 i 1 ξ 1 i 2 K 2 , 1 d ( x 1 , η i 1 ) h K 2 , 1 d ( x 2 , η i 2 ) h W s i , A n ( u ) ξ 2 i 1 ξ 2 i 2 K 2 , 2 d ( x 1 , η i 1 ) h K 2 , 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) .
Since we have
E ϵ L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) A 1 , n d / 2 A 2 , n d h d + 1 ϕ ( h ) 1 A 1 , n d A 2 , n d L n h d 1 ϕ 4 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ξ i 1 ξ i 2 K 2 d ( x 1 , η i 1 ) h K 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 1 / 2 ,
and considering the semi-metric
d ˜ n h , 2 ( 5 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) : = 1 A 1 , n d A 2 , n d L n h d 1 ϕ 4 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ξ 1 i 1 ξ 1 i 2 K 2 , 1 d ( x 1 , η i 1 ) h K 2 , 1 d ( x 2 , η i 2 ) h W s i , A n ( u ) ξ 2 i 1 ξ 2 i 2 K 2 , 2 d ( x 1 , η i 1 ) h K 2 , 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 1 / 2 .
We demonstrate that the statement in (88) is bounded as follows
L n 1 / 2 A 2 , n d n 1 / 2 h 2 ϕ ( h ) 0 L n 1 / 2 A 2 , n d n 1 / 2 h 2 d γ n log N t , F i 1 , i 2 , d ˜ n h , 2 ( 5 ) 1 / 2 d t ,
by choosing γ n = n α for some α > ( 17 r 26 ) / 60 r , we obtain the convergence of the preceding quantity to zero. In order to bound the second term on the right-hand side of (86), we can mention that
III 2 = E L n n 3 / 2 h ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j ) h W s i , A n ( u ) F 2 K 2 1 D n h ( U 4 ) > γ n A 1 , n 1 A 2 , n n 1 / 2 h d ϕ 1 ( h ) P L n 2 n 3 h 2 ϕ 2 ( h n ) 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j ) h W s i , A n ( u ) 2 F 2 K 2 γ n 2 .
We are going to use the square root method on the last expression conditionally on Γ n ( 1 ; ϵ 0 ) R n . We denote by E ϵ ϵ 0 the expectation with respect to σ η i 2 , ϵ ϵ 0 and we will suppose that any class of functions F m is unbounded and its envelope function satisfies for some p > 2 :
θ p : = sup x S H m E F p ( Y ) | X = x < ,
for 2 r / ( r 1 ) < s < (in the notation in of [151] [Lemma 5.2]).
M n = L n 1 / 2 E ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d ( x j , η i j ) h W s i , A n ( u ) 2 ,
where
x = γ n 2 A 1 , n 5 d / 2 n 1 / 2 h m d / 2 ϕ m / 2 ( h ) , ρ = λ = 2 4 γ n A 1 , n 5 d / 4 n 1 / 4 h m d / 4 ϕ m / 4 ( h ) ,
and
m = exp γ n 2 n h 2 d ϕ 2 ( h n ) A 2 , n 2 d .
However, since we need t > 8 M n , and m , by similar arguments as in ([82], p. 69), we reach the convergence of (88) and (89) to zero.
(IV):
Blocks of Different Types
The target here is to prove that:
P sup F m K m sup x H m sup u B m 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n > δ 0 .
We have
n 3 / 2 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n W s i , A n F 2 K 2 c 2 L n A 1 , n d A 2 , n d n 3 / 2 h d ϕ 1 ( h ) 0 .
Hence, the proof of the lemma is complete.
The final step in the proof of Proposition 1 lies in the use of Lemma 1 to prove that the nonlinear term converges to zero. □

9.2.2. Proof of Theorem 1

We have
r ^ n ( m ) ( φ , x , u ; h n ) r ( m ) ( φ , x , u ) = 1 r ˜ 1 ( φ , x , u ) g ^ 1 ( u , x ) + g ^ 2 ( u , x ) r ( m ) ( φ , x , u ) r ˜ 1 ( φ , x , u ) ,
where
r ˜ 1 ( φ , u , x ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n , g ^ 1 ( u , x ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n j = 1 m ϵ s i j , A n , g ^ 2 ( u , x ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n × r ( m ) φ , X s i 1 , A n , , X s i m , A n , s i 1 A n , , s i m A n .
The proof of this theorem is involved and divided into the following four steps, where in each one, we aim to show that
Step 1.
sup F m K m sup x H m sup u B m | g ^ 1 ( u , x ) | = O P log n / n h m d ϕ m ( h ) .
Step 2.
sup F m K m sup x H m sup u B m | g ^ 2 ( u , x ) r ( m ) ( φ , u , x ) r ˜ 1 ( φ , u , x ; h n ) E . S ( g ^ 2 ( u , x ) r ( m ) ( φ , u , x ) r ˜ 1 ( φ , u , x ; h n ) ) | = O P log n / n h m d ϕ m ( h ) .
Step 3.
Let κ 2 = R x 2 K ( x ) d x .
sup F m K m sup x H m sup u B m E . S g ^ 2 ( u , x ) r ( m ) ( φ , u , x ) r ˜ 1 ( φ , u , x ; h n ) = O 1 A n d p ϕ ( h ) + o h 2 , P S a . s .
Step 4.
sup F m K m sup x H m sup u B m r ˜ 1 ( φ , u , x ) E . S r ˜ 1 ( φ , u , x ) = o P . S ( 1 ) .
It is clear that Step 1 follows directly from Proposition 1 for W s i , A n = j = 1 m ϵ s i j , A n . The second one (Step 2) holds also if we replace W s i , A n with g ^ 2 ( u , x ) r ( m ) ( φ , u , x ) r ˜ 1 ( φ , u , x ; h n ) then applying Proposition 1.
We will pass now to Step 4. Observe that
for W s i , A n 1 , the previous mentioned proposition proved that
sup F m K m sup x H m sup u B m r ˜ 1 ( φ , u , x ) E . S r ˜ 1 ( φ , u , x ) = o P . S ( 1 ) .
Step 3. will be treated in what follows:
Let K 0 : [ 0 , 1 ] R be a Lipschitz continuous function compactly support on [ q C 1 , q C 1 ] for some q > 1 and such that K 0 ( x ) = 1 , x [ C 1 , C 1 ] . Show that
E . S g ^ 2 ( u , x ) r ( m ) ( φ , u , x ) r ˜ 1 ( φ , u , x ; h n ) ) = i = 1 4 Q i ( u , x ) ,
where Q i can be defined as follows
Q i ( u , x ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n q i ( u , x ) ,
such that
q 1 ( u , x ) = E . S j = 1 m K 0 d ( x j , X s i j , A n ) h n j = 1 m K 2 d ( x j , X s i j , A n ) h n j = 1 m K 2 d x i , X s i j A n ( s i j ) h × r ( m ) φ , s i A n , X s i , A n r ( m ) ( φ , u , x ) , q 2 ( u , x ) = E . S j = 1 m K 0 d ( x j , X s i j , A n ) h n K 2 d x i , X s i j A n ( s i j ) h r ( m ) φ , s i A n , X s i , A n r ( m ) φ , s i A n , X s i / A n ( s i ) , q 3 ( u , x ) = E . S j = 1 m K 0 d ( x j , X s i j , A n ) h n j = 1 m K 0 d x i , X s i j A n ( s i j ) h j = 1 m K 2 d x i , X s i j A n ( s i j ) h × r ( m ) φ , s i A n , X s i / A n ( s i ) r ( m ) ( φ , u , x ) , q 4 ( u , x ) = E . S j = 1 m K 2 d x i , X s i j A n ( s i j ) h r ( m ) φ , s i A n , X s i / A n ( s i ) r ( m ) ( φ , u , x ) .
Observe that
Q 1 ( u , x ) ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S j = 1 m K 0 d ( x j , X s i j , A n ) h n j = 1 m K 2 d ( x j , X s i j , A n ) h n j = 1 m K 2 d x i , X s i j A n ( s i j ) h × r ( m ) φ , s i A n , X s i , A n r ( m ) ( φ , u , x ) ,
using the properties of r ( m ) ( u , x ) allows us to show that
j = 1 m K 0 d ( x j , X s i j , A n ) h n r ( m ) φ , s i A n , X s i , A n r ( m ) ( φ , u , x ) C h m
Q 1 ( u , x ) ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S C h m × C j = 1 m K 2 d ( x j , X s i j , A n ) h n K 2 d x i , X s i j A n ( s i j ) h p ( Using the telescoping argument , and the boundness of K 2 for p = min ( ρ , 1 ) and C < ) ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S C h m j = 1 m C A n d h U s i j , A n s i j A n p C A n p d ϕ m ( h ) h p m uniformly in u .
In a similar way, and for
E j = 1 m K 2 d x i , X s i j A n ( s i j ) h C ϕ m 1 ( h ) ,
and since r ( m ) ( · ) is Lipschitz and d X s i j , A n , X s i j A n s j C A n d U s i j , A n s i j A n and the variable U s i j , A n s i j A n has a finite p-th moment, we can see that
Q 2 ( u , x ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S [ j = 1 m K 0 d ( x j , X s i j , A n ) h n K 2 d x i , X s i j A n ( s i j ) h r ( m ) φ , s i A n , X s i , A n r ( m ) φ , s i A n , X s i / A n ( s i ) ] ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S ϕ m 1 ( h ) C A n d U s i j , A n s i j A n p C A n p d ϕ ( h ) ,
and
sup F m K m sup x H m sup u I h m Q 3 ( u , x ) 1 A n p d ϕ m ( h ) h p m .
For the last term, we have
Q 4 ( u , x ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S j = 1 m K 2 d x i , X s i j A n ( s i j ) h r ( m ) φ , s i A n , X s i / A n ( s i ) r ( m ) ( φ , u , x ) .
Using Lemma A1 and inequality (17) and under Assumption 1, it follows that
Q 4 ( u , x ) ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S j = 1 m K 2 d x i , X s i j A n ( s i j ) h r ( m ) φ , s i A n , X s i / A n ( s i ) r ( m ) ( φ , u , x ) ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S j = 1 m K 2 d x i , X s i j A n ( s i j ) h d H m X s i / A n ( s i ) , x + u s i A n α ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n 0 1 0 1 1 h m j = 1 m K ¯ ( u j v j ) h d v j E . S j = 1 m K 2 d x i , X s i j A n ( s i j ) h × h α + ( n m ) ! n ! h m d ϕ m ( h ) i I n m 0 1 0 1 1 h m d j = 1 m K ¯ u j v j h d v j × E . S ϕ m 1 ( h ) h α O P . S h 2 α .
Adding the obtained results of Q i , 1 i 4 , Step 3 yields the rate of convergence of the estimator. □

9.2.3. Proof of Theorem 2

Recall that
r ^ n ( m ) ( φ , x , u ; h n ) = i I n m φ ( Y s i 1 , A n , , Y s i m , A n ) j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d ( x j , X s i j , A n ) h n .
For x H m , y Y m , define
G φ , i ( x , y ) : = j = 1 m K 2 d ( x j , X s i j , A n ) h n φ ( Y s i 1 , A n , , Y s i m , A n ) E j = 1 m K 2 d ( x j , X s i j , A n ) h n ; G : = G φ , i ( · , · ) φ F m , i = ( i 1 , , i m ) ; G ( k ) : = π k , m G φ , i ( · , · ) , φ F m , ; U n ( φ ) = U n ( m ) ( G φ , i ) : = ( n m ) ! n ! i I n m j = 1 m ξ i j G φ , i ( X i , Y i ) ;
and the U-empirical process is defined to be
μ n ( φ ) : = n h m ϕ ( h ) U n ( φ ) E ( U n ( φ ) ) .
Then
r ˜ n ( m ) ( φ , x , u ; h n ) = U n ( φ ) U n ( 1 ) .
In order to establish the weak convergence of our estimator, it must be established first for μ n ( φ ) . We have mentioned before that we deal with unbounded classes of functions; that is why we should truncate the function G φ , i ( x , y ) , indeed, for λ n = n 1 / p , with p > 0 , we have:
G φ , i ( x , y ) = G φ , i ( x , y ) 1 F ( y ) λ n + G φ , i ( x , y ) 1 F ( y ) > λ n : = G φ , i ( T ) ( x , y ) + G φ , i ( R ) ( x , y ) .
We can write the U-statistic as follows:
μ n ( φ ) = n h m ϕ ( h ) U n ( m ) G φ , i ( T ) E U n ( m ) G φ , i ( T ) + n h m ϕ ( h ) U n ( m ) G φ , i ( R ) E U n ( m ) G φ , i ( R ) : = n h m ϕ ( h ) U n ( T ) ( φ , i ) E U n ( T ) ( φ ) + n h m ϕ ( h ) U n ( R ) ( φ ) E U n ( R ) ( φ ) : = μ n ( T ) ( φ ) + μ n ( R ) ( φ ) .
The first term is the truncated part and the second is the remaining one. We have to prove that:
  • μ n ( T ) ( φ ) converges to a Gaussian process.
  • The remainder part does not matter much, in the sense that
    n h m ϕ ( h ) U n ( R ) ( φ ) E U n ( R ) ( φ ) F m K m P 0 .
For the first point, we will use the decomposition of Hoeffding, which would be the same as the previous decomposition in Section 3.1 except that we replace W i , n by φ ( Y i , n )
U n ( T ) ( φ ) E U n ( T ) ( φ ) : = U 1 , n ( φ ) + U 2 , n ( φ ) ,
where
U 1 , n ( φ ) : = 1 n i = 1 n H ^ 1 , i ( u , x , φ ) ,
U 2 , n ( φ ) : = ( n m ) ! ( n ) ! i I n m ξ i 1 ξ i m H 2 , i ( z ) .
The convergence of U 2 , n ( φ ) to zero in probability follows from Lemma 1. Hence, it is enough to show that U 1 , n ( φ ) converges weakly to a Gaussian process called G ( φ ) . In order to achieve our goal, we will go through finite-dimensional convergence and equicontinuity.
The finite-dimensional convergence simply asserts that every finite set of functions f 1 , , f q in L 2 , for U ˜ the centered form of U :
n h m ϕ ( h ) U ˜ 1 , n ( f 1 ) , , n h m ϕ ( h ) U ˜ 1 , n ( f q )
converges to the corresponding finite-dimensional distributions of the process G ( φ ) . It is sufficient to show that for every fixed collection ( a 1 , , a q ) R q , we have
j = 1 q a j U ˜ 1 , n ( f j ) N 0 , v 2 ,
where
v 2 = j = 1 q a j 2 Var U ˜ 1 , n ( f j ) + s r a s a r Cov U ˜ 1 , n ( f s ) , U ˜ 1 , n ( f r ) .
Take
h ( · ) = j = 1 q a j f j ( · ) .
By linearity of h ( · ) , we have to see that
U ˜ 1 , n ( h , i ) G ( h ) .
Let
N = E j = 1 m K 2 d ( x j , X s i j , A n ) h n .
We have:
U ˜ 1 , n ( h n ) = N 1 × 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 1 ϕ ( h ) K 2 d ( x i , X s i , A n ) h n × h ( y 1 , , y 1 , Y i , y , , y m 1 ) j = 1 j i m 1 1 ϕ ( h ) K 2 d ( x j , X s i j , A n ) h n P ( d ( ν 1 , y 1 ) , , d ( ν 1 , y 1 ) , d ( ν , y ) , , d ( ν m 1 , y m 1 ) ) , : = N 1 1 n i = 1 n ξ i 1 ϕ ( h ) K 2 d ( x i , X s i , A n ) h n h ˜ ( Y i ) .
The next step requires an extension of the Blocking techniques of Bernstein to the spacial process where all notions are defined in Section 9.1.
Recall that L n = L 1 , n L 2 , n and define:
Z s , A n ( u , x ) : = ξ i 1 ϕ ( h ) K 2 d ( x i , X s i , A n ) h n h ˜ ( Y i ) ,
and
Z n ( ; ϵ ) = i : s i Γ n ( ; ϵ ) R n Z s , A n ( u , x ) = Z n ( 1 ) ( ; ϵ ) , , Z n ( p ) ( ; ϵ ) .
Then, we have
U ˜ 1 , n ( h n ) = i = 1 n Z s , A n ( u , x ) = L n Z n ; ϵ 0 + ϵ ϵ 0 L 1 , n Z n ( ; ϵ ) = : Z 2 , n ( ϵ ) + ϵ ϵ 0 L 2 , n Z n ( ; ϵ ) = : Z 3 , n ( ϵ ) = : Z 1 , n + ϵ ϵ 0 Z 2 , n ( ϵ ) + ϵ ϵ 0 Z 3 , n ( ϵ ) .
Lemma A8 proves that Z 2 , n and Z 3 , n , for ϵ ϵ 0 , are asymptotically negligible. Treating now the variance of Z 1 , n is clear; first, mixing conditions are used to replace large blocks with independent random variables, and then Lyapunov’s condition for the central limit theorem is applied to the sum of independent random variables. Similary to the proof of Proposition 1 using Lemma A4, as in Equation (59), observe that
sup t > 0 P · S Z 1 , n > t P · S Z ˘ 1 , n > t C A n A 1 , n d β A 2 , n ; A n d ,
where Z ˘ n ( ; ϵ ) : L n denotes a sequence of independent random vectors in R p under P · S such that
Z ˘ n ( ; ϵ ) = d Z n ( ; ϵ ) , under P S , L n .
Applying Lyapunov’s condition for the central limit theorem for the sum of independent random variables, the remaining condition of finite-dimensional convergence must be established.
We end up with the asymptotic equicontinuity. We have to prove that:
lim δ 0 lim n P n h m ϕ ( h ) U ˜ 1 , n ( h n , i ) FK ( δ , . p ) > ϵ = 0 ,
where
FK ( δ , . p ) : = U ˜ 1 , n ( h n ) U ˜ 1 , n ( h n ) : U ˜ 1 , n ( h n ) U ˜ 1 , n ( h n ) < δ , U ˜ 1 , n ( h n ) , U ˜ 1 , n ( h n ) FK ,
for
U ˜ 1 , n ( h n ) = N 1 1 n i = 1 n ξ i 1 ϕ ( h ) K 2 , 1 d ( x i , X s i , A n ) h n h ˜ 1 ( Y i ) E U 1 , n ( h n ) U ˜ 1 , n ( h n ) = N 1 1 n i = 1 n ξ i 1 ϕ ( h ) K 2 , 2 d ( x i , X s i , A n ) h n h ˜ 2 ( Y i ) E U 1 , n ( h n )
At this point, we will adapt the chaining technique found in [82] and use it for the conditional setting with the locally stationary process in [152] but for random fields, as in Lemma 1.
Using the same strategy also as in Lemma 1 to pass from the sequence of locally stationary random variables to the stationary one and find that, for ζ i = ( η i , ς i ) , the independent blocks sequences:
P ( n ϕ h ) 1 / 2 h m / 2 N 1 i = 1 n ξ i K 2 d ( x i , X i ) h h ˜ ( Y i ) E U 1 , n ( h n ) FK ( b , · p ) > ϵ 2 P ( n ϕ h ) 1 / 2 h m / 2 N 1 L n i : s i Γ n ( ; ϵ 0 ) R n ξ i K 2 d ( x i , η i ) h h ˜ ( ς i ) E U 1 , n ( h n ) 1 n ϕ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n FK ( b , · p ) > ϵ + C A n A 1 , n d β A 2 , n ; A n d + o P ( 1 ) .
Taking advantage of the condition (E2) in Assumption 6, we obtain β A 2 , n ; A n d 0 as n 0 ; then, it is simply a matter of placing the first phrase in the right-hand sight of (109). Due to the fact that the blocks are independent, we symmetricize using a sequence { ϵ j } j N of i.i.d. Rademacher variables, i.e., r.v’s with
P ( ϵ j = 1 ) = P ( ϵ j = 1 ) = 1 / 2 .
It is important to notice that the sequence { ϵ j } j N is independent of the sequence ξ i = ( ς i , ζ i ) i N ; therefore, it remains to establish, for all ϵ > 0 and δ 0 ,
lim δ 0 lim n P ( n ϕ h ) 1 / 2 h m / 2 N 1 L n i : s i Γ n ( ; ϵ 0 ) R n ξ i K 2 d ( x i , η i ) h h ˜ ( ς i ) E U 1 , n ( h n , i ) 1 n ϕ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n FK ( b , · p ) > ϵ < δ .
Define the semi-norm:
d ˜ n ϕ , 2 : = ( n ϕ h ) 1 / 2 h m / 2 N 1 L n i : s i Γ n ( ; ϵ 0 ) R n ξ i K 2 , 1 d ( x i , η i ) h h ˜ 1 ( ς i ) E U 1 , n ( h n , i ) ξ i K 2 , 2 d ( x i , η i ) h h ˜ 2 ( ς i ) E U 1 , n ( h n , i ) 2 1 / 2
and the covering number defined for any class of functions E by:
N ˜ n ϕ , 2 ( u , E ) : = N n ϕ , 2 ( u , E , d ˜ n ϕ , 2 ) .
Because of the latter, we are able to bound (109) (more details are in [83]). In the same way, as in [83] and before in [82], as a result of the independence between the blocks and Assumption 7 (C3), and by applying ([151], Lemma 5.2), the equicontinuity is achieved, and then the weak convergence is achieved, too.
Now, we need to show that:
P μ n ( R ) ( φ , t ) F m K m > λ 0 a s n .
For clarity purposes, we restrict ourselves to m = 2 . Using the same notation as in Lemma 1, we have the following decomposition:
μ n ( R ) ( φ , i ) = n h m + d ϕ ( h n ) U n ( R ) ( φ , i ) E U n ( R ) ( φ , i ) = n h m + d ϕ ( h n ) n ( n 1 ) i 1 i 2 n ξ i 1 ξ i 2 G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) E G φ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) 1 n h m + d ϕ ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 1 n h m + d ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 2 1 n h m + d ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 2 1 n h m + d ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , t ( R ) X i , X j ) , ( Y i , Y j E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 1 n h m + d ϕ ( h n ) 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 1 : s i 1 Γ n ( 1 ; ϵ ) R n i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , t ( R ) X i , X j ) , ( Y i , Y j E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 1 n h m + d ϕ ( h n ) 1 L 1 , n L 2 , n ϵ ϵ 0 i 1 < i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , t ( R ) X i , X j ) , ( Y i , Y j E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) = : I + II + III + IV + V + VI .
We shall employ blocking arguments and evaluate the terms that result. We begin by examining the first I . We obtain
P 1 n ϕ ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) X i 1 , X i 2 ) , ( Y i 1 , Y i 2 E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) F 2 K 2 > δ P 1 n ϕ ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) F 2 K 2 > δ + C A n A 1 , n d β A 2 , n ; A n d .
Recall that for all φ F m , and:
x H 2 , y Y 2 : 1 d x , X i , n h F ( y ) φ ( y ) K 2 d ( x i , X s i , A n ) h n .
Hence, by the symmetry of F ( · ) :
1 n ϕ ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , t ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) F 2 K 2 1 n ϕ ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 F > λ n E F ( ζ i , ζ j ) 1 F > λ n 1 n ϕ ( h n ) p q υ n i H p ( U ) j H q ( U ) .
We are going to use Chebyshev’s inequality, Hoeffding’s trick and inequality, respectively, to obtain:
P 1 n ϕ ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 F > λ n E F ( ζ i , ζ j ) 1 F > λ n 1 n ϕ ( h n ) p q υ n i H p ( U ) j H q ( U ) > δ δ 2 n 1 ϕ 1 ( h ) V a r 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 F > λ n c 2 L n δ 2 n 1 ϕ 1 ( h ) V a r i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 F > λ n 2 c 2 L n δ 2 n 2 ϕ 1 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 E F ( ζ 1 , ζ 2 ) 2 1 F > λ n .
Under Assumption 7 (iii), we have for each λ > 0 :
c 2 L n δ 2 n 2 ϕ 1 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 E F ( ζ 1 , ζ 2 ) 2 1 F > λ n = c 2 L n δ 2 n 2 ϕ 1 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 × 0 P F ( ζ 1 , ζ 2 ) 2 1 F > λ n t d t = c 2 L n δ 2 n 2 ϕ 1 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 0 λ n P F > λ n d t + c 2 L n δ 2 n 2 ϕ 1 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ ( h n ) ξ i 1 ξ i 2 λ n P F 2 > t d t ,
converging to 0 as n . Terms II , V and VI will be handled the same way as the last term was. The terms II , VI do not follow the same line because the variables { ζ i , ζ j } ϵ = ϵ 0 (or { ζ i , ζ j } ϵ ϵ 0 for VI ) belong to the same blocks. Term IV can be deduced from the study of Terms I and III . Considering the term III , we have
P 1 n ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) 1 n ϕ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ P 1 n ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) 1 n ϕ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ + L n A 1 , n d A 2 , n d β A 2 , n ; A n d n ϕ ( h n ) .
We have also
P 1 n ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) 1 n ϕ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ P 1 n ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 G φ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) 1 n ϕ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ .
Since (111) is still true, the problem can be reduced to
P 1 n ϕ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 F > λ n E F ( ζ i , ζ j ) 1 F > λ n 1 n ϕ ( h n ) p q υ n i H p ( U ) j H q ( U ) > δ δ 2 n 1 ϕ ( h n ) V a r 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ ( h n ) ξ i 1 ξ i 2 × F ( ζ i , ζ j ) 1 F > λ n 1 n ϕ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n ,
the identical technique is followed as in (112). The remainder has just been demonstrated to be asymptotically negligible. Finally, with r ^ ( m ) ( φ , x , u ) E U n ( φ , i ) , and for U n ( 1 , i ) P 1 , the weak convergence of our estimator is accomplished. □

Author Contributions

I.S. and S.B.: conceptualization, methodology, investigation, writing—original draft, writing—review and editing. All authors contributed equally to the writing of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the Special Issue Editor of the Special Issue on “Current Developments in Theoretical and Applied Statistics”, Christophe Chesneau for the invitation. We extend our sincere thanks to the Editor-in-Chief, Associate Editor and several referees for their constructive comments that lead to numerous improvements over a previous version.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This appendix contains supplementary information that is an essential part of providing a more comprehensive understanding of the paper.
Assumption A1.
(KD1) 
(KB2) in Assumption 2 holds.
(KD2) 
For any α Z d where | α | = 1 , 2 , α f S ( s ) exists and is continuous on ( 0 , 1 ) d .
Define
f ^ S ( u ) = 1 n h d j = 1 n K ¯ h u S 0 , j .
Lemma A1
([153], Theorem 2). Under Assumption 8 and h 0 such that n h d / ( log n ) ∞ as n , we have that
sup u [ 0 , 1 ] d f ^ S ( u ) f S ( u ) = O log n n h d + h 2 P S a . s .
Lemma A2.
Let I h = C 1 h , 1 C 1 h . Suppose that kernel K 1 satisfies Assumption 8 part(i). Then for q = 0 , 1 , 2 and m > 1 :
sup u I h 1 n m h m d i I n m j = 1 m K ¯ u j S 0 , i j h n u j S 0 , i j h q R m d 1 h m d j = 1 m K ¯ u j ω j h n u j ω j h q f S ( ω j ) j = 1 m d ω j = O log n n h d m P S a . s .
Lemma A3.
Suppose that kernel K ¯ satisfies Assumption 8. Let g : [ 0 , 1 ] m d × H m R , ( u , x ) g ( u , x ) be a function continuously partially differentiable with respect to u j . For k = 1 , 2 , we have
sup u I h 1 n m h m d i I n m j = 1 m K ¯ u j S 0 , j h n k g S 0 , j , x j j = 1 m κ k f S ( u j ) g ( u j , x j ) = O log n n h m d + o ( h ) , P S a . s .
where
κ k = R d K ¯ k ( x ) d x .
For any probability measure Q on a product measure space Ω 1 × Ω 2 , Σ 1 × Σ 2 , we may define the β -mixing coefficients as follows:
Definition A1
([149], Definition 2.5). Let Q 1 and Q 2 be the marginal probability measures of Q on Ω 1 , Σ 1 and Ω 2 , Σ 2 , respectively. We set
β Σ 1 , Σ 2 , Q = P sup Q B Σ 1 Q 2 ( B ) : B Σ 2 .
The following lemma holds true for every finite n and is essential for the generation of independent blocks for β -mixing sequences.
Lemma A4
([149], Corollary 2.7). Let m N and let Q denote a probability measure on a product space i = 1 m Ω i , i = 1 m Σ i with the associated marginal measures Q i on Ω i , Σ i . Assume that h is a bounded measurable function on the product probability space in such a way that | h | M h < . For 1 a b m , let Q a b be the marginal measure on i = a b Ω i , i = a b Σ i . For a given τ > 0 , suppose that, for all 1 k m 1 ,
Q Q 1 k × Q k + 1 m T V 2 τ
where Q 1 k × Q k + 1 m is the product measure and · TV is the total variation. Then
| Q h P h | 2 M h ( m 1 ) τ ,
where P = i = 1 m Q i , Q h = h d Q and P h = h d P .
Lemma A5.
Let
I n = i Z d : i + ( 0 , 1 ] d R n .
Then, we have
P S j = 1 n 1 A n S 0 , j i + ( 0 , 1 ] d R n > 2 log n + n A n d for some i I n , i . o . = 0
and
P S j = 1 n 1 A n S 0 , j Γ n ( ; ϵ ) > C A 1 , n q ( ϵ ) A 2 , n d q ( ϵ ) n A n d for some L 1 , n , i . o . = 0
for any ϵ { 1 , 2 } d , where “i.o.” stands for infinitly often.
Proof. 
See the proof in ([93], Lemma A.1) for each statement. □
Remark A1.
Lemma A5 implies that each Γ n ( ; ϵ ) contains at most C A 1 , n q ( ϵ ) A 2 , n d q ( ϵ ) n A n d samples P S -almost surely.
Lemma A6.
Under Assumptions 2 and 3, Condition (B1) in 4–6 and 8, we have:
E . S S ¯ n ; ϵ 2 C A 1 , n d 1 A 2 , n ( n A n d + log n ) h m d ϕ ( h ) .

Appendix A.1. Proof of Lemma A6

We have
E . S S ¯ n ; ϵ 2 = i : s i Γ n ( ; ϵ ) R n E . S S ¯ s , A n 2 ( u , x ) + i j : s i , s j Γ n ( ; ϵ ) R n E . S S ¯ s i , A n ( u , x ) S ¯ s j , A n ( u , x )
where
i : s i Γ n ( ; ϵ ) R n E . S S ¯ s , A n 2 ( u , x ) ( n 1 ) 2 m I n 1 m 1 ( i ) = 1 m ξ i 1 2 ξ i 1 2 ξ i 2 ξ i 2 ξ i m 1 2 W s ( 1 , , 1 , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d ( x j , ν s j , A n ) h P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) 2 E · S 1 ϕ 2 ( h ) K 2 2 d ( x i , X s i , A n ) h W s i , A n 2 + E · S 1 ϕ 2 ( h ) K 2 d ( x i , X s i , A n ) h W s i , A n 2 C ϕ 2 ( h ) ( n 1 ) 2 m I n 1 m 1 ( i ) = 1 m ξ i 1 2 ξ i 1 2 ξ i 2 ξ i 2 ξ i m 1 2 P S a . s .
Likewise, we can see that
E . S S ¯ s i , A n ( u , x ) S ¯ s j , A n ( u , x ) C ϕ 2 ( h ) ( n 1 ) 2 m I n 1 m 1 ( i ) = 1 m ξ i 1 2 ξ i 1 2 ξ i 2 ξ i 2 ξ i m 1 2 P S a . s .
Applying Lemma A5 and Lemma A1 to find that
i : s i Γ n ( ; ϵ ) R n K ¯ h 2 u s j A n × ( n 1 ) 2 m I n 1 m 1 ( i ) = 1 m ξ i 1 2 ξ i 1 2 ξ i 2 ξ i m 1 2 C i : s i Γ n ( ; ϵ ) R n K ¯ h u s i A n × ( n 1 ) 2 m I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i m 1 C h m d i : s i Γ n ( ; ϵ ) R n C h m d A 1 , n d 1 A 2 , n n A d + log n , P S a . s . ,
and
i j : s i , s j Γ n ( , ϵ ) R n K ¯ h u s i A n K ¯ h u s j A n × ( n 1 ) m ( I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i m 1 I n 1 m 1 ( j ) = 1 m ξ j 1 ξ j 1 ξ j ξ j m 1 ) j : s j Γ n ( ; ϵ ) R n K ¯ h u s j A n ( n 1 ) m I n 1 m 1 ( j ) = 1 m ξ j 1 ξ j 1 ξ j ξ j m 1 2 C h 2 m d j : s j Γ n ( ; ϵ ) R n 2 C h 2 m d A 1 , n 2 ( d 1 ) A 2 , n 2 n A d + log n 2 , P S a . s .
Since
A 1 , n d 1 A 2 , n n A n d + log n h m d ϕ ( h ) A 1 , n d n A n d + log n h m d ϕ ( h ) = o ( 1 ) ,
we have
E · S S ¯ n ; ϵ 2 C A 1 , n d 1 A 2 , n n A n d + log n h m d ϕ ( h ) + A 1 , n 2 ( d 1 ) A 2 , n 2 n 2 A n 2 d + log 2 n h 2 ( d ) ϕ 2 ( h ) C A 1 , n d 1 A 2 , n n A n d + log n h m d ϕ ( h ) , P S a . s .
Lemma A7 (Bernstein’s inequality).
Let X 1 , , X n be zero-mean independent random variables. Assume that
max 1 i n X i M < , a . s .
For all t > 0 , we have
P i = 1 n X i t exp t 2 2 i = 1 n E X i 2 + M t 3
Lemma A8.
Under Assumptions 2–4 and 6, we have
1 n h m d ϕ m ( h ) Var · S L 1 , n Z n ( ; ϵ ) = o ( 1 ) , P S a . s .
1 n h m d ϕ m ( h ) Var · S L 2 , n Z n ( ; ϵ ) = o ( 1 ) , P S a . s .

Appendix A.2. Proof of Lemma A8

We have
1 n h m d ϕ ( h ) Var · S L 1 , n Z n ( ; ϵ ) = 1 n h m d ϕ ( h ) L 1 , n E · S Z n ( ; ϵ ) 2 + 1 n h m d ϕ ( h ) 1 2 L 1 , n E · S Z n ( 1 ; ϵ ) Z n ( 2 ; ϵ ) : = I 1 + I 2 .
Using Lemma A6 and Assumption 4, it is easy to see that
I 1 C 1 n h m d ϕ ( h ) A n A 1 , n d A 1 , n d 1 A 2 , n ( n A n d + log n ) h m d ϕ ( h ) C A 2 , n A 1 , n ( log n ) = o ( 1 ) .
For I 2 , using [Theorem 1.1] from [154], we have:
E · S Z n ( 1 ; ϵ ) Z n ( 2 ; ϵ ) E · S | Z n ( 1 ; ϵ ) | 3 1 / 3 E · S | Z n ( 2 ; ϵ ) | 3 1 / 3 β 1 / 3 ( d ( 1 , 2 ) A 2 , n , A 1 , n m d ) E · S | Z n ( 1 ; ϵ ) | 3 1 / 3 E · S | Z n ( 2 ; ϵ ) | 3 1 / 3 β 1 1 / 3 ( d ( 1 , 2 ) A 2 , n ) g 1 1 / 3 ( A 1 , n m d ) .
The first inequality holds using Equation (8), and for d ( 1 , 2 ) = max 1 j d | 1 j 2 j | . Using the same strategy as Lemma A6, we have
E · S | Z n ( 1 ; ϵ ) | 3 C A 1 , n d 1 A 2 , n ( n A n d + log n ) h m d ,
and
E · S | Z n ( 2 ; ϵ ) | 3 C A 1 , n d 1 A 2 , n ( n A n d + log n ) h m d .
Note that for 1 , 2 L 1 n , Γ 1 ; ϵ 0 and Γ 2 ; ϵ 0 in R n are separated by the 1 distance
d Γ 1 ; ϵ 0 , Γ 2 ; ϵ 0 1 2 d + A 3 n + A 2 n .
I 2 C A 1 , n d 1 A 2 , n n A n d + log n h p + d 2 / 3 n h d + p × 1 , 2 L 1 , n , 1 2 β 1 / 3 1 / 3 1 2 d + A 3 , n + A 2 , n g 1 1 / 3 A 1 , n d C 1 n h d + p 1 / 3 A 1 , n A n 2 d / 3 A 2 , n A 1 , n 2 / 3 + A 1 , n ( d 1 ) / 3 A 2 , n 1 / 3 ( log n ) 1 / 3 n h ( d + p ) / 3 × g 1 1 / 3 A 1 , n d β 1 1 / 3 A 2 , n + k = 1 A n / A 1 , n k d 1 β 1 1 / 3 k A 3 , n + A 2 , n = o ( 1 )
The last inequality follows using Assumption 4 and for
| 1 2 | = j = 1 d | 1 , j 2 , j | .
Equation (A6) could be treated similarly to (A5).
Remark A2.
In order to prove that the summation over the small block is asymptotically negligible, we can use the method of [87] where they used to pass from the dependence structure of variables to the independence as a first step; then, they proved the convergence of second-order expectation to zero using a maximal inequality. This method avoids the treatment of covariance, and it is based on the use of maximal inequality.
Proposition A1
([27], Proposition 3.6). Let { X i : i n } be a process satisfying, for m 1 :
E X i X j p 1 / p p 1 q 1 m / 2 E X i X j q 1 / q , 1 < q < p < ,
and the semi-metric:
ρ ( j , i ) = E X i X j 2 1 / 2 .
There exists a constant K = K ( m ) such that:
E sup i , j n X i X j K 0 D [ log N ( ϵ , n , ρ ) ] m / 2 d ϵ ,
where D is the ρ-diameter of n.
Lemma A9
([155]). Let X 1 , , X n be a sequence of independent random elements taking values in a Banach space ( B , . ) with E X i = 0 for all i . Let ε i be a sequence of independent Bernoulli r.v values independent of X i . Then, for any convex increasing function Φ,
E Φ 1 2 i = 1 n X i ε i E Φ i = 1 n X i E Φ 2 i = 1 n X i ε i .

References

  1. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Monographs on Statistics and Applied Probability; Chapman & Hall: London, UK, 1986; pp. x+175. [Google Scholar] [CrossRef]
  2. Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Mathematics and Its Applications (Soviet Series); Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1989; Volume 20, pp. x+213. [Google Scholar] [CrossRef]
  3. Härdle, W. Applied Nonparametric Regression; Econometric Society Monographs; Cambridge University Press: Cambridge, UK, 1990; Volume 19, pp. xvi+333. [Google Scholar] [CrossRef]
  4. Wand, M.P.; Jones, M.C. Kernel Smoothing; Monographs on Statistics and Applied Probability; Chapman and Hall, Ltd.: London, UK, 1995; Volume 60, pp. xii+212. [Google Scholar] [CrossRef]
  5. Eggermont, P.P.B.; LaRiccia, V.N. Maximum Penalized Likelihood Estimation. Density Estimation; Springer Series in Statistics; Springer: New York, NY, USA, 2001; Volume I, pp. xviii+510. [Google Scholar]
  6. Devroye, L.; Lugosi, G. Combinatorial Methods in Density Estimation; Springer Series in Statistics; Springer: New York, NY, USA, 2001; pp. xii+208. [Google Scholar] [CrossRef]
  7. Ripley, B.D. Spatial statistics: Developments 1980–1983. Internat. Statist. Rev. 1984, 52, 141–150. [Google Scholar] [CrossRef]
  8. Rosenblatt, M. Stationary Sequences and Random Fields; Birkhäuser Boston, Inc.: Boston, MA, USA, 1985; p. 258. [Google Scholar] [CrossRef]
  9. Guyon, X. Random Fields on a Network; Probability and Its Applications (New York); Springer: New York, NY, USA, 1995; pp. xii+255. [Google Scholar]
  10. Cressie, N.A.C. Statistics for Spatial Data, revised ed.; Wiley Classics Library, John Wiley & Sons, Inc.: New York, NY, USA, 2015; pp. xx+900. [Google Scholar]
  11. Tran, L.T. Kernel density estimation on random fields. J. Multivar. Anal. 1990, 34, 37–53. [Google Scholar] [CrossRef] [Green Version]
  12. Tran, L.T.; Yakowitz, S. Nearest neighbor estimators for random fields. J. Multivar. Anal. 1993, 44, 23–46. [Google Scholar] [CrossRef] [Green Version]
  13. Biau, G.; Cadre, B. Nonparametric spatial prediction. Stat. Inference Stoch. Process. 2004, 7, 327–349. [Google Scholar] [CrossRef]
  14. Dabo-Niang, S.; Yao, A.F. Kernel spatial density estimation in infinite dimension space. Metrika 2013, 76, 19–52. [Google Scholar] [CrossRef] [Green Version]
  15. Ndiaye, M.; Dabo-Niang, S.; Ngom, P. Nonparametric prediction for spatial dependent functional data under fixed sampling design. Rev. Colomb. Estadíst. 2022, 45, 391–428. [Google Scholar] [CrossRef]
  16. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Statistics 1948, 19, 293–325. [Google Scholar] [CrossRef]
  17. Stute, W. Almost sure representations of the product-limit estimator for truncated data. Ann. Statist. 1993, 21, 146–156. [Google Scholar] [CrossRef]
  18. Arcones, M.A.; Wang, Y. Some new tests for normality based on U-processes. Statist. Probab. Lett. 2006, 76, 69–82. [Google Scholar] [CrossRef]
  19. Giné, E.; Mason, D.M. Laws of the iterated logarithm for the local U-statistic process. J. Theoret. Probab. 2007, 20, 457–485. [Google Scholar] [CrossRef]
  20. Giné, E.; Mason, D.M. On local U-statistic processes and the estimation of densities of functions of several sample variables. Ann. Statist. 2007, 35, 1105–1145. [Google Scholar] [CrossRef] [Green Version]
  21. Schick, A.; Wang, Y.; Wefelmeyer, W. Tests for normality based on density estimators of convolutions. Statist. Probab. Lett. 2011, 81, 337–343. [Google Scholar] [CrossRef]
  22. Joly, E.; Lugosi, G. Robust estimation of U-statistics. Stoch. Process. Appl. 2016, 126, 3760–3773. [Google Scholar] [CrossRef]
  23. Lee, S.; Linton, O.; Whang, Y.J. Testing for stochastic monotonicity. Econometrica 2009, 77, 585–602. [Google Scholar] [CrossRef]
  24. Ghosal, S.; Sen, A.; van der Vaart, A.W. Testing monotonicity of regression. Ann. Statist. 2000, 28, 1054–1082. [Google Scholar] [CrossRef]
  25. Abrevaya, J.; Jiang, W. A nonparametric approach to measuring and testing curvature. J. Bus. Econom. Statist. 2005, 23, 1–19. [Google Scholar] [CrossRef]
  26. Nolan, D.; Pollard, D. U-processes: Rates of convergence. Ann. Statist. 1987, 15, 780–799. [Google Scholar] [CrossRef]
  27. Arcones, M.A.; Giné, E. Limit theorems for U-processes. Ann. Probab. 1993, 21, 1494–1542. [Google Scholar] [CrossRef]
  28. Sherman, R.P. Maximal inequalities for degenerate U-processes with applications to optimization estimators. Ann. Statist. 1994, 22, 439–459. [Google Scholar] [CrossRef]
  29. de la Peña, V.H.; Giné, E. Decoupling. From Dependence to Independence, Randomly Stopped Processes. U-Statistics and Processes. Martingales and Beyond; Probability and Its Applications (New York); Springer: New York, NY, USA, 1999; pp. xvi+392. [Google Scholar] [CrossRef]
  30. Halmos, P.R. The theory of unbiased estimation. Ann. Math. Stat. 1946, 17, 34–43. [Google Scholar] [CrossRef]
  31. von Mises, R. On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat. 1947, 18, 309–348. [Google Scholar] [CrossRef]
  32. Yoshihara, K.i. Limiting behavior of U-statistics for stationary, absolutely regular processes. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 1976, 35, 237–252. [Google Scholar] [CrossRef]
  33. Borovkova, S.; Burton, R.; Dehling, H. Limit theorems for functionals of mixing processes with applications to U-statistics and dimension estimation. Trans. Amer. Math. Soc. 2001, 353, 4261–4318. [Google Scholar] [CrossRef]
  34. Denker, M.; Keller, G. On U-statistics and v. Mises’ statistics for weakly dependent processes. Z. Wahrsch. Verw. Gebiete 1983, 64, 505–522. [Google Scholar] [CrossRef]
  35. Leucht, A. Degenerate U- and V-statistics under weak dependence: Asymptotic theory and bootstrap consistency. Bernoulli 2012, 18, 552–585. [Google Scholar] [CrossRef]
  36. Leucht, A.; Neumann, M.H. Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 2013, 65, 349–386. [Google Scholar] [CrossRef] [Green Version]
  37. Bouzebda, S.; Nemouchi, B. Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. Appear 2022, 1–56. [Google Scholar] [CrossRef]
  38. Bouzebda, S.; Nezzal, A.; Zari, T. Uniform consistency for functional conditional U-statistics using delta-sequences. Mathematics 2022, 24, 3745. [Google Scholar]
  39. Soukarieh, I.; Bouzebda, S. Exchangeably Weighted Bootstraps of General Markov U-Process. Mathematics 2022, 10, 3745. [Google Scholar] [CrossRef]
  40. Bouzebda, S.; Soukarieh, I. Renewal type bootstrap for increasing degree U-process of a Markov chain. J. Multivar. Anal. 2022, 195, 105143. [Google Scholar]
  41. Bouzebda, S.; Soukarieh, I. Renewal type bootstrap for U-process Markov chains. Markov Process. Related Fields 2022, 13, 1–50. [Google Scholar]
  42. Frees, E.W. Infinite order U-statistics. Scand. J. Statist. 1989, 16, 29–45. [Google Scholar]
  43. Rempala, G.; Gupta, A. Weak limits of U-statistics of infinite order. Random Oper. Stochastic Equ. 1999, 7, 39–52. [Google Scholar] [CrossRef]
  44. Heilig, C.; Nolan, D. Limit theorems for the infinite-degree U-process. Statist. Sinica 2001, 11, 289–302. [Google Scholar]
  45. Song, Y.; Chen, X.; Kato, K. Approximating high-dimensional infinite-order U-statistics: Statistical and computational guarantees. Electron. J. Stat. 2019, 13, 4794–4848. [Google Scholar] [CrossRef]
  46. Peng, W.; Coleman, T.; Mentch, L. Rates of convergence for random forests via generalized U-statistics. Electron. J. Stat. 2022, 16, 232–292. [Google Scholar] [CrossRef]
  47. Faivishevsky, L.; Goldberger, J. ICA based on a Smooth Estimation of the Differential Entropy. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–10 December 2008; Koller, D., Schuurmans, D., Bengio, Y., Bottou, L., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2008; Volume 21. [Google Scholar]
  48. Liu, Q.; Lee, J.; Jordan, M. A Kernelized Stein Discrepancy for Goodness-of-fit Tests. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Balcan, M.F., Weinberger, K.Q., Eds.; PMLR: New York, New York, USA, 2016; Volume 48, pp. 276–284. [Google Scholar]
  49. Clémençcon, S. On U-processes and clustering performance. In Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2011; Volume 24. [Google Scholar]
  50. Borovskikh, Y.V. U-Statistics in Banach Spaces; VSP: Utrecht, The Netherlands, 1996; pp. xii+420. [Google Scholar]
  51. Koroljuk, V.S.; Borovskich, Y.V. Theory of U-Statistics; Mathematics and Its Applications; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1994; Volume 273, pp. x+552. [Google Scholar]
  52. Lee, A.J. U-Statistics. Theory and Practice; Statistics: Textbooks and Monographs; Marcel Dekker Inc.: New York, NY, USA, 1990; Volume 110, pp. xii+302. [Google Scholar]
  53. Aneiros, G.; Cao, R.; Fraiman, R.; Genest, C.; Vieu, P. Recent advances in functional data analysis and high-dimensional statistics. J. Multivar. Anal. 2019, 170, 3–9. [Google Scholar] [CrossRef]
  54. Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis. Methods and Case Studies; Springer Series in Statistics; Springer: New York, NY, USA, 2002; pp. x+190. [Google Scholar] [CrossRef]
  55. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis. Theory and Practice; Springer Series in Statistics; Springer: New York, NY, USA, 2006; pp. xx+258. [Google Scholar]
  56. Araujo, A.; Giné, E. The Central Limit Theorem for Real and Banach Valued Random Variables; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons: New York, NY, USA, 1980; pp. xiv+233. [Google Scholar]
  57. Gasser, T.; Hall, P.; Presnell, B. Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998, 60, 681–691. [Google Scholar] [CrossRef]
  58. Bosq, D. Linear Processes in Function Spaces. Theory and Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 2000; Volume 149, pp. xiv+283. [Google Scholar] [CrossRef]
  59. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics; Springer: New York, NY, USA, 2012; pp. xiv+422. [Google Scholar] [CrossRef]
  60. Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
  61. Ferraty, F.; Laksaci, A.; Tadj, A.; Vieu, P. Rate of uniform consistency for nonparametric estimates with functional variables. J. Statist. Plann. Inference 2010, 140, 335–352. [Google Scholar] [CrossRef]
  62. Bouzebda, S.; Chaouch, M. Uniform limit theorems for a class of conditional Z-estimators when covariates are functions. J. Multivar. Anal. 2022, 189, 104872. [Google Scholar] [CrossRef]
  63. Kara-Zaitri, L.; Laksaci, A.; Rachdi, M.; Vieu, P. Uniform in bandwidth consistency for various kernel estimators involving functional data. J. Nonparametr. Stat. 2017, 29, 85–107. [Google Scholar] [CrossRef]
  64. Attouch, M.; Laksaci, A.; Rafaa, F. On the local linear estimate for functional regression: Uniform in bandwidth consistency. Comm. Statist. Theory Methods 2019, 48, 1836–1853. [Google Scholar] [CrossRef]
  65. Ling, N.; Meng, S.; Vieu, P. Uniform consistency rate of kNN regression estimation for functional time series data. J. Nonparametr. Stat. 2019, 31, 451–468. [Google Scholar] [CrossRef]
  66. Bouzebda, S.; Chaouch, M.; Laïb, N. Limiting law results for a class of conditional mode estimates for functional stationary ergodic data. Math. Methods Statist. 2016, 25, 168–195. [Google Scholar] [CrossRef]
  67. Mohammedi, M.; Bouzebda, S.; Laksaci, A. The consistency and asymptotic normality of the kernel type expectile regression estimator for functional data. J. Multivar. Anal. 2021, 181, 104673. [Google Scholar] [CrossRef]
  68. Bouzebda, S.; Mohammedi, M.; Laksaci, A. The k-Nearest Neighbors method in single index regression model for functional quasi-associated time series data. Rev. Mat. Complut. 2022, 1–30. [Google Scholar] [CrossRef]
  69. Bouzebda, S.; Nezzal, A. Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci. 2022, 5, 431–533. [Google Scholar] [CrossRef]
  70. Didi, S.; Al Harby, A.; Bouzebda, S. Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time. Mathematics 2022, 10, 3433. [Google Scholar] [CrossRef]
  71. Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. Nonparametric estimation of expectile regression in functional dependent data. J. Nonparametr. Stat. 2022, 34, 250–281. [Google Scholar] [CrossRef]
  72. Almanjahie, I.M.; Bouzebda, S.; Chikr Elmezouar, Z.; Laksaci, A. The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. Stat. Risk Model. 2022, 38, 47–63. [Google Scholar] [CrossRef]
  73. Stute, W. Conditional U-statistics. Ann. Probab. 1991, 19, 812–825. [Google Scholar] [CrossRef]
  74. Sen, A. Uniform strong consistency rates for conditional U-statistics. Sankhyā Ser. A 1994, 56, 179–194. [Google Scholar]
  75. Prakasa Rao, B.L.S.; Sen, A. Limit distributions of conditional U-statistics. J. Theoret. Probab. 1995, 8, 261–301. [Google Scholar] [CrossRef]
  76. Harel, M.; Puri, M.L. Conditional U-statistics for dependent random variables. J. Multivar. Anal. 1996, 57, 84–100. [Google Scholar] [CrossRef] [Green Version]
  77. Stute, W. Symmetrized NN-conditional U-statistics. In Research Developments in Probability and Statistics; VSP: Utrecht, The Netherlands, 1996; pp. 231–237. [Google Scholar]
  78. Fu, K.A. An application of U-statistics to nonparametric functional data analysis. Comm. Statist. Theory Methods 2012, 41, 1532–1542. [Google Scholar] [CrossRef]
  79. Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 2020, 32, 452–509. [Google Scholar] [CrossRef]
  80. Bouzebda, S.; Elhattab, I.; Nemouchi, B. On the uniform-in-bandwidth consistency of the general conditional U-statistics based on the copula representation. J. Nonparametr. Stat. 2021, 33, 321–358. [Google Scholar] [CrossRef]
  81. Jadhav, S.; Ma, S. Kendall’s Tau for Functional Data Analysis. arXiv 2019, arXiv:1912.03725. [Google Scholar]
  82. Arcones, M.A.; Yu, B. Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theoret. Probab. 1994, 7, 47–71. [Google Scholar] [CrossRef]
  83. Bouzebda, S.; Nemouchi, B. Central limit theorems for conditional empirical and conditional U-processes of stationary mixing sequences. Math. Methods Statist. 2019, 28, 169–207. [Google Scholar] [CrossRef]
  84. Masry, E. Nonparametric regression estimation for dependent functional data: Asymptotic normality. Stoch. Process. Appl. 2005, 115, 155–177. [Google Scholar] [CrossRef] [Green Version]
  85. Kurisu, D. Nonparametric regression for locally stationary random fields under stochastic sampling design. Bernoulli 2022, 28, 1250–1275. [Google Scholar] [CrossRef]
  86. Kurisu, D. Nonparametric regression for locally stationary functional time series. Electron. J. Stat. 2022, 16, 3973–3995. [Google Scholar] [CrossRef]
  87. Kurisu, D.; Kato, K.; Shao, X. Gaussian approximation and spatially dependent wild bootstrap for high-dimensional spatial data. arXiv 2021, arXiv:2103.10720. [Google Scholar]
  88. Dahlhaus, R. Fitting time series models to nonstationary processes. Ann. Statist. 1997, 25, 1–37. [Google Scholar] [CrossRef]
  89. Dahlhaus, R.; Subba Rao, S. Statistical inference for time-varying ARCH processes. Ann. Statist. 2006, 34, 1075–1114. [Google Scholar] [CrossRef] [Green Version]
  90. van Delft, A.; Eichler, M. Locally stationary functional time series. Electron. J. Stat. 2018, 12, 107–170. [Google Scholar] [CrossRef]
  91. Hall, P.; Patil, P. Properties of nonparametric estimators of autocovariance for stationary random fields. Probab. Theory Related Fields 1994, 99, 399–424. [Google Scholar] [CrossRef]
  92. Matsuda, Y.; Yajima, Y. Fourier analysis of irregularly spaced data on ℝd. J. R. Stat. Soc. Ser. B Stat. Methodol. 2009, 71, 191–217. [Google Scholar] [CrossRef]
  93. Lahiri, S.N. Central limit theorems for weighted sums of a spatial process under a class of stochastic and fixed designs. Sankhyā 2003, 65, 356–388. [Google Scholar]
  94. Lahiri, S.N. Resampling Methods for Dependent Data; Springer Series in Statistics; Springer: New York, NY, USA, 2003; pp. xiv+374. [Google Scholar] [CrossRef]
  95. Volkonskiĭ, V.A.; Rozanov, Y.A. Some limit theorems for random functions. I. Theor. Probability Appl. 1959, 4, 178–197. [Google Scholar] [CrossRef]
  96. Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  97. Ibragimov, I.A.; Solev, V.N. A condition for the regularity of a Gaussian stationary process. Dokl. Akad. Nauk SSSR 1969, 185, 509–512. [Google Scholar]
  98. Bradley, R.C. A caution on mixing conditions for random fields. Statist. Probab. Lett. 1989, 8, 489–491. [Google Scholar] [CrossRef]
  99. Bradley, R.C. Some examples of mixing random fields. Rocky Mountain J. Math. 1993, 23, 495–519. [Google Scholar] [CrossRef]
  100. Doukhan, P. Mixing. Properties and examples; Lecture Notes in Statistics; Springer: New York, NY, USA, 1994; Volume 85, pp. xii+142. [Google Scholar] [CrossRef]
  101. Dedecker, J.; Doukhan, P.; Lang, G.; León, R.; Louhichi, S.; Prieur, C. Weak Dependence: With Examples and Applications; Lecture Notes in Statistics; Springer: New York, NY, USA, 2007; Volume 190, pp. xiv+318. [Google Scholar]
  102. Lahiri, S.N.; Zhu, J. Resampling methods for spatial regression models under a class of stochastic designs. Ann. Statist. 2006, 34, 1774–1813. [Google Scholar] [CrossRef] [Green Version]
  103. Bandyopadhyay, S.; Lahiri, S.N.; Nordman, D.J. A frequency domain empirical likelihood method for irregularly spaced spatial data. Ann. Statist. 2015, 43, 519–545. [Google Scholar] [CrossRef]
  104. Kolmogorov, A.N.; Tihomirov, V.M. ε-entropy and ε-capacity of sets in functional space. Amer. Math. Soc. Transl. 1961, 17, 277–364. [Google Scholar]
  105. Dudley, R.M. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal. 1967, 1, 290–330. [Google Scholar] [CrossRef] [Green Version]
  106. Dudley, R.M. Uniform Central Limit Theorems; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1999; Volume 63, pp. xiv+436. [Google Scholar] [CrossRef]
  107. van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes. With Applications to Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 1996; pp. xvi+508. [Google Scholar] [CrossRef]
  108. Kosorok, M.R. Introduction to Empirical Processes and Semiparametric Inference; Springer Series in Statistics; Springer: New York, NY, USA, 2008; pp. xiv+483. [Google Scholar] [CrossRef]
  109. Deheuvels, P. One bootstrap suffices to generate sharp uniform bounds in functional estimation. Kybernetika 2011, 47, 855–865. [Google Scholar]
  110. Vogt, M. Nonparametric regression for locally stationary time series. Ann. Statist. 2012, 40, 2601–2633. [Google Scholar] [CrossRef]
  111. Mayer-Wolf, E.; Zeitouni, O. The probability of small Gaussian ellipsoids and associated conditional moments. Ann. Probab. 1993, 21, 14–24. [Google Scholar] [CrossRef]
  112. Bogachev, V.I. Gaussian Measures; Mathematical Surveys and Monographs; American Mathematical Society: Providence, RI, USA, 1998; Volume 62, pp. xii+433. [Google Scholar] [CrossRef]
  113. Li, W.V.; Shao, Q.M. Gaussian processes: Inequalities, small ball probabilities and applications. In Stochastic Processes: Theory and Methods; Handbook of Statist: Amsterdam, The Netherlands, 2001; Volume 19, pp. 533–597. [Google Scholar] [CrossRef]
  114. Ferraty, F.; Mas, A.; Vieu, P. Nonparametric regression on functional data: Inference and practical aspects. Aust. N. Z. J. Stat. 2007, 49, 267–286. [Google Scholar] [CrossRef] [Green Version]
  115. Lahiri, S.N.; Kaiser, M.S.; Cressie, N.; Hsu, N.J. Prediction of spatial cumulative distribution functions using subsampling. J. Amer. Statist. Assoc. 1999, 94, 86–110. [Google Scholar] [CrossRef]
  116. van der Vaart, A.W. Asymptotic Statistics; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 1998; Volume 3, pp. xvi+443. [Google Scholar] [CrossRef]
  117. Mason, D.M. Proving consistency of non-standard kernel estimators. Stat. Inference Stoch. Process. 2012, 15, 151–176. [Google Scholar] [CrossRef]
  118. Bellet, A.; Habrard, A.; Sebban, M. A Survey on Metric Learning for Feature Vectors and Structured Data. arXiv 2013, arXiv:1306.6709. [Google Scholar]
  119. Clémençon, S.; Colin, I.; Bellet, A. Scaling-up empirical risk minimization: Optimization of incomplete U-statistics. J. Mach. Learn. Res. 2016, 17, 76. [Google Scholar]
  120. Jin, R.; Wang, S.; Zhou, Y. Regularized Distance Metric Learning:Theory and Algorithm. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C., Culotta, A., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2009; Volume 22. [Google Scholar]
  121. Bellet, A.; Habrard, A. Robustness and generalization for metric learning. Neurocomputing 2015, 151, 259–267. [Google Scholar] [CrossRef] [Green Version]
  122. Cao, Q.; Guo, Z.C.; Ying, Y. Generalization bounds for metric and similarity learning. Mach. Learn. 2016, 102, 115–132. [Google Scholar] [CrossRef] [Green Version]
  123. Clémençon, S.; Robbiano, S. The TreeRank Tournament algorithm for multipartite ranking. J. Nonparametr. Stat. 2015, 27, 107–126. [Google Scholar] [CrossRef] [Green Version]
  124. Clémençon, S.; Robbiano, S.; Vayatis, N. Ranking data with ordinal labels: Optimality and pairwise aggregation. Mach. Learn. 2013, 91, 67–104. [Google Scholar] [CrossRef] [Green Version]
  125. Dudley, R.M. A course on empirical processes. In École d’été de Probabilités de Saint-Flour, XII—1982; Lecture Notes in Math; Springer: Berlin, Germany, 1984; Volume 1097, pp. 1–142. [Google Scholar] [CrossRef]
  126. Polonik, W.; Yao, Q. Set-indexed conditional empirical and quantile processes based on dependent data. J. Multivar. Anal. 2002, 80, 234–255. [Google Scholar] [CrossRef] [Green Version]
  127. Stute, W. Universally consistent conditional U-statistics. Ann. Statist. 1994, 22, 460–473. [Google Scholar] [CrossRef]
  128. Stute, W. Lp-convergence of conditional U-statistics. J. Multivar. Anal. 1994, 51, 71–82. [Google Scholar] [CrossRef]
  129. Maillot, B.; Viallon, V. Uniform limit laws of the logarithm for nonparametric estimators of the regression function in presence of censored data. Math. Methods Statist. 2009, 18, 159–184. [Google Scholar] [CrossRef]
  130. Kohler, M.; Máthé, K.; Pintér, M. Prediction from randomly right censored data. J. Multivar. Anal. 2002, 80, 73–100. [Google Scholar] [CrossRef] [Green Version]
  131. Carbonez, A.; Györfi, L.; van der Meulen, E.C. Partitioning-estimates of a regression function under random censoring. Statist. Decis. 1995, 13, 21–37. [Google Scholar] [CrossRef]
  132. Brunel, E.; Comte, F. Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Statist. 2006, 15, 233–255. [Google Scholar]
  133. Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Statist. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
  134. Bouzebda, S.; El-hadjali, T. Uniform convergence rate of the kernel regression estimator adaptive to intrinsic dimension in presence of censored data. J. Nonparametr. Stat. 2020, 32, 864–914. [Google Scholar] [CrossRef]
  135. Datta, S.; Bandyopadhyay, D.; Satten, G.A. Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scand. J. Stat. 2010, 37, 680–700. [Google Scholar] [CrossRef]
  136. Stute, W.; Wang, J.L. Multi-sample U-statistics for censored data. Scand. J. Statist. 1993, 20, 369–374. [Google Scholar]
  137. Chen, Y.; Datta, S. Adjustments of multi-sample U-statistics to right censored data and confounding covariates. Comput. Statist. Data Anal. 2019, 135, 1–14. [Google Scholar] [CrossRef]
  138. Yuan, A.; Giurcanu, M.; Luta, G.; Tan, M.T. U-statistics with conditional kernels for incomplete data models. Ann. Inst. Statist. Math. 2017, 69, 271–302. [Google Scholar] [CrossRef]
  139. Földes, A.; Rejto, L. A LIL type result for the product limit estimator. Z. Wahrsch. Verw. Gebiete 1981, 56, 75–86. [Google Scholar] [CrossRef]
  140. Bouzebda, S.; El-hadjali, T.; Ferfache, A.A. Uniform in bandwidth consistency of conditional U-statistics adaptive to intrinsic dimension in presence of censored data. Sankhya A 2022, 1–59. [Google Scholar] [CrossRef]
  141. Hall, P. Asymptotic properties of integrated square error and cross-validation for kernel estimation of a regression function. Z. Wahrsch. Verw. Gebiete 1984, 67, 175–196. [Google Scholar] [CrossRef]
  142. Härdle, W.; Marron, J.S. Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist. 1985, 13, 1465–1481. [Google Scholar] [CrossRef]
  143. Rachdi, M.; Vieu, P. Nonparametric regression for functional data: Automatic smoothing parameter selection. J. Statist. Plann. Inference 2007, 137, 2784–2801. [Google Scholar] [CrossRef]
  144. Benhenni, K.; Ferraty, F.; Rachdi, M.; Vieu, P. Local smoothing regression with functional data. Comput. Statist. 2007, 22, 353–369. [Google Scholar] [CrossRef]
  145. Shang, H.L. Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density. J. Nonparametr. Stat. 2014, 26, 599–615. [Google Scholar] [CrossRef] [Green Version]
  146. Li, Q.; Maasoumi, E.; Racine, J.S. A nonparametric test for equality of distributions with mixed categorical and continuous data. J. Econom. 2009, 148, 186–200. [Google Scholar] [CrossRef]
  147. Horowitz, J.L.; Spokoiny, V.G. An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative. Econometrica 2001, 69, 599–631. [Google Scholar] [CrossRef]
  148. Gao, J.; Gijbels, I. Bandwidth selection in nonparametric kernel testing. J. Am. Statist. Assoc. 2008, 103, 1584–1594. [Google Scholar] [CrossRef] [Green Version]
  149. Yu, B. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 1994, 22, 94–116. [Google Scholar] [CrossRef]
  150. Bernstein, S. Sur l’extension du théoréme limite du calcul des probabilités aux sommes de quantités dépendantes. Math. Ann. 1927, 97, 1–59. [Google Scholar] [CrossRef]
  151. Giné, E.; Zinn, J. Some limit theorems for empirical processes. Ann. Probab. 1984, 12, 929–998. [Google Scholar] [CrossRef]
  152. Bouzebda, S.; Soukarieh, I. Weak Convergence of the Conditional U-statistics for Locally Stationary Functional Time Series. Stat. Inference Stoch. Process. 2022. [Google Scholar]
  153. Masry, E. Multivariate local polynomial regression for time series: Uniform strong consistency and rates. J. Time Ser. Anal. 1996, 17, 571–599. [Google Scholar] [CrossRef]
  154. Rio, E. Inequalities and Limit Theorems for Weakly Dependent Sequences. Available online: https://cel.hal.science/cel-00867106/ (accessed on 20 October 2022).
  155. de la Peña, V.H. Decoupling and Khintchine’s inequalities for U-statistics. Ann. Probab. 1992, 20, 1877–1892. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bouzebda, S.; Soukarieh, I. Non-Parametric Conditional U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2023, 11, 16. https://doi.org/10.3390/math11010016

AMA Style

Bouzebda S, Soukarieh I. Non-Parametric Conditional U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics. 2023; 11(1):16. https://doi.org/10.3390/math11010016

Chicago/Turabian Style

Bouzebda, Salim, and Inass Soukarieh. 2023. "Non-Parametric Conditional U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design" Mathematics 11, no. 1: 16. https://doi.org/10.3390/math11010016

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop