Next Article in Journal
A Deep Learning Approach for Stochastic Structural Plane Generation Based on Denoising Diffusion Probabilistic Models
Previous Article in Journal
Improving Water and Energy Resource Management: A Comparative Study of Solution Representations for the Pump Scheduling Optimization Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design

Laboratory of Applied Mathematics of Compiègne (LMAC), Université de Technologie de Compiègne, CS 60319-60203 Compiègne Cedex, France
Mathematics 2024, 12(13), 1996; https://doi.org/10.3390/math12131996
Submission received: 18 February 2024 / Revised: 18 June 2024 / Accepted: 24 June 2024 / Published: 27 June 2024
(This article belongs to the Section Probability and Statistics)

Abstract

:
In his work published in (Ann. Probab. 19, No. 2 (1991), 812–825), W. Stute introduced the notion of conditional U-statistics, expanding upon the Nadaraya–Watson estimates used for regression functions. Stute illustrated the pointwise consistency and asymptotic normality of these statistics. Our research extends these concepts to a broader scope, establishing, for the first time, an asymptotic framework for single-index conditional U-statistics applicable to locally stationary random fields { X s , A n : s i n R n } observed at irregularly spaced locations in R n , a subset of R d . We introduce an estimator for the single-index conditional U-statistics operator that accommodates the nonstationary nature of the data-generating process. Our method employs a stochastic sampling approach that allows for the flexible creation of irregularly spaced sampling sites, covering both pure and mixed increasing domain frameworks. We establish the uniform convergence rate and weak convergence of the single conditional U-processes. Specifically, we examine weak convergence under bounded or unbounded function classes that satisfy specific moment conditions. These findings are established under general structural conditions on the function classes and underlying models. The theoretical advancements outlined in this paper form essential foundations for potential breakthroughs in functional data analysis, laying the groundwork for future research in this field. Moreover, in the same context, we show the uniform consistency for the nonparametric inverse probability of censoring weighted (I.P.C.W.) estimators of the regression function under random censorship, which is of its own interest. Potential applications of our findings encompass, among many others, the set-indexed conditional U-statistics, the Kendall rank correlation coefficient, and the discrimination problems.

1. Introduction

Statisticians and probability theorists have thoroughly examined the regression problem, resulting in a diverse array of techniques. Extensive research has been conducted on various aspects, including modeling, estimation methods, applications, and tests. Nonparametric methods are particularly notable for their flexibility, as they do not rely on predefined structural information, unlike parametric methods, which estimate a specific number of parameters based on a predetermined model structure. However, nonparametric methods can suffer from estimation biases and slower convergence rates. Among these, nonparametric kernel function estimation techniques have garnered significant interest. For detailed references and insights into the research literature and statistical applications in this field, see [1,2,3,4] and other pertinent works. In addition to nonparametric methods, approaches such as neural networks, wavelet analysis, splines, and nearest-neighbor methods are widely employed for creating reliable function estimators. These methodologies are applied across various data types and domains. In spatial data analysis, for instance, this article discusses the construction of kernel-type estimators for conditional U-statistics estimation. Numerous scientific disciplines, including econometrics, epidemiology, environmental science, image analysis, oceanography, meteorology, and geostatistics, routinely generate spatial data that are statistically analyzed at specific measurement locations. For comprehensive information and insights into statistical applications in spatial data analysis, refer to the research literature such as [5,6,7,8], and the references within these works. Recent advancements in nonparametric estimation for spatial data primarily focus on regression function and probability density estimation. Key references in this domain include works by [9,10,11,12,13,14], and other relevant sources cited within these works.
This research introduces a more general and abstract framework by exploring conditional U-processes for locally stationary random fields. The concept of U-statistics and U-processes has garnered significant interest due to their wide-ranging applications, including density estimation, nonparametric regression tests, and goodness-of-fit tests. U-processes effectively address complex statistical challenges across various contexts, such as higher-order terms in von Mises expansions. Notably, U-statistics play a crucial role in analyzing estimators and function estimators with varying degrees of smoothness. For instance, [15] analyzed the product limit estimator for truncated data, employing almost sure uniform bounds for P -canonical U-processes. Additionally, [16] introduced two novel normality tests based on U-processes, while [17] proposed new normality tests using weighted L 1 -distances between the standard normal density and local U-statistics from standardized data. Further applications include the estimation of the mean of multivariate functions under possibly heavy-tailed distributions by [18]. For a comprehensive understanding of U-statistics and U-processes, see references such as [19,20,21]. The versatility of U-statistics extends to estimation and machine learning applications, with notable works on U-statistics with random kernels of divergent orders by [22,23,24]. Infinite-order U-statistics are particularly valuable for constructing simultaneous prediction intervals, crucial for quantifying uncertainty in ensemble methods like sub-bagging and random forests. Additionally, the MeanNN method estimation for differential entropy, introduced by [25], showcases the versatility of the U-statistic. Novel test statistics for goodness-of-fit tests using U-statistics were proposed by [26].
This study primarily focuses on situations involving locally stationary spatial–functional data. For a comprehensive understanding of functional data analysis, readers are encouraged to explore the foundational works [27,28], which offer diverse case studies from disciplines such as criminology, economics, archaeology, and neurophysiology. It is worth noting that the extension of probability theory to random variables taking values in normed vector spaces predates the recent literature on functional data. Works by [29,30] address challenges posed by functional data with infinite dimensions, while nonparametric models for regression estimation are discussed in works by [31,32,33]. In recent years, modern empirical process theory has found new applications in processing functional data. Ref. [34] established consistency rates for various conditional models, including regression functions and conditional cumulative distributions, uniformly over subsets of the explanatory variables. Additionally, ref. [35] demonstrated strong convergence rates for local linear estimation of the regression function with a functional regressor, uniformly across bandwidth parameters. In the domain of strong mixing functional time series data, ref. [36] explored the k-nearest neighbors (kNN) estimator of the nonparametric regression model, determining its uniform, almost complete convergence rate under mild conditions. Recent references on U-statistics include works by [37,38,39], among others.
Stute [40] introduced a set of estimators for r ( m ) ( φ , t ) , known as conditional U-statistics, to extend Nadaraya–Watson regression function estimations. First, we present Stute’s estimators. Consider a regular sequence of random elements { ( X i , Y i ) , i N * } , where X i R d , and Y i Y , a Polish space (a topological space Y is Polish if its topology can be defined by a metric δ , for which it is complete and separable; this means there exists a countable dense subset of Y or a countable base of open sets (e.g., p. 209, [41])), and N * = N { 0 } . Let φ : Y m R be a measurable function. In this investigation, our primary focus is on estimating the conditional expectation or regression function
r ( m ) ( φ , t ) = E φ ( Y 1 , , Y m ) ( X 1 , , X m ) = t , f o r t R d m ,
whenever it exists, i.e.,
E φ ( Y 1 , , Y m ) < .
We now introduce a kernel function K : R d R with support contained in [ B , B ] d , B > 0 , satisfying
sup x R d | K ( x ) | = : κ < a n d K ( x ) d x = 1 .
Hence, the class of estimators for r ( m ) ( φ , t ) , given by [40], is defined, for each t R d m , as follows:
r ^ n ( m ) ( φ , t ; h n ) = i I n m φ ( Y i 1 , , Y i m ) K t 1 X i 1 h n K t m X i m h n i I n m K t 1 X i 1 h n K t m X i m h n ,
where
I n m = i = ( i 1 , , i m ) : 1 i j n a n d i j i r i f j r ,
denotes the set of all m-tuples of different integers i j between 1 and n and { h n : = h n } n 1 is a sequence of positive constants that converge to zero with rate n h n m . For m = 1 , the r ( m ) ( φ , t ) becomes r ( 1 ) ( φ , t ) = E ( φ ( Y ) | X = t ) , and Stute’s estimate transforms into the Nadaraya–Watson estimator of r ( 1 ) ( φ , t ) . In this framework, Sen [42] aimed to estimate the rate of uniform convergence in t of r ^ n ( m ) ( φ , t ; h n ) to r ( m ) ( φ , t ) . Meanwhile, the study by Prakasa Rao and Sen [43] explored the limit distributions of r ^ n ( m ) ( φ , t ; h n ) , discussing and contrasting Stute’s findings. Similarly, under appropriate mixing conditions, Harel and Puri [44] extended the results of [40] to weakly dependent data and applied their findings to validate the Bayes risk consistency of relevant discrimination rules. In another line of research, Stute [45] proposed symmetrized nearest neighbor conditional U-statistics as alternatives to conventional kernel estimators. Fu [46] considered the functional conditional U-statistic and established its finite-dimensional asymptotic normality. Despite the relevance of the subject, the nonparametric estimation of conditional U-statistics in the functional data framework has not received significant attention. Recent developments are discussed in reference [47], where the authors examine challenges related to uniform-in-bandwidth consistency in a general framework.
The necessity for regression models facilitating dimension reduction has been strongly advocated by [28]. Recent developments in Functional Data Analysis underscore the significance of devising models to mitigate dimensionality effects (see [33,48,49] for recent surveys). Among these models, single-index models are prevalent, operating on the premise that the impact of predictors on the response can be simplified to a single index. This index represents a projection onto a specified direction and is coupled with a nonparametric link function, condensing predictors into a single-variable index while preserving essential characteristics. Importantly, utilizing a nonparametric link function solely on a one-dimensional index helps alleviate challenges associated with high-dimensional data, commonly referred to as the curse of dimensionality. The single-index model extends beyond traditional linear regression by integrating a link function equivalent to the identity function. For further insights, interested readers may refer to [50,51,52].
Semiparametric concepts naturally emerge as promising candidates for this purpose. In this context, the Functional Single-Index Model (FSIM) has been examined by [53,54]. Functional single-index quantile regression, introduced by [55], estimates the unknown slope function and link function using B-spline basis functions. A novel compact functional single-index model, with the coefficient function only nonzero in a subregion, was proposed by [56]. The estimation of a general functional single-index model, where the conditional distribution of the response depends on the functional predictor via a functional single-index structure, was investigated by [57]. Tang et al. [58] devised a unique estimation procedure combining functional principal component analysis of functional predictors, B-spline models for parameters, and profile estimation of unknown parameters and functions in the model. For strong mixing time series data, Ling et al. [59,60] investigated the estimation of the functional single-index regression model with missing responses at random. Feng et al. [61] introduced a functional single-index varying coefficient model with the functional predictor as the single-index part, utilizing functional principal components analysis and basis function approximation for estimating slope and coefficient functions through an iterative procedure. Novo et al. [62] developed an automatic and location-adaptive procedure for estimating regression in an FSIM based on k-Nearest Neighbours (kNN) ideas. In the realm of imaging data analysis, Li et al. [63] proposed a novel functional varying-coefficient single-index model for regression analysis of functional response data on a set of covariates. Attaoui and Ling [64] investigated a functional Hilbertian regressor for nonparametric estimation of the conditional cumulative distribution with a scalar response variable in the single-index structure. An alternative approach was presented by [65], encompassing the multi-index case and not anchoring the true parameter on a pre-specified sieve. Their methodology includes a detailed theoretical analysis of a direct kernel-based estimation scheme, establishing a polynomial convergence rate. In comparison to the aforementioned studies on time series analysis, research concerning nonparametric methods for locally stationary random fields is scarce, even for nonparametric density estimation and regression, despite empirical interest in modeling spatially varying dependence structures. Moreover, for the analysis of spatial data on R d , it is generally assumed that the underlying model is a parametric Gaussian process. For empirical motivation and discussion regarding the modeling of nonstationary random fields, we refer to [66], and for discussions regarding the importance of nonparametric, nonstationary, and non-Gaussian spatial models, see [67]. For additional references, see [13,68,69,70,71,72,73].

1.1. Paper Contribution

This study aims to establish a comprehensive framework and elucidate the weak convergence and consistency of locally stationary random function fields utilizing conditional U-processes within a single-index structure. Locally stationary processes, as introduced by [74], denote time series exhibiting nonstationary behavior, allowing parameters to vary over time. These processes can be approximated locally by stationary time series, facilitating the establishment of asymptotic theories for estimating time-varying characteristics. In time series analysis, locally stationary models are primarily examined within a parametric framework featuring coefficients that vary with time. Refer to [75] for general theory in the literature of locally stationary processes, and also see [76]. The analysis is intricate due to the inherent challenge of ensuring asymptotic equicontinuity with minimal conditions in this broad context, which remains unresolved in the existing literature. To address this gap, we propose integrating insights from [49,77,78], while incorporating techniques relevant to functional data from [73,79,80]. However, as detailed in the subsequent sections, addressing this challenge goes beyond a mere amalgamation of concepts from existing studies. Complex mathematical derivations are indispensable to tackle the specific characteristics of functional data in our context. Successfully addressing this necessitates applying large-sample theoretical tools previously established for empirical processes, with reference to the works of [13,49,77,78]. This paper rigorously addresses several technical challenges. The first issue pertains to the nonlinear extension of the single-index idea in the framework of the conditional U-processes. The second challenge involves extending the Hoeffding decomposition for non-stationary time series. Finally, the third problem arises from the unbounded nature of the function class, resulting in lengthy and technical proofs. Our main results concerning the uniform convergence with rate are presented in Proposition 1 and Theorem 1. In Theorem 1, we present our main result concerning the weak convergence. In the realm of spatial functional data, there exists a paucity of literature regarding single-index models. In a previous study [81], we examined the expectile single index within spatial functional data. However, the current paper addresses a broader and more abstract problem, amalgamating various techniques from functional data analysis, spatial data observed at irregular intervals, and the theory of empirical processes and U-processes within a dependent framework. The proofs of our results diverge significantly from those applicable to equally spaced time series (mixing sequence) or spatial data observed on lattice points. Our emphasis lies on spatial dependence and irregularly spaced sampling points. In numerous scientific domains like ecology, meteorology, seismology, and spatial econometrics, irregular sampling points are intrinsic due to physical constraints. Measurement stations cannot be uniformly distributed on a regular grid. The stochastic sampling design employed in this study accommodates non-uniform density among sampling sites across the region. This flexibility allows the number of sampling sites to vary at different rates relative to the region’s volume, approximately O ( A d n ) . For more comprehensive details and rationale, please refer to [73]. This paper extends our previous research [13] to encompass the single-index context. Numerous significant questions within this framework remain unanswered, as outlined in Section 8.

1.2. Paper Organization

The structure of this article is as follows. In Section 2, we present the functional framework along with the necessary definitions for our study. Briefly outlined are the assumptions that form the basis of our asymptotic analysis. Section 3 discusses the uniform rates of strong convergence, while Section 4 reveals the main results concerning the uniform weak convergence for the conditional U-processes. Exploring potential applications is the focus of Section 5, including: metric learning in Section 5.1, the multipartite ranking in Section 5.3, the discrimination in Section 5.4 and the Kendall rank correlation coefficient in Section 5.5. Section 6 delves into the consideration of conditional U-statistics within the right-censored data framework including the conditional U-statistics for left truncated and right censored data in Section 6.1. The selection of bandwidth through cross-validation procedures is detailed in Section 7. Concluding remarks and possible future developments are covered in Section 8. To ensure a smooth flow of presentation, all proofs are consolidated in Section 9. Finally, Section 10 provides relevant technical results, and includes the examples of classes of functions in Section 10.1 and examples of U-kernels in Section 10.2.

2. The Functional Framework

2.1. Notation

For each set A R d , | A | denotes the Lebesgue measure of A, and [ ! [ A ] ! ] represents the number of elements in A. Regarding positive sequences a n and b n , we use the notation a n b n if there exists a constant C > 0 , independent of n, such that a n C b n for all n. Similarly, a n b n if a n b n and b n a n , and a n b n if a n / b n 0 as n . The symbol d indicates convergence in the distribution. We use X = d Y to denote that the random variables X and Y have the same distribution. P S denotes the joint probability distribution of the sequence of independent and identically distributed (i.i.d.) random vectors { S 0 , j } j 1 , and P · | S is the conditional probability distribution for { S 0 , j } j 1 . Additionally, E · | S represents the conditional expectation, and Var · | S represents the variance for { S 0 , j } j 1 .

2.2. Generality on the Model

Let · , · and · represent the inner product and the associated norm on the Hilbert space H , respectively, with ( e p ) p 1 as a complete orthonormal system of H . Let R n = [ 0 , A n ] d R d be a sampling region with A n as n . We consider the semi-metric d θ ( · , · ) associated with the single index θ H , defined by
d θ ( u , v ) : = | θ , u v | , for u , v H .
In this investigation, we assume the existence of a fixed θ = ( θ 1 , , θ m ) Θ m H m , ( x 1 , , x m ) H m , and the observations are related by the following relation for s i R n , i = 1 , , m , i = ( 1 , , m ) and s i = ( s 1 , , s m ) ,
r ( m ) φ , x , θ , s i A n : = E φ ( Y s 1 , A n , , Y s m , A n ) X s 1 , A n , θ 1 = x 1 , θ 1 , , X s m , A n , θ m = x m , θ m ,
and
φ ( Y s 1 , A n , , Y s m , A n ) = r ( m ) φ , x , θ , s i A n + j = 1 m σ s j A n , X s j , A n ϵ i j = : r ( m ) φ , x , θ , s i A n + j = 1 m ϵ s j , A n ,
where E [ ϵ s , A n | X s , A n ] = 0 and s i A n = ( s 1 A n , , s m A n ) . Here, Y s , A n and X s , A n represent random functions in H and Y , respectively. To address the identifiability issue, we assume that the regression function is differentiable, and for i = 1 , , m , θ i , e 1 = 1 , where e 1 denotes the first element of the orthonormal basis of H . For a more in-depth exploration of the identifiability problem in the single functional index model, one can refer to [53]. We view { X s , A n : s R n } as a locally stationary random function field on R n R d ( d 2 ). As suggested by [74], locally stationary processes are nonstationary time series in which the parameters of the time series can change over time. Locally in time, they can be modeled by a stationary time series, allowing the use of asymptotic theories to estimate the parameters of models that depend on time. Time series analysis predominantly centers on locally stationary models within a parametric framework, characterized by coefficients that vary over time. Without assuming the single-index structure, the model (4) is given by
r ( m ) φ , x , s i A n : = E φ ( Y s 1 , A n , , Y s m , A n ) X s 1 , A n , = x 1 , , X s m , A n = x m ,
and
φ ( Y s 1 , A n , , Y s m , A n ) = r ( m ) φ , x , s i A n + j = 1 m σ s j A n , X s j , A n ϵ i j = : r ( m ) φ , x , s i A n + j = 1 m ϵ s j , A n ,
where E [ ϵ s , A n | X s , A n ] = 0 . The following examples are given for the particular case m = 1 , in the finite dimensional setting and are presented in [82]. The following examples may help to illustrate the model (5), without considering the dependence of the processes that we will consider.
Example 1
(Multivariate nonlinear time series). Consider the scenario where we observe multivariate time series { Y t = ( Y 1 , t , , Y p , t ) } t = 1 T and { X t = ( X 1 , t , , X d , t ) } t = 1 T such that
Y j , t = r j ( X t ) + σ j ( X t ) ε j , t , j = 1 , , p .
The model (6) corresponds to (i) a multivariate nonlinear AR model when X t = ( Y t 1 , , Y t q ) for some q 1 and (ii) a multivariate nonlinear time series regression with exogenous variables when σ j ( · ) = 1 and { X t } t = 1 T is uncorrelated with { ε t = ( ε 1 , t , , ε p , t ) } t = 1 T . If one seeks to estimate the mean function r = ( r 1 , , r p ) : R d R p , then it suffices to estimate each component r j ( · ) .
Example 2
(Time-varying nonlinear models). Consider a nonlinear time-varying model:
Y t = r ( 1 ) t T , Y t 1 , , Y t p + σ t T , Y t 1 , , Y t q v t ,
where 1 p , q d 1 . This example corresponds to the model (5), m = 1 , with X t = ( t / T , Y t 1 , , Y t d + 1 ) as well as r ( 1 ) ( · ) and σ ( · ) considered as functions on R d in the canonical way. If the random variables v t are i.i.d., then the model (7) corresponds to that considered in [83]. Moreover, the model (7) covers, for instance, time-varying AR(p)-ARCH(q) models when r ( 1 ) ( u , x 1 , , x p ) = r 0 ( u ) + j = 1 p r j ( u ) x j , and σ ( u , x 1 , , x q ) = ( σ 0 ( u ) + j = 1 q σ j ( u ) x j 2 ) 1 / 2 , with some functions r j : [ 0 , 1 ] R , σ j : [ 0 , 1 ] [ 0 , ) , as referenced in [84].

2.3. Local Stationarity

Consider a random function field { X s , A n : s R n } , where A n as n . This field is regarded as locally stationary if it demonstrates behavior that is approximately stationary in the local space. (For example, consider a continuous function f : [ 0 , 1 ] R and a sequence of i.i.d. random variables ε t t N . The stochastic process X t , T = f ( t / T ) + ε t , t { 1 , , T } , T N can be expected to behave “almost” stationary for t { 1 , , T } close to t * , for some t * { 1 , , T } , as in this case f t * / T f ( t / T ) , but this process is not weakly stationary.) To ensure local stationarity around each rescaled space point u , a process X s , A n can be stochastically approximated by a stationary random function field { X u ( s ) : s R d } , as discussed, for example, in [85]. The following definition represents one possible approach to define this concept.
Definition 1.
The H -valued stochastic process { X s , A n : s R n } is considered locally stationary if, for each rescaled time point u [ 0 , 1 ] d , there exists an associated H -valued process { X u ( s ) : s R d } with the following properties:
(i) 
{ X u ( s ) : s R d } is strictly stationary.
(ii) 
It holds that
d θ X s , A n , X u ( s ) s A n u 2 + 1 A n d U s , A n ( u ) , a . s . ,
where { U s , A n ( u ) } is a process of positive variables satisfying E [ ( U s , A n ( u ) ) ρ ] < C for some ρ > 0 , C < ; C is independent of u , s , and A n . · 2 denotes an arbitrary norm in R d .
The concept of local stationarity for real-valued time series was initially introduced by [74]. Definition 1 is a natural extension of that idea. Additionally, the definition we provide aligns with (Definition 2.1, [86]) when H is the Hilbert space L R 2 ( [ 0 , 1 ] ) of all real-valued functions that are square integrable with respect to the Lebesgue measure on the unit interval [ 0 , 1 ] with the L 2 -norm given by
f 2 = f , f , f , g = 0 1 f ( t ) g ( t ) d t ,
where f , g L R 2 ( [ 0 , 1 ] ) . Additionally, the authors establish necessary conditions for an L R 2 ( [ 0 , 1 ] ) -valued stochastic process X t , A n to satisfy (8) with d ( f , g ) = | f g | 2 and ρ = 2 .

2.4. Sampling Design

We will explore the stochastic sampling strategy tailored to accommodate irregularly spaced data. Let us denote R n as the sampling region. Consider a sequence of positive numbers { A n } n 1 such that A n as n . The sampling region is defined as follows:
R n = [ 0 , A n ] d .
Next, we will discuss the (random) sample designs to be employed. Let f S ( s 0 ) be a continuous, everywhere positive probability density function on R 0 = [ 0 , 1 ] d , and let { S 0 , j } j 1 be a sequence of i.i.d. random vectors with the probability density f S ( s 0 ) . These random vectors are such that { S 0 , j } j 1 and { X s , A : s R n } share a common probability space ( Ω , F , P ) and are independent. The realizations s 0 , 1 , , s 0 , n of random vectors S 0 , 1 , , S 0 , n are determined by the following relation:
s j = A n s 0 , j , j = 1 , , n .
This relation provides the sampling sites s 1 , , s n . Herein, we assume that n A n d as n .
Remark 1.
In practical applications, A n can be determined by considering the diameter of the sampling region. We can extend the applicability of the assumption (9) to R n for a broader range of situations, namely,
R n = j = 1 d [ 0 , A j , n ] ,
where A j , n represents sequences of positive constants with A j , n as n . To avoid more challenging outcomes, we have retained the assumption (9). For further discussion, please refer to [73,80,87,88,89,90] and (Chapter 12, [91]).

2.5. Mixing Condition

The sequence Z 1 , Z 2 , is considered β -mixing or absolutely regular, as described in [92,93], if
β ( k ) : = E sup l 1 P A | σ 1 l P A : A σ l + k 0 as k .
Notably, Ibragimov and Solev [94] provided a comprehensive description of stationary Gaussian processes matching the last condition. Now, we define β -mixing coefficients for a random function field X ˜ . Let σ X ˜ ( T ) = σ ( { X ˜ ( s ) : s T } ) be the σ -field generated by variables { X ( ˜ s ) : s T } , T R d . For subsets T 1 and T 2 of R d , let
β ¯ ( T 1 , T 2 ) = sup 1 2 j = 1 J k = 1 K | P ( A j B k ) P ( A j ) P ( B k ) | ,
where the supremum is taken over all pairs of (finite) partitions { A 1 , , A J } and { B 1 , , B K } of R d such that A j σ X ˜ ( T 1 ) and B k σ X ˜ ( T 2 ) . Furthermore, let
d ( T 1 , T 2 ) = inf { | x y | : x T 1 , y T 2 } ,
where | x | = j = 1 d | x j | for x R d , and let R ( b ) be the collection of all finite disjoint unions of cubes in R d with a total volume not exceeding b. Subsequently, the β -mixing coefficients for the random field X ˜ can be defined as
β ( a ; b ) = sup { β ¯ ( T 1 , T 2 ) : d ( T 1 , T 2 ) a , T 1 , T 2 R ( b ) } .
We assume that a non-increasing function β 1 ( · ) with lim a β 1 ( a ) = 0 and a non-decreasing function g 1 ( · ) exist such that the β -mixing coefficient β ( a ; b ) satisfies the following inequality:
β ( a ; b ) β 1 ( a ) g 1 ( b ) , a > 0 , b > 0 ,
where g 1 ( · ) may be unbounded for d 2 .
Remark 2
(Some remarks about mixing conditions). The size of index sets T 1 and T 2 in the definition of β ( a ; b ) must be constrained. To elaborate on this point, if the β-mixing coefficients of a random field X are defined similarly to the β-mixing coefficients for time series as follows: Let O 1 and O 2 be half-planes with boundaries L 1 and L 2 , respectively. For each real number a > 0 , define
β ( a ) = sup β ¯ ( O 1 , O 2 ) : d ( O 1 , O 2 ) a ,
where the supremum is taken over all pairs of parallel lines L 1 and L 2 such that d ( L 1 , L 2 ) a . Subsequently, (Theorem 1, [95]) shows that if { X ( s ) : s R 2 } is a strictly stationary mixing random field, and a > 0 is a real number, then β ( a ) = 1 or 0. This means that if a random field X is β-mixing lim a β ( a ) = 0 , then, for η a positive constant and some a > η , the random field X is "ℓ-dependent", i.e., β ( a ) = 0 . However, this is highly restricted in practice. To relax these results and make them more flexible for practical purposes, it will be necessary to limit the size of T 1 and T 2 and adopt Definition 10 for β-mixing. For additional information on mixing coefficients for random fields, we refer to [73,80,96,97,98].
Lahiri [89] discussed the mixing condition given in Equation (11) for the α -mixing condition, as also explored in the studies of [99,100]. While we specifically focus on the β -mixing case, it is well established that β -mixing implies α -mixing. In general, within the expression (11), β 1 ( · ) is a function that could potentially depend on n, considering that the random field X s , A n is contingent on n. However, for simplicity, we assume that g ( · ) does not depend on n, although addressing cases where g ( · ) changes with n is not inherently complex. It is crucial to note that the random field Y s , A n (or φ ( Y s , A n ) ) may not necessarily adhere to the mixing condition (11), as the mixing condition is postulated for X s , A n . Nevertheless, with the regression form represented by the model in (4), Y s , A n (or φ ( Y s , A n ) ) may exhibit a flexible dependence structure.
Remark 3.
Kurisu [73] explored examples of locally stationary random fields on R d that satisfy our mixing conditions and the other regularity conditions outlined in Section 5 of the same reference. To achieve this, the author introduced the notion of approximately n -dependent locally stationary random fields (where n tends to infinity as n approaches infinity), and extended the framework of continuous autoregressive and moving average (CARMA) random fields as developed in [101] to encompass locally stationary CARMA-type random fields. CARMA random fields are identified by solutions to (fractional) stochastic partial differential equations, as delineated in [102], and are acknowledged as a versatile class of models for spatial data, as indicated in [69,101]. It has been demonstrated that a broad class of Lévy-driven moving average random fields, inclusive of locally stationary CARMA-type random fields, constitute (approximately n -dependent) locally stationary random fields. One of the distinguishing characteristics of CARMA random fields is their capability to represent non-Gaussian as well as Gaussian random fields, provided the driving Lévy random measures are purely non-Gaussian. Conversely, the statistical models prevalent in most of the existing spatial data literature pertaining to R 2 rely heavily on Gaussian processes. For non-Gaussian processes, verifying mixing conditions, as exemplified in works such as [99,100], can often pose significant challenges. In [73], locally stationary Lévy-driven MA random fields were introduced by considering the expressions:
X s , A n = R d g s A n , s v L ( d v ) , X u ( s ) = R d g ( u , s v ) L ( d v ) ,
where g : [ 0 , 1 ] d × [ 0 , ) R is a bounded function satisfying specific conditions. Notably, X u ( s ) denotes a strictly stationary random field for each u . Assuming E | L ( A ) | q < for any A B R d with bounded Lebesgue measure | A | and for q = 1 , 2 , it follows that:
E X u ( s ) = μ 0 R d g ( u , s ) d s , E X u ( s ) 2 = σ ¯ 0 2 R d g 2 ( u , s ) d s ,
where μ 0 = i ψ ( 0 ) and σ ¯ 0 2 = ψ ( 0 ) . Let > 0 . Define the function ι ( · : ) : [ 0 , ) [ 0 , 1 ] as
ι ( x : ) = 1 , if 0 x 2 , 2 x + 2 , if 2 < x , 0 , if < x .
Now consider the process
X u s : A 2 , n = R d g ( u , s v ) ι s v : A 2 , n L ( d v ) .
Remarkably, X u ( s  : A 2 , n also constitutes a 2 A 2 , n -dependent strictly stationary random field, whereby the β-mixing coefficients β ( a ; b ) = β 1 ( a ) g 1 ( b ) of X u s : A 2 , n satisfy β 1 ( a ) = 0 for a 2 A 2 , n .

2.6. Estimation Procedures

Let { X s , A n , Y s , A n : s R n } represent random variables, where Y s , A n belongs to Y , and X s , A n assumes values in a semi-metric space H with a semi-metric d θ ( · , · ) . This metric defines a topology to measure the proximity between two elements of H and is independent of the definition of X to avoid concerns related to measurability. The aim of this investigation is to establish the weak convergence of the single-index conditional U-process employing the following single-index U-statistic. Given x = ( x 1 , , x m ) H m , u = ( u 1 , , u m ) [ 0 , 1 ] d m , and θ = ( θ 1 , , θ m ) Θ m ,
r ^ n ( m ) ( x , θ , u ; h n ) : = r ^ n ( m ) ( φ , x , θ , u ; h n )       = i I n m φ ( Y s i 1 , A n , , Y s i m , A n ) j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n       = i I n m φ ( Y s i 1 , A n , , Y s i m , A n ) j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n ,
where
K ¯ ( u ) = = 1 d K 1 ( u ) .
To explore the weak convergence of the conditional empirical process and the conditional U-process within the context of functional data, it becomes essential to introduce new notations. Consider a symmetric, measurable function φ : Y m R that belongs to a class of functions denoted as F m . Additionally, let h n n N * be a sequence of positive real numbers such that h n 0 as n . To formally express this, define
F m = { φ : Y m R } ,
which constitutes a point-wise measurable class of real-valued symmetric measurable functions on Y m with a measurable envelope function
F ( y ) sup φ F m | φ ( y ) | , f o r y Y m .
For a kernel function K ( · ) , we define the point-wise measurable class of functions, for 1 m n
K θ m : = ( x 1 , , x m ) i = 1 m K d θ j ( x i , · ) h n , ( x 1 , , x m ) H m ,
and
K Θ m : = θ Θ m K θ m = θ Θ m ( x 1 , , x m ) i = 1 m K d θ j ( x i , · ) h n , ( x 1 , , x m ) H m .
We use the notation
ψ ( · , · ) F m K Θ m : = φ 1 ( · ) φ 2 ( · ) : φ 1 F m , φ 2 ( · ) K Θ m ,
and
ψ ( · , · ) F 1 K Θ 1 : = F K Θ = φ 1 ( · ) φ 2 ( · ) : φ 1 F 1 , φ 2 ( · ) K Θ 1 .

2.6.1. Small Ball Probability

In the absence of a universal reference measure, such as the Lebesgue measure, the density function of the functional variable is not defined, presenting a technical challenge in infinite-dimensional spaces. To tackle this issue, we utilize the concept of “small-ball probability”. Consider a ball B θ ( t , r ) = z H : d θ ( z , t ) r in H with center t H and radius r. To characterize the topological structure of functional spaces, we introduce the small-ball probability for a fixed t H , θ Θ and for all r > 0 as follows:
P ( X B θ ( t , r ) ) = : ϕ θ , t ( r ) .
This concept is widely utilized in nonparametric functional data analysis to circumvent the need for density assumptions on the functional variable X and address the challenges associated with the infinite-dimensional nature of functional spaces, as discussed in [28,30,79,103].

2.6.2. VC-Type Classes of Functions

The asymptotic analysis of functional data involves concentration properties articulated through the concept of small-ball probability. In examining a process indexed by a class of functions, it becomes essential to incorporate additional topological concepts, such as metric entropy and VC-subgraph classes (with “VC” standing for Vapnik and Červonenkis).
Definition 2.
Let S E be a subset of a semi-metric space E , and let N ε be a positive integer. A finite set of points { e 1 , , e N ε } E is called a ε-net of S E if:
S E j = 1 N ε B θ ( e j , ε ) .
If N ε ( S E ) denotes the cardinality of the smallest ε-net (the minimal number of open balls of radius ε) needed to cover S H in E , then the Kolmogorov’s entropy (metric entropy) of the set S E is given by the quantity:
ψ S E ( ε ) : = log N ε ( S E ) .
Named after Kolmogorov, the concept of metric entropy (cf. [104]) was introduced by him and later extended to various metric spaces. Dudley [105] applied this idea to establish necessary conditions for the continuity of Gaussian processes and laid the groundwork for significant expansions of Donsker’s theorem concerning the weak convergence of empirical processes. Represented by B H and S H , these are two subsets of the space H with Kolmogorov’s entropy (for the radius ε ) denoted by ψ B H ( ε ) and ψ S H ( ε ) , respectively. The Kolmogorov entropy for the subset B H × S H of the semi-metric space H 2 is then given by:
ψ B H × S H ( ε ) = ψ B H ( ε ) + ψ S H ( ε ) .
Hence, m ψ S H ( ε ) is the Kolmogorov entropy of the subset S H m of the semi-metric space H m . We specify by d the semi-metric on H , then this semi-metric is defined on H m by
d H m x , y : = 1 m d x 1 , y 1 + + 1 m d x m , y m ,
for
x = ( x 1 , , x m ) , y = ( y 1 , , y m ) H m .
Let
d H m , θ x , y : = 1 m d θ 1 x 1 , y 1 + + 1 m d θ m x m , y m , f o r θ = ( θ 1 , , θ m ) , Θ m .
In investigations of this nature, the choice of semi-metric plays a crucial role. For in-depth discussions on selecting the semi-metric, readers are encouraged to refer to [28] (specifically Chapters 3 and 13). Another topological term worth considering is VC-subgraph classes.
Definition 3.
A class of subsets C on a set C is called a VC-class if there exists a polynomial P ( · ) such that, for every set of N ε points in C, the class C selects at most P ( N ε ) distinct subsets.
Definition 4.
A class of functions F is termed a VC-subgraph class if the graphs of the functions in F form a VC-class of sets. In other words, if we define the subgraph of a real-valued function f on S as the subset G f on S × R given by:
G f = { ( s , t ) : 0 t f ( s ) or f ( s ) t 0 } ,
then the class { G f : f F } is a VC-class of sets on S × R . Informally, VC-class functions are identified by their polynomial covering number, which is the minimum number of functions required to form a covering over the entire class of functions.
A VC-class of functions F with an envelope function F exhibits the following entropy property: for a given 1 q < , there exist constants a and b such that
N ( ϵ , F , · L q ( Q ) ) a ( Q F q ) 1 / q ϵ b
for any ϵ > 0 and each probability measure such that Q F q < . For further insights, see (Lemma 22, [106]), (Section 4.7, [107]), (Theorem 2.6.7, [108]), and (Section 9.1, [109]). Additionally, Deheuvels (Section 3.2, [110]) provides further discussions on sufficient conditions under which (16) holds. We have provided some classical examples of classes of functions in Section 10.1.

2.7. Conditions and Comments

Assumption 1.
[Model and distribution assumptions]
(M1) 
The stochastic process { X s , A n : s R n } , taking values in the Hilbert space H , exhibits local stationarity. Therefore, for each time point u [ 0 , 1 ] d , there exists a strictly stationary process { X u ( s ) : s R d } . This process satisfies, for an arbitrary norm · on R d ,
d θ X s , A n , X u ( s ) s A n u 2 + 1 A n d U s , A n ( u ) a . s . ,
where E [ ( U s , A n ( u ) ) ρ ] < C holds for some ρ 1 and C < , and these constants are independent of u , s , and A n .
(M2) 
For i = 1 , , m , consider B ( x i , h ) = { y H : d θ i ( x i , y ) h } as a ball centered at x i H with radius h. Let c d < C d be positive constants, and for all u [ 0 , 1 ] d , define
ϕ x , θ ( h n ) : = P X u ( s 1 ) B θ 1 ( x 1 , h n ) , , X u ( s m ) B θ m ( x m , h n ) = F u , θ ( h n , x 1 , , x m ) ,
which satisfies the inequalities
0 < c d ϕ ( h ) f 1 ( x ) ϕ x , θ ( h ) C d ϕ ( h ) f 1 ( x ) ,
where ϕ ( h ) 0 as h , and f 1 ( x ) is a non-negative functional in x H m . Additionally, there exist constants C ϕ > 0 and ε 0 > 0 such that for any 0 < ε < ε 0 ,
0 ε ϕ ( u ) d u > C ϕ ε ϕ ( ε ) .
(M3) 
Let X s , A n = ( X s m , A n , , X s 1 , A n ) and X v , A n = ( X v 1 , A n , , X v m , A n ) and B θ ( x , h ) = i = 1 m B θ i ( x i , h ) . Assume
sup s , x , A n sup s v P ( ( X s , A n , X v , A n ) B θ ( x , h ) × B θ ( x , h ) ) ψ ( h ) f 2 ( x ) ,
where ψ ( h ) 0 as h 0 , and f 2 ( x ) is a non-negative functional in x H m . We assume that the ratio ψ ( h ) / ϕ 2 ( h ) is bounded.
(M4) 
H m × Θ m × σ : [ 0 , 1 ] R is bounded by some constant C σ < from above and by some constant c σ > 0 from below, that is, 0 < c σ σ ( x , θ , u ) C σ < for all u , x and θ.
(M5) 
σ ( · , · , · ) is Lipschitz continuous with respect to u .
(M6) 
sup u [ 0 , 1 ] m sup θ Θ m sup z : d θ ( x , z ) h | σ ( x , θ , u ) σ ( u , θ , z ) | = o ( 1 ) as h 0 .
(M7) 
r ( m ) ( x , θ , u ) is Lipschitz, and it satisfy
sup θ Θ m | r ( m ) ( x 1 , θ , u 1 ) r ( m ) ( x 2 , θ , u 2 ) | c m d H m , θ x 1 , x 2 α + u 1 u 2 α ,
for some c m > 0 , α > 0 and x 1 = ( x 1 , 1 , , x 1 , m ) , x 2 = ( x 2 , 1 , , x 2 , m ) H m , and it is twice continuously partially differentiable with first derivatives
u i r ( m ) ( x , θ , u ) = u i r ( m ) ( x , θ , u ) ,
and second derivatives
u i u j 2 r ( m ) ( x , θ , u ) = 2 u i u j r ( m ) ( x , θ , u ) .
Assumption 2.
[Kernel assumptions]
(KB1) 
The kernel K 2 ( · ) is non-negative, bounded by κ ˜ , and has support in [ 0 , 1 ] such that 0 < K 2 ( 0 ) and K 2 ( 1 ) = 0 . Moreover, the derivative K 2 ( v ) = d K 2 ( v ) / d v exists on [ 0 , 1 ] and satisfies C 1 K 2 ( v ) C 2 for two real constants < C 1 < C 2 < 0 .
(KB2) 
The kernel K ¯ : R d [ 0 , ) is bounded and has compact support [ C , C ] d . Moreover,
[ C , C ] d K ¯ ( x ) d x = 1 , [ C , C ] d x α K ¯ ( x ) d x = 0 , f o r a n y α Z d w i t h | α | = 1 ,
and
| K ¯ ( u ) K ¯ ( v ) | C u v .
(KB3) 
The bandwidth h converges to zero at least at a polynomial rate; that is, there exists a small ξ 1 > 0 such that h C n ξ 1 for some constant 0 < C < .
Assumption 3.
[Sampling design assumptions]
(S1) 
For any α N d with | α | = 1 , 2 , α f S ( s ) exists and is continuous on ( 0 , 1 ) d .
(S2) 
C 0 n A n d C 1 n η 1 for some C 0 , C 1 > 0 and small η 1 ( 0 , 1 ) .
Assumption 4.
[Block decomposition assumptions]
(B1) 
Let A 1 , n n 1 and A 2 , n n 1 be two sequences of positive numbers such that A 1 , n , A 2 , n , A 2 , n = o A 1 , n , and A 1 , n = o A n , or A 1 , n A n + A 2 , n A 1 , n C 0 1 n η 0 for some C 0 > 0 and η > 0 .
(B2) 
We have lim n n A n d = κ ( 0 , ] with A n n κ ¯ for some κ ¯ > 0 .
(B3) 
We have
1 n h m d ϕ ( h ) 1 / 3 A 1 , n A n 2 d / 3 A 2 , n A 1 , n 2 / 3 g 1 1 / 3 A 1 , n d k = 1 A n / A 1 , n k d 1 β 1 1 / 3 k A 1 , n + A 2 , n 0 .
(B4) 
We have lim n A n d A 1 , n d β A 2 , n ; A n d = 0 .
Assumption 5.
[Regularity conditions] Let α n = log n / n h m d ϕ ( h ) . As n ,
(R1) 
h ( m d ) ϕ ( h ) 1 α n m d A n d A 1 , n d β ( A 2 , n ; A n d ) 0 and A 1 , n d A n d n h m d ϕ ( h ) ( log n ) 0 ,
(R2) 
n 1 / 2 h ( m d ) / 2 ϕ ( h ) 1 / 2 / A 1 , n d n 1 / ζ C 0 n η for some 0 < C 0 < and η > 0 and ζ > 2 ,
(R3) 
A n d p ϕ ( h ) , where p is defined in the sequel.
Assumption 6.
(E1) 
For W s i , A n = j = 1 m ϵ s i j , A n , it holds that sup x H m E | W s , A n | ζ C and
sup x H m E | W s , A n | ζ X i , n = x C ,
for ζ > 2 and C < .
(E2) 
The β-mixing coefficients of the array X s , A n , W s , A n satisfy β ( a ; b ) β 1 ( a ) g 1 ( b ) with β 1 ( a ) 0 as a .
Assumption 7.
[Class of functions assumptions]
The classes of functions K m and F m satisfy the following conditions:
(C1) 
The class of functions F m is bounded, and its envelope function satisfies, for some 0 < M <
F ( y ) M , y Y m .
(C2) 
The class of functions F m is unbounded, and its envelope function satisfies, for some ζ > 2
θ ζ : = sup t S H m E F ζ ( Y ) x = x < .
(C3) 
The metric entropy of the class F m K m satisfies, for some 2 < ζ <
0 ( log N ( u , F m K Θ m , L ζ ( P m ) ) ) 1 2 d u < .

Comments on the Assumptions

Traditional statistical methods face challenges when applied to functional data. In our non-parametric functional regression model, we addressed the complex theoretical task of establishing functional central limit theorems for the conditional U-process under functional absolute regular data in a dual framework. This involved aligning assumptions with certain properties of the infinite-dimensional space, covering the topological structure on H m , the probability distribution of X , and the measurability of the classes F m and K Θ m . A discussion about these assumptions is crucial, drawing inspiration from works such as [28,30,49,71,73,79,83]. Assumption 1 initiates the formalization of the property of X i being locally stationary, with subsequent conditions providing additional precision. (M1) reflects the notion of a locally stationary time series, and various random fields satisfy this requirement, as demonstrated by [73]. Condition (M2), employed by [79], aligns with the fundamental axioms of probability calculus when H m = R m and exhibits exponential decay for an infinitely dimensional Hilbert space. Equation (18) controls the behavior of the small ball probability around zero, indicating that it can be approximated by the product of two independent functions ϕ m ( · ) and f 1 ( · ) . This condition is typical for various processes, including Ornstein–Uhlenbeck, general diffusion, and fractal processes. Conditions (M4), (M5), (M6) and Assumption 2 represent the regularity conditions, forming the foundation for the limiting theorems. Moreover, Assumption 3 addresses the flexibility of the non-uniform sampling density and the infill sampling criteria. The proposed sampling design accommodates both pure and mixed increasing domain cases. Condition (B1) in Assumption 4 relates to the Blocking technique used for decomposing the sampling region R n into large and small blocks. The sequences A 1 , n and A 2 , n are associated with the large-block–small-block argument, facilitating the proof of central limit theorems for sums of mixing random variables. Assumptions 6 contribute to deriving the weak convergence of the conditional U-statistic ψ ^ , while condition (C1) lays the groundwork for bounded functions. However, (C2) replaces it to establish the functional central limit theorem for conditional U-processes indexed by an unbounded class of functions.
For Assumption (C3), see (Examples 26 and 38, [111]), (Lemma 22, [106]), (§4.7., [107]), (Theorem 2.6.7, [108]), (§9.1, [109]) provide a number of sufficient conditions under which (C3) holds, we may also refer to (§3.2, [37,111,112,113]) for further discussions. For instance, it is satisfied, for general d 1 , whenever l ( x ) = Ψ ( p ( x ) ) , with p ( x ) is either a polynomial in d variables or the α th power of the absolute value of a real polynomial for some α > 0 and Ψ ( · ) is a real-valued function of bounded variation, which covers commonly used kernels, such as Gaussian, Epanečnikov, Uniform, etc., we refer the reader to (p. 1381, [114]). We also mention that condition (C3) is satisfied whenever that class of functions contains functions of bounded variation on R q (in the sense of Hardy and Kauser ([115,116,117]), see, e.g., [118,119,120,121]). Assumption (C3) ensures that L is VC type with characteristics C and ν . The condition (C3) as in [122] can formulated as follows for the kernel part:
(C4)
K ( x ) > 0 , is a bounded and compactly supported measurable function that belongs to the linear span (the set of finite linear combinations) of functions k ( x ) 0 satisfying the following property: the subgraph of k ( · ) , { ( s , u ) : k ( s ) u } , can be represented as a finite number of Boolean operations among sets of the form
( s , u ) : p ( s , u ) φ ( u ) ,
where p is a polynomial on R × R and φ is an arbitrary real function.
Indeed, for a fixed polynomial p, the family of sets
{ ( s , u ) : p ( ( s t ) / h , u ) φ ( u ) } : t R , h > 0
is contained in the family of positivity sets of a finite-dimensional space of functions, and then the entropy bound follows by Theorems 4.2.1 and 4.2.4 in [107]. These assumptions collectively provide a robust foundation for the investigation, encompassing the topological structure, probability measure, measurability, and uniformity in entropy characteristics of functional variables.
Remark 4.
Note that Assumption (C2) in (7) may be replaced by more general hypotheses on the moments of Y , as in [111,123]. Specifically, we can consider the following alternative assumption:
(C4)’ 
We denote by { M ( x ) : x 0 } a non-negative continuous function, increasing on [ 0 , ) , and such that, for some s > 2 , ultimately as x :
x s M ( x ) ; x 1 M ( x ) .
For each t M ( 0 ) , we define M i n v ( t ) 0 by M ( M i n v ( t ) ) = t . Assuming further that:
E ( M ( F ( Y ) ) ) < .
The following choices of M ( · ) are of particular interest:
(i) 
M ( x ) = x ζ for some ζ > 2 ;
(ii) 
M ( x ) = exp ( s x ) for some s > 0 .

3. Uniform Convergence Rates for Kernel Estimators

Before detailing the asymptotic behavior of our estimator as presented in (12), we will extend our analysis to a U-statistic estimator defined by:
ψ ^ ( x , θ , u ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n ,
where W s i , A n is an array of one-dimensional random variables. In this study, we consider results with W s i , A n = 1 and W s i , A n = j = 1 m ϵ s i j , A n .

3.1. Hoeffding’s Decomposition

Note ψ ^ ( x , θ , u ) is a standard U-statistic with a kernel depending on n. We define
ξ j : = 1 h d K ¯ u j s i j / A n h n , H ( Z 1 , , Z m ) : = j = 1 m 1 ϕ ( h ) K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n ,
thus, the U-statistic in (22) can be viewed as a weighted U-statistic of degree m:
ψ ^ ( x , θ , u ) = ( n m ) ! n ! i I n m ξ i 1 ξ i m H ( Z i 1 , , Z i m ) .
We can express Hoeffding’s decomposition in this scenario as in [124]. In the absence of assuming symmetry for W s i , A n or H, it is necessary to define
  • The expectation of H ( Z i 1 , , Z i m )
    θ ( i ) : = E H ( Z i 1 , , Z i m ) = W s i , A n j = 1 m 1 ϕ ( h ) K 2 d θ j ( x j , ν s i j , A n ) h n d P i ( z i ) .
  • For all { 1 , , m } the position of the argument, construct the function π ( · ) such that
    π ( z ; z 1 , , z m 1 ) : = ( z 1 , , z 1 , z , z , , z m 1 ) .
  • Define
    H ( ) z ; z 1 , , z m 1 : = H π z ; z 1 , , z m 1 , ϑ ( ) i ; i 1 , i 2 , , i m 1 : = ϑ π i ; i 1 , i 2 , , i m 1 .
Hence, the first-order expansion of H ( · ) will be seen as
H ˜ ( ) z : = E H ( ) z , Z 1 , , Z m 1       = W s ( 1 , , 1 , i , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d θ j ( x j , ν s j , A n ) h × 1 ϕ ( h ) K 2 d θ i ( x i , ν s i , A n ) h       × P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 )       : = 1 ϕ ( h ) K 2 d θ i ( x i , X s i , A n ) h W s i , A n × W s ( 1 , , 1 , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d θ j ( x j , ν s j , A n ) h       × P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) ,
with P is the underlying probability measure, and define
f i , i 1 , , i m 1 ( ) : = = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ ( ) z θ ( ) i ; i 1 , , i m 1 .
Then, the first order projection can be defined as
H ^ 1 , i ( x , θ , u ) : = ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) f i , i 1 , , i m 1 ( ) ,
where
I n 1 m 1 ( i ) : = 1 i 1 < < i m 1 n and i j i for all j { 1 , , m 1 } .
For the remainder terms, we denote by i i : = ( i 1 , , i l 1 , i l + 1 , , i m ) and for { 1 , , m } , let
H 2 , i ( z ) : = H ( z ) l = 1 m H ˜ i i ( ) ( z ) + ( m 1 ) ϑ ( i ) ,
where
H ˜ i i ( ) ( z ) = E H Z 1 , , Z 1 , z , Z + 1 Z m 1 ,
defined in (25), this projection gives the following remainder term:
ψ ^ 2 ( x , θ , u ) : = ( n m ) ! ( n ) ! i I n m ξ i 1 ξ i m H 2 , i ( z ) .
Finally, using Equations (27) and (29), and under conditions that
E H ^ 1 , i ( X , θ , u ) = 0 ,
E H 2 , i ( Z Z k ) = 0 a . s . ,
we obtain the Hoeffding [125] decomposition:
ψ ^ ( x , θ , u ) E ψ ^ ( x , θ , u ) = 1 n i = 1 n H ^ 1 , i ( x , θ , u ) + ψ ^ 2 , i ( x , θ , u ) : = ψ ^ 1 ( x , θ , u ) + ψ ^ 2 ( x , θ , u ) .

3.2. Strong Uniform Convergence Rate

We begin by presenting the following general result regarding the convergence rate of the U-process defined in (22).
Proposition 1.
Let F m K m be a measurable VC-subgraph class of functions, satisfying Assumption 7. Additionally, assume that Assumptions 2, 3, 4(B1), 5, 6 are satisfied. Then, the following result holds:
sup F m K Θ m sup θ Θ m sup x H m sup u [ 0 , 1 ] m ψ ^ ( x , θ , u ) E [ ψ ^ ( x , θ , u ) ] = O P . S log n n h m d ϕ ( h ) , P S a . s .
Next, the uniform rate of convergence of the estimator (12) of the mean function r ( m ) in the model (4) will be given, using the results of the last proposition.
Theorem 1.
Let F m K m be a measurable VC-subgraph class of functions satisfying Assumption 7. Let I h = [ C 1 h , 1 C 1 h ] d m and let S c be a compact subset of H m . Suppose that
inf u [ 0 , 1 ] d f ( u ) > 0 .
Then, under Assumptions 1–3, Condition (B1) in Assumptions 4–6 (with W s i , A n = 1 and ϵ s i , A n ), the following result holds for P S almost surely
sup F m K Θ m sup θ Θ m sup x S c sup u I h r ^ n ( m ) ( x , u ; h n ) r ( m ) ( x , u ; h n )       = O P . | S log n / n h m d ϕ ( h ) + h 2 α + 1 A n d p ϕ 1 / m ( h ) ,
where p = min { 1 , ρ } , and ρ > 0 given in Definition 1.
Remark 5.
We can consider the scenario where Θ = Θ n , where Θ n satisfies the conditions card Θ n = n α with α > 0 , and for every θ Θ n , we have θ θ 0 , θ θ 0 1 / 2 C 7 b n , where b n converges to zero, as discussed in [62].
The functional directions set, Θ n , is constructed following a similar approach to [54], as outlined below:
(i) Each direction θ Θ n is derived from a d n -dimensional space formed by B -spline basis functions, denoted by e 1 ( · ) , , e d n ( · ) . Thus, we express directions as
θ ( · ) = j = 1 d n α j e j ( · ) where α 1 , , α d n V
(ii) The set of coefficient vectors in (32), denoted by V , is generated through the following steps:
Step 1
For each β 1 , , β d n C d n , where C = c 1 , , c J R J represents a set of J ’seed-coefficients’, construct the initial functional direction as
θ init ( · ) = j = 1 d n β j e j ( · ) .
Step 2 
For each θ init from  Step 1   satisfying θ init t 0 > 0 , where t 0 denotes a fixed value in the domain of θ init ( · ) , compute θ init , θ init and form α 1 , , α d n = β 1 , , β d n / θ init , θ init 1 / 2 .
Step 3 
Define V as the collection of vectors ( α 1 , , α d n ) obtained in  Step 2. Consequently, the final set of permissible functional directions is represented as
Θ n = θ ( · ) = j = 1 d n α j e j ( · ) ; α 1 , , α d n V .
Remark 6.
It is important to highlight that the error term A n d p ϕ x , θ 1 / m ( h n ) arises from approximating the functional time series X s , A n with a functional stationary random field X u ( s ) .

4. Weak Convergence for Kernel Estimators

In this section, our focus lies in examining the weak convergence of the conditional U-process, as expressed in Equation (12), when dealing with absolutely regular spatial observations. The following theorem stands as the primary result in this study, outlining the weak convergence of the functional locally stationary random field estimator. Let us denote, for φ 1 , φ 2 F m ,
σ ( φ 1 , φ 2 ) = lim n E . S ( n h m d ϕ x , θ 1 / m ( h n ) ( r ^ n ( m ) ( φ 1 , x , θ , u ; h n ) r ( m ) ( φ 1 , x , θ , u ) × n h m d ϕ x , θ 1 / m ( h n ) ( r ^ n ( m ) ( φ 2 , x , θ , u ; h n ) r ( m ) ( φ 2 , x , θ , u ) ) .
Theorem 2.
Let F m K Θ m be a measurable VC-subgraph class of functions satisfying Assumption 7. Suppose that f S ( u ) > 0 and ϵ s i j , A n = σ s i j / A n , x ϵ i j , where σ ( · , · ) is continuous, and { ϵ i } i = 1 n is a sequence of i.i.d. random variables with a mean of 0 and variance 1. Moreover, suppose n h m ( d + 1 ) + 4 c 0 for a constant c 0 . If all assumptions assumed in Theorem 1 hold in addition to the Conditions (B2), (B3), and (B4), then the following result holds for P S almost surely:
n h m d ϕ x , θ 1 / m ( h ) r ^ n ( m ) ( φ , x , θ , u ; h ) r ( m ) ( φ , x , θ , u ) B u , x , θ
converges to a Gaussian process G n over F m K Θ m , whose simple paths are bounded and informally continuous with respect to . 2 norm, with a covariance function given by (33), and where the bias term B u , x , θ = O P . S ( h 2 α ) .
Remark 7.
Set A n d = O n 1 η ¯ 1 for some η ¯ 1 [ 0 , 1 ) , A 1 , n = O A n γ A 1 , A 2 , n = O A n γ A 2 with 0 < γ A 2 < γ A 1 < 1 / 3 and p = min { 1 , ρ } = 1 . Assume that we can take a sufficiently large ζ > 2 such that 2 ζ < 1 η ¯ 1 1 3 γ A 1 . Then, Assumption 4 is satisfied for d 1 .
Remark 8.
It is straightforward to modify the proofs of our results to show that it remains true when the entropy condition is substituted by the bracketing condition. (Given two functions l and u, the interval [ l , u ] represents the set of all functions f such that l f u . An ε-bracket is defined as [ l , u ] with u l < ε . The bracketing number N [ ] ( F , · , ε ) corresponds to the minimum number of ε-brackets required to encompass the class F . The entropy with bracketing is expressed as the logarithm of the bracketing number. It is important to note that, in the definition of the bracketing number, the upper and lower bounds u and l of the brackets need not be part of F itself, but they are assumed to have finite norms, refer to Definition 2.1.6 in [108].) For some C 0 > 0 and v 0 > 0 ,
N [ ] F , L 2 ( P ) , ϵ C 0 ϵ v 0 , 0 < ϵ < 1 .
Remark 9.
There are essentially no constraints on the choice of the kernel function in our framework, as long as it satisfies some mild conditions. However, selecting the bandwidth is a more intricate task. It is crucial for achieving a good consistency rate, impacting the bias size of the estimate significantly. Generally, we aim for a bandwidth selection that strikes a balance between bias and variance, adapting to the applied criteria and the available data. This flexibility cannot be achieved through classical methods. For more details and discussions on this subject, interested readers can refer to [126,127]. It would be interesting to establish uniform-in-bandwidth central limit theorems in our setting. Specifically, we want to let h > 0 vary such that h n h h n , where h n n 1 and h n n 1 are two sequences of positive constants, ensuring 0 < h n h n < . For either choice of h n = h n or h n = h n fulfilling our conditions, it will be of interest to demonstrate that
sup h n h h n n h m d ϕ x , θ 1 / m ( h n ) r ^ n ( m ) ( φ , x , θ , u ; h ) r ( m ) ( φ , x , θ , u ) B u , x , θ
converges to a Gaussian process G n over F m K Θ m .

5. Applications

5.1. Metric Learning

Metric learning has attracted considerable attention in recent years due to its ability to tailor the metric to the underlying data. For an extensive overview of metric learning and its diverse applications, please refer to [128,129]. This concept finds utility in various fields, ranging from computer vision to bioinformatics-driven information retrieval. To illustrate the practical significance of metric learning, consider the supervised classification scenario described in [129]. We examine dependent copies
( X s 1 , A n , Y 1 ) , , ( X s n , A n , Y n )
of an H × Y valued random couple ( X , Y ) , where H represents a feature space and Y = { 1 , , C } is a finite set of labels with C 2 . The objective of metric learning in this context is to identify a measure under which points with the same label are close together, while those with different labels are far apart. The risk of a metric D is conventionally defined as follows:
R ( D ) = E ϕ 1 D X , X · 2 1 I Y = Y 1 ,
where ϕ ( u ) is a convex loss function upper bounding the indicator function 1 I { u 0 } —for instance, the hinge loss ϕ ( u ) = max ( 0 , 1 u ) . To estimate R ( D ) , the natural empirical estimator is given by
R n ( D ) = 2 n ( n 1 ) 1 i < j n K ¯ u i s i / A n h n K ¯ u j s j / A n h n × ϕ D X s i , A n , X s j , A n 1 · 2 1 I Y i = Y j 1 ,
which is a one-sample U-statistic of degree two with the kernel given by
φ D ( x , y ) , x , y = ϕ D x , x 1 · 2 1 I y = y 1 .
The convergence of a minimizer of (35) to (34) in the non-spatial setting has been studied within the frameworks of algorithmic stability [130], algorithmic robustness [131], and based on the theory of U-processes under appropriate regularization [132].

5.2. Ranking Problems

For its great importance, the problem of ranking instances has received particular attention in machine learning. The ordering problems have many applications in different areas of banking (data mining process for direct marketing data extraction), document type classification, and so on. In some specific ranking problems, it is necessary to compare two different observations based on their observed characteristics and decide which one is better instead of simply classifying them. The problems of ordering/ranking are frequent problems in which U-statistics come into play. In this kind of challenge, the aim is to establish a universal and consistent ordering method. Suppose that we want to establish an order between the first components of the two pairs X , Y , X , Y of independent and identically distributed observations in X × R . The variables Y and Y are respective labels of the variables X and X that we want to order by observing them (and not their labels). Usually, we decide that X is better than X if Y > Y . To see things more clearly, we introduce the new variable
Z = Y Y 2 ,
then Y > Y is equivalent to Z > 0 . As mentioned, the goal is to establish a classification rule between X and X with minimal risk, i.e., the probability that the label of the highest-ranked variable is the smallest is small. Mathematically speaking, the decision rule is given by the function
r ( x , x ) = 1 if x > x , 1 else .
The following ranking risk gives the performance measure of r:
L ( r ) = P Z . r X , X .
A natural estimate for L ( · ) according to [133] is
L n ( r ) : = 1 n ( n 1 ) i j 1 I Z i , j . r ( X i , X j ) < 0 ,
where X 1 , Y 1 , , X n , Y n are n independent, identically distributed copies of ( X , Y ) , and Z i , j = Y i Y j 2 . One can easily see that L n is a U-statistic with k = 2 . For more details the reader is invited to consult [133,134].

5.3. Multipartite Ranking

Let us present the problem from [129]. Consider a random vector X H representing attributes or features and associated ordinal labels Y { 1 , , K } . In multipartite ranking, the objective, given a training set of labeled examples, is to order the features according to the assigned labels. This statistical learning problem finds applications in various fields such as medicine, finance, search engines, and e-commerce. Rankings are typically established using a scoring function s: H R , which maps the feature space to the natural order on the real line. The ROC manifold, or its summary measure VUS (Volume Under the ROC Surface), serves as the benchmark for evaluating the ranking performance of s ( x ) . According to [135], the most effective scoring functions are those optimal for all bipartite subproblems. Specifically, they are increasing transformations of the likelihood ratio dF k + 1 / dF k , where F k denotes the class-conditional distribution for the kth class. In cases where there is a set of optimal scoring functions, ref. [135] demonstrated that it coincides with the set of functions maximizing the space under the ROC surface
VUS ( s ) = P s X s 1 < < s X s K Y 1 = 1 , , Y K = K .
Given K independent samples X s 1 , A n k ( k ) , , X s n k , A n k ( k ) with distribution F k ( d x ) for k = 1 , , K , the empirical counterpart of the VUS can be written in the following way:
VUS ^ ( s ) = 1 k = 1 K n k i 1 = 1 n 1 i k = 1 n K j = 1 K K ¯ u j s i j / A n h n 1 I s X s i 1 , A n 1 ( 1 ) < < s s i K , A n K ( K ) .
The empirical VUS (36) is a K-sample U-statistic of degree ( 1 , , K ) with a kernel given by
φ s x 1 , , x K = 1 I s x 1 < < s x K .

5.4. Discrimination

Now, we implement the findings in addressing the discrimination problem detailed in Section 3 of [136], with additional reference to [137]. We adopt similar notations and configurations. Consider any function φ ( · ) with a finite range, say 1 , , M . Define sets
A j = ( y 1 , , y k ) : φ ( y 1 , , y k ) = j , 1 j M .
These sets form a partition of the feature space. Predicting the value of φ ( y 1 , , y k ) is equivalent to predicting the set in the partition to which ( Y 1 , , Y k ) belongs. For any discrimination rule g ( · ) , the following inequality holds:
P ( g ( X 1 , , X m ) = φ ( Y 1 , , Y m ) ) j = 1 M { x : g ( t ) = j } max m j ( t , θ ) d P ( t ) ,
where, for u = i A n ,
m j ( t , θ ) = m j ( t , θ , u ) = P ( φ ( Y 1 , , Y m ) = j X 1 , , X m , θ = t , θ ) , t H m .
Equality in the above holds if
g 0 ( t , θ ) = arg max 1 j M m j ( t , θ ) .
The function g 0 ( · ) is the Bayes rule and the associated probability of error
L * = 1 P ( g 0 ( X 1 , , X m ) = φ ( Y 1 , , Y m ) ) = 1 E max 1 j M m j ( t , θ )
is the Bayes risk. Each of the unknown functions m j ( · ) can be consistently estimated by methods discussed in the preceding sections. For 1 j M , let
m n j ( x , θ , u ) = i I n m 1 I { φ ( Y s i 1 , A n , , Y s i m , A n ) = j } W i ( x , θ , u ) ,
where
W i ( m ) ( x , θ , u ) = j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n .
Define
g 0 , n ( t , θ ) = arg max 1 j M m n j ( t , θ , u ) .
Introduce
L n * = P ( g 0 , n ( X 1 , , X m ) φ ( Y 1 , , Y m ) ) .
It can be demonstrated that the discrimination rule g 0 , n ( · ) is asymptotically Bayes’ risk consistent:
L n * L * .
This follows from the evident relation
L * L n * 2 E max 1 j M m n j ( X , θ , u ) m j ( X , θ ) .

5.5. Kendall Rank Correlation Coefficient

To test the independence of one-dimensional random variables Y 1 and Y 2 , Ref. [138] proposed a method based on the U-statistic K n with the kernel function:
φ s 1 , t 1 , s 2 , t 2 = 1 I s 2 s 1 t 2 t 1 > 0 1 I s 2 s 1 t 2 t 1 0 ·
Its rejection on the region is of the form n K n > γ . In this example, we consider a multivariate case. To test the conditional independence of ξ , η : Y = ( ξ , η ) given X, we propose a method based on the conditional U-statistic:
r ^ n ( 2 ) ( φ , x , θ , u ) = i I n 2 φ Y i 1 , Y i 2 W i ( 2 ) ( x , θ , u ) ,
where x = x 1 , x 2 I R 2 and φ ( · ) is Kendall’s kernel (38). Suppose that ξ and η are d 1 and d 2 -dimensional random vectors, respectively, and d 1 + d 2 = d . Furthermore, suppose that Y 1 , , Y n are observations of ( ξ , η ) that we are interested in testing:
H 0 : ξ a n d η a r e c o n d i t i o n a l l y i n d e p e n d e n t g i v e n X . v s H a : H 0 i s n o t t r u e .
Let a = a 1 , a 2 R d such as a = 1 and a 1 R d 1 , a 2 R d 2 , and F ( · ) , G ( · ) be the distribution functions of ξ and η respectively. Suppose F a 1 ( · ) and G a 2 ( · ) to be continuous for any unit vector a = a 1 , a 2 where F a 1 ( t ) = P a 1 ξ < t and G a 2 ( t ) = P a 2 η < t and a 1 T means the transpose of the vector a i , 1 i 2 . For n = 2 , let Y ( 1 ) = ξ ( 1 ) , η ( 1 ) and Y ( 2 ) = ξ ( 2 ) , η ( 2 ) such as ξ ( i ) R d 1 and η ( i ) R d 2 for i = 1 , 2 , and:
φ a Y ( 1 ) , Y ( 2 ) = φ a 1 ξ ( 1 ) , a 2 η ( 1 ) , a 1 ξ ( 2 ) , a 2 η ( 2 ) .
An application of Theorem 1 gives
r ^ n ( 2 ) ( φ a , x , θ , u ) r ( 2 ) ( φ a , x , θ , u ) 0 , in   probability .

5.6. Set Indexed Conditional U-Statistics

Our goal is to examine the connections between X and Y by estimating functional operators related to the conditional distribution of Y given X, such as the regression operator, for C 1 × × C m : = C ˜ within a class of sets C m ,
G ( m ) C 1 × × C m , x , θ , s i A n       = E i = 1 m 1 { Y i C i } X s 1 , A n , θ 1 = x 1 , θ 1 , , X s m , A n , θ m = x m , θ m , for x H m .
We define the metric entropy with the inclusion of the class of sets C of R d . For each ε > 0 , the covering number is defined as:
N ( ε , C , G ( 1 ) ( · θ , x ) ) = inf { n N : C 1 , , C n C such that C C 1 i , j n with C i C C j and G ( 1 ) ( C j C i x ) < ε } ,
where log ( N ( ε , C , G ( 1 ) ( · θ , x ) ) ) is called metric entropy with the inclusion of C with respect to G ( 1 ) ( · θ , x ) . The quantity log N ( ε , C , G ( 1 ) ( · θ , x ) ) is called metric entropy with inclusion of C with respect to G ( · θ , x ) . Estimates for such covering numbers are known for many classes (e.g., [139]). We will often assume below that either log N ( ε , C , G ( 1 ) ( · θ , x ) ) or N ( ε , C , G ( 1 ) ( · θ , x ) ) behaves like powers of ε 1 : we say that the condition ( R γ ) holds if
log N ( ε , C , G ( 1 ) ( · θ , x ) ) H γ ( ε ) , for all ε > 0 ,
where
H γ ( ε ) = log ( A ε ) if γ = 0 , A ε γ if γ > 0 ,
for some constants A , r > 0 . As in [140], it is worth noticing that the condition (41), γ = 0 , holds for intervals, rectangles, balls, ellipsoids, and for classes which are constructed from the above by performing set operations union, intersection and complement finitely many times. The classes of convex sets in R d ( d 2 ) fulfill the condition (41), γ = ( d 1 ) / 2 . This and other classes of sets satisfying (41) with γ > 0 can be found in [139]. As a particular case of (12), we estimate G ( m ) ( C 1 × × C m x , θ )
G ^ n ( m ) ( C ˜ , x , θ , u , h n ) = ( i 1 , , i m ) I ( m , n ) j = 1 m 1 { Y i j C j } W i ( m ) ( x , θ , u ) ,
where W i ( m ) is defined in (37). One can apply Theorem 1 to infer that
sup C ˜ × K ˜ C m K m sup θ Θ m sup x S c sup u I h G ^ n ( m ) ( C ˜ , x , θ , u ; h n ) G ( m ) ( C ˜ , x , θ , u ) 0 , in   probability .

5.7. Generalized U-Statistics

The extension to the case of several samples is straightforward. Consider k independent collections, of independent observations
X 1 ( 1 ) , Y 1 ( 1 ) , X 2 ( 1 ) , Y 2 ( 1 ) , , , X 1 ( k ) , Y 1 ( k ) , X 2 ( k ) , Y 1 ( k ) , .
Let, for t H ( m 1 + + m k ) ,
r ( m , k ) ( φ , t , θ , u ) = r ( m , k ) ( φ , t 1 , , t k , , θ 1 , , θ k , s 1 A n , , s k A n )     =     E φ Y 1 ( 1 ) , , Y m 1 ( 1 ) ; ; Y 1 ( k ) , , Y m k ( k ) X 1 , A n ( j ) , , X m j , A n ( j ) , θ j = t j , θ j , j = 1 , , k ,
where φ is assumed, without the loss of generality, to be symmetric within each of its k blocks of arguments. Corresponding to the “kernel” φ and assuming n 1 m 1 , , n k m k , the the conditional U -statistic for estimation of r ( m , k ) ( φ , t ) is defined as
r ^ n ( m , k ) ( φ , t , u , θ , h n )     =     c φ Y i 11 ( 1 ) , Y i 1 m 1 ( 1 ) ; ; Y i k 1 ( k ) , , Y i k m k ( k ) K X i 11 ( 1 ) , X i 1 m 1 ( 1 ) ; ; X i k 1 ( k ) , , X i k m k ( k ) c K X i 11 ( 1 ) , X i 1 m 1 ( 1 ) ; ; X i k 1 ( k ) , , X i k m k ( k ) ,
where
K X i 11 ( 1 ) , X i 1 m 1 ( 1 ) ; ; X i k 1 ( k ) , , X i k m k ( k )     =     ν = 1 k j ν = 1 m = 1 d K 1 u j ν , s i j ν , A n h n K 2 d θ j ν ( x j ν , X s i j ν , A n ) h n .
Here i j 1 , , i j m j denotes a set of m j distinct elements of the set { 1 , 2 , , n j 1 j k , and c denotes summation over all such combinations. The extension of [125] treatment of one-sample U -statistics to the k sample case is due to [141,142]. One can use Theorem 1 gives to infer that
max 1 j k sup t S H m 1 + + m k r ^ n ( m , k ) ( φ , t , u , θ , h n ) r ( m , k ) ( φ , t , u , θ ) 0 in   probability .

6. Extension to the Censored Case

Consider the triple ( Y , C , X ) of random variables defined in R × R × H , where Y is the variable of interest, C is a censoring variable, and X is a concomitant variable. We work with a sample { ( Y i , C i , X s i , A n ) } , replicating identically distributed instances of ( Y , C , X ) for n 1 . In the right-censorship model, the pairs ( Y i , C i ) , 1 i n , are not directly observed. Instead, the corresponding information is given by Z i : = min { Y i , C i } and δ i : = 1 I { Y i C i } , 1 i n . The observed sample is denoted as D n = { ( Z i , δ i , X s i , A n ) , i = 1 , , n } . Such censoring is common in survival data in clinical trials or failure time data in reliability studies. We impose assumptions on the distribution of ( X , Y ) . For < t < , let F Y ( t ) , G ( t ) , and H ( t ) be the right-continuous distribution functions of Y, C, and Z, respectively. Define T L = sup { t R : L ( t ) < 1 } for any right-continuous distribution function L. Consider a pointwise measurable class F of real measurable functions defined on R , assuming that F is of VC-type. We recall the regression function of ψ ( Y ) evaluated at X = x , denoted as r ( 1 ) ( ψ , x ) = E ( ψ ( Y ) X = x ) , when Y is right-censored. To estimate r ( 1 ) ( ψ , · ) , we utilize Inverse Probability of Censoring Weighted (I.P.C.W.) estimators, which have gained popularity in the censored data literature (see [143,144]). The fundamental concept behind I.P.C.W. estimators can be outlined as follows. Introduce the real-valued function Φ ψ ( · , · ) defined on R 2 by
Φ ψ ( y , c ) = 1 I { y c } ψ ( y c ) 1 G ( y c ) .
Assuming the known function G ( · ) , observe that Φ ψ ( Y i , C i ) = δ i ψ ( Z i ) / ( 1 G ( Z i ) ) is observed for every 1 i n . Additionally, under Assumption ( I ) stating that C and ( Y , X ) are independent, we have
r ( 1 ) ( Φ ψ , x ) : = E ( Φ ψ ( Y , C ) X = x ) = E 1 I { Y C } ψ ( Z ) 1 G ( Z ) X = x = E ψ ( Y ) 1 G ( Y ) E ( 1 I { Y C } X , Y ) X = x = r ( 1 ) ( ψ , x ) .
Consequently, any estimate of r ( 1 ) ( Φ ψ , · ) , derived from fully observed data, serves as an estimate for r ( 1 ) ( ψ , · ) as well. This property facilitates the natural extension of most statistical procedures designed for estimating the regression function in the uncensored case to the censored case. For example, constructing kernel-type estimates is particularly straightforward. For x H , h l n , and 1 i n , define
ω ¯ n , K 1 , 2 , h n , i ( 1 ) ( x , θ , u ) : = = 1 d K 1 u s j , A n h n K 2 d θ ( x , X s j , A n ) h n j = 1 n = 1 d K 1 u s j , A n h n K 2 d θ ( x , X s j , A n ) h n .
Given (45)–(47), and assuming knowledge of G ( · ) , a kernel estimator of r ( 1 ) ( ψ , · ) is expressed as
r ˘ n ( 1 ) ( ψ , x , θ , u ; h n ) = i = 1 n ω ¯ n , K 1 , 2 , h n , i ( 1 ) ( x , θ , u ) δ i ψ ( Z i ) 1 G ( Z i ) .
However, since G ( · ) is generally unknown, it needs to be estimated. Denote by G n * ( · ) the Kaplan–Meier estimator of G ( · ) [145]. With conventions = 1 and 0 0 = 1 , and setting N n ( u ) = i = 1 n 1 I { Z i u } , we have
G n * ( u ) = 1 i : Z i u N n ( Z i ) 1 N n ( Z i ) ( 1 δ i ) , for u R .
Given this notation, we consider the following estimator of r ( 1 ) ( ψ , · )
r ˘ n ( 1 ) * ( ψ , x , θ , u ; h n ) = i = 1 n ω ¯ n , K 1 , 2 , h n , i ( 1 ) ( x , θ , u ) δ i ψ ( Z i ) 1 G n * ( Z i ) ,
as discussed in [143,146]. Adopting the convention 0 / 0 = 0 , this quantity is well defined. The expression G n * ( Z i ) = 1 holds if and only if Z i = Z ( n ) and δ ( n ) = 0 . Here, Z ( k ) is the kth ordered statistic associated with the sample ( Z 1 , , Z n ) for k = 1 , , n , and δ ( k ) is the δ j corresponding to Z k = Z j . Given that the variable of interest is right-censored, functional estimation of the (conditional) law cannot generally be achieved on the complete support. To establish our results, we operate under the following assumptions.
(A.1) 
F = { ψ : = ψ 1 1 I { ( , τ ) m } , ψ 1 F 1 } , where τ < T H , and F 1 is a pointwise measurable class of real measurable functions defined on R and of VC type.
(A.2) 
The class of functions F has a measurable and uniformly bounded envelope function Υ such that
Υ ( y 1 , , y k ) sup ψ F ψ ( y 1 , , y k ) , y i T H .
Combining the results of Proposition 9.6 and Lemma 9.7 from [147] and Theorem 1, we have, in probability,
sup x , θ , u r ˘ n ( 1 ) * ( ψ , x , θ , u ; h n ) E ^ ( r ˘ n ( 1 ) * ( ψ , x , θ , u ; h n ) ) 0 .
A right-censored version of an unconditional U-statistic with a kernel of degree m 1 is introduced by the principle of a mean-preserving reweighting scheme in [148]. Stute and Wang [149] have proved almost sure convergence of multi-sample U-statistics under random censorship and provided applications by considering the consistency of a new class of tests designed for testing equality in distribution. To overcome potential biases arising from right-censoring of the outcomes and the presence of confounding covariates, Chen and Datta [150] proposed adjustments to the classical U-statistics. Yuan et al. [151] proposed a different way in the estimation procedure of the U-statistic by using a substitution estimator of the conditional kernel given the observed data. To the best of our knowledge, the problem of the estimation of conditional U-statistics has been open until now and serves as the main motivation for this section. A natural extension of the function defined in (45) is given by
Φ ψ ( y 1 , , y m , c 1 , , c m )     =     i = 1 m { 1 I { y i c i } ψ ( y 1 c 1 , , y m c m ) i = 1 m { 1 G ( y i c i ) } .
From this, we have an analogous relation to (46) given by
E ( Φ ψ ( Y 1 , , Y m , C 1 , , C m ) X 1 , , X m , θ = t , θ ) = E i = 1 m { 1 I { Y i C i } ψ ( Y 1 C 1 , , Y k C m ) i = 1 m { 1 G ( Y i C i ) } X 1 , , X m , θ = t , θ = E ψ ( Y 1 , , Y m ) i = 1 m { 1 G ( Y i ) } E i = 1 m { 1 I { Y i C i } ( Y 1 , X 1 ) , ( Y m , X m ) X 1 , , X m , θ = t , θ = E ψ ( Y 1 , , Y m ) X 1 , , X m , θ = t , θ = m ψ ( t , θ ) .
An analogous estimator to (12) in the censored case is given by
r ˘ n ( m ) ( ψ , t , θ , u ; h n )     =     ( i 1 , , i m ) I ( m , n ) δ i 1 δ i m ψ ( Z i 1 , , Z i m ) ( 1 G ( Z i 1 ) ( 1 G ( Z i k ) ) ω ¯ n , K 1 , 2 , h n , i ( m ) ( t , θ , u ) ,
where, for i = ( i 1 , , i m ) I n m ,
ω ¯ n , K 1 , 2 , h n , i ( k ) ( x , θ , u ) j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n .
The estimator that we will investigate is given by
r ˘ n ( m ) * ( ψ , t , θ , u ; h n ) = ( i 1 , , i k ) I ( m , n ) δ i 1 δ i k ψ ( Z i 1 , , Z i m ) ( 1 G n * ( Z i 1 ) ( 1 G n * ( Z i m ) ) ω ¯ n , K 1 , 2 , h n , i ( k ) ( t , θ , u ) .
Corollary 1.
Under the Assumptions  (A.1)–(A.2)   and the conditions of Theorem 1, we have
sup θ Θ m sup x H m sup u I h , x S c r ˘ n ( m ) * ( ψ , t , θ , u ; h n ) E r ˘ n ( m ) * ( ψ , t , θ , u ; h n ) = O P . | S log n / n h m d ϕ ( h ) + 1 A n d p ϕ 1 / m ( h ) .
In the last corollary, we use the law of iterated logarithm for G n * ( · ) established in [152] ensuring that
sup t τ | G n * G ( t ) | = O log log n n , almost surely as n .

6.1. Conditional U-Statistics for Left Truncated and Right Censored Data

Keeping in mind the notation of the preceding section, we now introduce a truncation variable denoted as L and assume that ( L , C ) is independent of Y. Let us consider a situation where we have random vectors ( Z i ϵ i , Δ i ) , with ϵ i = 1 I ( L i Z i ) . In this section, we aim to define U-statistics conditional for data that are left truncated and right censored (LTRC), by following ideas from [153] in the unconditional setting. To achieve this, we propose an extension of the function (51) for LTRC data as follows:
Φ ˜ ψ ( y 1 , , y k , l 1 , , l k , c 1 , , c k )     =     ψ ( y 1 c 1 , , y k c k ) i = 1 k 1 I { y i c i } 1 I { l i z i } i = 1 k P l i < z i < c i .
According to (52), we obtain that
E ( Φ ψ ( Y 1 , , Y k , L 1 , , L k , C 1 , , C k ) ( X 1 , , X k ) = t ) = E ψ ( Y 1 , , Y k ) ( X 1 , , X k ) = t .
An analog estimator to (12) for LTRC data can be expressed as follows:
r ˘ ˘ n ( k ) ( ψ , θ , u , x , h n )     =     ( i 1 , , i k ) I n k Δ i 1 Δ i k ϵ i 1 ϵ i k ψ ( Z i 1 , , Z i k ) P L i 1 < Z i 1 < C i 1 P L i k < Z i k < C i k ω ¯ n , i ( k ) ( θ , u , x ) ,
where ω ¯ n , i ( k ) ( θ , u , x ) is defined as in (54). As P L i < Z i < C i is not known, we need to estimate it. We introduce N i ( t ) = 1 I ( L i < Z i t , Δ i = 1 ) and N i c ( t ) = 1 I L i < Z i t , Δ i = 0 as the counting process corresponding to the variable of interest and the censoring variable. Furthermore, let
N ( t ) = i = 1 n N i ( t )
and
N c ( t ) = i = 1 n N i c ( t ) .
We introduce the risk indicators as R i ( t ) = 1 I Z i t L i and
R ( t ) = i = 1 n R i ( t ) .
It is important to note that the risk set R ( t ) at t contains the subjects who entered the study before t and are still under study at t. Indeed, N i c ( t ) is a local sub-martingale with the appropriate filtration F t . The martingale associated with the censoring counting process with filtration F t is given by
M i c ( t ) = N i c ( t ) 0 t R i ( u ) λ c ( u ) d u , i = 1 , 2 , n .
Here, λ c ( · ) represents the hazard function associated with the censoring variable C under left truncation. The cumulative hazard function for the censoring variable C is defined as
Λ c ( t ) = 0 t λ c ( u ) d u .
Denote
M c ( t ) = i = 1 n M i c ( t ) .
Now, we define the sub-distribution function of T 1 corresponding to Δ 1 = 1 and ϵ 1 = 1 as
S ( x ) = P T 1 x , Δ 1 ϵ 1 = 1 .
Let
w ( t ) = 0 h 1 ( x ) P L 1 x C 1 1 I ( x > t ) d S ( x ) ,
where
h 1 ( x ) = E ψ T 1 , Δ 1 , , T k , Δ k T 1 , Δ 1 = x , Δ 1 .
Also, denote
z ˜ ( t ) = P T 1 t L 1 .
Then, an estimate for the survival function of the censoring variable C under left truncation, denoted as K ^ c ( · ) , see [154], can be formulated as follows:
K ^ c ( τ ) = t τ 1 d N c ( t ) Z ˜ ( t ) .
Similar to the Nelson–Aalen estimator, for instance, see [155], the estimator for the cumulative hazard function of the censoring variable C under left truncation is represented as
Λ ^ c ( τ ) = 0 τ d N c ( t ) Z ˜ ( t ) .
In both the definitions presented in (58) and (59), we make the assumption that Z ˜ ( t ) is non-zero with a probability of one. The interrelation between K ^ c ( τ ) and Λ ^ c ( τ ) can be expressed as
K ^ c ( τ ) = exp Λ ^ c ( τ ) .
Let a K = inf { t : K ( t ) > 0 } and b K = sup { t : K ( t ) < 1 } denote the left and right endpoints of the support. For LTRC data, as in [156], F ( · ) is identifiable if a G a W and b G b W . By Corollary 2.2. [156], for b < b W we readily infer that
sup a W < τ < b | K ^ c ( τ ) K c ( τ ) | = O ( n 1 log log n ) .
From the above, the estimator (57) can be rewritten directly as follows:
r ˘ ˘ n ( k ) * ( ψ , θ , u , x , h n )     =     ( i 1 , , i k ) I n k Δ i 1 Δ i k ϵ i 1 ϵ i k ψ ( Z i 1 , , Z i k ) K ^ c Z i 1 K ^ c Z i k ω ¯ n , i ( k ) ( θ , u , x ) .
The last estimator is the conditional version of that studied in [153]. Following the same reasoning as that of Corollary 1, one can infer that as n ,
sup θ Θ m sup x H m sup u I h , x S c r ˘ ˘ n ( k ) * ( ψ , θ , u , x , h n ) E r ˘ ˘ n ( k ) * ( ψ , θ , u , x , h n ) = O P . | S log n / n h m d ϕ ( h ) + 1 A n d p ϕ 1 / m ( h ) .

7. The Bandwidth Selection Criterion

Various methodologies have been developed to derive asymptotically optimal bandwidth selection rules for nonparametric kernel estimators, particularly for the Nadaraya–Watson regression estimator. Prominent works in this field include contributions from [157,158,159]. The appropriate choice of this parameter is crucial, whether in the standard finite-dimensional case or within the infinite-dimensional framework, to ensure optimal practical performance. However, to the best of our knowledge, investigations addressing the selection of such a general functional conditional U-statistic are currently lacking. Nevertheless, an extension of the leave-one-out cross-validation procedure enables the definition of, for any fixed j = ( j 1 , , j m ) I n m ,
r ^ n , j ( m ) ( x , θ , u ; h n ) = i I n m ( j ) φ ( Y s i 1 , A n , , Y s i m , A n ) W ^ n , i ( m ) ( x , θ , u ; h n ) ,
where
W ^ n , i ( m ) ( x , θ , u ; h n ) = j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n i I n m j = 1 m = 1 d K 1 u j , s i j , A n h n K 2 d θ j ( x j , X s i j , A n ) h n ,
and
I n m ( j ) : = i I n m a n d i j = I n m { j } .
The Equation (62) serves as the leave-out- X j , Y j estimator for functional regression and can also be regarded as a predictor for φ ( Y s j 1 , A n , , Y s j m , A n ) : = φ ( Y j ) . To minimize the quadratic loss function, we introduce the following criterion, where W ( · ) is a non-negative weight function known in advance:
C V φ , h n : = ( n m ) ! n ! j I n m φ Y s j , A n r ^ n , j ( m ) ( X s j , A n , θ , u ; h n ) 2 W ˜ X s j , A n ,
where X j : = X s j , A n = ( X s j 1 , A n , , X s j m , A n ) . Following the ideas developed by [159], a natural way for choosing the bandwidth is to minimize the precedent criterion, so let us choose h ^ n [ a n , b n ] minimizing among h [ a n , b n ]
C V φ , h n .
Consistent with [160], where the bandwidths are locally determined using a data-driven approach involving the minimization of a functional version of a cross-validated criterion, we can substitute (63) with
C V φ , h n : = ( n m ) ! n ! j I n m φ Y j r ^ n , j ( m ) ( X j , θ , u ; h n ) 2 W ^ X j , x , θ ,
where
W ^ s , t , θ , : = i = 1 m W ^ ( s i , t i , θ i ) .
In practice, one takes for i I n m , the uniform global weights W ˜ X i = 1 , and the local weights
W ^ ( X i , t , θ ) = 1 if d H m , θ ( X i , t ) h , 0 otherwise .
To be concise, we have focused on the widely adopted approach, namely, the cross-validated selected bandwidth. However, this can be extended to any alternative bandwidth selector, such as the one rooted in Bayesian principles [161].
Remark 10.
It is worth noting that an alternative bandwidth criterion, proposed by [1], is the rule of thumb. Strictly speaking, since the cross-validated bandwidth is stochastic, justifying its asymptotic theory requires a specific stochastic equicontinuity argument. Cross-validation has been applied by [162] to assess the equality of unconditional and conditional functions in scenarios involving mixed categorical and continuous data. However, while optimal for estimation, this approach loses optimality when extended to nonparametric kernel testing. For testing a parametric model for the conditional mean function against a nonparametric alternative, Horowitz and Spokoiny [163] introduced an adaptive-rate-optimal rule. Another bandwidth selection method is presented by [164]. They propose, based on the Edgeworth expansion of the asymptotic distribution of the test, selecting the bandwidth to maximize the power function of the test while controlling the size function. Future investigations will delve into these three approaches.

8. Concluding Remarks

In this paper, we explored kernel-type estimation for single-index conditional U-statistics, encompassing the single-index Nadaraya–Watson estimator, within a functional framework involving random fields. Our results hinge on assumptions necessitating some regularity in conditional U-statistics and conditional moments, specific decay rates for the probability of variables within shrinking open balls, and appropriate decreasing rates for mixing coefficients. Notably, the conditional moment assumption facilitates the consideration of unbounded function classes. The proof of weak convergence adheres to a typical approach: finite-dimensional convergence and equicontinuity of conditional U-processes. We underscore the significance of adopting absolutely regular conditions or β -mixing conditions, which are independent of the entropy dimension of the class, a departure from other mixing conditions. In contrast to strong mixing, β -mixing offers greater flexibility, allowing decoupling and accommodation of diverse examples. Ref. [94] provided a comprehensive characterization of stationary Gaussian processes satisfying the β -mixing condition. Additionally, β -mixing aligns with the L 2 ( P ) -norm, playing a crucial role. Unlike strong mixing, which demands a polynomial rate of decay for strong mixing coefficients contingent on the entropy dimension of the function class, β -mixing involves the L 1 -norm and the metric entropy function.
Both the uniform rate of convergence and weak convergence results are underpinned by a general Blocking technique tailored for irregularly spaced sampling sites, where careful attention is paid to the impact of non-equidistant sampling sites. We intricately streamline the process to the independent setting to address this concern. In particular, for spatial points where introducing order is not as straightforward as in time series (Corollary 2.7, [165]), (Lemma 5) constructed exactly independent blocks of observations, allowing results from independent data to be directly applied to these blocks. Yu [165] noted that the uniform convergence result requires the β -mixing condition to link the original sequence with the sequence of independent blocks. This connection persists under ϕ -mixing conditions but is not necessary under α -mixing conditions. Thus, we opt for the β -mixing sequence, aiming to derive weak convergence for processes indexed by classes of functions.
In his work [73], Kurisu presented a potential extension of the sampling region inspired by [89]. This extension can be elucidated as follows. We can generalize the definition of the sample region R n to include nonstandard forms using the sample region concept from [89]. First, let R n be the sampling region. Define R 0 * as an open connected subset of ( 2 , 2 ] d containing [ 1 , 1 ] d , and R 0 a Borel set such that R 0 * R 0 R ¯ 0 * , where S ¯ signifies the closure of set S. For any set S R d , let A n n 1 be a sequence of positive numbers with A n as n , and define R n = A n R 0 as a sampling region. Moreover, for any sequence of positive numbers a n n 1 with a n 0 as n , let O a n d + 1 , as n , be the number of cubes of the form a n i + [ 0 , 1 ) d , where i Z d with their lower left corner a n i on the lattice a n Z d that intersects both R 0 and R 0 c (see Condition B in [89] Chapter 12, Section 12.2). This condition prevents pathological situations and must be assumed on the region R n . It is satisfied by the majority of areas of practical significance and, in the plane ( d = 2 ), for instance, if the boundary R 0 of R 0 is defined by a simple rectifiable curve of limited length or when sample sites are defined on the integer grid Z d . Additionally, define f ( · ) as a continuous, everywhere-present positive probability density function on R 0 , and let S 0 , i i 1 be a sequence of i.i.d. random vectors with density f ( · ) . Assume that S 0 , i i 1 and X s , A n are independent. When replacing our setting in Section 2.4 with this new one, our results remain valid, and it will be possible to demonstrate uniform convergence and weak convergence under the same assumptions and identical proofs. For future investigations, it would be interesting to relax the mixing conditions to the weak dependence (or the ergodicity framework [38,166,167]). This generalization is nontrivial due to the need for maximal moment inequalities in our asymptotic results, which are not available in this setting. Another interesting direction is to consider the incomplete data setting (missing at random, censored in different schemes) for locally spatial–functional data. A natural question is how to adapt our results to wavelet-based estimators [38,166], delta sequence estimators [168], kNN estimators [37], and local linear estimators [169]. The bootstrap method is frequently employed in statistical problems to evaluate the limiting distribution. It has proven to be a valuable tool in addressing common practical issues, including the determination of confidence intervals and critical values for composite hypotheses, as mentioned in [170,171]. Investigating the bootstrap in the context of the present paper would be of interest. In forthcoming investigations, we believe there is a necessity to focus on constructing locally stationary functional random fields. A significant endeavor should be dedicated to generating tangible examples, as in Section 5.5, that can be demonstrated using finite samples derived from either simulated or real data. Infinite-degree U-statistics (or infinite-order U-statistics) are useful tool for constructing simultaneous prediction intervals that quantify the uncertainty of several methods such as sub-bagging and random forests. In a future investigation, we will consider the uniform limit theory for a U-statistic of increasing degree, also called an infinite-degree U-statistic based on the locally stationary functional random fields.

9. Mathematical Developments

The proofs for our results are detailed in this section. The following will continue to use the notations introduced earlier. The proof techniques extend those of [13] to the single-index setting. In addition, some intricate steps of [77] are used in our proofs as in [49,78]. To prevent the repetition of the Blocking technique and the associated notation, the subsequent subsection will introduce all the notations required for this decomposition.
A.1.
Preliminaries
This approach necessitates an expansion of Bernstein’s Blocking techniques to the spatial process, as discussed in [73]. To facilitate our discussion, we will introduce some notations pertinent to this technique. Remember that A 1 , n and A 2 . n denote sequences of positive numbers such that
A 1 , n / A n + A 2 , n / A 1 , n 0 a s n .
Let
A 3 , n = A 1 , n + A 2 , n .
We consider a partition of R d by hypercubes of the form Γ n ( ; 0 ) = + ( 0 , 1 ] d A 3 , n , = 1 , , d Z d and divide Γ n ( ; 0 ) into 2 d hypercubes as follows:
Γ n ( ; ϵ ) = j = 1 d I j ϵ j , ϵ = ϵ 1 , , ϵ d { 1 , 2 } d ,
where for j = 1 , , d ,
I j ϵ j = j A 3 , n , j A 3 , n + A 1 , n if ϵ j = 1 , j A 3 , n + A 1 , n , j + 1 A 3 , n if ϵ j = 2 .
We note that
Γ n ( ; ϵ ) = A 1 , n q ( ϵ ) A 2 , n d q ( ϵ )
for any Z d and ϵ { 1 , 2 } d , where
q ( ϵ ) = 1 j d : ϵ j = 1 .
Let ϵ 0 = ( 1 , , 1 ) . The partitions Γ n ; ϵ 0 correspond to “large blocks” and the partitions Γ ( ; ϵ ) for ϵ ϵ 0 correspond to “small blocks”. Let
L 1 , n = Z d : Γ n ( , 0 ) R n
be the index set of all hypercubes Γ n ( , 0 ) that are contained in R n , and let
L 2 , n = Z d : Γ n ( , 0 ) R n 0 , Γ n ( , 0 ) R n c
denote the boundary hypercubes index set. Define L n = L 1 , n L 2 , n .
Proof of Proposition 1.
As mentioned earlier, our statistic is a weighted U-statistic that can be expressed as a sum of U-statistics through the Hoeffding decomposition. To achieve the desired result, let us delve into the details of this decomposition in Section 3. In that particular section, it has been demonstrated that
ψ ^ ( x , θ , u ) E ψ ^ ( x , θ , u ) = ψ ^ 1 ( x , θ , u ) + ψ ^ 2 ( x , θ , u ) ,
where the linear term ψ ^ 1 ( x , θ , u ) and the remainder term ψ ^ 2 ( x , θ , u ) are well defined in (27) and (29), respectively. Our goal is to prove that the linear term leads to the rate of convergence of this statistic, while the remaining term converges to zero almost surely as n . Let us focus on dealing with the first term in the decomposition. Consider B = [ 0 , 1 ] , α n = log n / n h m d ϕ ( h ) , and τ n = ρ n n 1 / ζ , where ζ is a positive constant given in Assumption 6 part i), with ρ n = ( log n ) ζ 0 for some ζ 0 > 0 . Define
H ˜ 1 ( ) ( z ) : = H ˜ ( ) ( z ) 1 I W s i , A n τ n ,
H ˜ 2 ( z ) : = H ˜ ( ) ( z ) 1 I W s i , A n > τ n ,
and
ψ ^ 1 ( 1 ) ( x , θ , u ) ϑ ( i ) = 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ 1 ( ) ( z ) , ψ ^ 1 ( 2 ) ( x , θ , u ) ϑ ( i ) = 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ 2 ( ) ( z ) .
Evidently, we observe that
ψ ^ 1 ( x , θ , u ) E ψ ^ 1 ( x , θ , u )     =     ψ ^ 1 ( 1 ) ( x , θ , u ) ) E ψ ^ 1 ( 1 ) ( x , θ , u ) + ψ ^ 1 ( 2 ) ( x , θ , u ) E ψ ^ 1 ( 2 ) ( x , θ , u ) .
To start, it is evident that
P · S sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 1 ( 2 ) ( x , θ , u ) ϑ ( i ) > α n       =     P · S sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 1 ( 2 ) ( x , θ , u ) ϑ ( i ) > α n sup F m K Θ m sup θ Θ m sup x H m i = 1 n W s i , A n > τ n sup F m K Θ m sup θ Θ m sup x H m i = 1 n W s i , A n > τ n c           P · S sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 1 ( 2 ) ( x , θ , u ) ϑ ( i ) > α n sup F m K Θ m sup θ Θ m sup x H m sup u B m i = 1 n W s i , A n > τ n           + P · S sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 2 ( 2 ) ( x , θ , u ) ϑ ( i ) > α n sup F m K Θ m sup θ Θ m sup x H m sup u B m i = 1 n W s i , A n > τ n c           P · S sup F m K Θ m sup θ Θ m sup x H m sup u B m W s i , A n > τ n for some i = 1 , , n + P · S ( )           τ n ζ i = 1 n E · S sup F m K m sup θ Θ m sup x H m sup u B m W s i , A n ζ n τ n ζ       =     ρ n ζ 0 .
We infer that
E · S ψ ^ 1 ( 2 ) ( x , θ , u )         1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 E · S H ˜ 2 ( ) ( z ) ,
where
E · S H ˜ 2 ( ) ( z )     =     E · S [ | 1 ϕ ( h ) K 2 d θ i ( x i , X s i , A n ) h W s i , A n × W s ( 1 , , 1 , , , m 1 ) , A n j = 1 j i m 1 1 ϕ ( h ) K 2 d θ j ( x j , ν s j , A n ) h P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 )         E · S 1 ϕ ( h ) K 2 d θ i x i , X s i , A n h + K 2 d θ i x i , X u i ( s i ) h K 2 d θ i x i , X u i ( s i ) h W s i , A n 1 I W s i , A n > τ n         τ n ( ζ 1 ) ϕ ( h ) E · S K 2 d θ i x i , X s i , A n h K 2 d θ i x i , X u i ( s i ) h W s i , A n ζ + K 2 d θ i x i , X u i ( s i ) h W s i , A n ζ         τ n ( ζ 1 ) ϕ ( h ) E · S h 1 d θ i x i , X s i , A n d θ i x i , X u i ( s i ) W s i , A n ζ + E · S K 2 d θ i x i , X u i ( s i ) h W s i , A n ζ         τ n ( ζ 1 ) ϕ ( h ) × 1 n h + ϕ ( h )         τ n ( ζ 1 ) n h ϕ ( h ) + τ n ( ζ 1 ) .
Hence, we have
E · S ψ ^ 1 ( 2 ) ( x , θ , u )         1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 τ n ( ζ 1 )         τ n ( ζ 1 ) 1 n m i I n m j = 1 m 1 h d K ¯ u j s i j / A n h n     =     C τ n ( ζ 1 ) f S ( u ) + O log n n h m d + h 2 ( Using   Lemma   2 )         C τ n ( ζ 1 ) = C ρ n ( ζ 1 ) n ( ζ 1 ) / ζ C α n P S a . s .
Consequently, we derive that
sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 1 ( 2 ) ( x , θ , u ) E · S ψ ^ 1 ( 2 ) ( x , θ , u ) = O P · S ( α n ) .
Next, let us address
sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 1 ( 1 ) ( x , θ , u ) E ψ ^ 1 ( 1 ) ( x , θ , u ) .
Remember the large blocks and small blocks as well as the notation provided in Section 9, and define
S s , A n ( x , θ , u ) : = ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 H ˜ 1 ( ) ( z ) , S n ( ; ϵ ) = i : s i Γ n ( ; ϵ ) R n S s , A n ( x , θ , u ) = S n ( 1 ) ( ; ϵ ) , , S n ( p ) ( ; ϵ ) .
Then we have, for m = 1 ,
S n = S n ( 1 ) , , S n ( m ) = i = 1 n S s , A n ( x , θ , u ) = L n S n ; ϵ 0 + ϵ ϵ 0 L 1 , n S n ( ; ϵ ) = : S 2 , n ( ϵ ) + ϵ ϵ 0 L 2 , n S n ( ; ϵ ) = : S 3 , n ( ϵ ) = : S 1 , n + ϵ ϵ 0 S 2 , n ( ϵ ) + ϵ ϵ 0 S 3 , n ( ϵ ) .
Here, we give the notation only for m = 1 , that will be used in treating the linear term in the Hoeffding decomposition. The case of m = 2 will be stated in the proof of Lemma 1.
  • To attain our result, we will proceed through the following two steps.
  • Step 1 (Reduction to independence). Recall
    S n ( ; ϵ ) = i : s i Γ n ( ; ϵ ) R n S s , A n ( x , θ , u ) .
    For each ϵ { 1 , 2 } d , let S ˘ n ( ; ϵ ) : L n be a sequence of independent random variables in R under P · S such that
    S ˘ n ( ; ϵ ) = d S n ( ; ϵ ) , under P . S , L n .
    Define
    S ˘ 1 , n = L n S ˘ n ; ϵ 0 = S ˘ 1 , n ( 1 ) , , S ˘ 1 , n ( m )
    and for ϵ ϵ 0 , define
    S ˘ 2 , n ( ϵ ) = L 1 , n S ˘ n ( ; ϵ )
    and
    S ˘ 3 , n ( ϵ ) = L 2 , n S ˘ n ( ; ϵ ) .
    We begin by verifying the following results:
    sup t > 0 P · S S 1 , n > t P · S S ˘ 1 , n > t
            C A n A 1 , n d β A 2 , n ; A n d , sup t > 0 P · S S 2 , n ( ϵ ) > t P · S S ˘ 2 , n ( ϵ ) > t
            C A n A 1 , n d β A 2 , n ; A n d , sup t > 0 P · S S 3 , n ( ϵ ) > t P · S S ˘ 3 , n ( ϵ ) > t
            C A n A 1 , n d β A 2 , n ; A n d .
    Keep in mind that
    L n = O A n / A 3 , n d A n / A 1 , n d .
    For ϵ { 1 , 2 } d and 1 , 2 L n with 1 2 , let
    J 1 ( ϵ ) = 1 i 1 n : s i 1 Γ n 1 , ϵ , J 2 ( ϵ ) = 1 i 2 n : s i 2 Γ n 2 ; ϵ .
    For any s i k = s 1 , i k , , s d , i k , k = 1 , 2 in such a way that i 1 J 1 ( ϵ ) and i 2 J 2 ( ϵ ) , we obtain max 1 u d s u , i 1 s u , i 2 A 2 , n using the definition of Γ ( ; ϵ ) . This gives
    s i 1 s i 2 A 2 , n .
    For any ϵ { 1 , 2 } d , let S n 1 ; ϵ , , S n L n ; ϵ be an arrangement of S n ( ; ϵ ) : L n . Let P . S ( a ) be the marginal distribution of S n a ; ϵ and let P · S ( a : b ) be the joint distribution of
    { S n k ; ϵ : a k b } .
    The β -mixing property of X gives that for 1 k L n 1 ,
    P · S P · S ( 1 : k ) × P · S k + 1 : L n TV β A 2 , n ; A n d ,
    where · TV is the total variation. The inequality is independent of the arrangement of S n ( ; ϵ ) : L n . Thus, the assumption in (125) of Lemma 5 is satisfied for S n ( ; ϵ ) : L n with τ β A 2 , n ; A n d and m A n / A 1 , n d . By combining the boundary condition on R n and Lemma 5, we obtain (75)–(77).
Remark 11.
Since
ϵ { 1 , 2 } d : ϵ ϵ 0 = 2 d 1 , L 1 , n A n / A 3 , n d A n / A 1 , n d ,
and
L 2 , n A n / A 3 , n d 1 A n / A 1 , n d 1 L 1 , n ,
For sufficiently large n, Lemma 6 and Equation (67) imply that the individual terms in the sums of S 2 , n and S 3 , n are, at most
O A 1 , n d 1 A 2 , n n A n d A n / A 1 , n d = O A 2 , n A 1 , n n
and
O A 1 , n d 1 A 2 , n n A n d A n / A 1 , n d 1 = O A 2 , n A n n ,
respectively.
  • Step 2: Keep in mind our intention to address
    sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 1 ( 1 ) ( x , θ , u ) E · S ψ ^ 1 ( 1 ) ( x , θ , u ) .
    To achieve the aimed result, we will cover the region B m = [ 0 , 1 ] d m by
    k 1 , , k m = 1 N ( u ) j = 1 m B ( u k j , r ) ,
    for some radius r. Hence, for each u = ( u 1 , , u m ) [ 0 , 1 ] d m , there exists l ( u ) = ( l ( u 1 ) , , l ( u m ) ) , where 1 i m , 1 l ( u i ) N ( u ) in such a way that
    u i = 1 m B ( u l ( u i ) , r ) a n d | u i u l ( u i ) | r , f o r 1 i m ,
    then for each u [ 0 , 1 ] d m , the closest center will be u l ( u ) , and the ball with the closest center will be defined by
    B ( u , l ( u ) , r ) : = j = 1 m B ( u k j , r ) .
    In the same way, Θ m × H m should be covered by
    k ˜ 1 , , k ˜ m = 1 N ( θ ) k 1 , , k m = 1 N ( x ) j = 1 m B θ k ˜ j ( x k j , r ) ,
    for some radius r. Hence, for each x = ( x 1 , , x m ) H m , there exists l ( x ) = ( l ( x 1 ) , , l ( x m ) ) , where 1 i m , 1 l ( x i ) N ( x ) in such a way that
    x i = 1 m B θ i ( u l ( x i ) , r ) and d θ i ( x i , x l ( u i ) ) r , f o r 1 i m ,
    then for each x H m , the closest center will be x l ( x ) , and the ball with the closest centre will be defined by
    B θ ( x , l ( x ) , r ) : = i = 1 m B θ i ( x l ( x i ) , r ) .
    We define
    K * ( ω , v ) C 0 j = 1 m = 1 d 1 I ( | ω j , | 2 C 1 ) j = 1 m K 2 ( v k ) f o r ( ω , v ) R 2 .
    We can show that, for ( u , x ) B j , n and large-enough n,
    K ¯ u s / A n h n K 2 d θ i ( x i , X s i , A n ) h K ¯ u n s / A n h n K 2 d θ i ( x n , i , X s i , A n ) h α n K * u n s / A n , d θ i ( x n , i , X s i , A n ) h n .
    Then, we have
    ψ ^ 1 ( 1 ) ( x , θ , u ) = 1 n i = 1 n ξ i 1 ϕ 1 / m ( h ) K 2 d θ i ( x i , X s i , A n ) h W s i , A n 1 I W s i , A n τ n       × ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i m 1 W s ( 1 , , 1 , , , m 1 ) , A n       × j = 1 j i m 1 1 ϕ 1 / m ( h ) K 2 d θ j ( x j , ν s j , A n ) h P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 ) .
    Let us define
    ψ ¯ 1 ( 1 ) ( x , θ , u ) = 1 n h d ϕ 1 / m ( h ) i = 1 n K * u n s i / A n , d θ i ( x n , i , X s i , A n ) h n W s i , A n 1 I W s i , A n τ n             × ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i m 1 W s ( 1 , , 1 , , , m 1 ) , A n           × j = 1 j i m 1 1 ϕ 1 / m ( h ) K 2 d θ j ( x j , ν s j , A n ) h P ( d ν 1 , , d ν 1 , d ν , , d ν m 1 )     : =   1 n h d ϕ 1 / m ( h ) i = 1 n S s , A n ( x , θ , u ) .
    We note that
    E · S ψ ¯ 1 ( 1 ) ( x , θ , u ) M < ,
    for some sufficiently large M. Let N = N F m K Θ m N ( θ ) m N ( x ) m N ( u ) denote the covering number associated with the classes of functions F m K m , the balls covering [ 0 , 1 ] m , the balls covering H m , and the balls covering Θ m . Consequently, we obtain
    sup F m K Θ m sup θ Θ m sup x H m sup u B ψ ^ 1 ( 1 ) ( x , θ , u ) E · S ψ ^ 1 ( 1 ) ( x , θ , u )         N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) ( u , x , φ ) E · S ψ ^ 1 ( 1 ) ( u , x , φ )         N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) ( x , θ , u n ) E · S ψ ^ 1 ( 1 ) ( x , θ , u n )           + N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) α n ψ ¯ 1 ( 1 ) ( x , θ , u n ) + E · S ψ ¯ 1 ( 1 ) ( x , θ , u n )         N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) ( x , θ , u n ) E · S ψ ^ 1 ( 1 ) ( x , θ , u n )           + N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ¯ 1 ( 1 ) ( x , θ , u n ) E · S ψ ¯ 1 ( 1 ) ( x , θ , u n )           + 2 M F ( y ) α n         N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) L n S n ; ϵ 0           + N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 1 , n S n ( ; ϵ )           + N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 2 , n S n ( ; ϵ )           + N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) L n S n ; ϵ 0           + N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 1 , n S n ( ; ϵ )           + N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ϵ ϵ 0 L 2 , n S n ( ; ϵ ) + 2 M F ( y ) α n .
  • Furthermore, for each ϵ { 1 , 2 } d , let S ˘ n ( ; ϵ ) : L n represent a sequence of independent random vectors in R m under P · S , such that
    S ˘ n ( ; ϵ ) = d S n ( ; ϵ ) , under P . S , L n .
    Show that
    P · S max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) ψ ^ 1 ( 1 ) ( x , θ , u ) E · S ψ ^ 1 ( 1 ) ( x , θ , u ) > 2 m d + 1 M a n         N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S sup ( u , x ) B k ψ ^ 1 ( x , θ , u ) E · S ψ ^ 1 ( x , θ , u ) > 2 m d + 1 M a n         ϵ { 1 , 2 } d Q ^ n ( ϵ ) + ϵ { 1 , 2 } d Q ¯ n ( ϵ ) + 2 m d + 1 N A n A 1 , n d β A 2 , n ; A n d ,
    where
    Q ^ n ϵ 0 = N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ 0 ) > M a n n h m d ϕ ( h ) , Q ¯ n ϵ 0 = N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ 0 ) > M a n n h m d ϕ ( h ) ,
    and for ϵ ϵ 0
    Q ^ n ϵ = N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h ) , Q ¯ n ϵ = N max 1 i 1 < < i m m sup B θ i ( x ) ( x i ( x ) , r ) max 1 i 1 < < i m m sup B ( u i ( u ) , r ) P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h ) .
    Because of the similarity between the two cases: ϵ ϵ 0 and ϵ = ϵ 0 , we will exclusively analyze Q ^ n for ϵ ϵ 0 . By employing Lemma 6, considering that S ˘ n ( ; ϵ ) are zero-mean random variables, we deduce that
    P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h ) 2 P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h )
    and
    S ˘ n ( ; ϵ ) C A 1 , n d 1 A 2 , n ( log n ) τ n , P S a . s . ( from Lemma 6 ) E · S S ˘ n ( ; ϵ ) 2 C h m d ϕ ( h ) A 1 , n d 1 A 2 , n ( log n ) , P S a . s . ( By Lemma 7 )
    Applying Bernstein’s inequality as stated in Lemma 8, we obtain
    P · S L n S ˘ n ( ; ϵ ) > M a n n h m d ϕ ( h )         exp 1 2 M n h m d ϕ ( h ) log n A n A 1 , n d A 1 , n d 1 A 2 , n h m d ϕ ( h ) ( log n ) + 1 3 M 1 / 2 n 1 / 2 h m d / 2 ϕ ( h ) 1 / 2 ( log n ) 1 / 2 A 1 , n d 1 A 2 , n τ n .
    Observe that
    n h m d log n A n A 1 , n d A 1 , n d 1 A 2 , n h m d ( log n ) = n A n d A 1 , n A 2 , n A 1 , n A 2 , n n η , n h m d ϕ ( h ) log n n 1 / 2 h m d / 2 ϕ ( h ) 1 / 2 ( log n ) 1 / 2 A 1 , n d 1 A 2 , n τ n = n 1 / 2 h m d / 2 ϕ ( h ) 1 / 2 ( log n ) 1 / 2 A 1 , n m d A 2 , n A 1 , n ρ n n 1 / ζ C 0 n η / 2 .
    Selecting M > 0 sufficiently large, and for N C h m d ϕ ( h ) α n m , this establishes the desired result. Now, we proceed to the nonlinear component of the Hoeffding decomposition. The objective is to demonstrate that
    P · S sup F m K Θ m sup θ Θ m sup x H m sup u B m ψ ^ 2 ( x , θ , u ) > λ 0 as n .
Below, we present a lemma that serves as a technical element in proving our proposition. It aids us in attaining our objective in Expression (82). The proof of this lemma involves the use of the Blocking technique introduced earlier, specifically for the U-statistic, making the block treatment more intricate.
Lemma 1.
Let F m K m be a uniformly bounded class of measurable canonical functions, m 2 . Assume there exist finite constants a and b such that the F m K m covering number satisfies:
N ( ϵ , F m K m , · L 2 ( Q ) ) a ϵ b ,
for all ϵ > 0 and all probability measure Q. If the mixing coefficients β of the local stationary sequence { Z i = ( X s i , A n , W s i , A n ) i N } satisfy Condition (E2) in Assumption 6, then, for some r > 1 , we have
sup F m K Θ m sup θ Θ m sup x H m sup u B m P h m d / 2 ϕ m / 2 ( h ) n m + 1 / 2 i I n m ξ i 1 ξ i m H ( Z i 1 , , Z i m ) 0 .
Remark 12.
As mentioned before, W s i , A n will be equal to 1 or ϵ s i , A n = σ s i A n , X s i , A n ϵ s i . In the proof of the previous Lemma, W s i , A n will be equal to ε i , n = σ i n , X i , n ϵ i , and we will use the notation W s i , A n ( u ) to indicate σ u , x ϵ i .
Proof of Lemma 1.
The proof of this lemma relies on the Blocking technique introduced by [77], known as Bernstein’s method [172]. This technique allows us to apply symmetrization and various other methods typically used for i.i.d. random variables. We extend this technique to spatial processes in the context of U-statistics, following the approach in [89]. In addition to the notation in Section 9, we define
L n : = L 1 , n L 2 , n ,
Δ 1 = { 2 : min 1 i d | 1 i 2 i | 1 }
Δ 2 = { 2 : min 1 i d | 1 i 2 i | 2 }
With the introduced notation, it is straightforward to demonstrate that for m = 2 ,
1 h 2 d ϕ 2 ( h ) i I n 2 j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n     =     1 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n           + 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n           + 2 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n           + 2 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n           + 1 h 2 d ϕ 2 ( h ) 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 1 : s i 1 Γ n ( 1 ; ϵ ) R n i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n           + 1 h 2 d ϕ 2 ( h ) 1 L 1 , n L 2 , n ϵ ϵ 0 i 1 < i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n   : = I + II + III + IV + V + VI .
(I):
The same type of blocks but not the same block:
Let { η i } i N * be a sequence of independent blocks. An application of Lemma 5 shows that
P sup F m K m sup θ Θ m sup x H m sup u B m n 3 / 2 1 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n > δ         P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n           j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , X s i j , A n ) h n j = 1 2 K 2 d θ i x i , X u j ( s i j ) h W s i , A n > δ )           + P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n           j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ i x i , X u j ( s i j ) h W s i , A n W s i , A n ( u ) > δ )     +     P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ i x i , X u j ( s i j ) h W s i , A n ( u ) > δ )         P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) > δ           + C A n A 1 , n d β A 2 , n ; A n d + o ( 1 ) ,
Because
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , X s i j , A n ) h n j = 1 2 K 2 d θ j x j , X u j ( s i j ) h W s i , A n     =     1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n E . S j = 1 2 K 2 d θ j ( x j , X s i j , A n ) h n j = 1 2 K 2 d θ j x j , X u j ( s i j ) h W s i , A n     =     1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n E . S j = 1 2 K 2 d θ j ( x j , X s i j , A n ) h n j = 1 2 K 2 d θ j x j , X u j ( s i j ) h j = 1 m ϵ s i j , A n     =     1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n           E . S j = 1 2 K 2 d θ j ( x j , X s i j , A n ) h n j = 1 2 K 2 d θ j x j , X u j ( s i j ) h j = 1 m σ s i j A n , X s i j , A n ϵ s i j     =     1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j E . S j = 1 2 K 2 d θ j ( x j , X s i j , A n ) h n j = 1 2 K 2 d θ j x j , X u j ( s i j ) h j = 1 m σ s i j A n , X s i j , A n j = 1 m σ ( x j , θ j , u j ) + j = 1 m σ ( x j , θ j , u j )         1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j             E . S C j = 1 m K 2 d θ j ( x j , X s i j , A n ) h n K 2 d θ j x j , X s i j A n ( s i j ) h p j = 1 m σ ( x j , θ j , u j ) + o P ( 1 )           ( using a telescoping argument , and the boundedness of K 2 ( · ) for p = min ( ρ , 1 ) and C < )         1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j E . S ϕ m 1 ( h ) C A n d U s i j , A n s i j A n p j = 1 m σ ( x j , θ j , u j ) + o P ( 1 )         o P ( 1 ) ,
and
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ i x i , X u j ( s i j ) h W s i , A n W s i , A n ( u )     =     1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j E . S j = 1 2 K 2 d θ i x i , X u j ( s i j ) h j = 1 m σ s i j A n , X s i j , A n j = 1 m σ ( x j , θ j , u j )         1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j × ( o P ( 1 ) ) 0 h k = 1 m K 2 y k h d F i k / n ( y k , x k )         1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 m E . S ϵ s i j × ( o P ( 1 ) ) ( ϕ 2 ( h ) )         o P ( 1 ) .
Under the lemma’s assumptions, we have
β ( a ; b ) β 1 ( a ) g 1 ( b ) ,
where β 1 ( a ) 0 as a and n . Therefore, our focus will be on the first term of the sum. For the second part of the inequality, we will leverage the work of [173] in the non-fixed kernels settings. Specifically, we define
f i 1 , , i m = k = 1 m ξ i k × H ,
and F i 1 , , i m as the collection of kernels and the class of functions related to this kernel. We will then use (Theorem 3.1.1, [21] and Remarks 3.5.4 part 2) for decoupling and randomization. As mentioned earlier, we assume that m = 2 . Consequently, we can observe that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W i , φ , n ( u ) F 2 K 2     =     E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n f i 1 , i 2 ( u , η ) F i 1 , i 2         c 2 E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n f i 1 , i 2 ( u , η ) F i 1 , i 2         c 2 E . S 0 D n h ( U 1 ) N t , F i 1 , i 2 , d ˜ n h , 2 ( 1 ) d t , ( By Lemma 10 and Proposition 2 . )
where D n h ( U 1 ) is the diameter of F i 1 , i 2 according to the distance d ˜ n h , 2 ( 1 ) , respectively defined as
D n h ( U 1 ) : = E ϵ 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n f i 1 , i 2 ( u , η ) F i 1 , i 2 = E ϵ 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 ,
and
d ˜ n h , 2 ( 1 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u )     : =     E ϵ 1 n 3 / 2 h d ϕ 2 ( h ) 1 2 L n ϵ p ϵ q i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ξ 1 i 1 ξ 1 i 2             k = 1 2 K 1 , 2 d θ k ( x k , η i k ) h W s i , A n ( u ) p q L n ϵ p ϵ q i H p ( U ) j H q ( U ) ξ 2 i 1 ξ 2 i 2 k = 1 2 K 2 , 2 d θ k ( x k , η i k ) h W s i , A n ( u ) .
Let consider another semi-norm d ˜ n h , 2 ( 2 ) :
d ˜ n h , 2 ( 2 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u )     =     1 n h d ϕ 2 ( h ) 1 2 L n ξ 1 i 1 ξ 1 i 2 k = 1 2 K 1 , 2 d θ k ( x k , η i k ) h W s i , A n ( u )             p q υ n ϵ p ϵ q i H p ( U ) j H q ( U ) ξ 2 i 1 ξ 2 i 2 k = 1 2 K 2 , 2 d θ k ( x k , η i k ) h W s i , A n ( u ) 2 ] 1 / 2 .
One can see that
d ˜ n h , 2 ( 1 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u )           A 1 , n n 1 / 2 h d ϕ ( h ) d ˜ n h , 2 ( 2 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u ) .
We readily infer that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W i , φ , n ( u ) F 2 K 2         c 2 E . S 0 D n h ( U 1 ) N t A 1 , n d n 1 / 2 , F i , j , d ˜ n h , 2 ( 2 ) d t         c 2 A 1 , n d n 1 / 2 P D n h ( U 1 ) A 1 , n d n 1 / 2 λ n + c m A 1 , n d n 1 / 2 0 λ n log t 1 d t ,
where λ n 0 . We have
0 λ n log t 1 d t λ n log λ n 1 0 ,
where λ n must be chosen in such a way that the following relation will be achieved
A 1 , n d λ n n 1 / 2 log λ n 1 0 .
By employing the triangle inequality along with Hoeffding’s trick, we can readily deduce that
A 1 , n d n 1 / 2 P D n h ( U 1 ) λ n A 1 , n d n 1 / 2 λ n 2 A 1 , n d n 5 / 2 h ϕ x , θ 1 ( h n ) E . S 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 c 2 [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 h ϕ x , θ 1 ( h n ) E . S 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 ,
where η i i N * are independent copies of ( η i ) i N * . By imposing
λ n 2 A 1 , n d r n 1 / 2 0 ,
we readily infer that
[ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 h ϕ x , θ 1 ( h n ) E . S 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 k = 1 2 K 2 d θ k ( x k , η i k ) h W s i , A n ( u ) 2 F 2 K 2 O λ n 2 A 1 , n d r n 1 / 2 .
Symmetrizing the last inequality in (91) and then applying Proposition 2, yields
[ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 h ϕ x , θ 1 ( h n ) E . S 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ϵ p ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 c 2 E . S 0 D n h ( U 2 ) log N ( u , F i , j , d ˜ n h , 2 ) 1 / 2 ,
where
D n h ( U 2 ) = E ϵ | [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 ϕ x , θ 1 ( h n ) 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 F 2 K 2 .
and for ξ 1 . K 2 , 1 W , ξ 2 . K 2 , 2 W F i j :
d ˜ n h , 2 ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u )     : =     E ϵ [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 ϕ x , θ 1 ( h n ) 1 L n ϵ p i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ 1 i 1 ξ 1 i 2 K 2 , 1 d θ 1 ( x 1 , η i 1 ) h × K 2 , 1 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 i 1 , i 2 H p ( U ) ξ 2 i ξ 2 j K 2 , 2 d θ 1 ( x 1 , η i 1 ) h K 2 , 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 .
By the fact that
E ϵ [ [ L n ] ] λ n 2 A 1 , n d n 5 / 2 ϕ x , θ 1 ( h n ) 1 L n ϵ p i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2         A 1 , n 3 d / 2 λ n 2 n 1 [ [ L n ] ] 1 A 1 , n 2 d ϕ 2 ( h n ) 1 L n i 1 , i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ 0 ) R n ξ i 1 ξ i 2 K 2 d θ i ( x i , η i 1 ) h K 2 d ( x 2 , η j ) h W s i , A n ( u ) 4 1 / 2 .
We readily infer that
A 1 , n d 3 / 2 λ n 2 n 1 0 ,
we have the convergence of (93) to zero. Recall that
L n = O A n / A 3 , n d A n / A 1 , n d .
(II):
The same block:
P ( sup F m K m sup θ Θ m sup x H m sup u B m | 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n > δ         P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2           j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , X s i j , A n ) h n j = 1 2 K 2 d θ i x i , X u j ( s i j ) h W s i , A n > δ )           + P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ i x i , X u j ( s i j ) h W s i , A n W s i , A n ( u ) > δ )           + P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ i x i , X u j ( s i j ) h W s i , A n ( u ) > δ )         P ( sup F m K m sup θ Θ m sup x H m sup u B m 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) > δ           + C A n A 1 , n d β A 2 , n ; A n d + o ( 1 ) .
Similarly to I , we can demonstrate that both the first and the second terms in the previous inequality are of order o ( 1 ) . Therefore, as in the previous proof, it is sufficient to establish that
E . S ( 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 ) 0 .
Note that when dealing with uniformly bounded classes of functions, we obtain uniformity in B m × F 2 K 2
E . S i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) = O ( a n ) .
Hence, we need to establish that, for u B m ,
E E . S ( 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n             j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u )             E . S j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 0
As for empirical processes, to prove (95), it is enough to symmetrize and show that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ϵ p j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 0 .
Similarly to in (88), we have
E . S 1 n 3 / 2 h d + 1 ϕ ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ϵ p j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 E 0 D n h ( U 3 ) log N u , F i 1 , i 2 , d ˜ n h , 2 ( 3 ) 1 / 2 d u ,
where
D n h ( U 3 ) = E ϵ | 1 n 3 / 2 h d ϕ ( h ) 1 L n ϵ p i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 ,
and the semi-metric d ˜ n h , 2 ( 3 ) is defined by
d ˜ n h , 2 ( 3 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u )     =     E ϵ | 1 n 3 / 2 h d ϕ ( h ) 1 L n ϵ p i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ξ 1 i ξ 1 j K 2 , 1 d θ 1 ( x 1 , η i 1 ) h K 2 , 1 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) ξ 2 i ξ 2 j K 2 , 2 d θ 1 ( x 1 , η i 1 ) h K 2 , 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) .
Given our consideration of uniformly bounded classes of functions, we obtain
E ϵ | n 3 / 2 h ϕ x , θ 1 ( h n ) 1 L n ϵ p i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) | A 1 , n 3 d / 2 n 1 h ϕ x , θ 1 ( h n ) [ 1 [ [ L n ] ] A 1 , n 2 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 ] 1 / 2 O A 1 , n 3 d / 2 n 1 ϕ x , θ 1 ( h n ) .
Using the fact that A 1 , n 3 d / 2 n 1 ϕ x , θ 1 ( h n ) 0 , D n h ( U 3 ) 0 , we obtain II 0 as n .
(III):
Different types of blocks:
To avoid redundancy, we can directly observe that
P ( sup F m K Θ m sup θ Θ m sup x H m sup u B m | 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n > δ         P ( sup F m K m sup θ Θ m sup x H m sup u B m | 1 n 3 / 2 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0             i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) > δ             + C A n A 1 , n d β A 2 , n ; A n d + o ( 1 ) .
For p = 1 and p = ν n :
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2     =     E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 .
For 2 p υ n 1 , we obtain
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2     =     E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 4 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2         E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 ,
therefore it suffices to show that
E . S 1 n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 .
Applying arguments similar to those in [77], we employ the standard symmetrization
E . S L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2         2 E . S L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2     =     2 E . S { L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 1 I D n h ( U 4 ) γ n             + 2 E . S { L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) F 2 K 2 1 I D n h ( U 4 ) > γ n     =     2 III 1 + 2 III 2 ,
where
D n h ( U 4 ) = L n n 3 / 2 h 2 d ϕ 2 ( h ) [ 2 : min 1 i d 2 i = 3 ϵ ϵ 0 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j h W s i , A n ( u ) 2 1 / 2 F 2 K 2 .
In a similar way as in (88), we infer that
III 1 c 2 0 γ n log N t , F i 1 , i 2 , d ˜ n h , 2 ( 4 ) 1 / 2 d t ,
where
d ˜ n h , 2 ( 4 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u )     : =     E ϵ | L n n 3 / 2 h ϕ x , θ 1 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q ξ 1 i 1 ξ 1 i 2 K 2 , 1 d θ 1 ( x 1 , η i 1 ) h K 2 , 1 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) ξ 2 i 1 ξ 2 i 2 K 2 , 2 d θ 1 ( x 1 , η i 1 ) h K 2 , 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) | .
Since we have
E ϵ | L n n 3 / 2 h 2 d ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u )         A 1 , n d / 2 A 2 , n d h d + 1 ϕ ( h ) ( 1 A 1 , n d A 2 , n d L n h d 1 ϕ 4 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3             2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ξ i 1 ξ i 2 K 2 d θ 1 ( x 1 , η i 1 ) h K 2 d θ 2 ( x 2 , η i 2 ) h W s i , A n ( u ) 2 ) 1 / 2 ,
and considering the semi-metric
d ˜ n h , 2 ( 5 ) ξ 1 . K 2 , 1 W ( u ) , ξ 2 . K 2 , 2 W ( u )     : =     ( 1 A 1 , n d A 2 , n d L n h d 1 ϕ 4 ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ξ 1 i 1 ξ 1 i 2             K 2 , 1 d ( x 1 , η i 1 ) h K 2 , 1 d ( x 2 , η i 2 ) h W s i , A n ( u ) ξ 2 i 1 ξ 2 i 2 K 2 , 2 d ( x 1 , η i 1 ) h K 2 , 2 d ( x 2 , η i 2 ) h W s i , A n ( u ) 2 1 / 2 .
We demonstrate that the statement in (101) is bounded as follows
L n 1 / 2 A 2 , n d n 1 / 2 h 2 ϕ ( h ) 0 L n 1 / 2 A 2 , n d n 1 / 2 h 2 d γ n log N t , F i 1 , i 2 , d ˜ n h , 2 ( 5 ) 1 / 2 d t ,
by choosing γ n = n α for some α > ( 17 r 26 ) / ( 60 r ) , we obtain the convergence of the preceding quantity to zero. To bind the second term on the right-hand side of (99), we can mention that
III 2 = E { L n n 3 / 2 h ϕ 2 ( h ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϵ q j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j ) h W s i , A n ( u ) F 2 K 2 1 I D n h ( U 4 ) > γ n A 1 , n 1 A 2 , n n 1 / 2 h d ϕ x , θ 1 ( h n ) P { L n 2 n 3 h 2 ϕ 2 ( h n ) 2 : min 1 i d 2 i = 3 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j ) h W s i , A n ( u ) 2 F 2 K 2 γ n 2 } .
We will apply the square root method to the last expression conditionally on Γ n ( 1 ; ϵ 0 ) R n . Let E ϵ ϵ 0 denote the expectation with respect to σ η i 2 , ϵ ϵ 0 . We assume that any class of functions F m is unbounded, and its envelope function satisfies, for some ζ > 2 ,
θ ζ : = sup x S H m E F ζ ( Y ) | X = x < ,
for 2 r / ( r 1 ) < s < , (in the notation in of Lemma 5.2 [174]).
M n = L n 1 / 2 E ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n j = 1 2 K ¯ u j s i j / A n h n j = 1 2 K 2 d θ j ( x j , η i j ) h W s i , A n ( u ) 2 ,
where
x = γ n 2 A 1 , n 5 d / 2 n 1 / 2 h m d / 2 ϕ m / 2 ( h ) , ρ = λ = 2 4 γ n A 1 , n 5 d / 4 n 1 / 4 h m d / 4 ϕ m / 4 ( h ) , m = exp γ n 2 n h 2 d ϕ 2 ( h n ) A 2 , n 2 d .
Nevertheless, as we require t > 8 M n and m , employing arguments akin to those in (page 69, [77]), we achieve the convergence of (101) and (102) to zero.
(IV):
Blocks of different types:
We have to prove that
P ( sup F m K Θ m sup θ Θ m sup x H m sup u B m | 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n | > δ ) 0 .
  • We have
    n 3 / 2 1 h 2 d ϕ 2 ( h ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n . × j = 1 2 K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n W s i , A n F 2 K 2 c 2 L n A 1 , n d A 2 , n d n 3 / 2 h d ϕ x , θ 1 ( h n ) 0 .
    Therefore, the proof of the lemma is concluded. □
    The final stage in the proof of Proposition 1 involves employing Lemma 1 to establish the convergence of the nonlinear term to zero. □
Proof of Theorem 1
We have
r ^ n ( m ) ( x , θ , u ; h n ) r ( m ) ( x , θ , u )     =     1 r ˜ 1 ( x , θ , u ) g ^ 1 ( x , θ , u ) + g ^ 2 ( x , θ , u ) r ( m ) ( φ , x , u ) r ˜ 1 ( x , θ , u ) ,
where
r ( m ) ( x , θ , u ) = r ( m ) φ , x 1 , θ 1 , , x 1 , θ m , u 1 A n , , u m A n , r ˜ 1 ( x , θ , u ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n , g ^ 1 ( x , θ , u ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n j = 1 m ϵ s i j , A n , g ^ 2 ( x , θ , u ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n × r ( m ) φ , x , θ , s i A n .
The proof of this theorem is intricate and divided into the following four steps. In each step, our objective is to demonstrate that
Step 1.
sup F m K Θ m sup θ Θ m sup x H m sup u B m | g ^ 1 ( x , θ , u ) | = O P log n / n h m d ϕ m ( h ) .
Step 2.
sup F m K Θ m sup θ Θ m sup x H m sup u B m | g ^ 2 ( x , θ , u ) r ( m ) ( x , θ , u ) r ˜ 1 ( φ , u , x ; h n )       E . S ( g ^ 2 ( x , θ , u ) r ( m ) ( x , θ , u ) r ˜ 1 ( φ , u , x ; h n ) ) | = O P log n / n h m d ϕ m ( h ) .
Step 3.
Let κ 2 = R x 2 K ( x ) d x .
sup F m K Θ m sup θ Θ m sup x H m sup u B m E . S g ^ 2 ( x , θ , u ) r ( m ) ( x , θ , u ) r ˜ 1 ( φ , u , x ; h n ) = O 1 A n d p ϕ ( h ) + o h 2 , P P . S a . s .
Step 4.
sup F m K Θ m sup θ Θ m sup x H m sup u B m r ˜ 1 ( x , θ , u ) E . S r ˜ 1 ( x , θ , u ) = o P . S ( 1 ) .
  • Step 1. is evidently a direct consequence of Proposition 1 when considering W s i , A n = j = 1 m ϵ s i j , A n . The same holds for Step 2., even if we replace W s i , A n with g ^ 2 ( x , θ , u ) r ( m ) ( x , θ , u ) r ˜ 1 ( φ , u , x ; h n ) and then apply Proposition 1. Now, we proceed to Step 4. Note that for W s i , A n 1 , the aforementioned proposition has already demonstrated that
    sup F m K Θ m sup θ Θ m sup x H m sup u B m r ˜ 1 ( x , θ , u ) E . S r ˜ 1 ( x , θ , u ) = o P . S ( 1 ) .
    Step 3. will be treated in what follows:
Let K 0 : [ 0 , 1 ] R be a Lipschitz continuous function compactly support on [ q C 1 , q C 1 ] for some q > 1 and such that K 0 ( x ) = 1 , x [ C 1 , C 1 ] . Show that
E . S g ^ 2 ( x , θ , u ) r ( m ) ( x , θ , u ) r ˜ 1 ( φ , u , x ; h n ) ) = i = 1 4 Q i ( x , θ , u ) ,
where Q i can be defined as follows
Q i ( x , θ , u ) = ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n q i ( x , θ , u ) ,
such that
q 1 ( x , θ , u ) = E . S j = 1 m K 0 d θ j ( x j , X s i j , A n ) h n j = 1 m K 2 d θ j ( x j , X s i j , A n ) h n j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h × r ( m ) X s i , A n , θ , s i A n r ( m ) ( x , θ , u ) , q 2 ( x , θ , u ) = E . S j = 1 m K 0 d θ j ( x j , X s i j , A n ) h n K 2 d θ i x i , X s i j A n ( s i j ) h r ( m ) X s i , A n , θ , s i A n r ( m ) X s i , A n , θ , s i / A n ( s i ) , q 3 ( x , θ , u ) = E . S j = 1 m K 0 d θ j ( x j , X s i j , A n ) h n j = 1 m K 0 d θ i x i , X s i j A n ( s i j ) h j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h × r ( m ) X s i , A n , θ , s i / A n ( s i ) r ( m ) ( x , θ , u ) , q 4 ( x , θ , u ) = E . S j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h r ( m ) X s i , A n , θ , s i / A n ( s i ) r ( m ) ( x , θ , u ) .
Observe that
Q 1 ( x , θ , u ) ( n m ) ! n ! h m d ϕ m ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S j = 1 m K 0 d θ j ( x j , X s i j , A n ) h n j = 1 m K 2 d θ j ( x j , X s i j , A n ) h n j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h × r ( m ) X s i , A n , θ , s i / A n ( s i ) r ( m ) ( x , θ , u ) ,
using the properties of r ( m ) ( x , θ , u ) allow us to show that
j = 1 m K 0 d θ j ( x j , X s i j , A n ) h n r ( m ) X s i , A n , θ , s i / A n ( s i ) r ( m ) ( x , θ , u ) C h m ,
and
Q 1 ( x , θ , u )         ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n               × E . S C h m j = 1 m K 2 d θ j ( x j , X s i j , A n ) h n K 2 d θ i x i , X s i j A n ( s i j ) h p               ( using the telescoping argument , and the boundedness of K 2 ( · )               for p = min ( ρ , 1 ) and C < )         ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S C h m j = 1 m C A n d h U s i j , A n s i j A n p         C A n p d ϕ ( h ) h p m uniformly in u .
In a similar way, and for
E j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h C ϕ ( m 1 ) / m ( h ) ,
and since r ( m ) ( · ) is Lipschitz and
d θ X s i j , A n , X s i j A n s j C A n d U s i j , A n s i j A n ,
and the variable U s i j , A n s i j A n have finite p-th moment, we can see that
Q 2 ( x , θ , u )     =     ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S [ j = 1 m K 0 d θ j ( x j , X s i j , A n ) h n K 2 d θ i x i , X s i j A n ( s i j ) h r ( m ) φ , s i A n , X s i , A n r ( m ) φ , s i A n , X s i / A n ( s i ) ]         ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n E . S ϕ ( m 1 ) / m ( h ) C A n d U s i j , A n s i j A n p         C A n p d ϕ 1 / m ( h ) ,
and
sup F m K Θ m sup θ Θ m sup x H m sup u I h m Q 3 ( x , θ , u ) 1 A n p d ϕ ( h ) h p m .
For the last term, we have
Q 4 ( x , θ , u )       =       ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n                   E . S j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h r ( m ) X s i , A n , θ , s i / A n ( s i ) r ( m ) ( x , θ , u ) .
Using Lemma 2 and inequality (20) and under Assumption 1, it follows that
Q 4 ( x , θ , u )             ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n                 E . S j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h r ( m ) X s i , A n , θ , s i / A n ( s i ) r ( m ) ( x , θ , u )             ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n                 E . S j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h d H m X s i / A n ( s i ) , x + u s i A n α             ( n m ) ! n ! h m d ϕ ( h ) i I n m j = 1 m K ¯ u j s i j / A n h n                 0 1 0 1 1 h m j = 1 m K ¯ ( u j v j ) h d v j E . S j = 1 m K 2 d θ i x i , X s i j A n ( s i j ) h × h α                 + ( n m ) ! n ! h m d ϕ ( h ) i I n m 0 1 0 1 1 h m d j = 1 m K ¯ u j v j h d v j × E . S ϕ m 1 ( h ) h α             O P . S h 2 α .
Combining the results obtained for Q i , 1 i 4 , from Step 3, we can deduce the convergence rate for the estimator. □
Proof of Theorem 2.
Recall that
r ^ n ( m ) ( x , θ , u ; h n )             = i I n m φ ( Y s i 1 , A n , , Y s i m , A n ) j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n i I n m j = 1 m K ¯ u j s i j / A n h n K 2 d θ j ( x j , X s i j , A n ) h n .
For x H m , y Y m , θ Θ , define
G φ , θ , i ( x , y ) : = j = 1 m K 2 d θ j ( x j , X s i j , A n ) h n φ ( Y s i 1 , A n , , Y s i m , A n ) E j = 1 m K 2 d θ j ( x j , X s i j , A n ) h n , G : = G φ , θ , i ( · , · ) : φ F m , i = ( i 1 , , i m ) , G ( k ) : = π k , m G φ , θ , i ( · , · ) : φ F m , U n ( φ ) = U n ( m ) ( G φ , θ , i ) : = ( n m ) ! n ! i I n m j = 1 m ξ i j G φ , θ , i ( X i , Y i ) ,
and the U-empirical process is defined to be
μ n ( φ ) : = n h m ϕ x , θ 1 / m ( h ) U n ( φ ) E ( U n ( φ ) ) .
Then, we have
r ˜ n ( m ) ( x , θ , u ; h n ) = U n ( φ ) U n ( 1 ) .
To prove the weak convergence of our estimator, it is essential to first establish it for μ n ( φ ) . Given that we are working with unbounded classes of functions, we need to truncate the function G φ , θ , i ( x , y ) . Let λ n = n 1 / ζ with ζ > 2 as in Assumption 7 (C.2), and define
G φ , θ , i ( x , y ) = G φ , θ , i ( x , y ) 1 I F ( y ) λ n + G φ , θ , i ( x , y ) 1 I F ( y ) > λ n : = G φ , θ , i ( T ) ( x , y ) + G φ , θ , i ( R ) ( x , y ) .
We can write the U-statistic as follows:
μ n ( φ ) = n h m ϕ x , θ 1 / m ( h ) U n ( m ) G φ , θ , i ( T ) E U n ( m ) G φ , θ , i ( T ) + n h m ϕ x , θ 1 / m ( h ) U n ( m ) G φ , θ , i ( R ) E U n ( m ) G φ , θ , i ( R ) : = n h m ϕ x , θ 1 / m ( h ) U n ( T ) ( φ ) E U n ( T ) ( φ ) + n h m ϕ x , θ 1 / m ( h ) U n ( R ) ( φ ) E U n ( R ) ( φ ) : = μ n ( T ) ( φ ) + μ n ( R ) ( φ ) .
The first term represents the truncated part, while the second term corresponds to the remaining part. Our goal is to demonstrate that
1.
μ n ( T ) ( φ ) converges to a Gaussian process.
2.
The remainder part does not matter much, in the sense that
n h m ϕ x , θ 1 / m ( h n ) U n ( R ) ( φ ) E U n ( R ) ( φ ) F m K m P 0 .
For the initial step, we will employ the Hoeffding decomposition, which is akin to the previous decomposition in Section 3.1, with the distinction that we replace W i , n with φ ( Y i , n )
U n ( T ) ( φ ) E U n ( T ) ( φ ) : = U 1 , n ( φ ) + U 2 , n ( φ ) ,
where
U 1 , n ( φ ) : = 1 n i = 1 n H ^ 1 , i ( u , x , φ ) ,
U 2 , n ( φ ) : = ( n m ) ! ( n ) ! i I n m ξ i 1 ξ i m H 2 , i ( z ) .
The convergence of U 2 , n ( φ ) to zero in probability has been established by Lemma 1. Therefore, our focus now is to demonstrate that U 1 , n ( φ ) converges weakly to a Gaussian process denoted as G ( φ ) . To achieve this, we will proceed with finite-dimensional convergence and equicontinuity. The finite-dimensional convergence requires that every finite set of functions f 1 , , f q in L 2 , with U ˜ being the centered form of U , satisfies
n h m ϕ x , θ 1 / m ( h n ) U ˜ 1 , n ( f 1 ) , , n h m ϕ x , θ 1 / m ( h n ) U ˜ 1 , n ( f q )
converges to the corresponding finite-dimensional distributions of the process G ( φ ) . It is sufficient to show that for every fixed collection ( a 1 , , a q ) R q , we have
j = 1 q a j U ˜ 1 , n ( f j ) N 0 , v 2 ,
where
v 2 = j = 1 q a j 2 Var U ˜ 1 , n ( f j ) + s r a s a r Cov U ˜ 1 , n ( f s ) , U ˜ 1 , n ( f r ) .
Take
Ψ ( · ) = j = 1 q a j f j ( · ) .
By the linearity of Ψ ( · ) , we have to see that
U ˜ 1 , n ( Ψ , i ) G ( Ψ ) .
Let
N = E j = 1 m K 2 d θ j ( x j , X s i j , A n ) h n .
We have
U ˜ 1 , n ( h n ) = N 1 × 1 n i = 1 n ( n m ) ! ( n 1 ) ! I n 1 m 1 ( i ) = 1 m ξ i 1 ξ i 1 ξ i ξ i ξ i m 1 1 ϕ x , θ ( h n ) K 2 d θ i ( x i , X s i , A n ) h n × h ( y 1 , , y 1 , Y i , y , , y m 1 ) j = 1 j i m 1 1 ϕ x , θ ( h n ) K 2 d θ j ( x j , X s i j , A n ) h n P ( d ( ν 1 , y 1 ) , , d ( ν 1 , y 1 ) , d ( ν , y ) , , d ( ν m 1 , y m 1 ) ) , : = N 1 1 n i = 1 n ξ i 1 ϕ x , θ ( h n ) K 2 d θ i ( x i , X s i , A n ) h n h ˜ ( Y i ) .
The subsequent step involves extending the Blocking techniques of Bernstein to the spatial process, with all notions defined in Section 9. Recall that L n = L 1 , n L 2 , n , and define
Z s , A n ( x , θ , u ) : = ξ i 1 ϕ x , θ 1 / m ( h n ) K 2 d θ i ( x i , X s i , A n ) h n h ˜ ( Y i ) ,
and
Z n ( ; ϵ ) = i : s i Γ n ( ; ϵ ) R n Z s , A n ( x , θ , u ) = Z n ( 1 ) ( ; ϵ ) , , Z n ( p ) ( ; ϵ ) .
Then, we have
U ˜ 1 , n ( h n ) = i = 1 n Z s , A n ( x , θ , u ) = L n Z n ; ϵ 0 + ϵ ϵ 0 L 1 , n Z n ( ; ϵ ) = : Z 2 , n ( ϵ ) + ϵ ϵ 0 L 2 , n Z n ( ; ϵ ) = : Z 3 , n ( ϵ ) = : Z 1 , n + ϵ ϵ 0 Z 2 , n ( ϵ ) + ϵ ϵ 0 Z 3 , n ( ϵ ) .
Lemma 9 establishes that Z 2 , n and Z 3 , n , for ϵ ϵ 0 , are asymptotically negligible. Addressing the variance of Z 1 , n is straightforward. Initially, mixing conditions are employed to replace large blocks with independent random variables. Subsequently, Lyapunov’s condition for the central limit theorem is applied to the sum of independent random variables. Similar to the proof of Proposition 1 using Lemma 5, as in Equation (75), observe that
sup t > 0 P · S Z 1 , n > t P · S Z ˘ 1 , n > t C A n A 1 , n d β A 2 , n ; A n d ,
where Z ˘ n ( ; ϵ ) : L n denotes a sequence of independent random vectors in R p under P · S such that
Z ˘ n ( ; ϵ ) = d Z n ( ; ϵ ) , under P S , L n .
By applying Lyapunov’s condition for the central limit theorem to the sum of independent random variables, the remaining condition of finite-dimensional convergence must be established. Next, we proceed to prove that
lim δ 0 lim n P n h m ϕ x , θ 1 / m ( h n ) U ˜ 1 , n ( h n , i ) FK ( δ , · p ) > ϵ = 0 ,
where
FK ( δ , · p ) : = U ˜ 1 , n ( h n ) U ˜ 1 , n ( h n ) : U ˜ 1 , n ( h n ) U ˜ 1 , n ( h n ) < δ , U ˜ 1 , n ( h n ) , U ˜ 1 , n ( h n ) FK ,
for
U ˜ 1 , n ( h n ) = N 1 1 n i = 1 n ξ i 1 ϕ x , θ 1 / m ( h n ) K 2 , 1 d θ i ( x i , X s i , A n ) h n h ˜ 1 ( Y i ) E U 1 , n ( h n ) , U ˜ 1 , n ( h n ) = N 1 1 n i = 1 n ξ i 1 ϕ x , θ 1 / m ( h n ) K 2 , 2 d θ i ( x i , X s i , A n ) h n h ˜ 2 ( Y i ) E U 1 , n ( h n ) .
Now, we will adapt the chaining technique found in [77], applying it to the conditional setting with the locally stationary process as in [175], but for random fields, similar to Lemma 1. We will employ the same strategy as in Lemma 1 to transition from the sequence of locally stationary random variables to the stationary one. Let ζ i = ( η i , ς i ) denote the independent block sequences:
P ( n ϕ x , θ 1 / m ( h n ) ) 1 / 2 h m / 2 N 1 i = 1 n ξ i K 2 d θ i ( x i , X i ) h h ˜ ( Y i ) E U 1 , n ( h n ) FK ( b , · p ) > ϵ 2 P ( n ϕ x , θ 1 / m ( h n ) ) 1 / 2 h m / 2 N 1 L n i : s i Γ n ( ; ϵ 0 ) R n ξ i K 2 d θ i ( x i , η i ) h h ˜ ( ς i ) E U 1 , n ( h n ) 1 n ϕ x , θ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n ) FK ( b , · p ) > ϵ } + C A n A 1 , n d β A 2 , n ; A n d + o ( 1 ) .
Exploiting Condition (E2) in Assumption 6, we deduce that β A 2 , n ; A n d 0 as n 0 . Consequently, it is straightforward to place the first phrase on the right-hand side of (121). Since the blocks are independent, we symmetrize using a sequence { ϵ j } j N * of i.i.d. Rademacher variables, i.e., random variables with
P ( ϵ j = 1 ) = P ( ϵ j = 1 ) = 1 / 2 .
It is crucial to note that the sequence { ϵ j } j N * is independent of the sequence ξ i = ( ς i , ζ i ) i N * . Therefore, it remains to be established that, for all ϵ > 0 and δ 0 ,
lim δ 0 lim n P ( n ϕ x , θ 1 / m ( h n ) ) 1 / 2 h m / 2 N 1 L n i : s i Γ n ( ; ϵ 0 ) R n ξ i K 2 d θ i ( x i , η i ) h h ˜ ( ς i ) E U 1 , n ( h n , i ) 1 n ϕ x , θ 1 / m ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n ) FK ( b , · p ) > ϵ } < δ .
Define the semi-norm
d ˜ n ϕ , 2 : = ( ( n ϕ x , θ 1 / m ( h n ) ) 1 / 2 h m / 2 N 1 L n i : s i Γ n ( ; ϵ 0 ) R n ξ i K 2 , 1 d θ i ( x i , η i ) h h ˜ 1 ( ς i ) E U 1 , n ( h n , i ) ξ i K 2 , 2 d θ i ( x i , η i ) h h ˜ 2 ( ς i ) E U 1 , n ( h n , i ) 2 1 / 2 ,
and the covering number defined for any class of functions E by
N ˜ n ϕ , 2 ( u , E ) : = N n ϕ , 2 ( u , E , d ˜ n ϕ , 2 ) .
Given the aforementioned independence, we can bound (122), with more details provided in [78]. Following a similar approach as in [78] and previously in [77], due to the independence between the blocks and Assumption 7 (C3), and by applying ([174] Lemma 5.2), we achieve equicontinuity, leading to weak convergence. Now, our objective is to demonstrate that for all ϵ > 0 and δ 0 ,
P μ n ( R ) ( φ , t ) F m K Θ m > λ 0 a s n .
For clarity, let us focus our discussion on the case where m = 2 . Using the same notation as in Lemma 1, we can decompose it as follows:
μ n ( R ) ( φ , i ) = n h m + d ϕ x , θ 1 / m ( h n ) U n ( R ) ( φ , i ) E U n ( R ) ( φ , i ) = n h m + d ϕ x , θ 1 / m ( h n ) n ( n 1 ) i 1 i 2 n ξ i 1 ξ i 2 G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) E G φ , θ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) 1 n h m + d ϕ x , θ 1 / m ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , θ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 1 n h m + d ϕ x , θ 1 / m ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 1 ; ϵ 0 ) R n i 1 i 2 ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , θ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 2 1 n h m + d ϕ x , θ ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , θ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 2 1 n h m + d ϕ x , θ 1 / m ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , t ( R ) X i , X j ) , ( Y i , Y j E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 1 n h m + d ϕ x , θ 1 / m ( h n ) 1 2 L 1 , n L 2 , n ϵ ϵ 0 i 1 : s i 1 Γ n ( 1 ; ϵ ) R n i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , t ( R ) X i , X j ) , ( Y i , Y j E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) + 1 n h m + d ϕ x , θ 1 / m ( h n ) 1 L 1 , n L 2 , n ϵ ϵ 0 i 1 < i 2 : s i 1 , s i 2 Γ n ( 1 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , t ( R ) X i , X j ) , ( Y i , Y j E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) = : I + II + III + IV + V + VI .
We will employ blocking arguments to analyze the resulting terms. Let us begin by examining the first term, I . We obtain
P 1 n ϕ x , θ 1 / m ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) X i 1 , X i 2 ) , ( Y i 1 , Y i 2 E G φ , t ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) F 2 K 2 > δ } P 1 n ϕ x , θ 1 / m ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , θ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) F 2 K 2 > δ } + C A n A 1 , n d β A 2 , n ; A n d .
Recall that for all φ F m , and
x H 2 , y Y 2 : 1 I d θ i x , X i , n h F ( y ) φ ( y ) K 2 d θ i ( x i , X s i , A n ) h n .
Hence, by the symmetry of F ( · ) ,
1 n ϕ x , θ 1 / m ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , t ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) F 2 K 2 1 n ϕ x , θ 1 / m ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 I F > λ n E F ( ζ i , ζ j ) 1 I F > λ n 1 n ϕ x , θ ( h n ) p q υ n i H p ( U ) j H q ( U ) .
We will employ Chebyshev’s inequality, Hoeffding’s trick, and inequalities, respectively, to obtain
P 1 n ϕ x , θ 1 / m ( h n ) 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 I F > λ n           E F ( ζ i , ζ j ) 1 I F > λ n 1 n ϕ x , θ ( h n ) p q υ n i H p ( U ) j H q ( U ) | > δ }           δ 2 n 1 ϕ x , θ 1 / m ( h n ) V a r 1 2 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 I F > λ n           c 2 L n δ 2 n 1 ϕ x , θ 1 / m ( h n ) V a r i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 I F > λ n           2 c 2 L n δ 2 n 2 ϕ x , θ 1 / m ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 E F ( ζ 1 , ζ 2 ) 2 1 I F > λ n .
Under Assumption 7 (iii), we have for each λ > 0
c 2 L n δ 2 n 2 ϕ x , θ 1 / m ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 E F ( ζ 1 , ζ 2 ) 2 1 I F > λ n = c 2 L n δ 2 n 2 ϕ x , θ 1 / m ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 × 0 P F ( ζ 1 , ζ 2 ) 2 1 I F > λ n t d t = c 2 L n δ 2 n 2 ϕ x , θ 1 / m ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 0 λ n P F > λ n d t + c 2 L n δ 2 n 2 ϕ x , θ 1 / m ( h n ) i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n i 2 : s i 2 Γ n ( 2 ; ϵ 0 ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 λ n P F 2 > t d t ,
converging to 0 as n . Terms II , V , and VI will be handled similarly to the last term. However, II and VI diverge because the variables { ζ i , ζ j } ϵ = ϵ 0 (or { ζ i , ζ j } ϵ ϵ 0 for VI ) belong to the same blocks. Term IV can be deduced from the study of terms I and III . Considering the term III , we have
P 1 n ϕ x , θ 1 / m ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) X i , X j ) , ( Y i , Y j E G φ , θ , i ( R ) ( X i 1 , X i 2 ) , ( Y i 1 , Y i 2 ) 1 n ϕ x , θ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ P 1 n ϕ x , θ 1 / m ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , θ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) 1 n ϕ x , θ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ + L n A 1 , n d A 2 , n d β A 2 , n ; A n d n ϕ x , θ 1 / m ( h n ) .
We have also
P 1 n ϕ x , θ 1 / m ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , θ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) 1 n ϕ x , θ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ P 1 n ϕ x , θ 1 / m ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 G φ , θ , i ( R ) ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 E G φ , θ , i ( R ) ( ς i 1 , ς i 2 ) , ( ζ i 1 , ζ i 2 ) 1 n ϕ x , θ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n F 2 K 2 > δ .
Keeping in mind (123), the problem can be reduced to
P 1 n ϕ x , θ 1 / m ( h n ) 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 F ( ζ i , ζ j ) 1 I F > λ n E F ( ζ i , ζ j ) 1 I F > λ n 1 n ϕ x , θ ( h n ) p q υ n i H p ( U ) j H q ( U ) > δ δ 2 n 1 ϕ ( h n ) V a r ( 1 L n i 1 : s i 1 Γ n ( 1 ; ϵ 0 ) R n Δ 2 2 L 1 , n L 2 , n ϵ ϵ 0 i 2 : s i 2 Γ n ( 2 ; ϵ ) R n ϕ x , θ ( h n ) ξ i 1 ξ i 2 × F ( ζ i , ζ j ) 1 I F > λ n 1 n ϕ x , θ ( h n ) p = 1 υ n i H p ( U ) q : | q p | 2 υ n ) .
We follow the identical technique as in (124). The remainder has just been demonstrated to be asymptotically negligible by considering λ n large enough. Finally, with
r ^ ( m ) ( x , θ , u ) E U n ( φ , i ) ,
and
U n ( 1 , i ) P 1 ,
the weak convergence of our estimator is accomplished. □

10. Technical Results

Assumption 8.
(KD1) 
(KB2) in Assumption 2 holds.
(KD2) 
For any α Z d with | α | = 1 , 2 , α f S ( s ) exist and continuous on ( 0 , 1 ) d .
Define
f ^ S ( u ) = 1 n h d j = 1 n K ¯ h u S 0 , j .
Lemma 2
([176], Theorem 2). Under Assumption 8 and h 0 such that n h d / ( log n ) ∞ as n , we have that
sup u [ 0 , 1 ] d f ^ S ( u ) f S ( u ) = O log n n h d + h 2 P S a . s .
Lemma 3.
Let I h = C 1 h , 1 C 1 h . Suppose that kernel K 1 ( · ) satisfies Assumption 8 part (i). Then, for q = 0 , 1 , 2 and m > 1 ,
sup u I h 1 n m h m d i I n m j = 1 m K ¯ u j S 0 , i j h n u j S 0 , i j h q R m d 1 h m d j = 1 m K ¯ u j ω j h n u j ω j h q f S ( ω j ) j = 1 m d ω j = O log n n h d m P S a . s .
Lemma 4.
Suppose that kernel K ¯ satisfies Assumption 8. Let g : [ 0 , 1 ] m d × H m R , ( u , x ) g ( u , x ) be a function continuously partially differentiable with respect to u j . For k = 1 , 2 , we have
sup u I h 1 n m h m d i I n m j = 1 m K ¯ u j S 0 , j h n k g S 0 , j , x j j = 1 m κ k f S ( u j ) g ( u j , x j ) = O log n n h m d + o ( h ) , P S a . s .
where
κ k = R d K ¯ k ( x ) d x .
For any probability measure Q on a product measure space Ω 1 × Ω 2 , Σ 1 × Σ 2 , we may define the β -mixing coefficients as follows:
Definition 5
(Definition 2.5, [165]). Let Q 1 and Q 2 be the marginal probability measures of Q on Ω 1 , Σ 1 and Ω 2 , Σ 2 , respectively. We set
β Σ 1 , Σ 2 , Q = P sup Q B Σ 1 Q 2 ( B ) : B Σ 2 .
The following lemma holds for every finite n and is essential for the generation of independent blocks for β -mixing sequences.
Lemma 5
(Corollary 2.7, [165]). Let m N and let Q denote a probability measure on a product space i = 1 m Ω i , i = 1 m Σ i with the associated marginal measures Q i on Ω i , Σ i . Assume that h is a bounded measurable function on the product probability space in such a way that | h | M h < . For 1 a b m , let Q a b be the marginal measure on i = a b Ω i , i = a b Σ i . For a given τ > 0 , suppose that, for all 1 k m 1 ,
Q Q 1 k × Q k + 1 m T V 2 τ
where Q 1 k × Q k + 1 m is the product measure and · TV is the total variation. Then
| Q h P h | 2 M h ( m 1 ) τ ,
where
P = i = 1 m Q i , Q h = h d Q a n d P h = h d P .
Lemma 6.
Let
I n = i Z d : i + ( 0 , 1 ] d R n .
Then we have
P S j = 1 n 1 I A n S 0 , j i + ( 0 , 1 ] d R n > 2 log n + n A n d for some i I n , i . o . = 0 ,
and
P S j = 1 n 1 I A n S 0 , j Γ n ( ; ϵ ) > C A 1 , n q ( ϵ ) A 2 , n d q ( ϵ ) n A n d for some L 1 , n , i . o . = 0 ,
for any ϵ { 1 , 2 } d , where “i.o.” stands for infinitely often.
Proof. 
See the proof in (Lemma A.1, [89]) for each statement. □
Remark 13.
Lemma 6 implies that each Γ n ( ; ϵ ) contains at most C A 1 , n q ( ϵ ) A 2 , n d q ( ϵ ) n A n d samples P S -almost surely.
Lemma 7.
[13,175] Under Assumptions 2 and 3, Condition (B1) in Assumptions 4–6 and 8, we have
E . S S ¯ n ; ϵ 2 C A 1 , n d 1 A 2 , n ( n A n d + log n ) h m d ϕ ( h ) .
Lemma 8
(Bernstein’s inequality). Let X 1 , , X n be zero-mean independent random variables. Assume that
max 1 i n X i M < , a . s .
For all t > 0 , we have
P i = 1 n X i t exp t 2 2 i = 1 n E X i 2 + M t 3 .
Lemma 9.
[13,175] Under Assumptions 2–4 and 6, we have
1 n h m d ϕ ( h ) Var · S L 1 , n Z n ( ; ϵ ) = o ( 1 ) , P S a . s . ,
1 n h m d ϕ ( h ) Var · S L 2 , n Z n ( ; ϵ ) = o ( 1 ) , P S a . s .
Remark 14.
To establish the asymptotic negligibility of the summation over the small block, we can employ the method introduced in [80]. This method involves transitioning from the dependence structure of variables to independence as the first step. Subsequently, the convergence of the second-order expectation to zero is proven using a maximal inequality. This approach circumvents the need to explicitly handle covariance and relies on the use of a maximal inequality.
Proposition 2
(Proposition 2.6, [173]). If a process X t : t T satisfies
E X t X s p 1 / p p 1 q 1 m / 2 E X t X s q 1 / q , 1 < q < p < ,
for some m 1 , and if
ρ ( s , t ) = E X t X s 2 1 / 2 ,
there is a constant K = K ( m ) < such that
E sup s , t T X t X s K 0 D [ log N ( T , ρ , ε ) ] m / 2 d ε ,
D being the ρ-diameter of T. Moreover, if T is finite, that is, T = { 1 , , N } , and N 2 , then
E max i N X i K ( log N ) m / 2 max i N E X i 2 1 / 2 .
Lemma 10.
Ref. [177] Let X 1 , , X n be a sequence of independent random elements taking values in a Banach space ( B , · ) with E X i = 0 for all i . Let ε i be a sequence of independent Bernoulli random variables independent of X i . Then, for any convex increasing function Φ,
E Φ 1 2 i = 1 n X i ε i E Φ i = 1 n X i E Φ 2 i = 1 n X i ε i .

10.1. Examples of Classes of Functions

Example 3.
The set F of all indicator functions 1 I { ( , t ] } of cells in R satisfies
N ϵ , F , d P ( 2 ) 2 ϵ 2 ,
for any probability measure P and ϵ 1 . Notice that
0 1 log 1 ϵ d ϵ 0 u 1 / 2 exp ( u ) d u 1 .
For more details and discussion on this example refer to Example 2.5.4 of [108] and ([109], p. 157). The covering numbers of the class of cells ( , t ] in higher dimension satisfy a similar bound, but with a higher power of ( 1 / ϵ ) , see Theorem 9.19 of [109].
Example 4
(Classes of functions that are Lipschitz in a parameter , Section 2.7.4 in [108]).Let F be the class of functions x φ ( t , x ) that are Lipschitz in the index parameter t T . Suppose that
| φ ( t 1 , x ) φ ( t 2 , x ) | d ( t 1 , t 2 ) κ ( x )
for some metric d on the index set T, the function κ ( · ) is defined on the sample space X . According to Theorem 2.7.11 of [108] and Lemma 9.18 of [109], it follows, for any norm · F on F , that
N ( ϵ F F , F , · F ) N ( ϵ / 2 , T , d ) .
Hence if ( T , d ) satisfy
J ( , T , d ) = 0 log N ( ϵ , T , d ) d ϵ < ,
then the conclusions hold for F .
Example 5.
Let us consider as an example the classes of functions that are smooth up to order α defined as follows, see Section 2.7.1 of [108] and Section 2 of [108]. For 0 < α < let α be the greatest integer strictly smaller than α. For any vector k = ( k 1 , , k d ) of d integers define the differential operator
D k . : = k . k 1 k d , w h e r e k . : = i = 1 d k i .
Then, for a function f : X R , let
f α : = max k . α sup x | D k f ( x ) | + max k . = α sup x D k f ( x ) D k f ( y ) x y α α ,
where the suprema are taken over all x , y in the interior of X with x y . Let C M α ( X ) be the set of all continuous functions f : X R with
f α M .
Note that for α 1 , this class consists of bounded functions f that satisfy a Lipschitz condition. Ref. [104] computed the entropy of the classes of C M α ( X ) for the uniform norm. As a consequence of their results in [108], we know that there exists a constant K depending only on α , d and the diameter of X such that for every measure γ and every ϵ > 0 ,
log N [ ] ( ϵ M γ ( X ) , C M α ( X ) , L 2 ( γ ) ) K 1 ϵ d / α ,
N [ ] is the bracketing number, refer to Definition 2.1.6 of [108] and we refer to Theorem 2.7.1 of [108] for a variant of the last inequality. By Lemma 9.18 of [109], we have
log N ( ϵ M γ ( X ) , C M α ( X ) , L 2 ( γ ) ) K 1 2 ϵ d / α .

10.2. Examples of U-kernels

In this section, we present some classical U-kernels.
Example 6.
Ref. [125] introduced the parameter
= D 2 ( y 1 , y 2 ) d F ( y 1 , y 2 ) ,
where
D ( y 1 , y 2 ) = F ( y 1 , y 2 ) F ( y 1 , ) F ( , y 2 ) ,
and F ( · , · ) is the distribution function of Y 1 and Y 2 . The parameter ▵ has the property that = 0 if and only if Y 1 and Y 2 are independent. From [178], an alternative expression for ▵ can be developed by introducing the functions
ψ y 1 , y 2 , y 3 = 1 if y 2 y 1 < y 3 , 0 if y 1 < y 2 , y 3 or y 1 y 2 , y 3 , 1 if y 3 y 1 < y 2 ,
and
φ y 1 , 1 , y 1 , 2 , , y 5 , 1 , y 5 , 2 = 1 4 ψ y 1 , 1 , y 1 , 2 , y 1 , 3 ψ y 1 , 1 , y 1 , 4 , y 1 , 5 × ψ y 1 , 2 , y 2 , 2 , y 3 , 2 ψ y 1 , 2 , y 4 , 2 , y 5 , 2 .
We have
= φ y 1 , 1 , y 1 , 2 , , y 5 , 1 , y 5 , 2 d F y 1 , 1 , y 1 , 2 d F y 1 , 5 , y 2 , 5 .
Example 7 (Hoeffding’s D).
From the symmetric kernel,
φ D z 1 , , z 5     : =     1 16 i 1 , , i 5 P 5     ×     1 I z i 1 , 1 z i 5 , 1 1 I z i 2 , 1 z i 5 , 1 1 I z i 3 , 1 z i 5 , 1 1 I z i 4 , 1 z i 5 , 1     ×     1 I z i 1 , 2 z i 5 , 2 1 I z i 2 , 2 z i 5 , 2 1 I z i 3 , 2 z i 5 , 2 1 I z i 4 , 2 z i 5 , 2 ,
we recover Hoeffding’s D statistic, a rank-based U-statistic of order 5, and gives rise to Hoeffding’s D correlation measure E h D .
Example 8
(Blum–Kiefer–Rosenblatt’s R). The symmetric kernel
φ R z 1 , , z 6     : =     1 32 i 1 , , i 6 P 6     ×     1 I z i 1 , 1 z i 5 , 1 1 I z i 2 , 1 z i 5 , 1 1 I z i 3 , 1 z i 5 , 1 1 I z i 4 , 1 z i 5 , 1     ×     1 I z i 1 , 2 z i 6 , 2 1 I z i 2 , 2 z i 6 , 2 1 I z i 3 , 2 z i 6 , 2 1 I z i 4 , 2 z i 6 , 2 ,
yields Blum–Kiefer–Rosenblatt’s R statistic [179].
Example 9
(Bergsma–Dassios–Yanagimoto’s τ * ). Ref. [180] introduced a rank correlation statistic as a U-statistic of order 4 with the symmetric kernel
φ τ * z 1 , , z 4 : = 1 16 i 1 , , i 4 P 4 1 I z i 1 , 1 , z i 3 , 1 < z i 2 , 1 , z i 4 , 1 + 1 I z i 2 , 1 , z i 4 , 1 < z i 1 , 1 , z i 3 , 1 1 I z i 1 , 1 , z i 4 , 1 < z i 2 , 1 , z i 3 , 1 1 I z i 2 , 1 , z i 3 , 1 < z i 1 , 1 , z i 4 , 1 × 1 I z i 1 , 2 , z i 3 , 2 < z i 2 , 2 , z i 4 , 2 + 1 I z i 2 , 2 , z i 4 , 2 < z i 1 , 2 , z i 3 , 2 1 I z i 1 , 2 , z i 4 , 2 < z i 2 , 2 , z i 3 , 2 1 I z i 2 , 2 , z i 3 , 2 < z i 1 , 2 , z i 4 , 2 .
Here,
1 I y 1 , y 2 < y 3 , y 4 : = 1 I y 1 < y 3 1 I y 1 < y 4 1 I y 2 < y 3 1 I y 2 < y 4 .
Example 10.
The Wilcoxon Statistic. Suppose that E R is symmetric around zero. As an estimate of the quantity
( x , y ) E 2 2 1 I { x + y > 0 } 1 d F ( x ) d F ( y ) ,
it is pertinent to consider the statistic
W n = 2 n ( n 1 ) 1 i < j n 2 · 1 I X i + X j > 0 1 ,
which is relevant for testing whether or not μ is located at zero.
Example 11.
The Takens estimator. Denote by · the usual Euclidean norm on R d . In [181], the following estimate of the correlation integral,
C F ( r ) = 1 I x x r d F ( x ) d F x , r > 0 ,
is considered:
C n ( r ) = 1 n ( n 1 ) 1 i j n 1 I X i X j r .
In the case where a scaling law holds for the correlation integral, i.e., when there exists α , r 0 , c R + * 3 such that C F ( r ) = c · r α for 0 < r r 0 , the U-statistic
T n = 1 n ( n 1 ) 1 i j n log X i X j r 0 ,
is used in order to build the Takens estimator α ^ n = T n 1 of the correlation dimension α.
Example 12.
Let Y 1 Y 2 ^ denote the oriented angle between Y 1 , Y 2 T , T is the circle of radius 1 and center 0 in R 2 . Let
φ t ( Y 1 , Y 2 ) = 1 I { Y 1 Y 2 ^ t } t / π , f o r t [ 0 , π ) .
Ref. [182] has used this kernel in order to propose a U-process to test uniformity on the circle.
Example 13.
For m = 3 , let
φ ( Y 1 , Y 2 , Y 3 ) = 1 I { Y 1 Y 2 Y 3 > 0 } ,
We have
r ( 3 ) ( φ , t 1 , t 2 , t 3 ) = P ( Y 1 > Y 2 + Y 3 X 1 = X 2 = X 3 = t )
and the corresponding conditional U-Statistic can be considered a conditional analog of the Hollander–Proschan test-statistic [183]. It may be used to test the hypothesis that the conditional distribution of Y 1 given X 1 = t , is exponential, against the alternative that it is of the New Better than Used type.
Example 14.
The Gini [184] mean difference. The Gini index provides another popular measure of dispersion. It corresponds to the case where E R and h ( x , y ) = | x y |
G n = 2 n ( n 1 ) 1 i < j n X i X j .
Example 15
([20]). Let the sample central moments of any order m = 2 , 3 , be given by
θ m ( F ) = E X 1 E X 1 m = ( x E X 1 ) m d F ( x ) .
In this case, the U-statistic has a symmetric kernel
φ x 1 , , x m = 1 m ! x i 1 m m 1 x i 1 m 1 x i 2 + m 2 x i 1 m 2 x i 2 x i 3 + ( 1 ) m 1 m m 1 1 x i 1 x i 2 x i m ,
where summation is carried out over all permutations i 1 , , i m of the numbers ( 1 , , m ) . In particular, if m = 3 , then
φ x 1 , x 2 , x 3 = 1 3 x 1 3 + x 2 3 + x 3 3 1 2 x 1 2 x 2 + x 2 2 x 1 + x 1 2 x 3 + x 3 2 x 1 + x 2 2 x 3 + x 2 2 x 2 + 2 x 1 x 2 x 3 .
In the case of m = 2 ,
θ 2 ( F ) = E X 1 E X 1 2 = ( x E X 1 ) 2 d F ( x ) .
For the kernel
φ x 1 , x 2 = x 1 2 + x 2 2 2 x 1 x 2 2 = 1 2 x 1 x 2 2 .
the corresponding U-statistic is the sample variance
U n h = 2 n ( n 1 ) 1 i < j n h X i , X j = 1 n 1 i = 1 n X i 2 n 1 n i = 1 n X i 2 = 1 n 1 i = 1 n X i 2 n X ¯ n 2 ,
we refer also to [19].
Example 16
([40]). Assume Y i = Y i 1 , Y i 2 t , and define h by
φ y 11 y 12 , y 21 y 22 = 1 2 y 11 y 12 + y 21 y 22 y 11 y 22 y 12 y 21 ,
that is, m = 2 , and
r ( 2 ) x 1 , x 2 = 1 2 E Y 11 Y 12 X 1 = x 1 + E Y 21 Y 22 X 2 = x 2 E Y 11 Y 22 X 1 = x 1 , X 2 = x 2 E Y 12 Y 21 X 1 = x 1 , X 2 = x 2 .
In particular,
r ( 2 ) x 1 , x 1 = E Y 11 Y 12 X 1 = x 1 E Y 11 X 1 = x 1 E Y 12 X 1 = x 1 ,
the conditional covariance of Y 1 given X 1 = x 1 .
Example 17
([185]) This example is given in [185]. The accelerated failure time (AFT) model is a prevalent tool in survival analysis, assuming a linear relationship between the logarithm of failure time T i and associated covariates X i . Mathematically, it is represented as
log T i = X i θ + ζ i .
Here, θ represents a vector of unknown regression parameters ( p × 1 ), and { ζ i , i = 1 , , N } are independent and identically distributed (i.i.d.) random errors with an unspecified distribution function, independent of { X i , i = 1 , , N } . Usually, T i is censored at C i , leading to observed data denoted as { Z i = ( T ˜ i , δ i , X i ) , i = 1 , , N } , where T ˜ i = min ( T i , C i ) is the observed failure time, and δ i = I ( T i C i ) is the censoring indicator. Estimating AFT models often involves using the Gehan estimator [186]. This estimator is derived by minimizing an objective function, formulated as a U-statistic:
F N ( θ ) = 1 C N 2 i j N δ i ( e j ( θ ) e i ( θ ) ) I ( e i ( θ ) e j ( θ ) ) .
Here, e i ( θ ) = log T ˜ i X i θ . However, the non-smooth nature of this function poses computational challenges in estimating coefficients and their standard errors. To address this challenge, ref. [187] introduced an induced smoothing method. They replace the non-smooth objective function with a smooth approximation, minimizing the following empirical risk:
1 C N 2 i j N δ i ( e j ( θ ) e i ( θ ) ) Φ e j ( θ ) e i ( θ ) r i j + r i j ϕ e j ( θ ) e i ( θ ) r i j .
Here, r i j 2 = 1 N ( X i X j ) Σ ( X i X j ) , and Σ is a symmetric and positive definite matrix satisfying Σ 1 / 2 = O ( 1 ) .

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The author expresses gratitude to the Editor-in-Chief, the two anonymous Associate Editors, and the four anonymous referees for their invaluable comments, which significantly enhanced the quality and clarity of this work. Additionally, the author dedicates this paper to his brothers and sisters, whose inspiration has been a profound source of motivation.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Monographs on Statistics and Applied Probability Series; Chapman & Hall: London, UK, 1986; p. x+175p. [Google Scholar]
  2. Nadaraya, E.A. Nonparametric Estimation of Probability Densities and Regression Curves; Mathematics and its Applications Series; Kotz, S., Translator; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1989; Volume 20, p. x+213. [Google Scholar] [CrossRef]
  3. Wand, M.P.; Jones, M.C. Kernel Smoothing; Monographs on Statistics and Applied Probability Series; Chapman & Hall: London, UK, 1995; Volume 60, p. xii+212. [Google Scholar]
  4. Eggermont, P.P.B.; LaRiccia, V.N. Maximum Penalized Likelihood Estimation; Springer Series in Statistics; Springer: New York, NY, USA, 2001; Volume I, p. xviii+510. [Google Scholar]
  5. Ripley, B.D. Spatial statistics: Developments 1980–1983. Internat. Statist. Rev. 1984, 52, 141–150. [Google Scholar] [CrossRef]
  6. Rosenblatt, M. Stationary Sequences and Random Fields; Birkhäuser Boston, Inc.: Boston, MA, USA, 1985; p. 258. [Google Scholar] [CrossRef]
  7. Guyon, X. Random Fields on a Network. Modeling, Statistics, and Applications; Probability and its Applications Series; Ludeña, C., Translator; Springer: New York, NY, USA, 1995; p. xii+255. [Google Scholar]
  8. Cressie, N.A.C. Statistics for Spatial Data, revised ed.; Wiley Classics Library, John Wiley & Sons, Inc.: New York, NY, USA, 2015; p. xx+900. [Google Scholar]
  9. Tran, L.T. Kernel density estimation on random fields. J. Multivar. Anal. 1990, 34, 37–53. [Google Scholar] [CrossRef]
  10. Tran, L.T.; Yakowitz, S. Nearest neighbor estimators for random fields. J. Multivar. Anal. 1993, 44, 23–46. [Google Scholar] [CrossRef]
  11. Biau, G.; Cadre, B. Nonparametric spatial prediction. Stat. Inference Stoch. Process. 2004, 7, 327–349. [Google Scholar] [CrossRef]
  12. Ndiaye, M.; Dabo-Niang, S.; Ngom, P. Nonparametric prediction for spatial dependent functional data under fixed sampling design. Rev. Colomb. Estadíst. 2022, 45, 391–428. [Google Scholar] [CrossRef]
  13. Soukarieh, I.; Bouzebda, S. Non-Parametric Conditional U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics. 2023, 11, 3745. [Google Scholar] [CrossRef]
  14. Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. The Local Linear Functional kNN Estimator of the Conditional Expectile: Uniform Consistency in Number of Neighbors. Metrika 2024, 34, 1–29. [Google Scholar] [CrossRef]
  15. Stute, W. Almost sure representations of the product-limit estimator for truncated data. Ann. Statist. 1993, 21, 146–156. [Google Scholar] [CrossRef]
  16. Arcones, M.A.; Wang, Y. Some new tests for normality based on U-processes. Statist. Probab. Lett. 2006, 76, 69–82. [Google Scholar] [CrossRef]
  17. Schick, A.; Wang, Y.; Wefelmeyer, W. Tests for normality based on density estimators of convolutions. Statist. Probab. Lett. 2011, 81, 337–343. [Google Scholar] [CrossRef]
  18. Joly, E.; Lugosi, G. Robust estimation of U-statistics. Stoch. Process. Appl. 2016, 126, 3760–3773. [Google Scholar] [CrossRef]
  19. Serfling, R.J. Approximation Theorems of Mathematical Statistics; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons: New York, NY, USA, 1980; p. xiv+371. [Google Scholar]
  20. Koroljuk, V.S.; Borovskich, Y.V. Theory of U-Statistics; Mathematics and its Applications Series; Malyshev, P.V.; Malyshev, D.V., Translators; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1994; Volume 273, p. x+552. [Google Scholar]
  21. de la Peña, V.H.; Giné, E. Randomly stopped processes. U-statistics and processes. Martingales and beyond. In Decoupling. From Dependence to Independence; Probability and its Applications Series; Springer: New York, NY, USA, 1999; p. xvi+392. [Google Scholar] [CrossRef]
  22. Frees, E.W. Infinite order U-statistics. Scand. J. Statist. 1989, 16, 29–45. [Google Scholar]
  23. Heilig, C.; Nolan, D. Limit theorems for the infinite-degree U-process. Statist. Sin. 2001, 11, 289–302. [Google Scholar]
  24. Song, Y.; Chen, X.; Kato, K. Approximating high-dimensional infinite-order U-statistics: Statistical and computational guarantees. Electron. J. Stat. 2019, 13, 4794–4848. [Google Scholar] [CrossRef]
  25. Faivishevsky, L.; Goldberger, J. ICA based on a Smooth Estimation of the Differential Entropy. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–10 December 2008; Koller, D., Schuurmans, D., Bengio, Y., Bottou, L., Eds.; Curran Associates, Inc.: New York, NY, USA, 2008; Volume 21. [Google Scholar]
  26. Liu, Q.; Lee, J.; Jordan, M. A Kernelized Stein Discrepancy for Goodness-of-fit Tests. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; Volume 48, pp. 276–284. [Google Scholar]
  27. Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis: Methods and Case Studies; Springer Series in Statistics; Springer: New York, NY, USA, 2002; p. x+190. [Google Scholar] [CrossRef]
  28. Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis: Theory and Practice; Springer Series in Statistics; Springer: New York, NY, USA, 2006; p. xx+258. [Google Scholar]
  29. Araujo, A.; Giné, E. The Central Limit Theorem for Real and Banach Valued Random Variables; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons: New York, NY, USA; Chichester, UK; Brisbane, Australia, 1980; p. xiv+233. [Google Scholar]
  30. Gasser, T.; Hall, P.; Presnell, B. Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998, 60, 681–691. [Google Scholar] [CrossRef]
  31. Bosq, D. Linear Processes in Function Spaces: Theory and Applications; Lecture Notes in Statistics Series; Springer: New York, NY, USA, 2000; Volume 149, p. xiv+283. [Google Scholar] [CrossRef]
  32. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics; Springer: New York, NY, USA, 2012; p. xiv+422. [Google Scholar] [CrossRef]
  33. Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
  34. Ferraty, F.; Laksaci, A.; Tadj, A.; Vieu, P. Rate of uniform consistency for nonparametric estimates with functional variables. J. Statist. Plann. Inference 2010, 140, 335–352. [Google Scholar] [CrossRef]
  35. Attouch, M.; Laksaci, A.; Rafaa, F. On the local linear estimate for functional regression: Uniform in bandwidth consistency. Comm. Statist. Theory Methods 2019, 48, 1836–1853. [Google Scholar] [CrossRef]
  36. Ling, N.; Meng, S.; Vieu, P. Uniform consistency rate of kNN regression estimation for functional time series data. J. Nonparametr. Stat. 2019, 31, 451–468. [Google Scholar] [CrossRef]
  37. Bouzebda, S.; Nezzal, A. Uniform in number of neighbors consistency and weak convergence of kNN empirical conditional processes and kNN conditional U-processes involving functional mixing data. AIMS Math. 2024, 9, 4427–4550. [Google Scholar] [CrossRef]
  38. Didi, S.; Al Harby, A.; Bouzebda, S. Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time. Mathematics 2022, 10, 3433. [Google Scholar] [CrossRef]
  39. Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. Nonparametric estimation of expectile regression in functional dependent data. J. Nonparametr. Stat. 2022, 34, 250–281. [Google Scholar] [CrossRef]
  40. Stute, W. Conditional U-statistics. Ann. Probab. 1991, 19, 812–825. [Google Scholar] [CrossRef]
  41. Bauer, H. Probability Theory and Elements of Measure Theory, 2nd ed.; Probability and Mathematical Statistics Series; Academic Press: London, UK; New York, NY, USA, 1981; p. xiii+460. [Google Scholar]
  42. Sen, A. Uniform strong consistency rates for conditional U-statistics. Sankhyā Ser. A 1994, 56, 179–194. [Google Scholar]
  43. Prakasa Rao, B.L.S.; Sen, A. Limit distributions of conditional U-statistics. J. Theoret. Probab. 1995, 8, 261–301. [Google Scholar] [CrossRef]
  44. Harel, M.; Puri, M.L. Conditional U-statistics for dependent random variables. J. Multivar. Anal. 1996, 57, 84–100. [Google Scholar] [CrossRef]
  45. Stute, W. Symmetrized NN-conditional U-statistics. In Research Developments in Probability and Statistics; VSP: Utrecht, The Netherlands, 1996; pp. 231–237. [Google Scholar]
  46. Fu, K.A. An application of U-statistics to nonparametric functional data analysis. Comm. Statist. Theory Methods 2012, 41, 1532–1542. [Google Scholar] [CrossRef]
  47. Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 2020, 32, 452–509. [Google Scholar] [CrossRef]
  48. Goia, A.; Vieu, P. An introduction to recent advances in high/infinite dimensional statistics. J. Multivar. Anal. 2016, 146, 1–6. [Google Scholar] [CrossRef]
  49. Bouzebda, S.; Nemouchi, B. Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. 2023, 26, 33–88. [Google Scholar] [CrossRef]
  50. Bhattacharjee, S.; Müller, H.G. Single index Fréchet regression. Ann. Statist. 2023, 51, 1770–1798. [Google Scholar] [CrossRef]
  51. Stute, W.; Zhu, L.X. Nonparametric checks for single-index models. Ann. Statist. 2005, 33, 1048–1083. [Google Scholar] [CrossRef]
  52. Gu, L.; Yang, L. Oracally efficient estimation for single-index link function with simultaneous confidence band. Electron. J. Stat. 2015, 9, 1540–1561. [Google Scholar] [CrossRef]
  53. Ferraty, F.; Peuch, A.; Vieu, P. Modèle à indice fonctionnel simple. Comptes Rendus Math. Acad. Sci. Paris 2003, 336, 1025–1028. [Google Scholar] [CrossRef]
  54. Ait-Saïdi, A.; Ferraty, F.; Kassa, R.; Vieu, P. Cross-validated estimations in the single-functional index model. Statistics 2008, 42, 475–494. [Google Scholar] [CrossRef]
  55. Jiang, Z.; Huang, Z.; Zhang, J. Functional single-index composite quantile regression. Metrika 2023, 86, 595–603. [Google Scholar] [CrossRef]
  56. Nie, Y.; Wang, L.; Cao, J. Estimating functional single index models with compact support. Environmetrics 2023, 34, e2784. [Google Scholar] [CrossRef]
  57. Zhu, H.; Zhang, R.; Liu, Y.; Ding, H. Robust estimation for a general functional single index model via quantile regression. J. Korean Statist. Soc. 2022, 51, 1041–1070. [Google Scholar] [CrossRef]
  58. Tang, Q.; Kong, L.; Rupper, D.; Karunamuni, R.J. Partial functional partially linear single-index models. Statist. Sin. 2021, 31, 107–133. [Google Scholar] [CrossRef]
  59. Ling, N.; Cheng, L.; Vieu, P.; Ding, H. Missing responses at random in functional single index model for time series data. Statist. Pap. 2022, 63, 665–692. [Google Scholar] [CrossRef]
  60. Ling, N.; Cheng, L.; Vieu, P. Single functional index model under responses MAR and dependent observations. In Functional and High-Dimensional Statistics and Related Fields; Contributions to Statistics Series; Springer: Cham, Switzerland, 2020; pp. 161–168. [Google Scholar] [CrossRef]
  61. Feng, S.; Tian, P.; Hu, Y.; Li, G. Estimation in functional single-index varying coefficient model. J. Statist. Plann. Inference 2021, 214, 62–75. [Google Scholar] [CrossRef]
  62. Novo, S.; Aneiros, G.; Vieu, P. Automatic and location-adaptive estimation in functional single-index regression. J. Nonparametr. Stat. 2019, 31, 364–392. [Google Scholar] [CrossRef]
  63. Li, J.; Huang, C.; Zhu, H. A functional varying-coefficient single-index model for functional response data. J. Amer. Statist. Assoc. 2017, 112, 1169–1181. [Google Scholar] [CrossRef] [PubMed]
  64. Attaoui, S.; Ling, N. Asymptotic results of a nonparametric conditional cumulative distribution estimator in the single functional index modeling for time series data with applications. Metrika 2016, 79, 485–511. [Google Scholar] [CrossRef]
  65. Chen, D.; Hall, P.; Müller, H.G. Single and multiple index functional regression models with nonparametric link. Ann. Statist. 2011, 39, 1720–1747. [Google Scholar] [CrossRef]
  66. Fuglstad, G.A.; Simpson, D.; Lindgren, F.; Rue, H.v. Does non-stationary spatial data always require non-stationary random fields? Spat. Stat. 2015, 14, 505–531. [Google Scholar] [CrossRef]
  67. Steel, M.F.J.; Fuentes, M. Non-Gaussian and nonparametric models for continuous spatial data. In Handbook of Spatial Statistics; CRC Press: Boca Raton, FL, USA, 2010; pp. 149–167. [Google Scholar] [CrossRef]
  68. Chu, T.; Liu, J.; Wang, H.; Zhu, J. Spatio-temporal expanding distance asymptotic framework for locally stationary processes. Sankhya A 2022, 84, 689–713. [Google Scholar] [CrossRef]
  69. Matsuda, Y.; Yajima, Y. Locally stationary spatio-temporal processes. Jpn. J. Stat. Data Sci. 2018, 1, 41–57. [Google Scholar] [CrossRef]
  70. Bitter, A.; Stelzer, R.; Ströh, B. Continuous-time locally stationary time series models. Adv. Appl. Probab. 2023, 55, 965–998. [Google Scholar] [CrossRef]
  71. Kurisu, D. Nonparametric regression for locally stationary functional time series. Electron. J. Stat. 2022, 16, 3973–3995. [Google Scholar] [CrossRef]
  72. Pezo, D. Local Stationarity for Spatial Data. PhD Thesis, Technische Universtat Kaiserlautern, Kaiserslautern, Germany, 2018. [Google Scholar]
  73. Kurisu, D. Nonparametric regression for locally stationary random fields under stochastic sampling design. Bernoulli 2022, 28, 1250–1275. [Google Scholar] [CrossRef]
  74. Dahlhaus, R. Fitting time series models to nonstationary processes. Ann. Statist. 1997, 25, 1–37. [Google Scholar] [CrossRef]
  75. Dahlhaus, R.; Richter, S. Adaptation for nonparametric estimators of locally stationary processes. Econom. Theory 2023, 39, 1123–1153. [Google Scholar] [CrossRef]
  76. Dahlhaus, R.; Richter, S.; Wu, W.B. Towards a general theory for nonlinear locally stationary processes. Bernoulli 2019, 25, 1013–1044. [Google Scholar] [CrossRef]
  77. Arcones, M.A.; Yu, B. Central limit theorems for empirical and U-processes of stationary mixing sequences. J. Theoret. Probab. 1994, 7, 47–71. [Google Scholar] [CrossRef]
  78. Bouzebda, S.; Nemouchi, B. Central Limit Theorems for Conditional Empirical and Conditional U-Processes of Stationary Mixing Sequences. Math. Methods Statist. 2019, 28, 169–207. [Google Scholar] [CrossRef]
  79. Masry, E. Nonparametric regression estimation for dependent functional data: Asymptotic normality. Stoch. Process. Appl. 2005, 115, 155–177. [Google Scholar] [CrossRef]
  80. Daisuke Kurisu, K.K.; Shao, X. Gaussian Approximation and Spatially Dependent Wild Bootstrap for High-Dimensional Spatial Data. J. Amer. Statist. Assoc. 2023, 1–13. [Google Scholar] [CrossRef]
  81. Elmezouar, Z.C.; Alshahrani, F.; Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. Strong consistency rate in functional single index expectile model for spatial data. AIMS Math. 2024, 9, 5550–5581. [Google Scholar] [CrossRef]
  82. Kurisu, D.; Fukami, R.; Koike, Y. Adaptive deep learning for nonlinear time series models. arXiv 2023, arXiv:2207.02546. [Google Scholar]
  83. Vogt, M. Nonparametric regression for locally stationary time series. Ann. Statist. 2012, 40, 2601–2633. [Google Scholar] [CrossRef]
  84. Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Statist. Theory Methods 2017, 46, 2454–2493. [Google Scholar] [CrossRef]
  85. Dahlhaus, R.; Subba Rao, S. Statistical inference for time-varying ARCH processes. Ann. Statist. 2006, 34, 1075–1114. [Google Scholar] [CrossRef]
  86. van Delft, A.; Eichler, M. Locally stationary functional time series. Electron. J. Stat. 2018, 12, 107–170. [Google Scholar] [CrossRef]
  87. Hall, P.; Patil, P. Properties of nonparametric estimators of autocovariance for stationary random fields. Probab. Theory Relat. Fields 1994, 99, 399–424. [Google Scholar] [CrossRef]
  88. Matsuda, Y.; Yajima, Y. Fourier analysis of irregularly spaced data on Rd. J. R. Stat. Soc. Ser. B Stat. Methodol. 2009, 71, 191–217. [Google Scholar] [CrossRef]
  89. Lahiri, S.N. Central limit theorems for weighted sums of a spatial process under a class of stochastic and fixed designs. Sankhyā 2003, 65, 356–388. [Google Scholar]
  90. Chen, M.; Chen, W.; Yang, R. Double moving extremes ranked set sampling design. Acta Math. Appl. Sin. Engl. Ser. 2024, 40, 75–90. [Google Scholar] [CrossRef]
  91. Lahiri, S.N. Resampling Methods for Dependent Data; Springer Series in Statistics; Springer: New York, NY, USA, 2003; p. xiv+374. [Google Scholar] [CrossRef]
  92. Volkonskiĭ, V.A.; Rozanov, Y.A. Some limit theorems for random functions. I. Theor. Probab. Appl. 1959, 4, 178–197. [Google Scholar] [CrossRef]
  93. Rosenblatt, M. Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 1956, 27, 832–837. [Google Scholar] [CrossRef]
  94. Ibragimov, I.A.; Solev, V.N. A condition for the regularity of a Gaussian stationary process. Dokl. Akad. Nauk SSSR 1969, 185, 509–512. [Google Scholar]
  95. Bradley, R.C. A caution on mixing conditions for random fields. Statist. Probab. Lett. 1989, 8, 489–491. [Google Scholar] [CrossRef]
  96. Bradley, R.C. Some examples of mixing random fields. Rocky Mt. J. Math. 1993, 23, 495–519. [Google Scholar] [CrossRef]
  97. Doukhan, P. Mixing: Properties and Examples; Lecture Notes in Statistics Series; Springer: New York, NY, USA, 1994; Volume 85, p. xii+142. [Google Scholar] [CrossRef]
  98. Dedecker, J.; Doukhan, P.; Lang, G.; León R., J.R.; Louhichi, S.; Prieur, C. Weak Dependence: With Examples and Applications; Lecture Notes in Statistics Series; Springer: New York, NY, USA, 2007; Volume 190, p. xiv+318. [Google Scholar]
  99. Lahiri, S.N.; Zhu, J. Resampling methods for spatial regression models under a class of stochastic designs. Ann. Statist. 2006, 34, 1774–1813. [Google Scholar] [CrossRef]
  100. Bandyopadhyay, S.; Lahiri, S.N.; Nordman, D.J. A frequency domain empirical likelihood method for irregularly spaced spatial data. Ann. Statist. 2015, 43, 519–545. [Google Scholar] [CrossRef]
  101. Brockwell, P.J.; Matsuda, Y. Continuous auto-regressive moving average random fields on Rn. J. R. Stat. Soc. Ser. B. Stat. Methodol. 2017, 79, 833–857. [Google Scholar] [CrossRef]
  102. Berger, D. Lévy driven CARMA generalized processes and stochastic partial differential equations. Stoch. Process. Appl. 2020, 130, 5865–5887. [Google Scholar] [CrossRef]
  103. Bouzebda, S. Weak Convergence of the Conditional Single Index U-statistics for Locally Stationary Functional Time Series. AIMS Math. 2024, 9, 14807–14898. [Google Scholar] [CrossRef]
  104. Kolmogorov, A.N.; Tihomirov, V.M. ε-entropy and ε-capacity of sets in functional space. Amer. Math. Soc. Transl. 1961, 17, 277–364. [Google Scholar]
  105. Dudley, R.M. The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal. 1967, 1, 290–330. [Google Scholar] [CrossRef]
  106. Nolan, D.; Pollard, D. U-processes: Rates of convergence. Ann. Statist. 1987, 15, 780–799. [Google Scholar] [CrossRef]
  107. Dudley, R.M. Uniform Central LimitDudley2014ms, 2nd ed.; Cambridge Studies in Advanced Mathematics Series; Cambridge University Press: New York, NY, USA, 2014; Volume 142, p. xii+472. [Google Scholar]
  108. van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes: With Applications to Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 1996; p. xvi+508. [Google Scholar] [CrossRef]
  109. Kosorok, M.R. Introduction to Empirical Processes and Semiparametric Inference; Springer Series in Statistics; Springer: New York, NY, USA, 2008; p. xiv+483. [Google Scholar] [CrossRef]
  110. Deheuvels, P. One bootstrap suffices to generate sharp uniform bounds in functional estimation. Kybernetika 2011, 47, 855–865. [Google Scholar]
  111. Pollard, D. Convergence of Stochastic Processes; Springer Series in Statistics; Springer: New York, NY, USA, 1984; p. xiv+215. [Google Scholar] [CrossRef]
  112. Bouzebda, S. On the strong approximation of bootstrapped empirical copula processes with applications. Math. Methods Statist. 2012, 21, 153–188. [Google Scholar] [CrossRef]
  113. Bouzebda, S. General tests of conditional independence based on empirical processes indexed by functions. Jpn. J. Stat. Data Sci. 2023, 6, 115–177. [Google Scholar] [CrossRef]
  114. Einmahl, U.; Mason, D.M. Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 2005, 33, 1380–1403. [Google Scholar] [CrossRef]
  115. Hardy, G.H. On double Fourier series and especially those which represent the double zeta-function with real and incommensurable parameters. Quart. J. Math 1905, 37, 53–79. [Google Scholar]
  116. Krause, M. Über Mittelwertsätze im Gebiete der Doppelsummen und Doppelintegrale. Leipz. Ber. 1903, 55, 239–263. [Google Scholar]
  117. Vitali, G. Sui gruppi di punti e sulle funzioni di variabili reali. Torino Atti 1908, 43, 229–246. [Google Scholar]
  118. Clarkson, J.A.; Adams, C.R. On definitions of bounded variation for functions of two variables. Trans. Amer. Math. Soc. 1933, 35, 824–854. [Google Scholar] [CrossRef]
  119. Vituškin, A.G. O Mnogomernyh Variaciyah; Gosudarstv. Izdat. Tehn.-Teor. Lit.: Moscow, Russia, 1955; p. 220. [Google Scholar]
  120. Hobson, E.W. The Theory of Functions of a Real Variable and the Theory of Fourier’s Series; Dover Publications: New York, NY, USA, 1958; Volume II, p. x+780. [Google Scholar]
  121. Niederreiter, H. Random Number Generation and Quasi-Monte Carlo Methods; CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1992; Volume 63, p. vi+241. [Google Scholar] [CrossRef]
  122. Giné, E.; Koltchinskii, V.; Zinn, J. Weighted uniform consistency of kernel density estimators. Ann. Probab. 2004, 32, 2570–2605. [Google Scholar] [CrossRef]
  123. Bouzebda, S.; Taachouche, N. On the variable bandwidth kernel estimation of conditional U-statistics at optimal rates in sup-norm. Phys. A 2023, 625, 129000. [Google Scholar] [CrossRef]
  124. Han, F.; Qian, T. On inference validity of weighted U-statistics under data heterogeneity. Electron. J. Stat. 2018, 12, 2637–2708. [Google Scholar] [CrossRef]
  125. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 1948, 19, 293–325. [Google Scholar] [CrossRef]
  126. Mason, D.M. Proving consistency of non-standard kernel estimators. Stat. Inference Stoch. Process. 2012, 15, 151–176. [Google Scholar] [CrossRef]
  127. Bouzebda, S. On the weak convergence and the uniform-in-bandwidth consistency of the general conditional U-processes based on the copula representation: Multivariate setting. Hacet. J. Math. Stat. 2023, 52, 1303–1348. [Google Scholar] [CrossRef]
  128. Bellet, A.; Habrard, A.; Sebban, M. A Survey on Metric Learning for Feature Vectors and Structured Data. arXiv 2013, arXiv:1306.6709. [Google Scholar]
  129. Clémençon, S.; Colin, I.; Bellet, A. Scaling-up empirical risk minimization: Optimization of incomplete U-statistics. J. Mach. Learn. Res. 2016, 17, 76. [Google Scholar]
  130. Jin, R.; Wang, S.; Zhou, Y. Regularized Distance Metric Learning: Theory and Algorithm. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C., Culotta, A., Eds.; Curran Associates, Inc.: New York, NY, USA, 2009; Volume 22. [Google Scholar]
  131. Bellet, A.; Habrard, A. Robustness and generalization for metric learning. Neurocomputing 2015, 151, 259–267. [Google Scholar] [CrossRef]
  132. Cao, Q.; Guo, Z.C.; Ying, Y. Generalization bounds for metric and similarity learning. Mach. Learn. 2016, 102, 115–132. [Google Scholar] [CrossRef]
  133. Clémençon, S.; Lugosi, G.; Vayatis, N. Ranking and empirical minimization of U-statistics. Ann. Statist. 2008, 36, 844–874. [Google Scholar] [CrossRef]
  134. Rejchel, W. On ranking and generalization bounds. J. Mach. Learn. Res. 2012, 13, 1373–1392. [Google Scholar]
  135. Clémençon, S.; Robbiano, S.; Vayatis, N. Ranking data with ordinal labels: Optimality and pairwise aggregation. Mach. Learn. 2013, 91, 67–104. [Google Scholar] [CrossRef]
  136. Stute, W. Universally consistent conditional U-statistics. Ann. Statist. 1994, 22, 460–473. [Google Scholar] [CrossRef]
  137. Stute, W. Lp-convergence of conditional U-statistics. J. Multivar. Anal. 1994, 51, 71–82. [Google Scholar] [CrossRef]
  138. Kendall, M.G. A New Measure of Rank Correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
  139. Dudley, R.M. A course on empirical processes. In École d’été de probabilités de Saint-Flour, XII—1982; Lecture Notes in Mathematics; Springer: Berlin, Germany, 1984; Volume 1097, pp. 1–142. [Google Scholar] [CrossRef]
  140. Polonik, W.; Yao, Q. Set-indexed conditional empirical and quantile processes based on dependent data. J. Multivar. Anal. 2002, 80, 234–255. [Google Scholar] [CrossRef]
  141. Lehmann, E.L. A general concept of unbiasedness. Ann. Math. Stat. 1951, 22, 587–592. [Google Scholar] [CrossRef]
  142. Dwass, M. The large-sample power of rank order tests in the two-sample problem. Ann. Math. Statist. 1956, 27, 352–374. [Google Scholar] [CrossRef]
  143. Kohler, M.; Máthé, K.; Pintér, M. Prediction from randomly right censored data. J. Multivar. Anal. 2002, 80, 73–100. [Google Scholar] [CrossRef]
  144. Carbonez, A.; Györfi, L.; van der Meulen, E.C. Partitioning-estimates of a regression function under random censoring. Statist. Decis. 1995, 13, 21–37. [Google Scholar] [CrossRef]
  145. Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
  146. Maillot, B.; Viallon, V. Uniform limit laws of the logarithm for nonparametric estimators of the regression function in presence of censored data. Math. Methods Statist. 2009, 18, 159–184. [Google Scholar] [CrossRef]
  147. Bouzebda, S.; El-hadjali, T. Uniform convergence rate of the kernel regression estimator adaptive to intrinsic dimension in presence of censored data. J. Nonparametr. Stat. 2020, 32, 864–914. [Google Scholar] [CrossRef]
  148. Datta, S.; Bandyopadhyay, D.; Satten, G.A. Inverse probability of censoring weighted U-statistics for right-censored data with an application to testing hypotheses. Scand. J. Stat. 2010, 37, 680–700. [Google Scholar] [CrossRef]
  149. Stute, W.; Wang, J.L. Multi-sample U-statistics for censored data. Scand. J. Statist. 1993, 20, 369–374. [Google Scholar]
  150. Chen, Y.; Datta, S. Adjustments of multi-sample U-statistics to right censored data and confounding covariates. Comput. Statist. Data Anal. 2019, 135, 1–14. [Google Scholar] [CrossRef]
  151. Yuan, A.; Giurcanu, M.; Luta, G.; Tan, M.T. U-statistics with conditional kernels for incomplete data models. Ann. Inst. Statist. Math. 2017, 69, 271–302. [Google Scholar] [CrossRef]
  152. Földes, A.; Rejtὅ, L. A LIL type result for the product limit estimator. Z. Wahrsch. Verw. Geb. 1981, 56, 75–86. [Google Scholar] [CrossRef]
  153. Sudheesh, K.K.; Anjana, S.; Xie, M. U-statistics for left truncated and right censored data. Statistics 2023, 57, 900–917. [Google Scholar] [CrossRef]
  154. Tsai, W.Y.; Jewell, N.P.; Wang, M.C. A note on the product-limit estimator under right censoring and left truncation. Biometrika 1987, 74, 883–886. [Google Scholar] [CrossRef]
  155. Andersen, P.K.; Borgan, O.R.; Gill, R.D.; Keiding, N. Statistical Models Based on Counting Processes; Springer Series in Statistics; Springer: New York, NY, USA, 1993; p. xii+767. [Google Scholar] [CrossRef]
  156. Zhou, Y.; Yip, P.S.F. A strong representation of the product-limit estimator for left truncated and right censored data. J. Multivar. Anal. 1999, 69, 261–280. [Google Scholar] [CrossRef]
  157. Hall, P. Asymptotic properties of integrated square error and cross-validation for kernel estimation of a regression function. Z. Wahrsch. Verw. Geb. 1984, 67, 175–196. [Google Scholar] [CrossRef]
  158. Härdle, W.; Marron, J.S. Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist. 1985, 13, 1465–1481. [Google Scholar] [CrossRef]
  159. Rachdi, M.; Vieu, P. Nonparametric regression for functional data: Automatic smoothing parameter selection. J. Statist. Plann. Inference 2007, 137, 2784–2801. [Google Scholar] [CrossRef]
  160. Benhenni, K.; Ferraty, F.; Rachdi, M.; Vieu, P. Local smoothing regression with functional data. Comput. Statist. 2007, 22, 353–369. [Google Scholar] [CrossRef]
  161. Shang, H.L. Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density. J. Nonparametr. Stat. 2014, 26, 599–615. [Google Scholar] [CrossRef]
  162. Li, Q.; Maasoumi, E.; Racine, J.S. A nonparametric test for equality of distributions with mixed categorical and continuous data. J. Econom. 2009, 148, 186–200. [Google Scholar] [CrossRef]
  163. Horowitz, J.L.; Spokoiny, V.G. An adaptive, rate-optimal test of a parametric mean-regression model against a nonparametric alternative. Econometrica 2001, 69, 599–631. [Google Scholar] [CrossRef]
  164. Gao, J.; Gijbels, I. Bandwidth selection in nonparametric kernel testing. J. Amer. Statist. Assoc. 2008, 103, 1584–1594. [Google Scholar] [CrossRef]
  165. Yu, B. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 1994, 22, 94–116. [Google Scholar] [CrossRef]
  166. Didi, S.; Bouzebda, S. Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes. Mathematics 2022, 10, 4356. [Google Scholar] [CrossRef]
  167. Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
  168. Bouzebda, S.; Nezzal, A. Asymptotic properties of conditional U-statistics using delta sequences. Comm. Statist. Theory Methods 2024, 53, 4602–4657. [Google Scholar] [CrossRef]
  169. Cheng, M.Y.; Wu, H.T. Local linear regression on manifolds and its geometric interpretation. J. Amer. Statist. Assoc. 2013, 108, 1421–1434. [Google Scholar] [CrossRef]
  170. Bouzebda, S.; Cherfi, M. General bootstrap for dual ϕ-divergence estimates. J. Probab. Stat. 2012, 2012, 834107. [Google Scholar] [CrossRef]
  171. Bouzebda, S.; Limnios, N. On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J. Multivar. Anal. 2013, 116, 52–62. [Google Scholar] [CrossRef]
  172. Bernstein, S. Sur l’extension du théoréme limite du calcul des probabilités aux sommes de quantités dépendantes. Math. Ann. 1927, 97, 1–59. [Google Scholar] [CrossRef]
  173. Arcones, M.A.; Giné, E. Limit theorems for U-processes. Ann. Probab. 1993, 21, 1494–1542. [Google Scholar] [CrossRef]
  174. Giné, E.; Zinn, J. Some limit theorems for empirical processes. Ann. Probab. 1984, 12, 929–998, With discussion. [Google Scholar] [CrossRef]
  175. Soukarieh, I.; Bouzebda, S. Weak Convergence of the Conditional U-statistics for Locally Stationary Functional Time Series. Stat. Inference Stoch. Process. 2024, 27, 227–304. [Google Scholar] [CrossRef]
  176. Masry, E. Multivariate local polynomial regression for time series: Uniform strong consistency and rates. J. Time Ser. Anal. 1996, 17, 571–599. [Google Scholar] [CrossRef]
  177. de la Peña, V.H. Decoupling and Khintchine’s inequalities for U-statistics. Ann. Probab. 1992, 20, 1877–1892. [Google Scholar] [CrossRef]
  178. Lee, A.J. U-Statistics: Theory and Practice; Statistics: Textbooks and Monographs Series; Marcel Dekker Inc.: New York, NY, USA, 1990; Volume 110, p. xii+302. [Google Scholar]
  179. Blum, J.R.; Kiefer, J.; Rosenblatt, M. Distribution free tests of independence based on the sample distribution function. Ann. Math. Statist. 1961, 32, 485–498. [Google Scholar] [CrossRef]
  180. Bergsma, W.; Dassios, A. A consistent test of independence based on a sign covariance related to Kendall’s tau. Bernoulli 2014, 20, 1006–1028. [Google Scholar] [CrossRef]
  181. Borovkova, S.; Burton, R.; Dehling, H. Consistency of the Takens estimator for the correlation dimension. Ann. Appl. Probab. 1999, 9, 376–390. [Google Scholar] [CrossRef]
  182. Silverman, B.W. Distances on circles, toruses and spheres. J. Appl. Probab. 1978, 15, 136–143. [Google Scholar] [CrossRef]
  183. Hollander, M.; Proschan, F. Testing whether new is better than used. Ann. Math. Statist. 1972, 43, 1136–1146. [Google Scholar] [CrossRef]
  184. Gini, C. Measurement of Inequality of Incomes. Econ. J. 1921, 31, 124–126. [Google Scholar] [CrossRef]
  185. Chen, L.; Wan, A.T.K.; Zhang, S.; Zhou, Y. Distributed algorithms for U-statistics-based empirical risk minimization. J. Mach. Learn. Res. 2023, 24, 1–43. [Google Scholar]
  186. Fygenson, M.; Ritov, Y. Monotone estimating equations for censored data. Ann. Statist. 1994, 22, 732–746. [Google Scholar] [CrossRef]
  187. Brown, B.M.; Wang, Y.G. Induced smoothing for rank regression with censored survival times. Stat. Med. 2007, 26, 828–836. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bouzebda, S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics 2024, 12, 1996. https://doi.org/10.3390/math12131996

AMA Style

Bouzebda S. Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design. Mathematics. 2024; 12(13):1996. https://doi.org/10.3390/math12131996

Chicago/Turabian Style

Bouzebda, Salim. 2024. "Limit Theorems in the Nonparametric Conditional Single-Index U-Processes for Locally Stationary Functional Random Fields under Stochastic Sampling Design" Mathematics 12, no. 13: 1996. https://doi.org/10.3390/math12131996

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop