Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data

Bouzebda, Salim; Souddi, Youssouf; Madani, Fethi

doi:10.3390/math12030448

Open AccessArticle

Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data

by

Salim Bouzebda

^1,*

,

Youssouf Souddi

²

and

Fethi Madani

²

¹

Laboratoire de Mathématiques Appliquées de Compiègne (L.M.A.C.), Université de Technologie de Compiègne, 60200 Compiègne, France

²

Laboratory of Stochastic Models, Statistics and Applications, University of Saida-Dr. Moulay Tahar, P.O. Box 138 EN-NASR, Saïda 20000, Algeria

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(3), 448; https://doi.org/10.3390/math12030448

Submission received: 18 January 2024 / Revised: 25 January 2024 / Accepted: 29 January 2024 / Published: 30 January 2024

Download Versions Notes

Abstract

This work examines the asymptotic characteristics of a conditional set-indexed empirical process composed of functional ergodic random variables with missing at random (MAR). This paper’s findings enlarge the previous advancements in functional data analysis through the use of empirical process methodologies. These results are shown under specific structural hypotheses regarding entropy and under appealing situations regarding the model. The regression operator’s asymptotic

(1 - α)

-confidence interval is provided for

0 < α < 1

as an application. Additionally, we offer a classification example to demonstrate the practical importance of the methodology.

Keywords:

conditional distribution; small ball probability; missing at random; empirical process; ergodic functional data; semi-metric space; covering number

MSC:

62G20; 62G05; 62G32; 62G08; 62G35; 62G07; 62E20

1. Introduction

There are several strategies for solving problems in statistics, among which empirical process techniques are considered the best. Historically, many limit theorems for the empirical process have been established in finite dimension frameworks (see, e.g., Refs. [1,2,3] for exhaustive, self-contained texts with a variety of statistical applications) together under mixing conditions and independent identically distributed framework, in the setting of independents variables [4] characterized modulo measurability, the classes

C

of sets for which the Glivenko–Cantelli theorem holds, we may also cite Refs. [5,6,7,8,9,10,11,12,13,14,15]. Under various mixing conditions, empirical processes based on dependent data have been investigated; for instance, the authors of Ref. [16] established the asymptotic normality of sequences undergoing

p h i

-mixing. Regarding these areas of investigation concerning an alternative form of mixing, it is possible to refer to Refs. [17,18,19,20]. Nevertheless, the author of [21] identified a bracketing condition that could occur due to vigorous mixing. The function-indexed empirical procedure for beta-mixing sequences was investigated by Ref. [22]. Uniform convergence and asymptotic normality of a set-indexed conditional empirical process within a strictly stationary and strong mixing framework have been established by Ref. [23]. Over the past few decades, there has been a growing interest in the statistical literature regarding matters concerning functional random variables, which are variables with values that exist in an infinite-dimensional space. As is the case, for example, in meteorology, medicine, satellite imagery, and numerous other scientific disciplines, the proliferation of data collected on an ever-increasingly precise temporal and spatial grid has inspired the development of this research topic. Numerous complex theoretical and numerical inquiries were thus engendered by the statistical modeling of these data, which were perceived as stochastic functions. The monographs of Refs. [24,25] provide comprehensive surveys of functional data analysis, encompassing both theoretical and practical aspects. These monographs discuss linear models for random variables that take values in a Hilbert space, scalar-on-function and function-on-function linear models, parametric discriminant analysis, and functional principle component analysis, respectively. To access the most recent findings on FDA and related subjects, we may consult the bibliographic reviews provided by sources such as Refs. [26,27,28,29,30,31], among others. For scalar-on-function nonlinear regression models, the authors of [32] emphasized nonparametric techniques, particularly kernel-type estimation. Such tools were subsequently expanded to include discrimination and classification analysis. An intriguing statistical concept that was extended to the functional data framework was examined by Ref. [33]. These concepts included the portmanteau test, change detection, and goodness-of-fit tests. Good overviews of this literature can be found in Refs. [20,34,35,36,37,38,39,40,41], and, more recently, Ref. [42] gave the first results of the conditional set-indexed empirical process in functional data. Considerable effort has been devoted to developing a convergence theory for empirical processes involving functional random variables, although these topics are well beyond the purview of the paper discussed in Ref. [23]. A theoretical framework of this nature is imperative for contemporary statistical analysis. For over six decades, functional data analysis has been acknowledged in the statistical literature and has since become the focus of numerous works. We observe the extreme limitedness of the outcomes produced by empirical processes utilizing functional frameworks. We may refer for recent references to Refs, [43,44,45,46,47], who achieved numerous valuable outcomes regarding set-indexed conditional empirical processes inside the functional setting of the ergodic framework. One should avoid overlooking the possibility that some pairings of observations may be incomplete in numerous practical applications, including sampling surveys, pharmaceutical tracing tests, and reliability tests. Such instances are commonly referred to as “missing data”. Others in the fields of data science and analytics will attest to the fact that missing data is a common issue. MAR (Missing At Random) indicates that while there may be systematic differences between the missing and observed values, these discrepancies can be fully accounted for by other observed variables. The situation changes significantly when predictors are present; for instance, the authors of [48,49,50,51,52,53,54,55,56,57,58] provide some examples of this in finite dimensionality, as recent references to Refs. [59,60]. In a recent study, the authors of [61] examined the linear quantile regression model in the presence of missing response data that occur randomly. The study utilized the inverse probability weight method. The authors developed a mathematical equation for estimating unknown parameters using quantile regression. They also introduced a standard estimator for quantile regression. Simultaneously, they formulated the empirical likelihood (EL) ratio function for the unknown parameter and established a maximum EL estimator for the unknown parameter. There is a scarcity of work that examines the statistical characteristics of functional nonparametric models for missing data. The kernel estimator of the conditional quantile was introduced by Ref. [62] under the assumptions of ergodicity and random censorship. The author also demonstrated strong consistency (with rate) and defined the asymptotic distribution of the estimator. Additionally, they applied the estimator to forecast the peak electricity demand interval using smart meter data, details of which have been omitted. In their study, the authors of [63] developed a type of estimator for the regression operator in the context of functional stationary ergodic data with missing at random (MAR) responses. They also established the asymptotic properties of the estimator, including its convergence rate in probability and asymptotic normality. For further references, we suggest consulting Refs. [64,65].

Our findings extend upon a prior study [44] by establishing more precise limits under less stringent limitations. This offers a new perspective of the empirical processes theory for random variables with general dependencies. This work addresses a problem that has not been thoroughly examined thus far. The framework of ergodic functional data was introduced by Ref. [66], who established consistencies with rates along with the asymptotic normality of the regression function estimate and provided some examples. For recent papers on the subject, we refer to Ref. [43], where the authors extended Ref. [66] to a more general framework. Some motivations to consider ergodic dependence structure in the data rather than a mixing one are discussed in Refs. [67,68].

The objective of this study is to enhance the development of a practical methodology for addressing MAR samples in functional nonparametric situations. We want to examine the estimation of conditional set-indexed empirical processes in the presence of both missing at random (MAR) data and ergodicity.

The structure of this paper is outlined as follows. In Section 2, we introduce the notation and definitions, along with the conditional empirical process. Our main results are presented in Section 3. Section 3.1 is dedicated to discussing the procedure for selecting the bandwidth. In Section 4, we apply our main result to classification. Concluding remarks and potential future developments are discussed in Section 5. To maintain a smooth presentation flow, all proofs are consolidated in Appendix A.

2. The Set Indexed Conditional Empirical Process

To enhance clarity, let us delve into the definition of the ergodic property for processes. Consider a measurable space

(S, J)

, and denote by

S^{N}

the space of all functions

s : N \to S

. If

s_{j}

represents the value of the function s at

j \in N

, define

H_{j}

as the j-th coordinate map, i.e.,

H_{j} (s) = s_{j}

. Now, consider

H_{j}^{- 1} (J)

for

j \in N

; a random process

Z = Z_{j} : j \in N

can be viewed as a random variable defined on the probability space

(Ω, A, P)

, taking values in

(S^{N}, J^{N})

. For any

B \in F

, a set is termed invariant if there exists a set

A \in J^{N}

such that

B = (Z_{n}, Z_{n + 1}, \dots) \in A

holds for every

n \geq 1

. The process Z is then considered ergodic when, for any invariant set B, we have

P (B) = 0

or

P (Ω ∣ B) = 0

. As per the ergodic theorem, it is well-known that for a stationary ergodic process Z, the following convergence holds almost surely:

lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} Z_{i} = E (Z_{1}), almost surely .

(1)

Therefore, the ergodic property in our setting is formulated based on the statement (1). We consider a sample of random elements

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

, each drawn from the joint distribution of

(X, Y)

, where X takes values in a space

E

and Y in

R^{d}

. The functional space

E

is endowed with a semi-metric

d_{E} (\cdot, \cdot)

. Our goal is to investigate the relationships between X and Y by estimating functional operators associated with the conditional distribution of Y given X. One such operator is the regression operator for a measurable set C in a class of sets

C

:

μ (C ∣ x) = E (1_{{Y \in C}} ∣ X = x) .

To address this, we employ a Nadaraya–Watson-type conditional empirical distribution, as proposed by Refs. [42,44,69,70]. We introduce the term MAR (Missing mechanism with MAR) for the response variable. In an available incomplete sample of size from

(X, Y, δ)

, denoted as

(X_{i}, Y_{i}, δ_{i}), 1 \leq i \leq n

,

X_{i}

is fully observed,

δ_{i} = 1

if

Y_{i}

is observed, and

δ_{i} = 0

otherwise. The Bernoulli random variable

δ

satisfies:

P (δ = 1 ∣ X = x; Y = y) = P (δ = 1 ∣ X = x) = P (x),

where

P (x)

is a function operator, termed the conditional probability of observing the response given the predictor, often unknown. This mechanism implies that

δ

and Y are conditionally independent given X, akin to the finite-dimensionality case in Ref. [48].

The Nadaraya–Watson-type conditional empirical distribution function is given by:

\begin{matrix} μ_{n} (C, x) = \frac{\sum_{i = 1}^{n} δ_{i} 1_{{Y_{i} \in C}} K (\frac{d_{E} (x, X_{i})}{h_{n}})}{\sum_{i = 1}^{n} δ_{i} K (\frac{d_{E} (x, X_{i})}{h_{n}})}, \end{matrix}

(2)

where

K (\cdot)

is a real-valued kernel function from

[0, \infty)

into

[0, \infty)

,

h_{n}

is a smoothing parameter satisfying

h_{n} \to 0

as

n \to \infty

, C is a measurable set, and

x \in E

. When choosing

C = (- \infty, z]

, where

z \in R^{d}

, it reduces to the conditional empirical distribution function

F_{n} (z | x) = μ_{n} ((- \infty, z], x)

, as referenced in Refs. [71,72,73]. However, the corresponding class

C

is defined as

\{(- \infty, z], z \in R^{d}\}

. Regarding the semi-metric topology on

E

, we introduce the notation

B (x, t) = {x_{1} \in E : d_{E} (x_{1}, x) \leq t},

which denotes the ball in

E

with center x and radius t. This concept is commonly referred to as the small ball probability function in the literature, especially when t tends to zero. The significance of this notion is both theoretically and practically profound, as the concept of a ball is intricately connected with the semi-metric

d (\cdot, \cdot)

. The selection of this semi-metric becomes pivotal when dealing with data in infinite-dimensional spaces.

In many cases, the probability function for the small ball can be roughly represented as the multiplication of two independent functions with respect to variables x and h. This insight is illustrated in several examples found in Proposition 1 of [74]:

$ϕ (h_{n}) = C h_{n}^{υ}$ for some $υ > 0$ with $τ_{0} (s) = s^{υ}$ ;
$ϕ (h_{n}) = C h_{n}^{υ} exp (- C h_{n}^{- p})$ for some $υ > 0$ and $p > 0$ with $τ_{0} (s)$ is the Dirac’s function;
$ϕ (h_{n}) = C {|ln (h_{n})|}^{- 1}$ with $τ_{0} (s) =_{] 0, 1]} (s)$ the indicator function in $] 0, 1]$ .

Define the following

σ

-fields:

F_{i}

and

G_{i}

Let

F_{i} = σ ((X_{i}, Y_{i}, δ_{i}) : 0 \leq i \leq n),

G_{i} = σ ((X_{i}, Y_{i}, δ_{i}) : 0 \leq i \leq n),

where

F_{i}

be the

σ

-filed generated by

((X_{1}, Y_{1}, δ_{1}), \dots, (X_{i}, Y_{i}, δ_{i}))

and

G_{i}

that generated by

((X_{1}, Y_{1}, δ_{1}), \dots, (X_{i}, Y_{i}, δ_{i}), X_{i + 1})

. Let

B (x, u)

be a ball centered at

x \in E

with radius u. Let

D_{i} = d (x, X_{i})

so that

D_{i}

is a nonnegative real-valued random variable. Operating within the probability space

(Ω, A, P)

, consider

F_{x} (u) = P (D_{i} \leq u) = P (X_{i} \in B (x, u)),

and

F_{x}^{F_{i - 1}} = P (X i \in B (x, u) ∣ F_{i - 1})

to be the distribution function and the conditional distribution function, respectively, given the

σ

-field

F_{i - 1}

of

{(D_{i})}_{i \geq 1}

. Here,

B (x, u)

denotes the ball in the space

E

centered at x with radius u. Let

o_{a . s} (u)

represent a real random function

l (\cdot)

such that

l (u) / u

converges to zero almost surely as

u \to 0

. In a similar vein, define

O_{a . s} (u)

as a real random function

l (\cdot)

such that

l (u) / u

is almost surely bounded. In what follows, we implicitly assume the ergodicity of the sequence of random elements

(X_{i}, Y_{i}), i = 1, \dots, n

.

2.1. Assumptions and Notation

In this paper, the variable x is a constant element within the functional space

E

. We present the metric entropy with inclusion as a means to quantify the richness or complexity of the set class

C

. For any given

ε > 0

, the covering number is defined as:

\begin{matrix} N & (ε, C, & μ (\cdot ∣ x)) \\ = & inf {n \in N : \exists C_{1}, \dots, C_{n} \in C such that \forall C \in C \exists 1 \leq i, j \leq n \\ with C_{i} \subset C \subset C_{j} and μ (C_{j} ∖ C_{i} ∣ x) < ε} . \end{matrix}

The term

log (N (ε, C, μ (\cdot ∣ x)))

is referred to as the metric entropy with inclusion of

C

with respect to

μ (\cdot ∣ x)

. For numerous classes, estimates for these covering numbers are well-documented; refer, for instance, to Ref. [75]. Below, we frequently make the assumption that either

log N (ε, C, μ (\cdot ∣ x))

or

N (ε, C, μ (\cdot ∣ x))

exhibit behaviors reminiscent of powers of

ε^{- 1}

. We affirm that condition (

R_{γ}

) is satisfied when

log N (ε, C, μ (\cdot ∣ x)) \leq H_{γ} (ε), for all ε > 0,

(3)

where

H_{γ} (ε) = \{\begin{matrix} log (A ε) & if & γ = 0, \\ A ε^{- γ} & if & γ > 0, \end{matrix}

for some constants

A, r > 0

. As emphasized in Ref. [23], it is notable that the condition (3), where

γ = 0

, is fulfilled by intervals, rectangles, balls, ellipsoids, and by classes derived from these through finite set operations of union, intersection, and complement. The class of convex sets in

R^{d}

(

d \geq 2

) satisfies the condition (3) with

γ = (d - 1) / 2

. Various other sets that satisfy (3) with

γ > 0

are elaborated upon in Ref. [75]. We give now further notation. For

j \geq 1

, set

M_{j} = K^{j} (1) - \int_{0}^{1} {(K^{j})}^{'} (u) τ_{0} (u) d u .

In this section, we establish the weak convergence of the process

ν_{n} (C, x) : C \in C

as defined by

ν_{n} (C, x) : = \sqrt{n ϕ (h_{n})} (μ_{n} (C, x) - E μ_{n} (C, x)) .

(4)

In the course of our analysis, we will rely on the following assumptions.

(H1)

For every

x \in E

, there exists a sequence of nonnegative bounded random functionals

(f_{i, 1}) i \geq 1

, a sequence of random functions

(g_{i, x}) i \geq 1

, a deterministic nonnegative bounded functional

f 1

, and a nonnegative real function

ϕ

such that

ϕ (h_{n}) \to 0

as

h \to 0

, as

h \to 0

, such that

(i): $F_{x} (u) = ϕ (u) f_{1} (x) + o (ϕ (u)) a s u \to 0 .$
(ii): For any $i \in N, F_{x}^{F_{i - 1}} (u) = ϕ (u) f_{i, 1} (x) + g_{i, x} (u)$ with $g_{i, x} (u) = o_{a . s} (ϕ (u))$ as $u \to 0,$ $g_{i, x} (u) / ϕ (u)$ almost surely bounded and $n^{- 1} \sum_{i = 1}^{n} g_{i, x}^{j} (u) = o_{a . s} (ϕ^{j} (u)) a s n \to \infty, j = 1, 2 .$
(iii): $n^{- 1} \sum_{i = 1}^{n} f_{i, 1}^{j} (x) \to f_{1}^{j} (x)$ almost surely as $n \to \infty$ , for $j = 1, 2 .$
(iv): There exists a nondecreasing bounded function $τ_{0} (u)$ that uniformly holds for all $u \in (0, 1)$ .

$τ_{0} (u) + o (1) = \frac{ϕ (r u)}{ϕ (r)},$

as $r ↓ 0$ and $1 \leq j \leq 2 + δ w i t h δ > 0$ , $\int_{0}^{1} {(K^{j} (u))}^{'} τ_{0} (u) d u < \infty$ .

(H2)

There exist positive constants

β > 0

and

η_{1} > 0

such that for all

x_{1}, x_{2} \in N_{x}

, a neighborhood of x, the following holds

| μ (C ∣ x_{1}) - μ (C ∣ x_{2}) | \leq η_{1} d_{E}^{β} (x_{1}, x_{2}) .

(H3)

(i): The conditional mean of $1 Y_{i} \in C$ given the $σ$ -field $G_{i - 1}$ depends solely on $X_{i}$ , meaning that for any $i \geq 1$ , $E (1 Y_{1} \in C ∣ G_{i - 1}) = μ (X_{i})$ almost surely. The conditional mean of $1 Y_{i} \in C$ given the $σ$ -field $G_{i - 1}$ also depends only on $X_{i}$ , i.e., for any $i \geq 1$ ,

$E ({(1_{{Y_{1} \in C}} - μ (X_{i}))}^{2} ∣ G_{i - 1}) = W (X_{i}),$

almost surely.
(ii): Furthermore, the functions $W (\cdot)$ and $P (\cdot)$ are continuous in a neighborhood of x, namely,

$sup_{{u : d (x, u) \leq h}} |W (u) - W (x)| = o (1) a s h \to 0,$

$sup_{{u : d (x, u) \leq h}} |P (u) - P (x)| = o (1) a s h \to 0 .$
(iii): $\exists δ > 0$ such that we let

${\bar{W}}_{2 + δ} (u) = E (| (1_{{Y_{1} \in C}} - μ (x)) |^{2 + δ} ∣ X_{1} = u)$

be continuous in a neighborhood of x for $u \in E .$

(H4)

For any

(y_{1}, y_{2}) \in R^{2 d}

and positive constants

b_{3} > 0

and

η_{4} > 0

, the following holds for the conditional density

f (\cdot)

of Y given

X = x

:

|f (y_{1}) - f (y_{2})| \leq η_{4} {∥y_{1} - y_{2}∥}^{b_{3}} .

(H5)

The kernel function

K (\cdot)

has support within the interval

(0, 1)

and possesses a continuous first derivative on

(0, 1)

. It satisfies the condition

K^{'} (t) < 0

for all

t \in (0, 1)

. Moreover,

|\int_{0}^{1} {(K^{j})}^{'} (u) d u| < \infty, f o r j = 1, 2 .

(H6)

Suppose that the set class

C

adheres to condition (3);

(H7)

The smoothing parameter (

h_{n}

) fulfills the following criterion:

h_{n} \to 0

and

n ϕ (h_{n}) \to \infty

as

n \to \infty

.

2.2. Comments on the Assumptions

The significance of condition (H1) extends to both the ergodic and functional aspects addressed in this paper. The condition utilized here shares similarities with that employed in Ref. [66]. The functions

f_{i, 1} (\cdot)

and

f_{1} (\cdot)

play roles analogous to the conditional and unconditional densities in the finite-dimensional scenario. In the meantime,

ϕ (u)

describes the influence of the radius u on the small ball probability as u tends to zero, as illustrated in Ref. [66]. Conditions (H2)(i) are standard in nonparametric regression estimation. (H3)(i) is essential for establishing consistency, reflecting the Markovian nature of the functionally stationary ergodic data. This condition aligns with that used in Ref. [63]. (H3)(ii,iii) serve as continuous local conditions, necessary for the main results and for conciseness in this paper. Condition (H4) on the density

f (\cdot)

conforms to a classical Lipschitz-type nonparametric functional model. Assumption (H5) relates to the choice of the kernel

K (\cdot)

, a common practice in nonparametric functional estimation. It is worth noting that the Parzen symmetric kernel is unsuitable in this context due to the positivity of the random process

d (x, X)

. Hence, we consider

K (\cdot)

with support

[0, 1]

, a natural generalization of the assumption usually made in the multivariate case, where

K (\cdot)

is expected to be a spherically symmetric density function. The conditions

K (1) > 0

and

K^{'} (\cdot) < 0

ensure that

M_{1} > 0

for all limit functions

τ_{0}

. The condition

K (1) > 0

is necessary for defining the moments

M_{2}

, which, in this case, are determined by the value

K (1)

. (H7) provides a condition on the bandwidths, acknowledging that consistency cannot be guaranteed without it.

3. Main Results

Below, we note

Z \overset{D}{=} N (μ, σ^{2})

when the random variable Z is distributed according to a normal distribution with mean

μ

and variance

σ^{2}

. The symbol

\overset{D}{\to}

represents convergence in distribution, while

\overset{P}{\to}

indicates convergence in probability.

Theorem 1

(Uniform Consistency). Assume that the conditions (H1)–(H7) are satisfied. Consider a class of measurable sets

C

for which

N (ε, C, μ (\cdot ∣ x)) < \infty,

for any

ε > 0

. Moreover, assume that for every

C \in C

| μ (C, y) f (y) - μ (C, x) f (x) | ⟶ 0, a s y \to x .

If

n ϕ (h_{n}) \to \infty

and

h_{n} \to 0

as

n \to \infty

, then

sup_{C \in C} |μ_{n} (C, x) - E (μ_{n} (C, x))| \overset{P}{⟶} 0 .

Note that the proof of Theorem 1 follows directly from the decomposition

\begin{matrix} μ_{n} (C, x) - E (μ_{n} (C, x)) & = & \frac{1}{E (\hat{f_{n}} (x))} [\hat{φ_{n}} (C, x) - E (\hat{φ_{n}} (C, x))] \\ - \frac{μ_{n} (C, x)}{E (\hat{f_{n}} (x))} [\hat{f_{n}} (x) - E (\hat{f_{n}} (x))], \\ = \frac{Q_{n} (x)}{E (\hat{f_{n}} (x))}, \end{matrix}

where

Q_{n} (x) = [\hat{φ_{n}} (C, x) - E (\hat{φ_{n}} (C, x))] - μ_{n} (C, x) [\hat{f_{n}} (x) - E (\hat{f_{n}} (x))] .

and

\begin{matrix} \hat{φ_{n}} (C, x) & = & \frac{1}{n ϕ (h_{n})} \sum_{i = 1}^{n} δ_{i} 1_{{Y_{i} \in C}} K (\frac{d_{E} (x, X_{i})}{h_{n}}), \\ \hat{f_{n}} (x) & = & \frac{1}{n ϕ (h_{n})} \sum_{i = 1}^{n} δ_{i} K (\frac{d_{E} (x, X_{i})}{h_{n}}) . \end{matrix}

Let

Δ_{i} (x) = K (\frac{d_{E} (x, X_{i})}{h_{n}})

. We have

\begin{matrix} \hat{φ_{n}} (C, x) & = & \frac{1}{n ϕ (h_{n})} \sum_{i = 1}^{n} 1_{{Y_{i} \in C}} δ_{i} Δ_{i} (x), \\ \hat{f_{n}} (x) & = & \frac{1}{n ϕ (h_{n})} \sum_{i = 1}^{n} δ_{i} Δ_{i} (x) . \end{matrix}

Henceforth, for

x \in E

, let us denote

E (\hat{φ_{n}} (C, x)) = \frac{1}{n E (Δ_{1} (x))} \sum_{i = 1}^{n} E (δ_{i} 1_{{Y_{i} \in C}} Δ_{i} (x) ∣ F_{i - 1}),

and

E (\hat{f_{n}} (x)) = \frac{1}{n E (Δ_{1} (x))} \sum_{i = 1}^{n} E (δ_{i} Δ_{i} (x) ∣ F_{i - 1}),

here,

E (X ∣ F)

represents the conditional expectation of the random variable X given the

σ

-field

F

.

To establish asymptotic normality, define the “bias” term as

\begin{matrix} B_{n} (x) & = & \frac{E (\hat{f_{n}} (x)) - μ_{n} (C, x) E (\hat{φ_{n}} (C, x))}{E (\hat{φ_{n}} (C, x))} \end{matrix}

The subsequent result presents the weak convergence. It is important to note that

f_{1} (x)

is specified in (H1).

Theorem 2

(Asymptotic normality). Assuming (H1)–(H7), as

n \to \infty

, for

m \geq 1

and

C_{1}, \dots, C_{m} \in C

, we have

{ν_{n} {(C_{i}, x)}_{i = 1, \dots, m}} \overset{D}{⟶} N (0, Σ),

where

Σ = σ_{i j} (x), i, j = 1, \dots, m

and

σ_{i j} (x) = \frac{M_{2}}{P (x) M_{1}^{2} f_{1} (x)} (E (1_{{Y \in C_{i} \cap C_{j}}} ∣ X = x) - E (1_{{Y \in C_{i}}} ∣ X = x) E (1_{{Y \in C_{j}}} ∣ X = x)),

whenever

f_{1} (x) > 0

and

M_{1} = K (1) - \int_{0}^{1} K^{'} (u) τ_{0} (u) d (u), M_{2} = K^{2} (1) - \int_{0}^{1} {(K^{2})}^{'} (u) τ_{0} (u) d u .

To obtain the density of the process, it is essential to introduce the following function, which provides insights into the asymptotic behavior of the modulus of continuity:

Λ_{γ} (σ^{2}, n) = \{\begin{matrix} \sqrt{σ^{2} log \frac{1}{σ^{2}}}, & if & γ = 0; \\ max ({(σ^{2})}^{(1 - γ) / 2}, n ϕ {(h_{n})}^{(3 γ - 1) / (2 (3 γ + 1))}), & if & γ > 0 . \end{matrix}

Theorem 3.

Assume that (H1)–(H7) are satisfied. For every

σ^{2} > 0

, consider

C_{σ} \subset C

as a class of measurable sets with

\sum_{t = 1}^{n} sup_{C \in C_{σ}} μ (C, x) \leq σ^{2} \leq 1,

and suppose that

C

fulfils (3) with

γ \geq 0

. Additionally, we assume that

ϕ (h_{n}) \to 0

and

n ϕ (h_{n}) \to + \infty

as

n \to + \infty

, such that

n ϕ (h_{n}) \leq {(Λ_{γ} (σ^{2}, n))}^{2},

and as

n \to + \infty,

we have

\frac{n ϕ {(σ^{2} log (\frac{1}{σ^{2}}))}^{1 + γ}}{log (n)} \to \infty .

Furthermore, we assume that

σ^{2} \geq h^{2}

. For

γ > 0

and

d = 1, 2

, the latter has to be replaced by

σ^{2} \geq ϕ (h_{n}) log (\frac{1}{ϕ (h_{n})})

. Under the conditions of Theorem 2, the process converges in law to a Gaussian process

\{ν (C, x) : C \in C\}

, which possesses a version with uniformly bounded and uniformly continuous paths with respect to the

{∥ \cdot ∥}_{2} -

norm. The covariance is given by

σ_{i j} (x)

as specified in Theorem 2.

Remark 1.

The distance of two measures

μ_{1}

,

μ_{2}

in the Prokhorov metric is defined as (see, e.g., Refs. [76,77,78,79])

ρ_{P} (μ_{1}, μ_{2}) : = inf \{ε > 0 ∣ μ_{1} (B) \leq μ_{2} (B^{ε}) + ε, \forall Borel sets B \subset Ω\}

Here

B^{ε} = {x ∣ d (x, B) < ε}

, where

d (x, B)

is the distance of x to B, i.e.,

d (x, B) = {inf}_{z \in B} ∥ x - z ∥

. The distance of two random variables

ξ_{1}

,

ξ_{2}

in the Ky Fan metric is defined as [80]

ρ_{K} (ξ_{1}, ξ_{2}) : = inf \{ε > 0 ∣ μ \{ω \in Ω ∣ d (ξ_{1} (ω), ξ_{2} (ω)) > ε\} < ε\} .

It is worthwhile to establish an adequate link of our findings to these distances in the conditional setting.

Remark 2.

Central limit theorems are frequently utilized to establish confidence intervals for the target being estimated. In the realm of non-parametric estimation, the asymptotic variance

Σ (x) : = σ_{i, j} (x)

in the central limit depends on certain unknown functions. Consequently, in practical scenarios, only approximate confidence intervals can be derived, even when

Σ (x)

is functionally specified. Notably, according to Theorem 2, the limiting variance incorporates the unknown function

f_{1} (\cdot)

and the normalization is contingent on the function

ϕ (\cdot)

, which is not explicitly identifiable in practice. Furthermore, the quantities

W (\cdot)

and

τ_{0}

need to be estimated. The corollary below, a slight modification of Theorem 2, permits a practical form of the results to be used, as typically the conditional variance

W (x)

is estimated similarly to what is obtained by Ref. [63].

Let

\begin{matrix} W_{n} & = & \frac{\sum_{i = 1}^{n} {(δ_{i} 1_{{Y_{i} \in C}} - μ_{n} (x))}^{2} K (\frac{d_{E} (x, X_{i})}{h})}{\sum_{i = 1}^{n} δ_{i} K (\frac{d_{E} (x, X_{i})}{h})} \\ = & \frac{\sum_{i = 1}^{n} {(δ_{i} 1_{{Y_{i} \in C}} - μ_{n} (x))}^{2} K (\frac{d_{E} (x, X_{i})}{h})}{\sum_{i = 1}^{n} δ_{i} K (\frac{d_{E} (x, X_{i})}{h})} - {(μ_{n} (x))}^{2} \\ = & \hat{g_{n}} (x) - {(μ_{n} (x))}^{2} . \end{matrix}

Let us introduce the following estimation

F_{x, n} (t) = \frac{1}{n} \sum_{i = 1}^{n} 1_{\{d (x, X_{i}) \leq t\}} .

By employing the decomposition of

τ_{0} (\cdot)

in (H1)(i) and (H1)(i,iv), one can estimate

τ_{0} (\cdot)

as

τ_{n} (t) = \frac{F_{x, n} (t h)}{F_{x, n} (h)} .

Subsequently, for a given kernel

K (\cdot)

and the quantities

M_{1}

and

M_{2}

can be estimated as follows

M_{1, n} = K (1) - \int_{0}^{1} K^{'} (s) τ_{n} (s) d s, M_{2, n} = K^{2} (1) - \int_{0}^{1} {(K^{2})}^{'} (s) τ_{n} (s) d s .

Finally, the estimator of

P (x)

is denoted by

P_{n} (x) = \frac{\sum_{i = 1}^{n} δ_{i} K (\frac{d_{E} (x, X_{i})}{h_{n}})}{\sum_{i = 1}^{n} K (\frac{d_{E} (x, X_{i})}{h_{n}})} .

Corollary 1.

Suppose that conditions (H1)–(H7) are satisfied, where

K^{'}

and

{(K^{2})}^{'}

are integrable functions. Additionally, assume that

n F x (h) ⟶ \infty

and

h^{β} {(n F x (h))}^{1 / 2} ⟶ 0

as

n \to \infty

. Then, for any

x \in E

such that

f_{1} (x) > 0

, we have

\frac{M_{1, n}}{\sqrt{M_{2, n}}} \sqrt{\frac{n F_{x, n} (h_{n}) P_{n} (x)}{W_{n} (x)}} (μ_{n} (C, x) - μ (C, x)) \overset{D}{⟶} N (0, 1) .

Using Corollary (1) the asymptotic

100 (1 - α)

confidence band given by

[μ_{n} (C, x) - c_{α} \frac{M_{1, n}}{\sqrt{M_{2, n}}} \sqrt{\frac{W_{n} (x)}{n F_{x, n} (h) P_{n} (x)}}, μ_{n} (C, x) + c_{α} \frac{M_{1, n}}{\sqrt{M_{2, n}}} \sqrt{\frac{W_{n} (x)}{n F_{x, n} (h) P_{n} (x)}}] .

where

c_{α}

is the upper

\frac{α}{2}

quantile of the Normal distribution

N (0, 1)

3.1. The Bandwidth Selection Criterion

Several approaches have been devised and refined to formulate asymptotically optimal bandwidth selection rules for nonparametric kernel estimators, particularly for the Nadaraya–Watson regression estimator. Some noteworthy contributions include [81,82,83,84,85,86,87]. Choosing this parameter appropriately is essential, whether in the conventional finite-dimensional case or within the infinite-dimensional framework, to guarantee favorable practical performance. Let us define the leave-out-

(X_{j}, Y_{j}, δ_{j})

estimator for the regression function

\begin{matrix} μ_{n, j} (C, x) = \frac{\sum_{i = 1, i \neq j}^{n} δ_{i} 1_{{Y_{i} \in C}} K (\frac{d_{E} (x, X_{i})}{h_{n}})}{\sum_{i = 1}^{n} δ_{i} K (\frac{d_{E} (x, X_{i})}{h_{n}})} . \end{matrix}

(5)

To minimize the quadratic loss function, we introduce the following criterion, where we have a (known) nonnegative weight function

W (\cdot) :

C V (C, h) : = \frac{1}{n} \sum_{j = 1}^{n} {(δ_{j} 1_{{Y_{j} \in C}} - μ_{n, j} (C, X_{j}))}^{2} W (X_{j}) .

(6)

Building upon the concepts developed by Ref. [83], a natural approach for selecting the bandwidth is to minimize the preceding criterion. Thus, let us choose

{\hat{h}}_{n}

, as the minimizer over h:

sup_{C \in C} C V (C, h) .

One can replace (6) by

C V (C, h_{n}) : = \frac{1}{n} \sum_{j = 1}^{n} {(δ_{j} 1_{{Y_{j} \in C}} - μ_{n, j} (C, X_{j}))}^{2} \hat{W} (X_{j}, x) .

(7)

In practice, one takes, for

j = 1, \dots, n

, the uniform global weights

W (X_{j}) = 1

, and the local weights

\hat{W} (X_{j}, x) = \{\begin{matrix} 1 & i f & d (X_{j}, x) \leq h_{n}, \\ 0 & otherwise . \end{matrix}

For brevity, we have concentrated on the most popular method, namely, the cross-validated selected bandwidth. This approach can be extended to any other bandwidth selector, such as the bandwidth based on Bayesian ideas [88].

4. Applications to Classification with Partially Labeled Data

In this section, we apply the results developed in the previous sections to the problem of statistical classification. We consider a sample of random elements

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

drawn from the joint distribution of

(X, Y)

, where X takes values in a space

E

and Y in

R^{d}

. In classification, the objective is to predict the integer-valued label Y based on the covariate vector X. More formally, we aim to find a function (classifier)

θ : E ⟶ R^{d}

for which the probability of misclassification error (incorrect prediction), i.e.,

P (θ (X) \neq Y)

, is minimized. Let

P_{k} (x) = P (Y = k ∣ X = x), x \in E, 1 \leq k \leq n .

Demonstrating that the optimal classifier, i.e., the one with the minimum probability of error, is given by

θ_{B} (x) = arg max_{1 \leq k \leq n} P_{k} (x),

i.e., the best classifier

θ_{B}

satisfies

max_{1 \leq k \leq n} P_{k} (x) = P_{θ_{B} (x)} (x) .

As

θ_{B}

is unknown, the data is utilized to construct estimates of

θ_{B}

. Specifically, let

D_{n} = (X 1, Y_{1}), \dots, (X_{n}, Y_{n})

represent a random sample from the distribution of

(X, Y)

, where each

(X_{i}, Y_{i})

is fully observable. Let

{\hat{θ}}_{n}

be any sample-based classifier. In other words,

\hat{θ} n (X)

is the predicted value of Y, based on

D_{n}

and X. Let

L_{n} ({\hat{θ}}_{n}) = P ({\hat{θ}}_{n} (X) \neq Y ∣ D_{n}),

be the conditional probability of error of the sample-based classifier

{\hat{θ}}_{n}

. Then

{\hat{θ}}_{n}

is said to be consistent if

L_{n} ({\hat{θ}}_{n}) ⟶ L_{n} (θ_{n}) = P (θ_{B} (X) \neq Y)

as

n \to \infty

, for

k = 1, \dots, n

. Let

{\hat{P}}_{k} (x)

be any sample-based estimators of

P_{k} (x) = P (Y = k ∣ X = x)

and define the classification rule

{\hat{θ}}_{n}

by

{\hat{θ}}_{n} (x) = arg max_{1 \leq k \leq n} {\hat{P}}_{k} (x) .

In other words,

{\hat{θ}}_{n}

satisfies

max_{1 \leq k \leq n} {\hat{P}}_{k} (x) = {\hat{P}}_{{\hat{θ}}_{n} (x)} (x),

to show

L_{n} ({\hat{θ}}_{n}) - L_{n} (θ_{B}) ⟶ 0

it is sufficient to show that

{\hat{P}}_{k} (x) - P_{k} (x) ⟶ 0

by posing

δ_{i} = {\hat{P}}_{k} (x)

, we have

\begin{matrix} μ_{n} (C, x) = \frac{\sum_{i = 1}^{n} {\hat{P}}_{k} (x) 1_{{Y_{i} \in C}} K (\frac{d_{E} (x, X_{i})}{h_{n}})}{\sum_{i = 1}^{n} {\hat{P}}_{k} (x) K (\frac{d_{E} (x, X_{i})}{h_{n}})} . \end{matrix}

(8)

Theorem 4.

Under the conditions of Theorem 3, we have the convergence

L_{n} ({\hat{θ}}_{n}) - L_{n} (θ_{B}) ⟶ 0 .

5. Concluding Remarks

In this investigation, we have examined the asymptotic properties of the conditional set-indexed empirical process involving ergodic functional data that are missing at random (MAR). Our findings are obtained under assumptions pertaining to the richness of the index class

C

of sets in terms of metric entropy with bracketing. Our contribution is two-fold: first, we have developed a functional methodology for addressing MAR samples in non-parametric problems, and second, we have extended our non-parametric conditional methodology by incorporating the ergodicity concepts introduced in Ref. [44]. Several challenging open questions remain in this context, including potential extensions to other types of non-parametric predictors such as functional local linear predictors, functional kNN predictors, and others. Furthermore, exploring extensions to problems beyond prediction, such as the estimation of variance error, is an interesting avenue for future research. Another direction for future exploration is the consideration of reducing the predictor’s dimensionality by employing a Single Functional Index Model (SFIM) to estimate the regression, as discussed in Refs. [89,90]. SFIM has shown its effectiveness in improving the consistency of the regression operator estimator.

Author Contributions

Conceptualization, S.B.; methodology, S.B.; validation, S.B., Y.S. and F.M.; formal analysis, S.B. and Y.S.; investigation, S.B. and Y.S.; original draft preparation, S.B. and Y.S.; writing—review and editing, S.B. and Y.S.; supervision, S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors would like to thank the Editor-in-Chief, an Associate-Editor, and the three referees for their extremely helpful remarks, which resulted in a substantial improvement of the original form of the work and a presentation that was more sharply focused.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The proofs of our results are presented in this section. The notation introduced earlier is also utilized in the subsequent sections.

Lemma A1.

Assume that conditions (H1(i))–(H1(ii))–(H1(iv))–(H5) hold true for any real numbers

1 \leq j \leq 2 + δ

and

1 \leq k \leq 2 + δ

with

δ > 0

. As

n \to \infty

, we have:

(i): $\frac{1}{ϕ (h)} E (Δ_{i}^{j} (x) ∣ F_{i - 1}) = M_{j} f_{i, 1} (x) + O_{a . s} (\frac{g_{i, x} (h)}{ϕ (h)})$ ;
(ii): $\frac{1}{ϕ (h)} E (Δ_{1}^{j} (x)) = M_{j} f_{1} (x) + o (1)$ ;
(iii): $\frac{1}{ϕ^{k} (h)} {(E (Δ_{1} (x)))}^{k} = M_{1}^{k} f_{1}^{k} (x) + o (1)$ .

Proof of Lemma A1.

For the proof of Lemma A1, the reader is directed to Ref. [66]. □

Lemma A2.

Assume that the hypotheses (H1) and (H5), along with condition (H7), are satisfied. As

n \to \infty

, for every fixed neighborhood

N_{E}

of x in the functional space

E

, we have:

\sum_{t = 1}^{n} lim_{n \to \infty} sup_{x \in N_{E}} \hat{f_{n}} (x) = \sum_{t = 1}^{n} lim_{n \to \infty} E (\hat{f_{n}} (x)) = P (x) .

Proof of Lemma A2.

We shall prove that

\hat{f_{n}} (x) \overset{P}{⟶} P (x) .

(A1)

We employ the identical proof as presented in Ref. [63]. See that.

\hat{f_{n}} (x) = R_{1, n} (x) + R_{2, n} (x),

where

\begin{matrix} R_{1, n} (x) & = & \frac{1}{n E (Δ_{1} (x))} \sum_{t = 1}^{n} (δ_{i} Δ_{i} (x) - E (δ_{i} Δ_{i} (x) ∣ F_{i - 1})), \\ R_{2, n} (x) & = & \frac{1}{n E (Δ_{1} (x))} E [δ_{i} Δ_{i} (x) ∣ F_{i - 1}] . \end{matrix}

First, we need to establish under the assumption (H1)(i–iii) and (H3)(i) and for

n \to \infty

as

n ϕ (h) \to \infty

, we have

\begin{matrix} R_{2, n} (x) \overset{P}{⟶} P (x), \end{matrix}

as

n \to \infty .

Using the properties of conditional expectation and the missing at random (MAR) mechanism, and combining assumptions (H1)(ii,iii) and (H3)(i) with the continuity property of

P (x)

along with Lemma A1, we derive:

\begin{matrix} R_{2, n} (x) & = & \frac{1}{n E (Δ_{1} (x))} \sum_{t = 1}^{n} (E \{E [δ_{i} Δ_{i} (x) ∣ F_{i - 1}] ∣ G_{i - 1}\} \\ = & \frac{1}{n E (Δ_{1} (x))} \sum_{t = 1}^{n} (E [P (x) + o (1) δ_{i} Δ_{i} (x) ∣ F_{i - 1}] \\ = & (P (x) + o (1)) \frac{1}{n E (Δ_{1} (x))} \sum_{t = 1}^{n} (ϕ (h) M_{1} f_{i, 1} (x) + O (g_{i x} (h))) \\ = & (P (x) + o (1)) \frac{ϕ (h)}{E (Δ_{1} (x))} (\frac{1}{n} \sum_{t = 1}^{n} M_{1} f_{i, 1} (x) + \frac{1}{n} \sum_{t = 1}^{n} O_{a s} (\frac{g_{i, x} (h)}{ϕ (h)})) \\ = & (P (x) + o (1)) \frac{1}{M_{1} f_{1} (x) + o (1)} (M_{1} (f_{1} (x) + o (1)) + o_{a . s} (1)) \\ \to P (x) . \end{matrix}

Second, we will prove that as

n \to \infty,

\begin{matrix} R_{1, n} (x) \overset{P}{⟶} 0 . \end{matrix}

On the one hand, we define

η_{n, i} = δ_{i} Δ (x) - E (δ_{i} Δ (x))

for

i = 1, \dots, n

. Thus,

η_{n, i}, 1 \leq i \leq n

forms a triangular array of martingale differences with respect to the

σ

-field

F_{i - 1}

and

R_{1, n} (x) = \frac{1}{n E (Δ_{1} (x))} \sum_{t = 1}^{n} η_{n, i} (x) .

By combining Burkholder’s inequality [91] and Jensen’s inequality, we establish that for any

ϵ > 0

, there exists a constant

C_{0}

such that:

\begin{matrix} P (| R_{1, n} (x) | > ϵ) & = & P (| \sum_{t = 1}^{n} η_{n, i} (x) | > ϵ n E (Δ_{1} (x))) \\ \leq & C_{0} \frac{E (η_{n, i}^{2} (x))}{ϵ^{2} n {(E (Δ_{1} (x)))}^{2}} < C_{0} \frac{E (δ_{1} Δ_{1}^{2} (x))}{ϵ^{2} n (E (Δ_{1}^{2} (x)))} \to 0, \end{matrix}

as

n \to \infty,

where we use the results from lemma (A1). Since

n ϕ (h) \to \infty

as

n \to \infty

we then conclude that

R_{1, n} (x) = o_{P} (1) .

Thus, the proof is complete. □

We will utilize arguments akin to those employed in the work of Ref. [63] to establish the asymptotic normality of the process

Q_{n} (x)

defined as:

Q_{n} (x) = [\hat{φ_{n}} (C, x) - E (\hat{φ_{n}} (C, x))] - μ_{n} (C, x) [\hat{f_{n}} (x) - E (\hat{f_{n}} (x))] .

Lemma A3.

Assuming that the hypotheses (H1)–(H7) are fulfilled, we can state that for any

x \in E

such that

f_{1} (x) > 0

, we have:

\sqrt{n ϕ (h_{n})} Q_{n} (x) \overset{D}{⟶} N (0, σ_{0}^{2} (x)), a s n \to \infty .

(A2)

where

σ_{0}^{2} (x) = \frac{M_{2} W (x) P (x)}{M_{1}^{2} f_{1} (x)},

whenever

f_{1} (x) > 0

Proof of Lemma A3.

Let us introduce some notation. We put

η_{n i} = {(\frac{ϕ (h)}{n})}^{1 / 2} δ_{i} (1_{{Y_{i} \in C}} - μ (x)) \frac{Δ_{i} (x)}{E (Δ_{1} (x))},

(A3)

and define

ξ_{n i} = η_{n i} - E (η_{n i} ∣ F_{i - 1})

. It is easily seen that

{(n ϕ (h))}^{1 / 2} Q_{n} (x) = \sum_{t = 1}^{n} ξ_{n i} .

(A4)

Here, for any fixed

x \in E

, the terms in (A4) form a triangular array of stationary martingale differences with respect to the

σ

-field

F_{i - 1}

. This allows us to apply the central limit theorem for discrete-time arrays of real-valued martingales (refer to Ref. [92], page 23) to establish the asymptotic normality of

Q n (x)

. This can be accomplished by verifying the following statements:

(a): $\sum_{t = 1}^{n} E (ξ_{n i}^{2} ∣ F_{i - 1}) ⟶ σ_{0}^{2} (x),$
(b): $n E (ξ_{n i}^{2} 1_{| η_{n i} | > ϵ}) = o (1),$

holds for any $ϵ > 0$ (Lindeberg condition).

Proof of Part (a).

Observe first that

|\sum_{t = 1}^{n} E (η_{n i}^{2} ∣ F_{i - 1}) - \sum_{t = 1}^{n} E (ξ_{n i}^{2} ∣ F_{i - 1})| \leq \sum_{t = 1}^{n} {(E (η_{n i} ∣ F_{i - 1}))}^{2} .

Making use of the condition (H2) and Lemma A1, one has

\begin{array}{l} E & (η_{n i} ∣ F_{i - 1}) \\ = \frac{1}{E (Δ_{i})} {(\frac{ϕ (h)}{n})}^{1 / 2} | E ((μ (X_{i}) - μ (x)) Δ_{i} (x) P (X_{i}) ∣ F_{i - 1}) | \\ \leq \frac{1}{E (Δ_{i})} {(\frac{ϕ (h)}{n})}^{1 / 2} \sum_{t = 1}^{n} sup_{u \in B (x, h)} | μ (X_{i}) - μ (x) | E (Δ_{i} (x) ∣ F_{i - 1}) h^{β} (o (1) + P (x)) \\ \leq O (h^{β}) {(\frac{ϕ (h)}{n})}^{1 / 2} (\frac{f_{i, 1} (x)}{f_{1} (x)} + O_{a . s} (\frac{g_{i, x} (h)}{ϕ (h)})) h^{β} (o (1) + P (x)) . \end{array}

Thus, by (H1)(ii,iii), we have

\begin{matrix} \sum_{t = 1}^{n} {(E (η_{n i} ∣ F_{i - 1}))}^{2} & = & O_{a . s} (h^{2 β} ϕ (h)) (\frac{1}{f_{1}^{2} (x)} \frac{1}{n} + \sum_{t = 1}^{n} f_{i, 1}^{2} (x) + o_{a . s} (1)) \\ \times {(o (1) + P (x))}^{2} \\ = & O_{a . s} (ϕ (h) h^{2 β}) . \end{matrix}

(A5)

The statement (a) follows then if we show that

\sum_{t = 1}^{n} lim_{n \to \infty} \sum_{t = 1}^{n} E (η_{n i}^{2} ∣ F_{i - 1}) = σ_{0}^{2} .

(A6)

To prove (A6), observe that

\begin{matrix} \sum_{t = 1}^{n} lim_{n \to \infty} \sum_{t = 1}^{n} E (η_{n i}^{2} ∣ F_{i - 1}) & = & \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E [{(1_{{Y_{i} \in C}} - μ (x))}^{2} δ_{i} Δ_{i}^{2} (x) ∣ F_{i - 1}] \\ = & J_{1 n} + J_{2 n}, \end{matrix}

where

\begin{matrix} J_{1 n} & = & \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E [E {(1_{{Y_{i} \in C}} - μ (X_{i}))}^{2} δ_{i} Δ_{i}^{2} (x) ∣ F_{i - 1}], \end{matrix}

and

J_{2 n} = \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E [{(μ (X_{i}) - μ (X))}^{2} δ_{i} Δ_{i}^{2} (x) ∣ F_{i - 1}] .

Hence, leveraging the properties of conditional expectation, we derive:

\begin{matrix} J_{1 n} & = & \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E \{E [{(1_{{Y_{i} \in C}} - μ (X_{i}))}^{2} δ_{i} Δ_{i}^{2} (x) ∣ B_{i - 1}] ∣ F_{i - 1}\} \\ = & \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E \{Δ_{i}^{2} (x) E [{(1_{{Y_{i} \in C}} - μ (X_{i}))}^{2} δ_{i} ∣ X_{i}] ∣ F_{i - 1}\} \\ = & \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E [W (X_{i}) P (X_{i}) Δ_{i}^{2} (x) ∣ F_{i - 1}] . \end{matrix}

Likewise, with the assumptions (H2)(ii,iii) and (H4)(i), along with the aid of Lemma A1 once more, it follows that, as

n \to \infty

:

\begin{matrix} J_{1 n} & = & \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E [(o (1) + W (x)) (o (1) + P (x)) Δ_{i}^{2} (x) ∣ F_{i - 1}] \\ = & \frac{1}{\frac{{(E (Δ_{1} (x)))}^{2}}{ϕ^{2} (h)}} \frac{1}{n} \frac{1}{ϕ (h)} \sum_{t = 1}^{n} (o (1) + W (x)) (o (1) + P (x)) \\ \times (M_{2} ϕ (h) f_{i 1} (x) + O_{a . s} (g_{i, x} (h))) \\ \to \frac{M_{2} W (x) P (x)}{M_{1}^{2} f_{1} (x)} = σ_{0} {(x)}^{2} . \end{matrix}

Again, combining Lemma A1 with conditions (H1)(ii), and (H3)(ii,iii), it is evident that:

\sum_{t = 1}^{n} lim_{n \to \infty} J_{1 n} = \frac{M_{2} W (x) P (x)}{M_{1}^{2} f_{1} (x)},

almost surely, whenever

f_{1} (x) > 0

. Consider now the term

J_{2 n}

. Utilizing conditions (H1)(ii,iii) and (H2)(i) alongside Lemma A1, we can express, as

n \to \infty

:

\begin{matrix} | J_{n 2} | & = & O (h^{2 β}) \frac{ϕ (h)}{n {(E (Δ_{1} (x)))}^{2}} \sum_{t = 1}^{n} E (δ_{i} Δ_{i}^{2} (x) ∣ F_{i - 1}) \\ = & O (h^{2 β}) (\frac{M_{2}}{M_{1}^{2}} \frac{1}{f_{1} (x)} + o_{a . s} (1)) \to 0, a l m o s t s u r e l y, \end{matrix}

(A7)

whenever

f_{1} (x) > 0

, this completes the proof of Part (a).

Proof of Part (b).

The Lindeberg condition results from Corollary 9.5.2 in Ref. ([93]), which implies that

n E (ξ_{n i}^{2} 1 (| ξ_{n i} | > ε)) \leq 4 n E (η_{n i}^{2} 1 (| η_{n i} | > ε / 2)) .

Let

a > 1

and

b > 1

such that

\frac{1}{a} + \frac{1}{b} = 1

. Applying Hölder and Markov inequalities, one can express, for all

ε > 0

:

E (η_{n i}^{2} 1 (| η_{n i} | > ε / 2)) \leq \frac{E | η_{n i} |^{2 a}}{{(ε / 2)}^{2 a / b}},

where

C_{0}

is a positive constant and

2 a = 2 + δ

. Utilizing

δ

from the condition (H3)(iii) of conditional moments, we obtain:

\begin{array}{l} 4 n & E (η_{n i}^{2} 1 (| η_{n i} | > ε / 2)) \\ \leq C_{0} {(\frac{ϕ (h)}{n})}^{(2 + δ) / 2} \frac{n}{{(E (Δ_{1} (x)))}^{2 + δ}} E ([| 1_{{Y_{i} \in C}} - μ (x) | δ_{i} Δ_{i} (x)]^{2 + δ}) \\ \leq C_{0} {(\frac{ϕ (h)}{n})}^{(2 + δ) / 2} \frac{n}{{(E (Δ_{1} (x)))}^{2 + δ}} E (E (| 1_{{Y_{i} \in C}} {- μ (x) |}^{2 + δ} δ_{i} {(Δ_{i} (x))}^{2 + δ} ∣ X_{i})) \\ \leq C_{0} {(\frac{ϕ (h)}{n})}^{(2 + δ) / 2} \frac{n}{{(E (Δ_{1} (x)))}^{2 + δ}} E ({(Δ_{i} (x))}^{2 + δ} P (X_{i}) {\bar{W}}_{2 + δ} (X_{i})) \\ \leq C_{0} {(\frac{ϕ (h)}{n})}^{(2 + δ) / 2} \frac{n}{E {(Δ_{1} (x))}^{2 + δ}} E [{(Δ_{1} (x))}^{2 + δ} (P (x) + o (1)) ({\bar{W}}_{2 + δ} (x) + o (1))] \\ \leq C_{0} {(n ϕ (h))}^{- δ / 2} \frac{(M_{2 + δ} f_{1} (x) + o (1))}{(M_{1}^{2 + δ} f_{1}^{2 + δ} (x) + o (1))} (P (x) {\bar{W}}_{2 + δ} (x) + o (1)) \\ = O ({(n ϕ (h))}^{- δ / 2}), \end{array}

where the last equality follows from Lemma A1. This concludes the proof of part (b) as

n ϕ (h) \to \infty

when

n \to \infty

. Thus, the proof is complete. □

Proof of Theorem 1.

By Lemma A3 it follows that

\sqrt{n ϕ (h_{n})} Q_{n} (x) = O_{P} (1) .

Thus, by Lemma A2 the proof is valid. □

Proof of Theorem 2.

The proof follows from A1, A2, and Slutsky’s Theorem, so the proof is valid. □

Proof of Theorem 3.

Let us recall some facts. Let

f (\cdot) = δ_{i} 1 {\cdot \in C_{1}}

and

g (\cdot) = δ_{i} 1 {\cdot \in C_{2}}

. Given random measures

λ

on

(X, X)

, we define

d_{λ}^{(2)} (f, g) : = {[λ {(f - g)}^{2}]}^{1 / 2} .

Say that a class of functions

F

has uniformly integrable entropy with respect to

L_{2}

-norm if

\int_{0}^{\infty} sup_{γ \in M (X, F)} {[ln N (ϵ {[γ (F^{2})]}^{1 / 2}, F, d_{γ}^{(2)})]}^{1 / 2} d ϵ < \infty,

where

d_{γ}^{(2)} (f, g) : = {[\int_{X} {(f - g)}^{2} d γ]}^{1 / 2} .

If the class

F

possesses uniformly integrable entropy,

(F, d_{γ}^{(2)})

is totally bounded for any measure

γ

. Let

κ

be an envelope of

F

, i.e.,

κ

is a measurable function mapping

F

to

[0, \infty)

such that:

sup_{f \in F} | f (t) | \leq κ (t), for all t \in R .

Let

M (R, κ)

be the set of all measures

γ

on

(R, F)

with

γ (κ) : = \int_{R} κ^{2} d γ < \infty,

(A8)

and

d_{γ}^{(r)} (f, g) : = {[\int_{R} {(f - g)}^{r} d γ]}^{1 / r} .

Given random measures

λ

on

(R, F)

, we define

d_{λ}^{(2)} (f, g) : = {[λ {(f - g)}^{2}]}^{1 / 2} .

Let us introduce the uniform entropy integral

J (δ, F, d_{γ}^{(2)}) = \int_{0}^{δ} sup_{γ \in (R, F)} {[log (N (ϵ {[γ (κ^{2})]}^{1 / 2}, F, d_{γ}^{(2)}))]}^{1 / 2} d ϵ .

We say that

F

has uniformly integrable entropy with respect to

L_{2}

-norm if

J (\infty, F, d_{γ}^{(2)}) < \infty .

(A9)

If the class

F

possesses uniformly integrable entropy,

(F, d_{γ}^{(2)})

is totally bounded for any measure

γ

. Let

B (φ) : φ \in F

be a Gaussian process whose sample paths are contained in

U_{b} (F, d_{γ}^{(2)}) : = \{f \in ℓ^{\infty} (F) : f is uniformaly continuous with respect to d_{γ}^{(2)}\} .

Let

L (•)

denote the law of •. Notice that obtaining a uniform CLT essentially means that we show the following convergence

\{L (A_{n, φ}) : φ \in F\} \to \{L (B (φ)) : f \in F\},

where the processes are indexed by

F

and considered as random elements of the bounded real-valued functions on

F

defined by

ℓ^{\infty} (F) : = \{f : F \to {R : ∥ f ∥}_{F} : = sup_{φ \in F} | f (φ) | < \infty\},

(A10)

which is a Banach space equipped with the sup norm. In the following, we employ the weak convergence in the sense of Ref. [94], which we recap in the following definition. Throughout the paper,

E^{*}

denotes the upper expectation with respect to the outer probability

P^{*}

; for further details and discussion, refer to Ref. [1] (p. 6) and Ref. [95] (§6.2, p. 88). □

Definition A1.

A sequence of

ℓ^{\infty} (F)

-valued random functions

{T_{n} : n \geq 1}

converges in law to a

ℓ^{\infty} (F)

-valued Borel measurable random function T whose law concentrates on a separable subset of

ℓ^{\infty} (F)

, denoted

T_{n} ⇝ T

, if,

E g (T) = lim_{n \to \infty} E^{*} g (T_{n}), \forall g \in C (ℓ^{\infty} (F) {, ∥ \cdot ∥}_{F}),

where

C (ℓ^{\infty} (F), ∥ \cdot ∥_{F})

is the set of all bounded

{∥ \cdot ∥}_{F}

-continuous functions from

(ℓ^{\infty} (F), ∥ \cdot ∥_{F})

into

R

.

We set

η_{n; i} (f, x) : = η_{n; i} (C_{1}, x) : = {(\frac{ϕ (h)}{n})}^{1 / 2} (δ_{i} 1_{{(Y_{i} \in C_{1}}} - μ (C, x)) \frac{Δ_{i} (x)}{E (Δ_{i} (x))},

with

Δ_{i} (x) = K (h^{- 1} d (x, X_{i}))

, and define

η_{n; i} (g, x)

in a similar way. Let

ξ_{n; i} (f, x) : = η_{n; i} (f, x) - E (η_{n; i} (f, x) ∣ F_{i - 1}) .

Let us define

σ_{n}^{2} (f, g) = \sum_{i = 1}^{n} {(ξ_{n; i} (f, x) - ξ_{n; i} (g, x))}^{2} .

To establish Theorem 3, we can rely on Theorem 2 of [96] (see also Refs. [10,13,15]). It is sufficient to demonstrate that, for all constant

L > 0

, as n tends to infinity:

P^{*} \{sup_{f, g \in F} \frac{σ_{n}^{2} (f, g)}{{(d_{μ_{n}}^{(2)} (f, g))}^{2}} > L\} \to 0,

(A11)

which is implied by the following,

E^{*} sup_{d^{(2)} (f, g) \leq δ_{n}} \sum_{i = 1}^{n} \frac{E ({(ξ_{n; i} (f, x) - ξ_{n; i} (g, x))}^{2} ∣ F_{i - 1})}{{(d^{(2)} (f, g))}^{2}} \to 0, a s δ_{n} \to 0,

where we recall

d^{(2)} (f, g) : = {[\int_{R} {(f - g)}^{2} d P]}^{1 / 2} .

In the rest of the proof, denote by

β_{n} (x) = \frac{\sqrt{ϕ (h)}}{E [Δ_{1} (x)]},

and

ζ (f, x) = ζ (C_{1}, x) : = (δ_{i} 1_{{(Y_{i} \in C_{1}}} - μ (C, x)) Δ_{i} (x) .

Therefore, we have the following

\begin{array}{l} \sum_{i = 1}^{n} & \frac{E ({(ξ_{n; i} (f, x) - ξ_{n; i} (g, x))}^{2} ∣ F_{i - 1})}{d^{(2)} (f, g)} \\ = \frac{β_{n}^{2} (x)}{n d^{(2)} (f, g)} \sum_{i = 1}^{n} E [((ζ (f, x) - ζ (g, x)) \\ {- E [ζ (f, x) - ζ (g, x) ∣ F_{i - 1}])^{2} ∣ F_{i - 1}]}^{2} \\ \leq \frac{β_{n}^{2} (x)}{n d^{(2)} (f, g)} \sum_{i = 1}^{n} 2 E [{(ζ (f, x) - ζ (g, x))}^{2} ∣ F_{i - 1}] \\ - 2 E \{{[E [(ζ (f, x) - ζ (g, x)) ∣ F_{i - 1}]]}^{2}\} \\ : = T_{1, n} + T_{2, n} . \end{array}

We first evaluate

T_{1, n}

. We have

\begin{matrix} T_{1, n} & \leq & \frac{2 β_{n}^{2} (x)}{n d^{(2)} (f, g)} \sum_{i = 1}^{n} 2 E [Δ_{i}^{2} (x) {(δ_{i} f (Y_{i}) - δ_{i} g (Y_{i}))}^{2} ∣ F_{i - 1}] \\ + 2 E [δ_{i} Δ_{i}^{2} (x) {(μ (C_{1}, x) - μ (C_{2}, x))}^{2} ∣ F_{i - 1}] \\ : = & T_{1, n, 1} + T_{1, n, 2} . \end{matrix}

Using the fact that

E (Δ_{1}^{2} (x)) = O (ϕ (h))

(as indicated in Lemma A1), and taking into account that the class of functions

F

has a constant envelope and

K (\cdot)

is both bounded and bounded away from zero, one can obtain the following upper bound for the last equation, where C is a positive constant:

\begin{matrix} T_{1, n, 1} & \leq & \frac{C \sqrt{ϕ (h)}}{d^{(2)} (f, g)} E [Δ_{1} (x) (f (Y_{1}) - g (Y_{1}))] \\ \leq & \frac{C \sqrt{ϕ (h)}}{d^{(2)} (f, g)} E {[Δ_{1} {(x)}^{2}]}^{1 / 2} E {[{(f (Y_{1}) - g (Y_{1}))}^{2}]}^{1 / 2} \\ = & \frac{C \sqrt{ϕ (h)}}{{\bar{G}}^{2} (ζ)} E {[Δ_{1} {(x)}^{2}]}^{1 / 2} \\ = & O (ϕ (h)) = o (1) . \end{matrix}

Making use of similar arguments, we infer that

\begin{matrix} T_{1, n, 2} & = & \frac{C ϕ {(h)}^{3 / 2}}{d^{(2)} (f, g)} {(E [δ (f (Y) - g (Y)) | X = x])}^{2} {= O (ϕ (h)}^{3 / 2} = o (1) . \end{matrix}

We readily obtain that,

T_{1, n} = o (1) .

By employing arguments akin to those utilized in the proof of the previous statement, we can establish that

T_{2, n} = o (1) .

Using the Lindeberg conditions from the preceding proof and (A11), along with Theorem 1 of [96], we deduce that for a given

ε > 0

and

γ > 0

, there exists

η > 0

, such that:

\underset{n \to \infty}{lim sup} P^{*} \{sup_{d (C_{1}, C_{2})) \leq η} | ν_{n} (C_{1}, x) - ν_{n} (C_{2}, x) | \geq 5 γ\} \leq 3 ε .

(A12)

Now, the proof of the theorem is completed by combining this last equation with Theorem 3.

References

van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer Series in Statistics; With applications to statistics; Springer: New York, NY, USA, 1996; pp. xvi+508. [Google Scholar] [CrossRef]
Shorack, G.R.; Wellner, J.A. Empirical Processes with Applications to Statistics; Classics in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 2009; Volume 59, pp. xli+956. [Google Scholar] [CrossRef]
Dudley, R.M. Uniform Central Limit Theorems; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1999; Volume 63, pp. xiv+436. [Google Scholar] [CrossRef]
Vapnik, V.N.; Červonenkis, A.J. The uniform convergence of frequencies of the appearance of events to their probabilities. Teor. Verojatnost. i Primenen. 1971, 16, 264–279. [Google Scholar]
Dudley, R.M. Central limit theorems for empirical measures. Ann. Probab. 1978, 6, 899–929. [Google Scholar] [CrossRef]
Giné, E.; Zinn, J. Some limit theorems for empirical processes. Ann. Probab. 1984, 12, 929–998. [Google Scholar] [CrossRef]
Le Cam, L. A remark on empirical measures. In A Festschrift for Erich Lehmann in Honor of His Sixty-Fifth Birthday; Wadsworth Statist./Probab. Ser.; UC Berkeley Statistics: Wadsworth, OH, USA; Belmont, CA, USA, 1983; pp. 305–327. [Google Scholar]
Pollard, D. A central limit theorem for empirical processes. J. Aust. Math. Soc. Ser. A 1982, 33, 235–248. [Google Scholar] [CrossRef]
Bass, R.F.; Pyke, R. A strong law of large numbers for partial-sum processes indexed by sets. Ann. Probab. 1984, 12, 268–271. [Google Scholar] [CrossRef]
Bouzebda, S.; Soukarieh, I. Renewal type bootstrap for U-process Markov chains. Markov Process. Relat. Fields 2022, 28, 673–735. [Google Scholar]
Alvarez-Andrade, S.; Bouzebda, S.; Lachal, A. Strong approximations for the p-fold integrated empirical process with applications to statistical tests. Test 2018, 27, 826–849. [Google Scholar] [CrossRef]
Bouzebda, S. Some applications of the strong approximation of the integrated empirical copula processes. Math. Methods Stat. 2016, 25, 281–303. [Google Scholar] [CrossRef]
Soukarieh, I.; Bouzebda, S. Renewal type bootstrap for increasing degree U-process of a Markov chain. J. Multivar. Anal. 2023, 195, 105143. [Google Scholar] [CrossRef]
Bouzebda, S.; Soukarieh, I. Limit theorems for a class of processes generalizing the U-empirical process. Stochastics 2024, 1–36. [Google Scholar]
Soukarieh, I.; Bouzebda, S. Exchangeably Weighted Bootstraps of General Markov U-Process. Mathematics 2022, 10, 3745. [Google Scholar] [CrossRef]
Yoshihara, K.I. Conditional empirical processes defined by ϕ-mixing sequences. Comput. Math. Appl. 1990, 19, 149–158. [Google Scholar] [CrossRef]
Eberlein, E. Weak convergence of partial sums of absolutely regular sequences. Stat. Probab. Lett. 1984, 2, 291–293. [Google Scholar] [CrossRef]
Nobel, A.; Dembo, A. A note on uniform laws of averages for dependent processes. Stat. Probab. Lett. 1993, 17, 169–172. [Google Scholar] [CrossRef]
Yu, B. Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 1994, 22, 94–116. [Google Scholar] [CrossRef]
Bouzebda, S.; Nemouchi, B. Central limit theorems for conditional empirical and conditional U-processes of stationary mixing sequences. Math. Methods Stat. 2019, 28, 169–207. [Google Scholar] [CrossRef]
Andrews, D.W.K.; Pollard, D. An Introduction to Functional Central Limit Theorems for Dependent Stochastic Processes. Int. Stat. Rev. Rev. Int. Stat. 1994, 62, 119–132. [Google Scholar] [CrossRef]
Doukhan, P.; Massart, P.; Rio, E. Invariance principles for absolutely regular empirical processes. Ann. Inst. H. Poincaré Probab. Stat. 1995, 31, 393–427. [Google Scholar]
Polonik, W.; Yao, Q. Set-indexed conditional empirical and quantile processes based on dependent data. J. Multivar. Anal. 2002, 80, 234–255. [Google Scholar] [CrossRef]
Bosq, D. Linear Processes in Function Spaces; Lecture Notes in Statistics; Theory and Applications; Springer: New York, NY, USA, 2000; Volume 149, pp. xiv+283. [Google Scholar] [CrossRef]
Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2005; pp. xx+426. [Google Scholar]
Cuevas, A. A partial overview of the theory of statistics with functional data. J. Stat. Plan. Inference 2014, 147, 1–23. [Google Scholar] [CrossRef]
Goia, A.; Vieu, P. An introduction to recent advances in high/infinite dimensional statistics [Editorial]. J. Multivar. Anal. 2016, 146, 1–6. [Google Scholar] [CrossRef]
Aneiros, G.; Cao, R.; Fraiman, R.; Genest, C.; Vieu, P. Recent advances in functional data analysis and high-dimensional statistics. J. Multivar. Anal. 2019, 170, 3–9. [Google Scholar] [CrossRef]
Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
Chowdhury, J.; Chaudhuri, P. Multi-sample comparison using spatial signs for infinite dimensional data. Electron. J. Stat. 2022, 16, 4636–4678. [Google Scholar] [CrossRef]
Chowdhury, J.; Chaudhuri, P. Convergence rates for kernel regression in infinite-dimensional spaces. Ann. Inst. Stat. Math. 2020, 72, 471–509. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer Series in Statistics; Theory and Practice; Springer: New York, NY, USA, 2006; pp. xx+258. [Google Scholar]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics; Springer: New York, NY, USA, 2012; pp. xiv+422. [Google Scholar] [CrossRef]
Bosq, D.; Blanke, D. Inference and Prediction in Large Dimensions; Wiley Series in Probability and Statistics; John Wiley & Sons, Ltd.: Chichester, UK; Dunod, Scotland; Paris, France, 2007; pp. x+316. [Google Scholar] [CrossRef]
Shi, J.Q.; Choi, T. Gaussian Process Regression Analysis for Functional Data; CRC Press: Boca Raton, FL, USA, 2011; pp. xx+196. [Google Scholar]
Zhang, J.T. Analysis of Variance for Functional Data; Monographs on Statistics and Applied Probability; CRC Press: Boca Raton, FL, USA, 2014; Volume 127, pp. xxiv+386. [Google Scholar]
Bongiorno, E.G.; Goia, A.; Salinelli, E.; Vieu, P. An overview of IWFOS’2014. In Contributions in Infinite-Dimensional Statistics and Related Topics; Esculapio: Bologna, Italy, 2014; pp. 1–5. [Google Scholar]
Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; Wiley Series in Probability and Statistics; John Wiley & Sons, Ltd.: Chichester, UK, 2015; pp. xiv+334. [Google Scholar] [CrossRef]
Aneiros, G., Bongiorno, E.G., Cao, R., Vieu, P., Eds.; Functional statistics and related fields. In Proceedings of the 4th International Workshop on Functional and Operational Statistics, IWFOS, Corunna, Spain, 15–17 June 2017; Springer: Cham, Switzerland, 2017; pp. xxiv+288. [Google Scholar]
Berrahou, N.; Bouzebda, S.; Douge, L. Functional uniform-in-bandwidth moderate deviation principle for the local empirical processes involving functional data. Math. Methods Stat. 2024, 33, 1–43. [Google Scholar]
Poryvaĭ, D.V. An invariance principle for conditional empirical processes formed by dependent random variables. Izv. Ross. Akad. Nauk Ser. Mat. 2005, 69, 129–148. [Google Scholar] [CrossRef]
Bouzebda, S.; Madani, F.; Souddi, Y. Some Asymptotic Properties of the Conditional Set-Indexed Empirical Process Based on Dependent Functional Data. Int. J. Math. Stat. 2022, 22, 77–105. [Google Scholar]
Bouzebda, S.; Chaouch, M. Uniform limit theorems for a class of conditional Z-estimators when covariates are functions. J. Multivar. Anal. 2022, 189, 104872. [Google Scholar] [CrossRef]
Souddi, Y.; Madani, F.; Bouzebda, S. Some characteristics of the conditional set-indexed empirical process involving functional ergodic data. Bull. Inst. Math. Acad. Sin. (New Ser.) 2021, 16, 367–399. [Google Scholar] [CrossRef]
Bouzebda, S.; Soukarieh, I. Nonparametric conditional U-processes for locally stationary functional random fields under stochastic sampling design. Mathematics 2022, 10, 16. [Google Scholar] [CrossRef]
Soukarieh, I.; Bouzebda, S. Weak Convergence of the Conditional U-statistics for Locally Stationary Functional Time Series. Stat. Inference Stoch. Process 2024, 16, 1–78. [Google Scholar] [CrossRef]
Bouzebda, S.; Nezzal, A. Uniform in number of neighbors consistency and weak convergence of kNN empirical conditional processes and kNN conditional U-processes involving functional mixing data. AIMS Math. 2024, 9, 4427–4550. [Google Scholar] [CrossRef]
Cheng, P.E. Nonparametric estimation of mean functionals with data missing at random. J. Am. Stat. Assoc. 1994, 89, 81–87. [Google Scholar] [CrossRef]
Cheng, P.E.; Chu, C.K. Kernel estimation of distribution functions and quantiles with missing data. Stat. Sin. 1996, 6, 63–78. [Google Scholar]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1987; pp. xvi+278. [Google Scholar]
Nittner, T. Missing at random (MAR) in nonparametric regression—A simulation experiment. Stat. Methods Appl. 2003, 12, 195–210. [Google Scholar] [CrossRef]
Tsiatis, A.A. Semiparametric Theory and Missing Data; Springer Series in Statistics; Springer: New York, NY, USA, 2006; pp. xvi+383. [Google Scholar]
Wang, Q.; Sun, Z. Estimation in partially linear models with missing responses at random. J. Multivar. Anal. 2007, 98, 1470–1493. [Google Scholar] [CrossRef]
Wang, Q. Probability density estimation with data missing at random when covariables are present. J. Stat. Plan. Inference 2008, 138, 568–587. [Google Scholar] [CrossRef]
Liang, H.; Wang, S.; Carroll, R.J. Partially linear models with missing response variables and error-prone covariates. Biometrika 2007, 94, 185–198. [Google Scholar] [CrossRef]
Efromovich, S. Nonparametric regression with responses missing at random. J. Stat. Plan. Inference 2011, 141, 3744–3752. [Google Scholar] [CrossRef]
Efromovich, S. Nonparametric regression with predictors missing at random. J. Am. Stat. Assoc. 2011, 106, 306–319. [Google Scholar] [CrossRef]
Tang, N.; Zhao, P.; Zhu, H. Empirical likelihood for estimating equations with nonignorably missing data. Stat. Sin. 2014, 24, 723–747. [Google Scholar] [CrossRef]
Müller, U.U.; Schick, A. Efficiency transfer for regression models with responses missing at random. Bernoulli 2017, 23, 2693–2719. [Google Scholar] [CrossRef][Green Version]
Müller, U.U.; Schick, A. Efficiency for heteroscedastic regression with responses missing at random. J. Stat. Plan. Inference 2018, 196, 132–143. [Google Scholar] [CrossRef]
Shen, Y.; Liang, H.Y. Quantile regression and its empirical likelihood with missing response at random. Stat. Pap. 2018, 59, 685–707. [Google Scholar] [CrossRef]
Ferraty, F.; Sued, M.; Vieu, P. Mean estimation with data missing at random for functional covariables. Statistics 2013, 47, 688–706. [Google Scholar] [CrossRef]
Ling, N.; Liang, L.; Vieu, P. Nonparametric regression estimation for functional stationary ergodic data with missing at random. J. Stat. Plan. Inference 2015, 162, 75–87. [Google Scholar] [CrossRef]
Ling, N.; Liu, Y.; Vieu, P. Conditional mode estimation for functional stationary ergodic data with responses missing at random. Statistics 2016, 50, 991–1013. [Google Scholar] [CrossRef]
Wang, L.; Cao, R.; Du, J.; Zhang, Z. A nonparametric inverse probability weighted estimation for functional data with missing response data at random. J. Korean Stat. Soc. 2019, 48, 537–546. [Google Scholar] [CrossRef]
Laib, N.; Louani, D. Nonparametric kernel regression estimation for functional stationary ergodic data: Asymptotic properties. J. Multivar. Anal. 2010, 101, 2266–2281. [Google Scholar] [CrossRef]
Didi, S.; Bouzebda, S. Wavelet Density and Regression Estimators for Continuous Time Functional Stationary and Ergodic Processes. Mathematics 2022, 10, 4356. [Google Scholar] [CrossRef]
Didi, S.; Al Harby, A.; Bouzebda, S. Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time. Mathematics 2022, 10, 3433. [Google Scholar] [CrossRef]
Nadaraja, E.A. On a regression estimate. Teor. Verojatnost. i Primen. 1964, 9, 157–159. [Google Scholar]
Watson, G.S. Smooth regression analysis. Sankhyā Ser. A 1964, 26, 359–372. [Google Scholar]
Stute, W. Conditional empirical processes. Ann. Stat. 1986, 14, 638–647. [Google Scholar] [CrossRef]
Stute, W. On almost sure convergence of conditional empirical distribution functions. Ann. Probab. 1986, 14, 891–901. [Google Scholar] [CrossRef]
Horváth, L.; Yandell, B.S. Asymptotics of conditional empirical processes. J. Multivar. Anal. 1988, 26, 184–206. [Google Scholar] [CrossRef]
Ferraty, F.; Mas, A.; Vieu, P. Nonparametric regression on functional data: Inference and practical aspects. Aust. N. Z. J. Stat. 2007, 49, 267–286. [Google Scholar] [CrossRef]
Dudley, R.M. A course on empirical processes. In École d’été de Probabilités de Saint-Flour, XII—1982; Lecture Notes in Mathematics; Springer: Berlin, Germany, 1984; Volume 1097, pp. 1–142. [Google Scholar] [CrossRef]
Billingsley, P. Convergence of Probability Measures, 2nd ed.; Wiley Series in Probability and Statistics: Probability and Statistics; A Wiley-Interscience Publication; John Wiley & Sons, Inc.: New York, NY, USA, 1999; pp. x+277. [Google Scholar] [CrossRef]
Huber, P.J. Robust Statistics; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1981; pp. ix+308. [Google Scholar]
Parthasarathy, K.R. Probability Measures on Metric Spaces; Reprint of the 1967 original; AMS Chelsea Publishing: Providence, RI, USA, 2005; pp. xii+276. [Google Scholar] [CrossRef]
Hofinger, A. The metrics of Prokhorov and Ky Fan for assessing uncertainty in inverse problems. Österreich. Akad. Wiss. Math.-Natur. Kl. Sitzungsber. II 2006, 215, 107–125. [Google Scholar] [CrossRef]
Fan, K. Entfernung zweier zufälligen Grössen und die Konvergenz nach Wahrscheinlichkeit. Math. Z. 1944, 49, 681–683. [Google Scholar] [CrossRef]
Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 2020, 32, 452–509. [Google Scholar] [CrossRef]
Hall, P. Asymptotic properties of integrated square error and cross-validation for kernel estimation of a regression function. Z. Wahrsch. Verw. Geb. 1984, 67, 175–196. [Google Scholar] [CrossRef]
Rachdi, M.; Vieu, P. Nonparametric regression for functional data: Automatic smoothing parameter selection. J. Stat. Plan. Inference 2007, 137, 2784–2801. [Google Scholar] [CrossRef]
Dony, J.; Mason, D.M. Uniform in bandwidth consistency of conditional U-statistics. Bernoulli 2008, 14, 1108–1133. [Google Scholar] [CrossRef]
Bouzebda, S. On the weak convergence and the uniform-in-bandwidth consistency of the general conditional U-processes based on the copula representation: Multivariate setting. Hacet. J. Math. Stat. 2023, 52, 1303–1348. [Google Scholar] [CrossRef]
Bouzebda, S.; Taachouche, N. On the variable bandwidth kernel estimation of conditional U-statistics at optimal rates in sup-norm. Phys. A 2023, 625, 129000. [Google Scholar] [CrossRef]
Bouzebda, S. General tests of conditional independence based on empirical processes indexed by functions. Jpn. J. Stat. Data Sci. 2023, 6, 115–177. [Google Scholar] [CrossRef]
Shang, H.L. Bayesian bandwidth estimation for a functional nonparametric regression model with mixed types of regressors and unknown error density. J. Nonparametr. Stat. 2014, 26, 599–615. [Google Scholar] [CrossRef]
Bouzebda, S.; Laksaci, A.; Mohammedi, M. The k-nearest neighbors method in single index regression model for functional quasi-associated time series data. Rev. Mat. Complut. 2023, 36, 361–391. [Google Scholar] [CrossRef]
Bouzebda, S.; Laksaci, A.; Mohammedi, M. Single index regression model for functional quasi-associated time series data. Revstat 2022, 20, 605–631. [Google Scholar]
Hall, P.; Heyde, C.C. Martingale Limit Theory and Its Application; Probability and Mathematical Statistics; Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers]: New York, NY, USA; London, UK, 1980; pp. xii+308. [Google Scholar]
Györfi, L.; Morvai, G.; Yakowitz, S.J. Limits to consistent on-line forecasting for ergodic time series. IEEE Trans. Inf. Theory 1998, 44, 886–892. [Google Scholar] [CrossRef]
Chow, Y.S.; Teicher, H. Probability Theory, 3rd ed.; Springer Texts in Statistics; Independence, interchangeability, martingales; Springer: New York, NY, USA, 1997; pp. xxii+488. [Google Scholar] [CrossRef]
Hoffmann-Jørgensen, J. Stochastic Processes on Polish Spaces; Various Publications Series (Aarhus); Aarhus Universitet, Matematisk Institut: Aarhus, Denmark, 1991; Volume 39, pp. ii+278. [Google Scholar]
Kosorok, M.R. Introduction to Empirical Processes and Semiparametric Inference; Springer Series in Statistics; Springer: New York, NY, USA, 2008; pp. xiv+483. [Google Scholar] [CrossRef]
Bae, J.; Jun, D.; Levental, S. The uniform CLT for martingale difference arrays under the uniformly integrable entropy. Bull. Korean Math. Soc. 2010, 47, 39–51. [Google Scholar] [CrossRef][Green Version]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bouzebda, S.; Souddi, Y.; Madani, F. Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data. Mathematics 2024, 12, 448. https://doi.org/10.3390/math12030448

AMA Style

Bouzebda S, Souddi Y, Madani F. Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data. Mathematics. 2024; 12(3):448. https://doi.org/10.3390/math12030448

Chicago/Turabian Style

Bouzebda, Salim, Youssouf Souddi, and Fethi Madani. 2024. "Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data" Mathematics 12, no. 3: 448. https://doi.org/10.3390/math12030448

APA Style

Bouzebda, S., Souddi, Y., & Madani, F. (2024). Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data. Mathematics, 12(3), 448. https://doi.org/10.3390/math12030448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weak Convergence of the Conditional Set-Indexed Empirical Process for Missing at Random Functional Ergodic Data

Abstract

1. Introduction

2. The Set Indexed Conditional Empirical Process

2.1. Assumptions and Notation

2.2. Comments on the Assumptions

3. Main Results

3.1. The Bandwidth Selection Criterion

4. Applications to Classification with Partially Labeled Data

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI