Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time

DIDI, Sultana; AL HARBY, Ahoud; BOUZEBDA, Salim

doi:10.3390/math10193433

Open AccessArticle

Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time

by

Sultana DIDI

^1,†

,

Ahoud AL HARBY

^2,† and

Salim BOUZEBDA

^3,*,†

¹

Department of Statistics, College of Sciences, Qassim University, P.O. Box 6688, Buraydah 51452, Saudi Arabia

²

Department of Mathematics, College of Sciences, Qassim University, P.O. Box 6688, Buraydah 51452, Saudi Arabia

³

LMAC (Laboratory of Applied Mathematics of Compiègne), Université de Technologie de Compiègne, 60200 Compiègne, France

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(19), 3433; https://doi.org/10.3390/math10193433

Submission received: 6 August 2022 / Revised: 14 September 2022 / Accepted: 17 September 2022 / Published: 21 September 2022

(This article belongs to the Special Issue Functional Data Analysis: Theory and Applications to Different Scenarios)

Download Versions Notes

Abstract

The nonparametric estimation of density and regression function based on functional stationary processes using wavelet bases for Hilbert spaces of functions is investigated in this paper. The mean integrated square error over adapted decomposition spaces is given. To obtain the asymptotic properties of wavelet density and regression estimators, the Martingale method is used. These results are obtained under some mild conditions on the model; aside from ergodicity, no other assumptions are imposed on the data. This paper extends the scope of some previous results for wavelet density and regression estimators by relaxing the independence or the mixing condition to the ergodicity. Potential applications include the conditional distribution, curve discrimination, and time series prediction from a continuous set of past values.

Keywords:

multivariate regression estimation; multivariate density estimation; stationarity; ergodicity; rates of strong convergence; wavelet-based estimators; martingale differences; conditional distribution; curve discrimination

MSC:

62G07; 62G08; 62G05; 62G20; 62H05; 60G42; 60G46

1. Introduction

The statistical literature has recently become more interested in statistical issues concerning studying functional random variables or variables with values in an infinite-dimensional space. The availability of data measured on ever-finer temporal/spatial grids, such as in meteorology, medicine, satellite images, and many other research disciplines, is driving the growth of this research issue, statistically modeling these data as random functions revealed many complex theoretical and numerical research challenges. The reader may consult the monographs for a summary of the theoretical and practical aspects of functional data analysis. The work in Bosq [1] concerns linear models for random variables with values in a Hilbert space, Ramsay and Silverman [2] discussed scalar-on-function and function-on-function linear models, functional principal component analysis, and parametric discriminant analysis. The work in [3], on the other hand, concentrates on nonparametric methods, particularly kernel-type estimation for scalar-on-function nonlinear regression models. Such tools were extended to classification and discrimination analysis. Horváth and Kokoszka [4] discussed the application of several interesting statistical concepts to the functional data framework, including goodness-of-fit tests, portmanteau tests, and change point problems. The work in [5] focuses on analyzing variance for functional data, whereas that in [6] is more concerned with regression analysis for Gaussian processes. Recent studies and surveys on functional data modeling and analysis can be found in [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22].

Motivated by diverse applications and their helpful role in statistical inference, the problem of estimating conditional models has been subjected to a wide range of statistical literature, employing many types of estimation approaches, the most common of which are the traditional kernel methods. Such methods, however, may have some limitations when estimating compactly supported or discontinuous curves at boundary points. Alternative wavelet methods are prominent due to their adaptability to discontinuities in the curve to be estimated. In practice, the wavelet procedure provides a simple estimation algorithm to implement and compute. For more information on wavelets theory, we refer to [23,24,25,26] and others. The work in [27] discusses wavelet approximation properties in detail, and surveys the use of wavelets in various curve estimation problems. Some wavelet theory applications are discussed in [28] by considering the estimation of the integrated squared derivative density function in the independent unidimensional case. The results in [28] were then extended by [29] to estimate the derivatives of a density for negatively and positively associated sequences, respectively. Rao [30] proposed wavelet estimators for the partial derivatives of a multivariate probability density function, where the rates of almost certain convergence for the independence case are obtained. We cite [31] on estimating partial derivatives of a multivariate probability density function in the presence of additive noise. At this point, we refer to [32]. In the i.i.d. framework, Ref. [33] investigated the density and regression estimation problems for functional data. The authors [33] developed a new adaptive procedure based on the term-by-term selection of wavelet coefficient estimators using wavelet bases for Hilbert spaces of functions. The primary goal of this paper is to extend the previous reference to stationary ergodic processes. To our knowledge, the consideration of the general dependence framework for wavelet analysis is unexplored, which motivates this study. Allow

\{X_{n}, n \in Z\}

to be a stationary sequence. Consider the backward field

A_{n} = σ (X_{k} : k \leq n)

and the forward field

B_{n} = σ (X_{k} : k \geq n)

. The sequence is strongly mixing if

sup_{A \in A_{0}, B \in B_{n}} | P (A \cap B) - P (A) P (B) | = α (n) \to 0 as n \to \infty .

The sequence is ergodic if

lim_{n \to \infty} \frac{1}{n} \sum_{k = 0}^{n - 1} |P (A \cap τ^{- k} B) - P (A) P (B)| = 0,

where

τ

is the time-evolution or shift transformation. The naming of strong mixing in the above definition is more stringent than what is ordinarily referred to (when using the vocabulary of measure-preserving dynamical systems) as strong mixing, namely to that

lim_{n \to \infty} P (A \cap τ^{- n} B) = P (A) P (B)

for any two measurable sets

A, B

, see, for instance, Ref. [34]. As a result, strong mixing implies ergodicity, whereas the converse is not always true (see, for example, Remark 2.6 on page 50 concerning Proposition 2.8 on page 51 in [35]). Some reasons for considering ergodic dependence structure in data rather than a mixing structure are discussed in [36,37,38,39,40,41,42,43,44], where details on the definition of ergodic property of processes are given, as well as illustrative examples of such processes. One of the arguments used in [45] to justify the ergodic setting is that for certain classes of processes, proving ergodic properties rather than the mixing condition can be much easier. As a result, the ergodicity hypothesis appears to be the best fit and offers a better framework for studying data series generated by noisy chaos. The work in [45] provided an example of an ergodic but non-mixing process in their discussion, which can be summarized as follows: Let

(T_{i}, λ_{i}) : i \in Z

be a strictly stationary process such that

T_{i} ∣ T_{i - 1}

is a Poisson process with parameter

λ i

, where

T_{i}

is the

s i g m a

-field generated by

(T_{i}, λ_{i}, T_{i - 1}, \dots)

. Assume that

λ_{i} = f (λ_{i - 1}, T_{i - 1}),

and

f : [0, \infty] \times N \to (0, \infty)

is a given function. This process is not mixing in general (see Remark 3 of [46]). It is known that any sequence

{(ε_{i})}_{i \in Z}

of i.i.d. random variables is ergodic. Hence, according to Proposition 2.10 in [35], it is easy to see that

{(Y_{i})}_{i \in Z}

with

Y_{i} = ϑ ((\dots, ε_{i - 1}, ε_{i}), (ε_{i + 1}, ε_{i + 2}, \dots)),

for some Borel-measurable function

ϑ (\cdot)

.

The primary goal of this paper is to provide the first complete theoretical justification for wavelet-based functional density and regression function estimation for stationary processes. To our knowledge, the mean integrated square error over adapted decomposition spaces using wavelet estimators in functional ergodic data frameworks has not yet been considered in the literature and thus remains a fundamentally unsolved open problem. By combining several Martingale theory techniques used in the mathematical development of the proofs, we hope to fill this gap in the literature. These tools are not the same as those used in regression estimation under strong mixing or in an independent setting. However, as we will see later, combining existing ideas and results is not enough to solve the problem. Dealing with wavelet estimators in an ergodic setting will involve detailed mathematical derivations.

The following is how the paper is structured. The multiresolution analysis is introduced in Section 2. The main results for density estimation are presented in Section 3. The main results for the regression estimation are presented in Section 4. Some potential applications are listed in Section 5. Section 6 contains some concluding remarks. Section 7 contains a collection of all proofs.

2. Multiresolution Analysis

We will now introduce some basic notation to define wavelet bases for Hilbert spaces of functions following [33,47] with some changes necessary for our setting. In our work, we consider nonlinear, thresholded, wavelet-based estimators. Firstly, we initiate our study by describing elements of the basic theory of wavelet methods and introducing nonlinear wavelet-based estimators; the interested reader may refer to [23,24], see also [48,49] and the references therein, although the wavelet bases on a separable Hilbert space

H

of real or complex-valued functions on a complete separable metric space was introduced later by [47], which we briefly recall here for the sake of reader’s convenience. Let

H

be a separable Hilbert space of real-valued functions defined on a complete separable metric space

S

. Since the space

H

is separable, it has an orthonormal basis

E = \{e_{j} : j \in Δ\},

where

Δ

is a countable index set. The space

H

is equipped with an inner product

〈 \cdot, \cdot 〉

and a norm

‖ \cdot ‖

.

Consider the sequence of subsets

\{I_{k}; k \geq 0\}

an increasing sequence of finite subsets of

Δ

such that

⋃_{k \geq 0} I_{k} = Δ .

The subset

J_{k}

denotes the orthogonal complement of

I_{k}

in

I_{k + 1}

, i.e.,

J_{k} = I_{k + 1} I_{k} .

Choose, for any

k \geq 0

,

ζ_{k, ℓ} \in S

,

ℓ \in I_{k}

and

η_{k, ℓ} \in S

,

ℓ \in J_{k}

, such that the following matrices

A_{k} = {(e_{j} (ζ_{k, ℓ}))}_{(j, ℓ) \in I_{k} \times I_{k}}, B_{k} = {(e_{j} (η_{k, ℓ}))}_{(j, ℓ) \in J_{k} \times J_{k}},

(1)

satisfy one of the two following conditions, for instance, see [33,47] and the references therein.

(A.1): $A_{k}^{*} A_{k} = diag {(a_{k, ℓ})}_{ℓ \in I_{k}}$ and $B_{k}^{*} B_{k} = diag {(b_{k, ℓ})}_{ℓ^{'} \in J_{k}}$ where $a_{k, ℓ}$ and $b_{k, ℓ}$ for $ℓ \in I_{k}$ and $ℓ^{'} \in J_{k}$ are positive constants.
(A.2): $A_{k} A_{k}^{*} = diag {(c_{k, ℓ})}_{ℓ \in I_{k}}$ and $B_{k} B_{k}^{*} = diag {(d_{k, ℓ})}_{ℓ^{'} \in J_{k}}$ where $c_{k, ℓ}$ and $d_{k, ℓ}$ for $ℓ \in I_{k}$ and $ℓ^{'} \in J_{k}$ are positive constants.

The condition (A.1) implies that

a_{k, ℓ} = \sum_{j \in I_{k}} | e_{j} (ζ_{k, ℓ}) |^{2}, ℓ \in I_{k}, and b_{k, ℓ} = \sum_{j \in J_{k}} {| e_{j} (η_{k, ℓ}) |}^{2}, ℓ \in I_{k},

(2)

which means that all the columns of

A_{k}

and

B_{k}

are not the zero vector. As for (A.2), it gives

c_{k, ℓ} = \sum_{ℓ \in I_{k}} | e_{j} (ζ_{k, ℓ}) |^{2}, j \in I_{k}, and d_{k, ℓ} = \sum_{ℓ \in J_{k}} {| e_{j} (η_{k, ℓ}) |}^{2}, j \in I_{k},

(3)

indicating that all the rows of

A_{k}

and

B_{k}

are not the zero vector. For any

x \in S

, we set

\{\begin{matrix} ϕ_{k} (\cdot; ζ_{k, ℓ}) = \sum_{j \in I_{k}} \frac{1}{\sqrt{g_{j, k, ℓ}}} \bar{e_{j} (ζ_{k, ℓ})} e_{j} (\cdot), \\ ψ_{k} (\cdot; η_{k, ℓ}) = \sum_{j \in J_{k}} \frac{1}{\sqrt{h_{j, k, ℓ}}} \bar{e_{j} (η_{k, ℓ})} e_{j} (\cdot), \end{matrix}

(4)

where

g_{j, k, ℓ} = \{\begin{matrix} a_{k, ℓ} if (A . 1), \\ c_{k, ℓ} if (A . 2), \end{matrix} h_{j, k, ℓ} = \{\begin{matrix} b_{k, ℓ} if (A . 1), \\ d_{k, ℓ} if (A . 2), \end{matrix}

(5)

The following collection form an orthonormal basis of

H

(see Theorem 2 of [47]):

B = \{ϕ_{0} (x, ζ_{0, ℓ}), ℓ \in I_{0}; ψ_{k} (x, η_{k, ℓ}), k \geq 0, ℓ \in J_{k}\} .

(6)

For more details, see [33,47,50]. Hence, we conclude that for any

f \in H

, we have

f (x) = \sum_{ℓ \in I_{0}} α_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k \geq 0} \sum_{ℓ \in J_{k}} β_{k, ℓ} ψ_{k} (x; η_{k; ℓ}),

(7)

where

α_{0, ℓ} = 〈 f, ϕ_{0} (\cdot; ζ_{0, ℓ}) 〉, β_{k, ℓ} = 〈 f, ψ_{k} (\cdot; η_{k; ℓ}) 〉 .

(8)

In the following, we add two assumptions on the orthonormal basis

E

:

(E.1)

There exists a constant

C_{1} > 0

such that, for any integer

k \geq 0

, one has

(i): $\sum_{j \in I_{k}} \frac{1}{g_{j, k, ℓ}} {|e_{j} (ζ_{k, ℓ})|}^{2} \leq C_{1},$
(ii): $\sum_{j \in I_{k}} \frac{1}{h_{j, k, ℓ}} {|e_{j} (η_{k, ℓ})|}^{2} \leq C_{1} .$

(E.2)

There exists a constant

C_{2} > 0

such that, for any integer

k \geq 0

, one has

sup_{x \in S} \sum_{j \in J_{k}} {|e_{j} (x)|}^{2} \leq C_{2} | J_{k} | .

Remark 1.

Clearly, one can see that assumption (E.1)is satisfied under assumption (A.1)when taking

C_{1} = 1

, we may also refer to [47], Section 4, Example 2 and its applications for more details. [47,50] have presented three examples verifying the assumption (E.2)taking

sup_{x \in S} \sum_{j \in J_{k}} {|e_{j} (x)|}^{2} \leq 1,

see also [50] Theorem 3.2. Moreover, Ref. [33] have used both assumptions in the case of i.i.d and functional data.

Besov Space

Over the years, many researchers have addressed, from a statistical point of view, the following question: given an estimation method and a prescribed estimation rate for a given loss function, what is the maximal space over which this rate is achieved, for instance, see [27,51] and the references therein. We are interested in the estimation methods based on wavelet bases’ thresholding procedures in a natural setting. It is well known that wavelet bases provide characterizations of smoothness spaces such as the Hölder spaces

C^{s}

, Sobolev spaces

W^{s} (L_{p})

, and Besov spaces

B_{q}^{s} (L_{p})

for a range of indices s that depend both on the smoothness properties of

ψ

and its dual function

\tilde{ψ}

, for instance, we refer to [51] for more detail and examples, at this point, we may refer to [52]. From a statistical point of view, the following definition is used in approximation theory for the study of nonlinear procedures such as thresholding and greedy algorithms, for instance we refer to [27,49,51,53].

Definition 1

(Besov space). Let

s > 0

. We say that the function

f \in H

, defined by (7), belongs to the Besov space

B_{\infty}^{s} (H)

if and only if:

sup_{m \geq 0} | J_{m} |^{2 s} \sum_{k \geq m} \sum_{ℓ \in J_{k}} {| β_{k, ℓ} |}^{2} < \infty .

(9)

Definition 2

(Weak Besov space). Let

r > 0

. We say that the function

f \in H

, defined by (7), belongs to the weak Besov space

W^{r} (H)

if and only if:

sup_{λ \geq 0} λ^{r} \sum_{k \geq 0} \sum_{ℓ \in J_{k}} 𝟙_{\{| β_{k, ℓ} | \geq λ\}} < \infty .

(10)

3. Problem Definition of the Density Estimation

Let

{X_{i}, Y_{i}}_{i \geq 1}

be a sequence strictly stationary ergodic pairs of random elements where

Y_{i}

is a real or complex-valued variable and

X_{i}

takes values in a complete separable metric space of Hilbert space

S

associated with the corresponding Borel

σ

-algebra

B

. Let

P_{X}

be the probability measure induced by

X_{1}

on

(S, B)

. Suppose that there exists

σ

-finite measure

ν

on the measurable space

(S, B)

such that

P_{X}

is dominated by

ν

. The Radon–Nikodym theorem ensures the existence of a non-negative measurable function

f (\cdot)

such that

P_{X} (B) = \int_{B} f (x) ν (d x), B \in B .

(11)

In this context, we aim to estimate

f (\cdot)

based on n observed functional data

X_{1}, \dots, X_{n}

– Examples of such random elements

X_{1}

are stochastic processes with continuous sample paths on a finite interval

[a, b]

with

S = C [a, b]

associated with supremum norm and processes with square integrable sample paths on the real line when

S = L_{2} (R)

. We suppose that

f \in H

, where

H

is a separable Hilbert space of real or complex-valued functions defined on

S

and square-integrable with respect to the

σ

-finite measure

ν

. In this paper, we are particularly interested in the wavelet estimation procedures developed in the 1990s; see Meyer’s work for the functional data of a Hilbert space, more precisely, the nonlinear estimators. The majority of the approaches carried out in this model consist in introducing kernel estimators techniques to estimate the model’s functional part, refer to [54]. Let

f (\cdot)

be the common density function of the sample

X_{1}, \dots, X_{n}

, which is assumed to be

(F.1): $\exists C_{f} > 0$ a known constant such that

$sup_{x \in S} f (x) \leq C_{f} .$

(12)

3.1. Density Function Estimator

From now, we assume that the density function

f (\cdot) \in H

, a separable Hilbert space. Then,

f (\cdot)

fulfills the wavelet representation (7). Suppose that we observe a sequence

{\{(X_{i}, Y_{i})\}}_{i = 1}^{n}

of copies of

(X, Y)

that is assumed to be functional stationary and ergodic with

X

admitting the density function

f (\cdot)

. We study density estimation through wavelet bases for Hilbert spaces of functions developed by [47]. We consider the estimates of the coefficients

{α_{k, ℓ}}

and

{β_{k, ℓ}}

given, respectively, by (14) and (15). For any

j_{0} \leq m

. Here, the resolution level

m = m (n) \to \infty

at a rate specified below, since we assume that

ϕ (\cdot)

and

ψ_{i} (\cdot)

have a compact support so that the summations in (7) are finite for each fixed

x

(note that in this case the support of

ϕ (\cdot)

and

ψ_{i} (\cdot)

is a monotonically increasing function of their degree of differentiability [24]). We focus our attention on the nonlinear estimators (13) which will be studied in the mean integrated squared error over adapted decomposition spaces, in a similar way as in [33] in the setting of the i.i.d functional processes. The density wavelet hard thresholding estimator

\hat{f} (\cdot)

is defined, for all

x \in S

, by

{\hat{f}}_{n} (x) = \sum_{ℓ \in I_{0}} {\hat{α}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{n}} \sum_{ℓ \in J_{k}} {\hat{β}}_{k, ℓ} 𝟙_{\{| {\hat{β}}_{k, ℓ} | \geq κ \sqrt{\frac{ln n}{n}}\}} ψ_{k} (x; η_{k; ℓ}),

(13)

where

\begin{matrix} {\hat{α}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} ϕ_{k} (X_{i}; ζ_{k, ℓ}), \end{matrix}

(14)

\begin{matrix} {\hat{β}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} ψ_{k} (X_{i}; η_{k, ℓ}) . \end{matrix}

(15)

Here,

κ

is a large enough constant and

m_{n}

is the integer satisfying

\frac{1}{2} \frac{n}{ln n} \leq | J_{m_{n}} | \leq \frac{n}{ln n} .

(16)

3.2. Estimation Procedure Steps

Our estimation method is divided into three steps:

1.: Estimation of the wavelet coefficients $α_{k, ℓ}$ and $β_{k, ℓ}$ , see (8), by the estimators ${\hat{α}}_{k, ℓ}$ and ${\hat{β}}_{k, ℓ}$ defined by Equations (14) and (15);
2.: Applying the hard thresholding to select the greatest ${\hat{β}}_{k, ℓ}$ ;
3.: Reconstructing the selected elements of the initial wavelet basis.

It is important to note that our choice is the universal threshold

κ {(\frac{ln n}{n})}^{1 / 2}

and the definition of

m_{n}

is based on theoretical considerations. The considered estimator does not depend on the smoothness of

f (\cdot)

; we may refer the reader to [50] for more details in the case of the linear wavelet estimator of

f (\cdot)

. Furthermore, for more details on the case of

H = L ([a, b])

and more standard nonparametric models, see [27,55]. To state the results, we need some notation. Throughout the paper, we will denote by

F_{i}

the

σ -

field generated by

{X_{j} : 0 \leq j \leq i}

and

G_{i}

the

σ -

field generated by

{(X_{j}, Y_{j}) : X_{i + 1}, 0 \leq j \leq i}

. Let

B \in B,

be an open set of the Borel

σ -

algebra

B

. For any

i = 1, \dots, n

define

f_{X_{i}}^{F_{i - 1}} (\cdot)

as the conditional density of

X_{i}

given the

σ -

field

F_{i - 1}

. Define

F_{X_{i}} = P (X_{i} \in B) = P_{X} (B), see (11),

and

F_{X_{i}}^{F_{i - 1}} = P (X_{i} \in B | F_{i - 1})

as the distribution function and the conditional distribution function, given the

σ -

field

F_{i - 1}

, respectively. The following assumptions will be needed throughout the paper.

(C.0): There is a non-negative measurable function $f^{F_{i - 1}}$ such that

$P_{X}^{F_{i - 1}} (B) = \int_{B} f^{F_{i - 1}} (x) ν (d x), B \in B .$

(17)
(C.1): For any $x \in S$

$lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x) = f (x), in the a . s . and L^{2} sense .$

At this point, we may refer to [56] for further details.

Theorem 1.

Under the conditions (C.0), (C.1), (F.1), (E.1)and (E.2), and (16) for any

θ \in (0, 1)

,

f \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H),

there exists a constant

C_{1} > 0

in such a way that

E ({∥\hat{f} (x) - f (x)∥}^{2}) \leq C_{1} {(\frac{ln n}{n})}^{θ},

(18)

for large enough n.

A direct consequence is the following upper bound result: for

s > 0

, if

f \in B_{\infty}^{s / (2 s + 1)} (H) \cap W^{2 / (2 s + 1)} (H),

then there exists a constant

C_{2} > 0

such that

E (∥ \hat{f} {- f ∥}^{2}) \leq C_{2} {(\frac{ln n}{n})}^{2 s / (2 s + 1)} .

This rate of convergence corresponds to the near optimal one in the “standard” minimax setting (see, e.g., [27]). Moreover, applying [[49], Theorem 3.2], one can see that

B_{\infty}^{θ / 2} (H) \cap

W^{2 (1 - θ)} (H)

is the “maxiset” associated to

\hat{f} (\cdot)

at the rate of convergence

{(ln n / n)}^{θ}

, i.e.,

lim_{n \to \infty} {(\frac{n}{ln n})}^{θ} E (∥ \hat{f} {- f ∥}^{2}) < \infty \Leftrightarrow f \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H) .

4. Problem Definition of the Regression Estimation

For a measurable function

ρ : R^{q} \to R

, we define the regression function

m (\cdot, ρ)

by

ρ (Y) = m (X, ρ) + ϵ,

(19)

where

ϵ

is a random variable independent of

X

with

N (0, 1)

. We suppose that

m (\cdot, ρ) \in H

, where

H

is a separable Hilbert space of real or complex-valued functions defined on

S

and square integrable with respect to the

σ

-finite measure

ν

. We shall suppose that there exist a known constant and

C_{m} > 0

such that

sup_{x \in S} m (x, ρ) \leq C_{m} .

(20)

In this context, we redefine the probability measure

P_{X}

in (11) and suppose that

f (\cdot)

is a non-negative measurable known function.

(M.1): We shall suppose that there exist two known constant $C_{m} > 0$ such that

$sup_{x \in S} m (x; ρ) \leq C_{m} .$
(M.2): We shall suppose that there exist two known constant $c_{f} > 0$ such that

$inf_{x \in S} f (x) \geq c_{f} .$

Regression Function Estimator

In this context, we aim to estimate

m (\cdot, ρ)

based on n observed functional data

(X_{1}, Y_{1}), \dots, (X_{n}, Y_{n})

. The kernel estimator of the regression function for functional data was proposed by [57]

{\hat{m}}_{n; h_{n}} (x, ρ) : = \frac{\sum_{i = 1}^{n} ρ (Y_{i}) K (d (x, X_{i}) / h_{n})}{\sum_{i = 1}^{n} K (d (x, X_{i}) / h_{n})} .

As combined with the work of [55], we define the wavelet hard thresholding estimator

\hat{m} (\cdot, ρ)

, for all

x \in S

, by

\hat{m} (x, ρ) = \sum_{ℓ \in I_{0}} {\hat{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{n}} \sum_{ℓ \in J_{k}} {\hat{θ}}_{k, ℓ} 𝟙_{\{| {\hat{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln n}{n}}\}} ψ_{k} (x; η_{k; ℓ}),

(21)

where

\begin{matrix} {\hat{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{ρ (Y_{i})}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}), \end{matrix}

(22)

\begin{matrix} {\hat{θ}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) . \end{matrix}

(23)

where

κ

is large enough constant and

m_{n}

is the integer satisfying

\frac{1}{2} \frac{n}{{(ln n)}^{2}} \leq | J_{m_{n}} | \leq \frac{n}{{(ln n)}^{2}} .

(24)

Theorem 2.

Under the conditions (E.1), (E.2) (M.1)–(M.2), (C.0)and (C.1), combined with the assumption (24), for any

θ \in (0, 1)

,

m (\cdot, ρ) \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H)

, there exists a constant

C_{3} > 0

in such a way that

E ({∥\hat{m} (\cdot, ρ) - m (\cdot; ρ)∥}^{2}) \leq C {(\frac{ln n}{n})}^{θ},

(25)

for large enough n.

Suppose that

m (\cdot, ρ)

and f satisfy (M.1) and, for any

θ \in (0, 1), m (\cdot, ρ) \in B_{\infty}^{θ / 2} (H) \cap W^{2 (1 - θ)} (H)

, where

B_{\infty}^{θ / 2} (H)

is in (Definition 1) with

s = θ / 2

and

W^{2 (1 - θ)} (H)

is in (Definition 2) with

r = 2 (1 - θ)

. Then, there exists a constant

C_{3} > 0

such that

E ({∥\hat{m} (\cdot, ρ) - m (\cdot; ρ)∥}^{2}) \leq C_{3} {(\frac{{(ln n)}^{2}}{n})}^{θ}

for n large enough. Again, note that, for

s > 0

, if

m (\cdot, ρ) \in B_{\infty}^{s / (2 s + 1)} (H) \cap W^{2 / (2 s + 1)} (H),

then there exists a constant

C_{4} > 0

such that

E ({∥\hat{m} (\cdot, ρ) - m (\cdot; ρ)∥}^{2}) \leq C_{4} {(\frac{{(ln n)}^{2}}{n})}^{2 s / (2 s + 1)} .

Up to an additional logarithmic term, this rate of convergence corresponds to the near-optimal one in the “standard” minimax setting (see, for example, [27]). Theorem 2 is the first to investigate an adaptive wavelet-based estimator for functional data in the context of nonparametric regression for ergodic processes.

Since the coefficients defined by (22) and (23) depend on the unknown function

f (\cdot)

, one can use

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{ρ (Y_{i})}{\hat{f} (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}), \end{matrix}

(26)

\begin{matrix} {\tilde{θ}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{ρ (Y_{i})}{\hat{f} (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) . \end{matrix}

(27)

We define the wavelet hard thresholding estimator

\tilde{m} (\cdot, ρ)

, for all

x \in S

, by

\tilde{m} (x, ρ) = \sum_{ℓ \in I_{0}} {\tilde{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{n}} \sum_{ℓ \in J_{k}} {\tilde{θ}}_{k, ℓ} 𝟙_{\{| {\tilde{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln n}{n}}\}} ψ_{k} (x; η_{k; ℓ}) .

(28)

Recall the following elementary observation

\frac{1}{\hat{f} (\cdot)} = \frac{1}{f (\cdot)} + \frac{(f (\cdot) - \hat{f} (\cdot))}{f (\cdot) \hat{f} (\cdot)} .

From the last equation, we can infer the following:

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & {\hat{η}}_{k, ℓ} + \frac{1}{n} \sum_{i = 1}^{n} \frac{(f (X_{i}) - \hat{f} (X_{i}))}{f (X_{i}) \hat{f} (X_{i})} ρ (Y_{i}) ϕ_{k} (X_{i}; ζ_{k, ℓ}), \\ {\tilde{θ}}_{k, ℓ} & = & {\hat{θ}}_{k, ℓ} + \frac{1}{n} \sum_{i = 1}^{n} \frac{(f (X_{i}) - \hat{f} (X_{i}))}{f (X_{i}) \hat{f} (X_{i})} ρ (Y_{i}) ψ_{k} (X_{i}; η_{k, ℓ}) . \end{matrix}

By combining Theorem 1 with Theorem 2, we obtain the following corollary.

Corollary 1.

Under the conditions of Theorems 1 and 2, there exists a constant

C_{5} > 0

such that

E ({∥\tilde{m} (x; ρ) - m (x; ρ)∥}^{2}) \leq C_{5} {(\frac{ln n}{n})}^{θ},

(29)

for large enough n.

Remark 2.

In our previous paper [40], we were concerned with the nonparametric estimation of the density and the regression function in a finite-dimensional setting using orthonormal wavelet bases. Our findings differ significantly from those presented in the present paper. In [40], we provided the strong uniform consistency properties with rates of these estimators, over compact subsets of

R^{d}

, under a general ergodic condition on the underlying processes. We also establish the asymptotic normality of wavelet-based estimators. We used the Burkholder–Rosenthal inequality as the main ingredient in this paper, which is a more complicated tool than the exponential inequality used in the previous paper. More importantly, in the present paper, we look into the mean integrated square error over compact subsets, which is entirely different from the results of the previous paper.

5. Applications

5.1. The Conditional Distribution

Our result can be used to investigate the conditional distribution

F (y ∣ x)

for

y \in R^{d}

. To be more precise, let

ρ (y) = 𝟙 {y \leq t}

. We define the wavelet hard thresholding estimator

\overset{˘}{m} (\cdot, ρ)

, for all

x \in S

, by

\hat{F} (y ∣ x) = \sum_{ℓ \in I_{0}} {\overset{˘}{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{n}} \sum_{ℓ \in J_{k}} {\overset{˘}{θ}}_{k, ℓ} 𝟙_{\{| {\overset{˘}{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln n}{n}}\}} ψ_{k} (x; η_{k; ℓ}),

(30)

where

\begin{matrix} {\overset{˘}{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{𝟙 {Y_{i} \leq t}}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}), \end{matrix}

(31)

\begin{matrix} {\overset{˘}{θ}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{𝟙 {Y_{i} \leq t}}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) . \end{matrix}

(32)

A direct consequence of Theorem 2 is

E ({∥\hat{F} (y ∣ x) - F (y ∣ x)∥}^{2}) \leq C {(\frac{ln n}{n})}^{θ} .

(33)

5.2. The Curve Discrimination

We can state the curve discrimination problem in the following way. Let

{\{X_{i}\}}_{i = 1, \dots, n}

be a sample of curves, and each of them is known to belong to one among

G

groups

ι = 1, \dots, G

. Let us denote by

T_{i}

the group of the curve

X_{i}

. Assume that each pair of variables

(X_{i}, T_{i})

has the same distribution as the pair

(X, T)

. Given a new curve

x

, the question is to know its class membership, and for that, we will estimate, for any

ι = 1, \dots, G

, the conditional probability:

p_{ι} (x) = P (T = ι ∣ X = x) .

Following the idea proposed in [58,59] permitting the estimation of these probabilities by

{\hat{p}}_{ι} (x) = \sum_{ℓ \in I_{0}} {\overset{˘}{\overset{˘}{η}}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{n}} \sum_{ℓ \in J_{k}} {\overset{˘}{\overset{˘}{θ}}}_{k, ℓ} 𝟙_{\{| {\overset{˘}{\overset{˘}{θ}}}_{k, ℓ} | \geq κ \sqrt{\frac{ln n}{n}}\}} ψ_{k} (x; η_{k; ℓ}),

(34)

where

\begin{matrix} {\overset{˘}{\overset{˘}{η}}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{𝟙 {T_{i} =_{ι}}}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}), \end{matrix}

(35)

\begin{matrix} {\overset{˘}{\overset{˘}{θ}}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{𝟙 {T_{i} =_{ι}}}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) . \end{matrix}

(36)

A direct consequence of Theorem 2 is

E ({∥\hat{F} (y ∣ x) - F (y ∣ x)∥}^{2}) \leq C {(\frac{ln n}{n})}^{θ} .

(37)

As remarked by [58,59], for each

ι

, we make use of the notation

Y = \{\begin{matrix} 1 & if T = ι \\ 0 & otherwise, \end{matrix}

then, we can write

p_{ι} (x) = E (Y ∣ X = x) .

An application of Theorem 2 is

E ({∥{\hat{p}}_{ι} (x) - p_{ι} (x∥}^{2}) \leq C {(\frac{ln n}{n})}^{θ} .

5.3. Time Series Prediction from Continuous Set of Past Values

A direct consequence of our results is the prediction of future values of some real time series, which we will follow from [59]. One of the purposes of the proposed functional method is to predict the future from a continuous set of past values of the process. Let

{Z (t)}_{t \in R}

denote a real-valued process and s denote a fixed non-negative real number. We are interested in the prediction’s problem of a future value

Z (τ)

given some past values

Z (t)

for

τ - T \leq t < τ

, at some time

τ > 0 .

Our goal can be seen as the estimation of the operator r:

Z (τ + s) = r (Z (t) for τ - T \leq t < τ) + ε,

whenever it exists. Let us describe the model. Suppose that the process has been observed from

t = 0

until

t = t_{max}

and without loss of generality, assume that

t_{max} = n T + s < τ .

The methodology consists of splitting the observed process into n pieces of fixed length. Each piece of the process is denoted by

X_{i} = {Z (t), (i - 1) T \leq t < i T} .

Let us denote the response value

Y_{i} = Z (i T + s)

. This can be formulated by a regression problem in the following way:

Y_{i} = m (X_{i}, I d) + ε_{i} for i = 1, \dots, n,

where

I d

denotes the identity function. To fit the theoretical setting of the present paper, we assume that such a function

m (\cdot, I d)

does not depend on i; this is the case for the stationary processes. Hence, at time

τ

, we can use for predicting the value at time

τ + s

the following predictor, which is directly derived from (21):

\hat{m} (x, I d) = \sum_{ℓ \in I_{0}} {\hat{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{n}} \sum_{ℓ \in J_{k}} {\hat{θ}}_{k, ℓ} 𝟙_{\{| {\hat{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln n}{n}}\}} ψ_{k} (x; η_{k; ℓ}),

(38)

where

\begin{matrix} {\hat{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{Z (i T + s)}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}), \end{matrix}

(39)

\begin{matrix} {\hat{θ}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{Z (i T + s)}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}), \end{matrix}

(40)

where

x = {Z (t)

, for

τ - T \leq t < τ}

. Our Theorem 2 gives mathematical support to this nonparametric functional predictor, and provides a different way of solving the prediction problem investigated in [3,59].

6. Concluding Remarks

In this work, we have investigated the nonparametric estimation of the density and the regression function based on the functional stationary processes, using wavelet bases for Hilbert spaces of functions. We have characterized the mean integrated square error over compact subsets. The asymptotic properties of these estimators are obtained employing the Martingale approach, which is completely different from the mixing and the independent setting. The assumption on the dependence of the process is ergodicity. To motivate the present paper, we have presented the conditional distribution, the curve discrimination and the time series prediction from a continuous set of past values. Extending the nonparametric functional ideas to the local stationary process is a somewhat underdeveloped field. It would be interesting to extend our work to the case of the functional local stationary process, which requires nontrivial mathematics; this would go well beyond the scope of the present paper.

7. Proofs

7.1. Proof of Theorem 1

In this paper, we need an upper bound inequality for partial sums of unbounded martingale differences that we use to derive the asymptotic results for the density and the regression functions estimates built upon functional strictly stationary and ergodic data. Here and in the sequel, we denote by “C” a positive constant that may be different from line to line. This inequality is given in the following lemmas. This lemma is stated following Notation 1 in [60].

Lemma 1

(Burkholder-Rosenthal inequality). Let

{(X_{i})}_{i \geq 1}

be a stationary Martingale adapted to the filtration

{(F_{i})}_{i \geq 1}

, define

{(d_{i})}_{i \geq 1}

is the sequence of Martingale differences adapted to

{(F_{i})}_{i \geq 1}

and

S_{n} = \sum_{i = 1}^{n} d_{i} .

Then, for any positive integer n,

∥ max_{1 \leq j \leq n} | S_{j} {|) ∥}_{p} ≪ n^{1 / p} {∥ d_{1} ∥}_{p} + {∥\sum_{k = 1}^{n} E (d_{k}^{2} / F_{k - 1})∥}_{p / 2}^{1 / 2}, f o r a n y p \geq 2;

(41)

where, as usual, the norm

{∥ \cdot ∥}_{p} = {(E [| \cdot |^{p}])}^{1 / p}

.

Lemma 2

([61]). Let

{Z_{i}, i \geq 1}

be a sequence of Martingale differences such that

| Z_{i} | \leq B, a . s .,

then, for all

ϵ > 0

and all sufficiently large n, we have

P \{|\sum_{i = 1}^{n} Z_{i}| > ϵ\} \leq 2 exp \{- \frac{ϵ^{2}}{2 n B^{2}}\} .

The following lemmas describe the asymptotic behavior of the estimators

{\hat{α}}_{k, ℓ}

and

{\hat{β}}_{k, ℓ}

.

Lemma 3.

For any

k \in {0, \dots, m_{n}}

and any

ℓ \in I_{k}

, under assumptions (C.0), (C.1), (F.1)and (E.1)(i), there exists a constant

C > 0

such that

E ({|{\hat{α}}_{k, ℓ} - α_{k, ℓ}|}^{2}) \leq C (\frac{ln n}{n}) .

(42)

Lemma 4.

For any

k \in {0, \dots, m_{n}}

and any

ℓ \in J_{k}

, and under assumptions (C.0), (C.1), (F.1), (E.1)and (E.2), and condition (16), there exists a constant

C > 0

such that

E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) = C {(\frac{ln n}{n})}^{2}, a . s .

(43)

Lemma 5.

For any

k \in {0, \dots, m_{n}}

and any

ℓ \in J_{k}

, for

κ > 0

large enough and under assumptions (C.0), (C.1), (F.1), (E.1)and (E.2), and condition (16), there exists a constant

C > 0

such that

P (|{\hat{β}}_{k, ℓ} - β_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln n}{n}}) \leq C {(\frac{ln n}{n})}^{2} .

(44)

7.1.1. Proof of Theorem 1

Observe that the proof of Theorem 1 is a direct application of ([49], Theorem 3.1) with

c (n) = {(ln n / n)}^{1 / 2}

,

σ_{i} = 1

,

r = 2

and the Lemmas 3–5. We adapted and extended the method of demonstration of [33], Theorem 3.1 to the stationary ergodic process. □

7.1.2. Proof of Lemmas

Proof of Lemma 3

Consider the following decomposition

\begin{matrix} {\hat{α}}_{k, ℓ} - α_{k, ℓ} & = & {\hat{α}}_{k, ℓ} - {\tilde{α}}_{k, ℓ} + {\tilde{α}}_{k, ℓ} - α_{k, ℓ} \\ = & A_{k, ℓ, 1} + A_{k, ℓ, 2}, \end{matrix}

(45)

where

{\tilde{α}}_{k, ℓ} = \frac{1}{n} \sum_{i = 1}^{n} E [ϕ_{k} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}] .

Under the assumptions (C.0) and (C.2), we have

\begin{matrix} {\tilde{α}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) f^{F_{i - 1}} (x) ν (d x) \\ = & \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x)) ν (d x) \\ = & \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) (f (x) + o (1)) ν (d x) \\ = & \int_{S} ϕ_{k} (x; ζ_{k, ℓ}) f (x) ν (d x) + o (1) \\ = & α_{k, ℓ} + o (1) . \end{matrix}

We readily obtain

{\tilde{α}}_{k, ℓ} = α_{k, ℓ}, as, n \to \infty,

(46)

implying that

A_{k, ℓ, 2} = o (1), a . s .

(47)

Therefore, we infer that

\begin{matrix} {\hat{α}}_{k, ℓ} - α_{k, ℓ} & = & A_{k, ℓ, 1} + o (1), a . s . \end{matrix}

Let us now consider the term

A_{k, ℓ, 1}

. We have

\begin{matrix} A_{k, ℓ, 1} & = & {\hat{α}}_{k, ℓ} - {\tilde{α}}_{k, ℓ} \\ = & \frac{1}{n} \sum_{i = 1}^{n} (ϕ_{k} (x_{i}; ζ_{k, ℓ}) - E [ϕ_{k} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}]) \\ = & \frac{1}{n} \sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ}) . \end{matrix}

Notice that

{(Φ_{k} (x_{i}; ζ_{k, ℓ}))}_{0 \leq k \leq m_{n}}

is a sequence of Martingale differences with respect to the sequence of

σ -

fields

{(F_{i})}_{0 \leq k \leq m_{n}}

. It is obvious, by Lemma 1 inequality, to see that

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ})|}^{2}] . \end{matrix}

Applying the Burkholder–Rosenthal inequality (1), for

p = 2

, we obtain

\begin{array}{l} {(E [({|\sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ})|}^{2})])}^{\frac{1}{2}} \\ \leq n^{1 / 2} ‖ Φ_{k} (x_{1}; ζ_{k, ℓ}) ‖_{2} + {‖ \sum_{i = 1}^{n} E [Φ_{k}^{2} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}] ‖}_{1}^{1 / 2} \\ = Φ_{(1)} + Φ_{(2)} \end{array}

(48)

On one hand, using a very famous decomposition combined with the fact that

F_{0}

is the trivial

σ -

field, we obtain

\begin{matrix} \frac{1}{n} Φ_{(1)}^{2} & = & ‖ Φ_{k} (x_{1}; ζ_{k, ℓ}) ‖_{2}^{2} \\ = & E [{|ϕ_{k} ((x_{1}; ζ_{k, ℓ})) - E [ϕ_{k} (x_{1}; ζ_{k, ℓ}) | F_{0}]|}^{2}] \\ \leq & E [\sum_{j = 0}^{2} ∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣^{j} {(E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣])}^{2 - j}] \\ = & \sum_{j = 0}^{2} C_{2}^{j} E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣^{j}] . {(E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣])}^{2 - j} \\ = & C_{2}^{2} E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣^{2}] + C_{2}^{1} {(E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣])}^{2} + C_{2}^{0} E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣^{2}], \end{matrix}

(49)

By the Cauchy–Schwarz inequality, together with assumptions (E.1)(i), (E.2) and the condition (16), we obtain

\begin{matrix} sup_{x \in S} |ϕ_{k} (x_{i}; ζ_{k, ℓ})| & \leq & sup_{x \in S} \sum_{j \in J_{k}} \frac{1}{\sqrt{g_{j, k, ℓ}}} | e_{j} (ζ_{k, ℓ}) | | e_{j} (x) | \\ \leq & {(\sum_{j \in J_{k}} \frac{1}{g_{j, k, ℓ}} {| e_{j} (ζ_{k, ℓ}) |}^{2})}^{1 / 2} {(sup_{x \in S} \sum_{j \in J_{k}} {| e_{j} (x) |}^{2})}^{1 / 2} \\ \leq & C_{1}^{1 / 2} C_{2}^{1 / 2} \sqrt{| J_{k} |} \\ \leq & C_{3} \sqrt{| J_{m_{n}} |} \\ \leq & C_{3} \sqrt{\frac{n}{ln n}} . \end{matrix}

(50)

Observe that, under the assumptions (F.1) and (E.1)(i) and the fact that

E

is an orthonormal basis of H, we have

\begin{matrix} E [{|ϕ_{k} (x_{1}; ζ_{k, ℓ})|}^{2}] & = & \int_{S} {|ϕ_{k} (x; ζ_{k, ℓ})|}^{2} f (x) ν (d x) \\ \leq & C_{f} \int_{S} {|ϕ_{k} (x; ζ_{k, ℓ})|}^{2} ν (d x) \\ = & C_{f} \int_{S} {|\sum_{j \in I_{k}} \frac{1}{\sqrt{g_{j, k, ℓ}}} e_{j} (ζ_{k, ℓ}) e_{j} (x)|}^{2} ν (d x) \\ = & C_{f} \int_{S} \sum_{j \in I_{k}} \frac{1}{g_{j, k, ℓ}} {|e_{j} (ζ_{k, ℓ})|}^{2} ν (d x) \\ \leq & C_{f} C_{1} . \end{matrix}

(51)

where

C_{1}

is a positive constant,

\begin{matrix} E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣] & = & O (\sqrt{\frac{n}{ln n}}), \end{matrix}

(52)

\begin{matrix} E [∣ ϕ_{k} (x_{1}; ζ_{k, ℓ}) ∣^{2}] & = & O (1) \end{matrix}

(53)

therefore,

\begin{matrix} Φ_{(1)} = O (n^{1 / 2}) . \end{matrix}

(54)

On the other hand, we consider the second term of decomposition (48), observe that

\begin{matrix} Φ_{2} & = & {(E (\sum_{i = 1}^{n} E [Φ_{k}^{2} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}]))}^{1 / 2} \\ = & {(\sum_{i = 1}^{n} E (E [Φ_{k}^{2} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}]))}^{1 / 2} \\ = & {(\sum_{i = 1}^{n} E [Φ_{k}^{2} (x_{i}; ζ_{k, ℓ})])}^{1 / 2} \end{matrix}

using the notable identity, we obtain

\begin{matrix} E [Φ_{k}^{2} (x_{i}; ζ_{k, ℓ})] \\ = & E [{(|ϕ_{k} (x_{i}; ζ_{k, ℓ}) - E [ϕ_{k} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}]|)}^{2}] \\ \leq & E [∣ ϕ_{k} (x_{i}; ζ_{k, ℓ}) ∣^{2} + 2 ∣ ϕ_{k} (x_{i}; ζ_{k, ℓ}) ∣ E [∣ ϕ_{k} (x_{i}; ζ_{k, ℓ}) ∣ | F_{i - 1}] \\ + E [∣ ϕ_{k} (x_{i}; ζ_{k, ℓ}) ∣^{2} | F_{i - 1}]] \\ \leq & 2 E [∣ ϕ_{k} (x_{i}; ζ_{k, ℓ}) ∣^{2}] + 2 E [E [∣ ϕ_{k} (x_{i}; ζ_{k, ℓ}) ∣^{2} | F_{i - 1}]] \\ \leq & 4 E [∣ ϕ_{k} (x_{i}; ζ_{k, ℓ}) ∣^{2}] \end{matrix}

observe that, using (50) and (51), we obtain

\begin{matrix} Φ_{2} = O (n^{1 / 2}) . \end{matrix}

(55)

therefore, we combine (54) and (55) to obtain

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ})|}^{2}])}^{1 / 2} = O (n^{1 / 2}) . \end{matrix}

Hence,

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ})|}^{2}] \\ = & \frac{1}{n^{2}} O (n) \\ \leq & C (\frac{ln n}{n}) \end{matrix}

Therefore, there exists a constant

C = C_{f} C_{1} > 0

, such that

E ({|{\hat{α}}_{k, ℓ} - α_{k, ℓ}|}^{2}) \leq \frac{4 C}{n} \leq 4 C (\frac{ln n}{n}) .

(56)

Hence, the proof is complete. □

Proof of Lemma 4

Consider the following decomposition

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & {\hat{β}}_{k, ℓ} - {\tilde{β}}_{k, ℓ} + {\tilde{β}}_{k, ℓ} - β_{k, ℓ} \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

(57)

where

{\tilde{β}}_{k, ℓ} = \frac{1}{n} \sum_{i = 1}^{n} E [ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}] .

Observe that, under the assumptions (F.1) and (E.1)(i) and the fact that

E

is an orthonormal basis of H, and proceeding in a similar way as in (46), we show that

{\tilde{β}}_{k, ℓ} = β_{k, ℓ}, as, n \to \infty .

(58)

This, in turn, implies that

B_{k, ℓ, 2} = o (1), a . s .

(59)

Therefore, we obtain

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & B_{k, ℓ, 1} + o (1), a . s . \end{matrix}

Hence, we readily infer

\begin{matrix} E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) & = & \frac{1}{n^{4}} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}), \end{matrix}

(60)

where

Ψ_{i, k, ℓ} = ψ_{k} (x_{i}; η_{k, ℓ}) - E [ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}] .

Notice that

{(Ψ_{i, k, ℓ})}_{0 \leq k \leq n}

is a sequence of Martingale differences with respect to the sequence of

σ -

fields

{(F_{i})}_{0 \leq k \leq n}

, applying the Burkholder–Rosenthal inequality for

p = 4

(see Lemma 1), we obtain

\begin{matrix} {(E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}))}^{1 / 4} & \leq & {∥max_{1 \leq j \leq n} |\sum_{i = 1}^{j} Ψ_{i, k, ℓ}|∥}_{4} \\ ≪ & n^{1 / 4} {∥ Ψ_{1, k, ℓ} ∥}_{4} + {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} | F_{i - 1})∥}_{4 / 2}^{1 / 2} \\ = & Ψ_{k, ℓ}^{(1)} + Ψ_{k, ℓ}^{(2)} . \end{matrix}

(61)

Consider the first term of Equation (61). We have

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & = & ∥ Ψ_{1, k, ℓ} ∥_{4}^{4} \\ = & E ({|ψ_{k} (x_{1}; η_{k, ℓ}) - E [ψ_{k} (x_{1}; η_{k, ℓ}) | F_{0}]|}^{4}) \\ \leq & E [{(|ψ_{k} (x_{1}; η_{k, ℓ})| + E [|ψ_{k} (x_{1}; η_{k, ℓ})|])}^{4}] . \end{matrix}

Using the classical identity

{(a + b)}^{n} = \sum_{k = 0}^{n} C_{n}^{k} a^{k} b^{n - k}

in connection with the Jensen inequality and taking

n = 4

, we obtain

\begin{matrix} {(|ψ_{k} (x_{1}; η_{k, ℓ})| + E [|ψ_{k} (x_{1}; η_{k, ℓ})|])}^{4} \\ = \sum_{k = 0}^{4} C_{4}^{k} {|ψ_{k} (x_{1}; η_{k, ℓ})|}^{k} {(E [|ψ_{k} (x_{1}; η_{k, ℓ})|])}^{4 - k} \\ \leq \sum_{k = 0}^{4} C_{4}^{k} {|ψ_{k} (x_{1}; η_{k, ℓ})|}^{k} E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{4 - k}] . \end{matrix}

This gives that

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & \leq & E [\sum_{k = 0}^{4} C_{4}^{k} {|ψ_{k} (x_{1}; η_{k, ℓ})|}^{k} E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{4 - k}]] \\ = & \sum_{k = 0}^{4} C_{4}^{k} E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{k}] E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{4 - k}] . \end{matrix}

(62)

By proceeding in a similar way as in (51) and making use of the assumptions (F.1) and (E.1)(i), we infer that

E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{2}] \leq C,

(63)

where C is a positive constant. Moreover, by the Cauchy–Schwarz inequality together with assumptions (E.1)(ii), (E.2) and the condition (16), we obtain

\begin{matrix} sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| & \leq & sup_{x \in S} \sum_{j \in J_{k}} \frac{1}{\sqrt{h_{j, k, ℓ}}} | e_{j} (η_{k, ℓ}) | | e_{j} (x) | \\ \leq & {(\sum_{j \in J_{k}} \frac{1}{h_{j, k, ℓ}} {| e_{j} (η_{k, ℓ}) |}^{2})}^{1 / 2} {(sup_{x \in S} \sum_{j \in J_{k}} {| e_{j} (x) |}^{2})}^{1 / 2} \\ \leq & C_{1}^{1 / 2} C_{2}^{1 / 2} \sqrt{| J_{k} |} \\ \leq & C_{4} \sqrt{| J_{m_{n}} |} \\ \leq & C_{4} \sqrt{\frac{n}{ln n}} . \end{matrix}

(64)

We then obtain

\begin{matrix} E [|ψ_{k} (x_{1}; η_{k, ℓ})|] & = & O (\sqrt{\frac{n}{ln n}}), \end{matrix}

(65)

\begin{matrix} E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{2}] & = & O (1), \end{matrix}

(66)

\begin{matrix} E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{3}] & = & O (\sqrt{\frac{n}{ln n}}), \end{matrix}

(67)

\begin{matrix} E [{|ψ_{k} (x_{1}; η_{k, ℓ})|}^{4}] & = & O (\frac{n}{ln n}) . \end{matrix}

(68)

Observe that the largest term is (68); now, using (68) in Equation (62), we deduce that

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & = & O (\frac{n}{ln n}) . \end{matrix}

This implies that

Ψ_{k, ℓ}^{(1)} = O (\frac{n^{1 / 2}}{{(ln n)}^{1 / 4}}) .

(69)

Let us now investigate the upper bound of

Ψ_{k, ℓ}^{(2)}

in (61). Observe that

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} / F_{i - 1})∥}_{2}^{1 / 2} \\ = & {(E [{(\sum_{i = 1}^{n} E [Ψ_{i, k, ℓ}^{2} | F_{i - 1}])}^{2}])}^{1 / 4}, \end{matrix}

for all

i = 1, \dots, n

. Making use of the Jensen inequality with the fact that

{(a - b)}^{2} = a^{2} - 2 a b + b^{2}

, it follows that

\begin{matrix} \sum_{i = 1}^{n} E [Ψ_{i, k, ℓ}^{2} | F_{i - 1}] & = & \sum_{i = 1}^{n} (E [{(ψ_{k} (x_{i}; η_{k, ℓ}) - E [ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}])}^{2} | F_{i - 1}]) \\ \leq & 4 \sum_{i = 1}^{n} E [{(ψ_{k} (x_{i}; η_{k, ℓ}))}^{2} | F_{i - 1}] . \end{matrix}

Observe, under the assumptions (F.1), (E.1)(i), (C.0) and (C.1) and (63):

\begin{matrix} \sum_{i = 1}^{n} E [{(ψ_{k} (x_{i}; η_{k, ℓ}))}^{2} | F_{i - 1}] & = & n \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x)) ν (d x) \\ = & n \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} (f (x) + o (1)) ν (d x) \\ \leq & n (C_{f} + o (1)) \int_{S} {|ψ_{k} (x; ζ_{k, ℓ})|}^{2} ν (d x) \\ \leq & n C_{f} C_{1} . \end{matrix}

(70)

It follows that

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & O (n^{1 / 2}) . \end{matrix}

(71)

Combining (61), (69) and (71), we obtain

\begin{matrix} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}) & = & O (\frac{n^{2}}{ln n}) + O (n^{2}) . \end{matrix}

We conclude that

\begin{matrix} E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) & = & O (\frac{1}{n^{2} ln n}) + O (\frac{1}{n^{2}}) . \end{matrix}

(72)

This implies that there exists a constant

C > 0

, such that

\begin{matrix} E ({|{\hat{β}}_{k, ℓ} - β_{k, ℓ}|}^{4}) & \leq & C {(\frac{ln n}{n})}^{2} . \end{matrix}

(73)

The proof is achieved. □

Proof of Lemma 5

Consider the previous decomposition in Lemma 4, to write that

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & ({\hat{β}}_{k, ℓ} - {\tilde{β}}_{k, ℓ}) + ({\tilde{β}}_{k, ℓ} - β_{k, ℓ}) \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

where

\begin{matrix} B_{k, ℓ, 1} & = & \frac{1}{n} \sum_{i = 1}^{n} Ψ_{i, k, ℓ} = \frac{1}{n} \sum_{i = 1}^{n} (ψ_{k} (x_{i}; η_{k, ℓ}) - E [ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}]), \\ B_{k, ℓ, 2} & = & \frac{1}{n} \sum_{i = 1}^{n} E [ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}] - β_{k, ℓ} . \end{matrix}

By using (59), we obtain the desired result for the term

B_{k, ℓ, 2}

\begin{matrix} {\hat{β}}_{k, ℓ} - β_{k, ℓ} & = & B_{k, ℓ, 1} + o (1) . \end{matrix}

Now, observe that

\begin{matrix} P (|\frac{1}{n} \sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln n}{n}}) & = & P (|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{n ln n}) . \end{matrix}

An application of Lemma 2 implies that

\begin{matrix} |Ψ_{i, k, ℓ}| & = & |ψ_{k} (x_{i}; η_{k, ℓ}) - E [ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}]| \\ \leq & 2 sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| \\ \leq & C_{3} \sqrt{\frac{n}{ln n}} \\ \leq & C_{3} \sqrt{n} . \end{matrix}

(74)

Let

B = C_{3} \sqrt{n}

. Then, for all

ϵ_{n} = \frac{κ}{2} \sqrt{n ln n}

where n is sufficiently large, we have

\begin{matrix} P (|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{n ln n}) & \leq & 2 exp \{- \frac{ϵ_{n}^{2}}{2 n B^{2}}\} \\ = & 2 exp \{- \frac{{(\frac{κ}{2} \sqrt{n ln n})}^{2}}{2 n {(C_{3} \sqrt{n})}^{2}}\} \\ = & 2 exp \{- \frac{κ^{2} ln n}{8 C_{3}^{2} n}\} \\ = & 2 exp \{ln n^{- \frac{κ^{2}}{8 C_{3}^{2} n}}\} \\ = & 2 n^{- w (κ, n)}, \end{matrix}

(75)

where

w (κ, n) = \frac{κ^{2}}{8 C_{3}^{2} n} .

By choosing

κ

such that

w (n, κ) = 2,

we have

\begin{matrix} P (|{\hat{β}}_{k, ℓ} - β_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln n}{n}}) & \leq & C \frac{1}{n^{2}} + o (1) \\ \leq & C {(\frac{ln n}{n})}^{2} . \end{matrix}

(76)

The proof of (44) is achieved. □

7.1.3. Proof of Theorem 2

Recall that

\hat{m} (x, ρ) = \sum_{ℓ \in I_{0}} {\hat{η}}_{0, ℓ} ϕ_{0} (x; ζ_{0, ℓ}) + \sum_{k = 0}^{m_{n}} \sum_{ℓ \in J_{k}} {\hat{θ}}_{k, ℓ} 𝟙_{\{| {\hat{θ}}_{k, ℓ} | \geq κ \sqrt{\frac{ln n}{n}}\}} ψ_{k} (x; η_{k; ℓ}),

where

\begin{matrix} {\hat{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \frac{ρ (Y_{i})}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}), {\hat{θ}}_{k, ℓ} = \frac{1}{n} \sum_{i = 1}^{n} \frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) . \end{matrix}

Lemma 6.

For any

k \in {0, \dots, m_{n}}

and any

ℓ \in I_{k}

and under the assumptions (E.1)(i), (M.1)- (M.2), (C.0)and (C.1), there exists a constant

C > 0

such that

E ({|{\hat{η}}_{k, ℓ} - η_{k, ℓ}|}^{2}) \leq C (\frac{ln n}{n}) .

(77)

Lemma 7.

For any

k \in {0, \dots, m_{n}}

and any

ℓ \in J_{k}

, and under the assumptions (E.1), (E.2)(M.1)-(M.2), (C.0)and (C.1), combined with the condition (24), there exists a constant

C > 0

such that

E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) = C {(\frac{ln n}{n})}^{2}, a . s .

(78)

Lemma 8.

For any

k \in {0, \dots, m_{n}}

and any

ℓ \in J_{k}

, for

κ > 0

large enough, (E.1), (E.2) (M.1)–(M.2), (C.0)and (C.1), combined with the condition (24), there exists a constant

C > 0

such that

P (|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln n}{n}}) \leq C {(\frac{ln n}{n})}^{2} .

(79)

7.1.4. Proof of Theorem 2

Observe that the proof of Theorem 2 is a direct application of ([49], Theorem 3.1) with

c (n) = {(ln n / n)}^{1 / 2}

,

σ_{i} = 1

,

r = 2

and the following Lemmas 6–8. We extended the method of the proof in [33], Theorem 4.1. □

7.1.5. Proof of Lemmas

Proof of Lemma 6

Consider the following decomposition

\begin{matrix} {\hat{η}}_{k, ℓ} - η_{k, ℓ} & = & {\hat{η}}_{k, ℓ} - {\tilde{η}}_{k, ℓ} + {\tilde{η}}_{k, ℓ} - η_{k, ℓ} \\ = & A_{k, ℓ, 1} + A_{k, ℓ, 2}, \end{matrix}

(80)

where

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} E [\frac{ρ (Y_{i})}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] \\ = & \frac{1}{n} \sum_{i = 1}^{n} E [\frac{(m (X_{i}, ρ) + ϵ_{i})}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] \\ = & \frac{1}{n} \sum_{i = 1}^{n} E [\frac{m (X_{i}, ρ)}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] \\ + \frac{1}{n} \sum_{i = 1}^{n} E [\frac{ϵ_{i}}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] . \end{matrix}

From the independence between

ϵ_{i}

and

X_{i}

, we have

\begin{matrix} E [ϵ_{i} | G_{i - 1}] & = & E [ϵ_{i} | X_{i}] \\ = & E [ϵ_{i}] \\ = & 0 . \end{matrix}

(81)

Observe that

\begin{matrix} E [\frac{ϵ}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] & = & E [\frac{E [ϵ | G_{i - 1}]}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] \\ = & E [\frac{E [ϵ | X_{i}]}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] \\ = & E [\frac{E [ϵ]}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] \\ = & 0 . \end{matrix}

This implies that

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} E [\frac{ρ (Y_{i})}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] \\ = & \frac{1}{n} \sum_{i = 1}^{n} E [\frac{m (X_{i}, ρ)}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}] . \end{matrix}

Making use of the assumptions (M.1)–(M.2), (C.0) and (C.1), we have

\begin{matrix} {\tilde{η}}_{k, ℓ} & = & \frac{1}{n} \sum_{i = 1}^{n} \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) f^{F_{i - 1}} (x) ν (d x) \\ = & \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) (\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x)) ν (d x) \\ = & \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) (f (x) + o (1)) ν (d x) \\ = & \int_{S} \frac{m (x, ρ)}{f (x)} ϕ_{k} (x; ζ_{k, ℓ}) f (x) ν (d x) + o (1) \\ = & η_{k, ℓ} + o (1) . \end{matrix}

We readily obtain that

{\tilde{η}}_{k, ℓ} = η_{k, ℓ}, as, n \to \infty .

(82)

implying that

A_{k, ℓ, 2} = o (1), a . s .

(83)

Therefore,

\begin{matrix} {\hat{η}}_{k, ℓ} - η_{k, ℓ} & = & A_{k, ℓ, 1} + o (1), a . s . \end{matrix}

Let us now turn our attention to the term

A_{k, ℓ, 1}

in (80), we have

\begin{matrix} A_{k, ℓ, 1} & = & {\hat{η}}_{k, ℓ} - {\tilde{η}}_{k, ℓ} \\ = & \frac{1}{n} \sum_{i = 1}^{n} (\frac{ρ (Y_{i})}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) - E [\frac{ρ (Y_{i})}{f (X_{i})} ϕ_{k} (X_{i}; ζ_{k, ℓ}) | F_{i - 1}]) \\ = & \frac{1}{n} \sum_{i = 1}^{n} Φ_{k} (X_{i}; ζ_{k, ℓ}) . \end{matrix}

Notice that

{(Φ_{k} (x_{i}; ζ_{k, ℓ}))}_{0 \leq k \leq m_{n}}

is a sequence of Martingale differences with respect to the sequence of

σ -

fields

{(F_{i})}_{0 \leq k \leq m_{n}}

. It is obvious, proceeding as the proof of (56), that

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{k} (X_{i}; ζ_{k, ℓ})|}^{2}] \end{matrix}

where

\begin{matrix} {(E [({|\sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ})|}^{2})])}^{\frac{1}{2}} \\ \leq & n^{1 / 2} ‖ Φ_{k} (x_{1}; ζ_{k, ℓ}) ‖_{2} + {‖ \sum_{i = 1}^{n} E [Φ_{k}^{2} (x_{i}; ζ_{k, ℓ}) | F_{i - 1}] ‖}_{1}^{1 / 2} \\ = & Φ_{(1)} + Φ_{(2)} \end{matrix}

(84)

On the one hand, using a very famous decomposition combined with the fact that

F_{0}

is the trivial

σ -

field, we obtain

\begin{matrix} \frac{1}{n} Φ_{(1)}^{2} & = & ‖ Φ_{k} (x_{1}; ζ_{k, ℓ}) ‖_{2}^{2} \\ = & E [{|\frac{ρ (Y_{i})}{f (X_{i})} ϕ_{k} ((x_{1}; ζ_{k, ℓ})) - E [\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ}) | F_{0}]|}^{2}] \\ \leq & E [\sum_{j = 0}^{2} {|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ})|}^{j} {(E [|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ})|])}^{2 - j}] \\ = & \sum_{j = 0}^{2} C_{2}^{j} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ})|}^{j}] . {(E [|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ})|])}^{2 - j} \\ = & C_{2}^{2} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ})|}^{2}] + C_{2}^{1} {(E [|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ})|])}^{2} \\ + C_{2}^{0} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (x_{1}; ζ_{k, ℓ})|}^{2}], \end{matrix}

It follows, from assumptions (M.1) and (M.2), that

| ρ (Y_{i}) | \leq C_{m} + | ϵ_{i} |,

(85)

combined with the independence between

X_{1}

and

ϵ_{1}

,

E [ϵ_{1}^{2}] = 1

. Observe that, under assumption (E.1)(i) and the fact that

E

is an orthonormal basis of H, we have

\begin{matrix} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (X_{1}; ζ_{k, ℓ})|}^{2}] & \leq & \frac{(C_{m}^{2} + 1)}{c_{f}} E [|\frac{1}{f (X_{1})}| {|ϕ_{k} (x_{1}; ζ_{k, ℓ})|}^{2}] \\ \leq & \frac{(C_{m}^{2} + 1)}{c_{f}} \int_{S} \frac{1}{f (x)} {|ϕ_{k} (x; ζ_{k, ℓ})|}^{2} f (x) ν (d x) \\ = & \frac{(C_{m}^{2} + 1)}{c_{f}} \int_{S} {|\sum_{j \in I_{k}} \frac{1}{\sqrt{g_{j, k, ℓ}}} e_{j} (ζ_{k, ℓ}) e_{j} (x)|}^{2} ν (d x) \\ = & \frac{(C_{m}^{2} + 1)}{c_{f}} \int_{S} \sum_{j \in I_{k}} \frac{1}{g_{j, k, ℓ}} {|e_{j} (ζ_{k, ℓ})|}^{2} ν (d x) \\ \leq & \frac{(C_{m}^{2} + 1)}{c_{f}} C_{1} = O (1) . \end{matrix}

(86)

Moreover, from Assumptions (M.1) and (M.2) and using (52), (85), combined with the independence between

X_{1}

and

ϵ_{1}

,

E [ϵ_{1}] = 0

. we have

\begin{matrix} E [|\frac{ρ (Y_{1})}{f (X_{1})} ϕ_{k} (X_{1}; ζ_{k, ℓ})|] & = & O (\sqrt{\frac{n}{ln n}}) . \end{matrix}

(87)

Therefore,

\begin{matrix} Φ_{(1)} = O (n^{1 / 2}) . \end{matrix}

(88)

On the other hand, we consider the second term of decomposition (84), and proceeding as in the proof of (55) and considering (86) and (87)

\begin{matrix} Φ_{2} = O (n^{1 / 2}) . \end{matrix}

(89)

therefore, combining (88) and (89) to obtain

\begin{matrix} {(E [{|\sum_{i = 1}^{n} Φ_{k} (X_{i}; ζ_{k, ℓ})|}^{2}])}^{1 / 2} = O (n^{1 / 2}) . \end{matrix}

Hence,

\begin{matrix} E [| A_{k, ℓ, 1} |^{2}] & = & \frac{1}{n^{2}} E [{|\sum_{i = 1}^{n} Φ_{k} (x_{i}; ζ_{k, ℓ})|}^{2}] \\ = & \frac{1}{n^{2}} O (n) \\ \leq & C (\frac{ln n}{n}) \end{matrix}

(90)

Therefore, combining (83) and (90), there exists a constant C such as

E ({|{\hat{α}}_{k, ℓ} - α_{k, ℓ}|}^{2}) \leq \frac{C}{n} \leq C (\frac{ln n}{n}) .

(91)

Hence, the proof is complete. □

Proof of Lemma 7

Consider the following decomposition

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & {\hat{θ}}_{k, ℓ} - {\overset{˘}{θ}}_{k, ℓ} + {\overset{˘}{θ}}_{k, ℓ} - θ_{k, ℓ} \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

(92)

where

{\overset{˘}{θ}}_{k, ℓ} = \frac{1}{n} \sum_{i = 1}^{n} E [\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}] .

Observe that, under the assumptions (M.1)–(M.2) and (C.1), the Equation (81) and the fact that

E

is an orthonormal basis of H, by proceeding as in (82), we show that

{\overset{˘}{θ}}_{k, ℓ} = θ_{k, ℓ}, as, n \to \infty,

(93)

implying that

B_{k, ℓ, 2} = o (1), a . s .

(94)

Therefore,

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & B_{k, ℓ, 1} + o (1), a . s . \end{matrix}

Hence, we obtain

\begin{matrix} E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) & = & \frac{1}{n^{4}} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}), \end{matrix}

(95)

where

Ψ_{i, k, ℓ} = \frac{Y_{i}}{f (x_{i})} ψ_{k} (x_{i}; η_{k, ℓ}) - E [\frac{Y_{i}}{f (x_{i})} ψ_{k} (x_{i}; η_{k, ℓ}) | F_{i - 1}] .

Notice that

{(Ψ_{i, k, ℓ})}_{0 \leq k \leq n}

is a sequence of Martingale differences with respect to the sequence of

σ -

fields

{(F_{i})}_{0 \leq k \leq n}

, applying the Burkholder–Rosenthal inequality for

p = 4

(see Lemma 1), we obtain

\begin{matrix} {(E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}))}^{1 / 4} & \leq & {∥max_{1 \leq j \leq n} |\sum_{i = 1}^{j} Ψ_{i, k, ℓ}|∥}_{4} \\ ≪ & n^{1 / 4} {∥ Ψ_{1, k, ℓ} ∥}_{4} + {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} | F_{i - 1})∥}_{4 / 2}^{1 / 2} \\ = & Ψ_{k, ℓ}^{(1)} + Ψ_{k, ℓ}^{(2)} . \end{matrix}

(96)

Consider the first term of Equation (96),

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & = & ∥ Ψ_{1, k, ℓ} ∥_{4}^{4} \\ = & E ({|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ}) - E [\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ}) | F_{0}]|}^{4}) \\ \leq & E [{(|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})| + E [|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|])}^{4}] . \end{matrix}

By combining the identity

{(a + b)}^{n} = \sum_{k = 0}^{n} C_{n}^{k} a^{k} b^{n - k}

and the Jensen inequality and taking

n = 4

, we obtain

\begin{matrix} {(|\frac{ρ (Y_{1})}{f (x_{1})} ψ_{k} (X_{1}; η_{k, ℓ})| + E [|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|])}^{4} \\ = \sum_{k = 0}^{4} C_{4}^{k} {|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{k} {(E [|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|])}^{4 - k} \\ \leq \sum_{k = 0}^{4} C_{4}^{k} {|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{k} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{4 - k}] . \end{matrix}

Then, we have

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & \leq & E [\sum_{k = 0}^{4} C_{4}^{k} {|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{k} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{4 - k}]] \\ = & \sum_{k = 0}^{4} C_{4}^{k} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{k}] E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{4 - k}] . \end{matrix}

(97)

By following the same reasoning as in (86) and under the same assumptions (M.1)-(M.2) and (E.1)(i), we have

E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{2}] \leq C,

(98)

where C is a positive constant. Moreover, by the Cauchy–Schwarz inequality together with assumptions (E.1)(ii), (E.2) and condition (24), we obtain

\begin{matrix} sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| & \leq & sup_{x \in S} \sum_{j \in J_{k}} \frac{1}{\sqrt{h_{j, k, ℓ}}} | e_{j} (η_{k, ℓ}) | | e_{j} (x) | \\ \leq & {(\sum_{j \in J_{k}} \frac{1}{h_{j, k, ℓ}} {| e_{j} (η_{k, ℓ}) |}^{2})}^{1 / 2} {(sup_{x \in S} \sum_{j \in J_{k}} {| e_{j} (x) |}^{2})}^{1 / 2} \\ \leq & C_{1}^{1 / 2} C_{2}^{1 / 2} \sqrt{| J_{k} |} \\ \leq & C_{3} \sqrt{| J_{m_{n}} |} \\ \leq & C_{3} \sqrt{\frac{n}{{(ln n)}^{2}}} . \end{matrix}

(99)

Hence, we infer that

\begin{matrix} E [|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|] & = & O (\sqrt{\frac{n}{{(ln n)}^{2}}}), \end{matrix}

(100)

\begin{matrix} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{2}] & = & O (1), \end{matrix}

(101)

\begin{matrix} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{3}] & = & O (\sqrt{\frac{n}{{(ln n)}^{2}}}), \end{matrix}

(102)

\begin{matrix} E [{|\frac{ρ (Y_{1})}{f (X_{1})} ψ_{k} (X_{1}; η_{k, ℓ})|}^{4}] & = & O (\frac{n}{{(ln n)}^{2}}) . \end{matrix}

(103)

Observe that the largest term is (103), now, using that same statement (103) in Equation (62), we deduce that

\begin{matrix} \frac{1}{n} {(Ψ_{k, ℓ}^{(1)})}^{4} & \leq & C_{4} E [{|ψ_{k} (X_{1}; η_{k, ℓ})|}^{4}] \\ = & O (\frac{n}{{(ln n)}^{2}}) . \end{matrix}

It follows that

Ψ_{k, ℓ}^{(1)} = O {(\frac{n}{ln n})}^{1 / 2} .

(104)

Let us now investigate the upper bound of

Ψ_{k, ℓ}^{(2)}

of (61). Observe that

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & {∥\sum_{i = 1}^{n} E (Ψ_{i, k, ℓ}^{2} / F_{i - 1})∥}_{2}^{1 / 2} \\ = & {(E [{(\sum_{i = 1}^{n} E [Ψ_{i, k, ℓ}^{2} | F_{i - 1}])}^{2}])}^{1 / 4}, \end{matrix}

for all

i = 1, \dots, n

, using the Jensen inequality and the fact that

{(a - b)}^{2} = a^{2} - 2 a b + b^{2},

it follows

\begin{matrix} \sum_{i = 1}^{n} E [{|Ψ_{i, k, ℓ}|}^{2} | F_{i - 1}] \\ = \sum_{i = 1}^{n} (E [{|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (x_{i}; η_{k, ℓ}) - E [\frac{ρ (Y_{i})}{f (x_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) | F_{i - 1}]|}^{2} | F_{i - 1}]) \\ \leq 4 \sum_{i = 1}^{n} E [{|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ})|}^{2} | F_{i - 1}] . \end{matrix}

From the independence between

ϵ_{i}

and

X_{i}

, we have

\begin{matrix} E [ϵ^{2} | G_{i - 1}] & = & E [ϵ^{2} | X_{i}] \\ = & E [ϵ^{2}] = 1 . \end{matrix}

Under assumptions (M.1)–(M.2) and (E.1)(i) and (C.1), (63) and (105), we have

\begin{matrix} \sum_{i = 1}^{n} E [{|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ})|}^{2} | F_{i - 1}] \\ = \sum_{i = 1}^{n} E [E [{|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ})|}^{2} | G_{i - 1}] | F_{i - 1}] \\ \leq \frac{n (C_{m}^{2} + 1)}{c_{f}} \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} |\frac{\frac{1}{n} \sum_{i = 1}^{n} f^{F_{i - 1}} (x)}{f (x)}| ν (d x) \\ = \frac{n (C_{m}^{2} + 1)}{c_{f}} (1 + o (1)) \int_{S} {|ψ_{k} (x; η_{k, ℓ})|}^{2} ν (d x) \\ \leq n C, \end{matrix}

(105)

where

C = \frac{(C_{m}^{2} + 1)}{c_{f}} (1 + o (1)) .

It follows

\begin{matrix} Ψ_{k, ℓ}^{(2)} & = & C n^{1 / 2} . \end{matrix}

(106)

Combining (61), (69) and (71), we obtain

\begin{matrix} E ({|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}|}^{4}) & = & O ({(\frac{n}{ln n})}^{2}) + O (n^{2}), \end{matrix}

combining this with (95), we conclude

\begin{matrix} E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) & = & O ((\frac{1}{n^{4}}) {(\frac{n}{ln n})}^{2}) + O ((\frac{1}{n^{4}}) n^{2}) . \end{matrix}

(107)

Hence, there exists a constant

C > 0

, such that

\begin{matrix} E ({|{\hat{θ}}_{k, ℓ} - θ_{k, ℓ}|}^{4}) & \leq & C {(\frac{1}{n})}^{2} \leq C {(\frac{ln n}{n})}^{2} . \end{matrix}

(108)

The proof is achieved. □

Proof of Lemma 8

Considering the previous decomposition (92) in Lemma 7, we have

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & ({\hat{θ}}_{k, ℓ} - {\overset{˘}{θ}}_{k, ℓ}) + ({\overset{˘}{θ}}_{k, ℓ} - θ_{k, ℓ}) \\ = & B_{k, ℓ, 1} + B_{k, ℓ, 2}, \end{matrix}

where

\begin{matrix} B_{k, ℓ, 1} & = & \frac{1}{n} \sum_{i = 1}^{n} Ψ_{i, k, ℓ} \\ = & \frac{1}{n} \sum_{i = 1}^{n} (\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) - E [\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) | F_{i - 1}]), \\ B_{k, ℓ, 2} & = & \frac{1}{n} \sum_{i = 1}^{n} E [\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) | F_{i - 1}] - θ_{k, ℓ} . \end{matrix}

Statement (94) achieves the desired result for the term

B_{k, ℓ, 2}

\begin{matrix} {\hat{θ}}_{k, ℓ} - θ_{k, ℓ} & = & B_{k, ℓ, 1} + o (1) . \end{matrix}

We consider the next decomposition

\begin{matrix} Ψ_{i, k, ℓ} & = & V_{i, k, ℓ} + W_{i, k, ℓ}, \end{matrix}

(109)

where

V_{i, k, ℓ} = (\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (x_{i}; η_{k, ℓ}) 𝟙_{A_{i}} - E [\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}} | F_{i - 1}]),

W_{i, k, ℓ} = (\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}}^{c} - E [\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}}^{c} | F_{i - 1}]),

and

𝟙_{A_{i}} = \{| ϵ_{i} | \geq c_{*} \sqrt{ln n}\},

and

c_{*}

denotes a constant which will be chosen later. Now, observe that

\begin{matrix} P (|{\hat{θ}}_{k, ℓ} - {\overset{˘}{θ}}_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln n}{n}}) & \leq & P (|B_{k, ℓ, 1}| \geq \frac{κ}{2} \sqrt{\frac{ln n}{n}}) + o (1) \\ = & P (|\sum_{i = 1}^{n} Ψ_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{n ln n}) + o (1) \\ = & I_{1} + I_{2} + o (1), \end{matrix}

(110)

where

\begin{matrix} I_{1} & = & P (|\sum_{i = 1}^{n} V_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{n ln n}), \\ I_{2} & = & P (|\sum_{i = 1}^{n} W_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{n ln n}) . \end{matrix}

First, we aim to bound the term

I_{1}

of Equation (110). The Markov inequality and the Cauchy–Schwarz inequality yield

\begin{matrix} I_{1} & \leq & \frac{2}{κ \sqrt{n ln n}} E (|\sum_{i = 1}^{n} V_{i, k, ℓ}|) \\ \leq & \frac{2}{κ \sqrt{n ln n}} \sum_{i = 1}^{n} E (|V_{i, k, ℓ}|) . \end{matrix}

(111)

Observe that

\begin{matrix} E (|V_{i, k, ℓ}|) \\ = & E (|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}} - E [\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}} | F_{i - 1}]|) \end{matrix}

\begin{matrix} \leq & E (|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}}|) + E (E [|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}}| | F_{i - 1}]) \\ \leq & 2 E (|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}}|) \end{matrix}

(112)

\begin{matrix} \leq & 2 {(E ({|\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ})|}^{2}))}^{1 / 2} {(P (A_{i}))}^{1 / 2} . \end{matrix}

(113)

We use (98) combined with an elementary Gaussian inequality and take

c_{*}

to have

\frac{c_{*}^{2}}{4} - 1 / 2 = 2 .

We obtain

\begin{matrix} I_{1} & \leq & \frac{2 C}{κ} \sqrt{\frac{n}{ln n}} exp \{- \frac{c_{*}^{2} ln n}{4}\} \\ \leq & \frac{2 C}{κ} \frac{n^{- (\frac{c_{*}^{2}}{4} - 1 / 2)}}{\sqrt{ln n}} \\ \leq & C_{κ} \frac{1}{n^{2}}, \end{matrix}

(114)

where

C_{κ} = \frac{2 C}{κ} .

We now intend to investigate an upper bound for

I_{2}

of decomposition (110). We start by verifying the condition of Lemma 2. Suppose that assumptions (M.1) and (M.2) are satisfied combined with (99), we obtain

\begin{matrix} | Y_{i} 𝟙_{A_{i}^{c}} | & \leq & C_{m} + c_{*} \sqrt{ln n} \\ \leq & C \sqrt{ln n}, \end{matrix}

(115)

which implies

\begin{matrix} |W_{i, k, ℓ}| & \leq & |\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}^{c}} - E [\frac{ρ (Y_{i})}{f (X_{i})} ψ_{k} (X_{i}; η_{k, ℓ}) 𝟙_{A_{i}^{c}} 𝟙_{A_{i}^{c}} | F_{i - 1}]| \\ \leq & \frac{2 C \sqrt{ln n}}{c_{f}} sup_{x \in S} |ψ_{k} (x; η_{k, ℓ})| \\ \leq & \frac{2 C}{c_{f}} \sqrt{ln n} \sqrt{\frac{n}{{(ln n)}^{2}}} \\ \leq & C_{3} \sqrt{\frac{n}{ln n}} \\ \leq & C_{3} \sqrt{n}, \end{matrix}

(116)

where

C_{3} = \frac{2 C}{C_{f}}

, let

B = C_{3} \sqrt{n}

, then, for all

ϵ_{n} = \frac{κ}{2} \sqrt{n ln n}

sufficiently large n, we have

\begin{matrix} I_{2} = P (|\sum_{i = 1}^{n} W_{i, k, ℓ}| \geq \frac{κ}{2} \sqrt{n ln n}) & \leq & 2 exp \{- \frac{ϵ_{n}^{2}}{2 n B^{2}}\} \\ = & 2 exp \{- \frac{{(\frac{κ}{2} \sqrt{n ln n})}^{2}}{2 n {(C_{3} \sqrt{n})}^{2}}\} \\ = & 2 exp \{- \frac{κ^{2} ln n}{4 C_{3}^{2} n}\} \\ = & 2 exp \{ln n^{- \frac{κ^{2}}{4 C_{3}^{2} n}}\} \\ = & 2 n^{- w (κ, n)}, \end{matrix}

(117)

where

w (κ, n) = \frac{κ^{2}}{4 C_{3}^{2} n} .

Taking

κ

such that

w (n, κ) = 2,

we have

\begin{matrix} I_{2} & \leq & C \frac{1}{n^{2}} . \end{matrix}

(118)

It follows from (110), (114) and (118) that

\begin{matrix} P (|{\hat{θ}}_{k, ℓ} - {\tilde{θ}}_{k, ℓ}| \geq \frac{κ}{2} \sqrt{\frac{ln n}{n}}) & \leq & C \frac{1}{n^{2}} + o (1) \end{matrix}

(119)

\begin{matrix} \leq & C {(\frac{ln n}{n})}^{2} . \end{matrix}

(120)

The proof of (79) is achieved. □

Author Contributions

S.D., A.A.H. and S.B.: conceptualization, methodology, investigation, writing-original draft, writing—review and editing. All authors contributed equally to the writing of this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the Special Issue Editor of the Special Issue on “Functional Data Analysis: Theory and Applications to Different Scenarios”, Mustapha Rachdi for the invitation. The authors are indebted to the Editor-in-Chief and the two referees for their very generous comments and suggestions on the first version of our article, which helped us to improve content, presentation and layout of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bosq, D. Linear Processes in Function Spaces; Volume 149 of Lecture Notes in Statistics; Theory and applications; Springer: New York, NY, USA, 2000. [Google Scholar]
Ramsay, J.; Silverman, B. Functional Data Analysis, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2005. [Google Scholar]
Ferraty, F.; Vieu, P. Nonparametric Functional Data Analysis; Springer Series in Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Series in Statistics; Springer: New York, NY, USA, 2012. [Google Scholar]
Zhang, J. Analysis of Variance for Functional Data; Volume 127 of Monographs on Statistics and Applied Probability; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
Qing, S.; Choi, T. Gaussian Process Regression Analysis for Functional Data; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
Geenens, G. Curse of dimensionality and related issues in nonparametric functional regression. Stat. Surv. 2011, 5, 30–43. [Google Scholar] [CrossRef]
Cuevas, A. A partial overview of the theory of statistics with functional data. J. Statist. Plann. Inference 2014, 147, 1–23. [Google Scholar] [CrossRef]
Shang, H. A survey of functional principal component analysis. AStA Adv. Stat. Anal. 2014, 98, 121–142. [Google Scholar] [CrossRef]
Horváth, L.; Rice, G. An introduction to functional data analysis and a principal component approach for testing the equality of mean curves. Rev. Mat. Complut. 2015, 28, 505–548. [Google Scholar] [CrossRef]
Müller, H. Peter hall, functional data analysis and random objects. Ann. Statist. 2016, 44, 1867–1887. [Google Scholar] [CrossRef]
Nagy, S. An overview of consistency results for depth functionals. In Functional Statistics and Related Fields; Contrib. Stat.; Springer: Cham, Switzerland, 2017; pp. 189–196. [Google Scholar]
Vieu, P. On dimension reduction models for functional data. Statist. Probab. Lett. 2018, 136, 134–138. [Google Scholar] [CrossRef]
Aneiros-Pérez, G.; Cao, R.; Fraiman, R.; Genest, C.; Vieu, P. Recent advances in functional data analysis and high-dimensional statistics. J. Multivar. Anal. 2018, 170, 3–9. [Google Scholar] [CrossRef]
Ling, N.; Vieu, P. Nonparametric modelling for functional data: Selected survey and tracks for future. Statistics 2018, 52, 934–949. [Google Scholar] [CrossRef]
Goia, A.; Vieu, P. An introduction to recent advances in high/infinite dimensional statistics. J. Multivar. Anal. 2016, 146, 1–6. [Google Scholar] [CrossRef]
Almanjahie, I.M.; Bouzebda, S.; ChikrElmezouar, Z.; Laksaci, A. The functional kNN estimator of the conditional expectile: Uniform consistency in number of neighbors. Stat. Risk Model. 2022, 38, 47–63. [Google Scholar] [CrossRef]
Almanjahie, I.M.; Bouzebda, S.; Kaid, Z.; Laksaci, A. Nonparametric estimation of expectile regression in functional dependent data. J. Nonparametr. Stat. 2022, 34, 250–281. [Google Scholar] [CrossRef]
Bouzebda, S.; Nezzal, A. Uniform consistency and uniform in number of neighbors consistency for nonparametric regression estimates and conditional U-statistics involving functional data. Jpn. J. Stat. Data Sci. 2022, 1–103. [Google Scholar] [CrossRef]
Bouzebda, S.; Nemouchi, B. Weak-convergence of empirical conditional processes and conditional U-processes involving functional mixing data. Stat. Inference Stoch. Process. 2022, 1–56. [Google Scholar] [CrossRef]
Bouzebda, S.; Nemouchi, B. Uniform consistency and uniform in bandwidth consistency for nonparametric regression estimates and conditional U-statistics involving functional data. J. Nonparametr. Stat. 2020, 32, 452–509. [Google Scholar] [CrossRef]
Almanjahie, I.M.; Kaid, Z.; Laksaci, A.; Rachdi, M. Estimating the conditional density in scalar-on-function regression structure: k-NN local linear approach. Mathematics 2022, 10, 902. [Google Scholar] [CrossRef]
Meyer, Y. Wavelets and operators. In Different Perspectives on Wavelets (San Antonio, TX, 1993); Volume 47 of Proc. Sympos. Appl. Math.; Amer. Math. Soc.: Providence, RI, USA, 1993; pp. 35–58. [Google Scholar]
Daubechies, I. Ten Lectures on Wavelets; Volume 61 of CBMS-NSF Regional Conference Series in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1992. [Google Scholar]
Mallat, S. A Wavelet Tour of Signal Processing, 3rd ed.; Elsevier/Academic Press: Amsterdam, The Netherlands, 2009. [Google Scholar]
Vidakovic, B. Statistical Modeling by Wavelets; Wiley Series in Probability and Statistics: Applied Probability and Statistics; John Wiley & Sons Inc.: New York, NY, USA, 1999. [Google Scholar]
Härdle, W.; Kerkyacharian, G.; Picard, D.; Tsybakov, A. Wavelets, Approximation, and Statistical Applications; Volume 129 of Lecture Notes in Statistics; Springer: New York, NY, USA, 1998. [Google Scholar]
Rao, B.L.S.P. Nonparametric estimation of the derivatives of a density by the method of wavelets. Bull. Inform. Cybernet. 1996, 28, 91–100. [Google Scholar] [CrossRef]
Chaubey, Y.P.; Doosti, H.; Rao, B.P. Wavelet based estimation of the derivatives of a density for a negatively associated process. J. Stat. Theory Pract. 2008, 2, 453–463. [Google Scholar] [CrossRef]
Rao, B.L.S.P. Nonparametric Estimation of Partial Derivatives of a Multivariate Probability Density by the Method of Wavelets; De Gruyter: Berlin, Germany, 2018; pp. 321–330. [Google Scholar]
Prakasa Rao, B.L.S. Wavelet estimation for derivative of a density in the presence of additive noise. Braz. J. Probab. Stat. 2018, 32, 834–850. [Google Scholar] [CrossRef]
Allaoui, S.; Bouzebda, S.; Chesneau, C.; Liu, J. Uniform almost sure convergence and asymptotic distribution of the wavelet-based estimators of partial derivatives of multivariate density function under weak dependence. J. Nonparametr. Stat. 2021, 33, 170–196. [Google Scholar] [CrossRef]
Chesneau, C.; Kachour, M.; Maillot, B. Nonparametric estimation for functional data by wavelet thresholding. REVSTAT 2013, 11, 211–230. [Google Scholar]
Rosenblatt, M. Uniform ergodicity and strong mixing. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 1972, 24, 79–84. [Google Scholar] [CrossRef]
Bradley, R. Introduction to Strong Mixing Conditions; Kendrick Press: Heber City, UT, USA, 2007; Volume 1. [Google Scholar]
Laïb, N.; Louani, D. Nonparametric kernel regression estimation for functional stationary ergodic data: Asymptotic properties. J. Multivariate Anal. 2010, 101, 2266–2281. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S.; ElHajj, L. Multivariate wavelet density and regression estimators for stationary and ergodic continuous time processes: Asymptotic results. Math. Methods Statist. 2015, 24, 163–199. [Google Scholar] [CrossRef]
Delecroix, M.; Rosa, A.C. Nonparametric estimation of a regression function and its derivatives under an ergodic hypothesis. J. Nonparametr. Statist. 1996, 6, 367–382. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Additive regression model for stationary and ergodic continuous time processes. Comm. Statist. Theory Methods 2017, 46, 2454–2493. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: Asymptotic results. Comm. Statist. Theory Methods 2017, 46, 1367–1406. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev. Mat. Complut. 2021, 34, 811–852. [Google Scholar] [CrossRef]
Bouzebda, S.; Didi, S. Some results about kernel estimators for function derivatives based on stationary and ergodic continuous time processes with applications. Comm. Statist. Theory Methods 2022, 51, 3886–3933. [Google Scholar] [CrossRef]
Bouzebda, S.; Chaouch, M. Uniform limit theorems for a class of conditional z-estimators when covariates are functions. J. Multivar. Anal. 2022, 189, 104872. [Google Scholar] [CrossRef]
Bouzebda, S.; Chaouch, M.; Didi, S. Asymptotics for function derivatives estimators based on stationary and ergodic discrete time processes. Ann. Inst. Statist. Math. 2022, 74, 1–35. [Google Scholar] [CrossRef]
Leucht, A.; Neumann, M. Degenerate U- and V-statistics under ergodicity: Asymptotics, bootstrap and applications in statistics. Ann. Inst. Statist. Math. 2013, 65, 349–386. [Google Scholar] [CrossRef]
Neumann, M.H. Absolute regularity and ergodicity of Poisson count processes. Bernoulli 2011, 17, 1268–1284. [Google Scholar] [CrossRef]
Goh, S.S. Wavelet bases for Hilbert spaces of functions. Complex Var. Elliptic Equ. 2007, 52, 245–260. [Google Scholar] [CrossRef]
Kerkyacharian, G.; Picard, D. Density estimation in Besov spaces. Statist. Probab. Lett. 1992, 13, 15–24. [Google Scholar] [CrossRef]
Kerkyacharian, G.; Picard, D. Thresholding algorithms, maxisets and well-concentrated bases. Test 2000, 9, 283–344. [Google Scholar] [CrossRef]
Prakasa, Rao, B. L.S. Nonparametric density estimation for functional data via wavelets. Commun. Stat. Theory Methods 2010, 39, 1608–1618. [Google Scholar] [CrossRef]
Cohen, A.; DeVore, R.; Kerkyacharian, G.; Picard, D. Maximal spaces with given rate of convergence for thresholding algorithms. Appl. Comput. Harmon. Anal. 2001, 11, 167–191. [Google Scholar] [CrossRef]
DeVore, R.A. Nonlinear approximation. In Acta Numerica, 1998; Volume 7 of Acta Numer; Cambridge Univ. Press: Cambridge, UK, 1998; pp. 51–150. [Google Scholar]
Autin, F. Point de vue Maxiset en Estimation Non Paramétrique. Ph.D. Thesis, Université Paris-Diderot-Paris VII, Paris, France, 2004. [Google Scholar]
Gasser, T.; Hall, P.; Presnell, B. Nonparametric estimation of the mode of a distribution of random curves. J. R. Stat. Soc. Ser. B Stat. Methodol. 1998, 60, 681–691. [Google Scholar] [CrossRef]
Donoho, D.L.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Density estimation by wavelet thresholding. Ann. Statist. 1996, 24, 508–539. [Google Scholar] [CrossRef]
Peškir, G. The uniform mean-square ergodic theorem for wide sense stationary processes. Stoch. Anal. Appl. 1998, 16, 697–720. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. Nonparametric models for functional data, with application in regression, time-series prediction and curve discrimination. Nonparametric Stat. 2004, 16, 111–125. [Google Scholar] [CrossRef]
Ferraty, F.; Vieu, P. The functional nonparametric model and application to spectrometric data. Comput. Statist. 2002, 17, 545–564. [Google Scholar] [CrossRef]
Ouassou, I.; Rachdi, M. Regression operator estimation by delta-sequences method for functional data and its applications. AStA Adv. Stat. Anal. 2012, 96, 451–465. [Google Scholar] [CrossRef]
Burkholder, D.L. Distribution function inequalities for martingales. Ann. Probab. 1973, 1, 19–42. [Google Scholar] [CrossRef]
de la Peña, V.H.; Giné, E. Decoupling; Probability and Its Applications (New York); From dependence to independence, Randomly stopped processes. U-statistics and processes. Martingales and beyond; Springer: New York, NY, USA, 1999. [Google Scholar]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

DIDI, S.; AL HARBY, A.; BOUZEBDA, S. Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time. Mathematics 2022, 10, 3433. https://doi.org/10.3390/math10193433

AMA Style

DIDI S, AL HARBY A, BOUZEBDA S. Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time. Mathematics. 2022; 10(19):3433. https://doi.org/10.3390/math10193433

Chicago/Turabian Style

DIDI, Sultana, Ahoud AL HARBY, and Salim BOUZEBDA. 2022. "Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time" Mathematics 10, no. 19: 3433. https://doi.org/10.3390/math10193433

APA Style

DIDI, S., AL HARBY, A., & BOUZEBDA, S. (2022). Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time. Mathematics, 10(19), 3433. https://doi.org/10.3390/math10193433

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wavelet Density and Regression Estimators for Functional Stationary and Ergodic Data: Discrete Time

Abstract

1. Introduction

2. Multiresolution Analysis

Besov Space

3. Problem Definition of the Density Estimation

3.1. Density Function Estimator

3.2. Estimation Procedure Steps

4. Problem Definition of the Regression Estimation

Regression Function Estimator

5. Applications

5.1. The Conditional Distribution

5.2. The Curve Discrimination

5.3. Time Series Prediction from Continuous Set of Past Values

6. Concluding Remarks

7. Proofs

7.1. Proof of Theorem 1

7.1.1. Proof of Theorem 1

7.1.2. Proof of Lemmas

Proof of Lemma 3

Proof of Lemma 4

Proof of Lemma 5

7.1.3. Proof of Theorem 2

7.1.4. Proof of Theorem 2

7.1.5. Proof of Lemmas

Proof of Lemma 6

Proof of Lemma 7

Proof of Lemma 8

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI