1. Introduction
This paper considers a regression model with errors in the variables. Suppose observations
are i.i.d. (independent and identically distributed) random variables generated by the model
The i.i.d. random variables
are independent of
and
.
are independent of
,
and
. The functions
(known) and
(unknown) stand for the densities of
and
, respectively. The goal is to estimate the regression function
from the observations
. Errors-in-variables regression problems have been extensively studied in the literature, see, for example, ([
1,
2,
3,
4,
5,
6,
7]). Regression models with errors in the variables play an important role in many areas of science and social science ([
8,
9,
10]).
Nadaraya and Watson ([
11,
12]) propose a kernel regression estimator for the classical regression model
. Since the Fourier transform can transform a complex convolution to an ordinary product, it is a common method to deal with the deconvolution problem. Fan and Truong [
4] generalize the Nadaraya–Watson regression estimator from the classical regression model to the regression model (
1) via the Fourier transform. They study the convergence rate by assuming the integer order derivatives of
and
m to be bounded. Compared to integer-order derivatives, it is more precise to describe the smoothness by the Hölder condition. Meister [
6] shows the convergence rate under the local Hölder condition.
The above references on model (
1) both assume that the characteristic function of the covariate errors
does not have zeros on the real line. The assumption is rather strong. For example, if
is of uniform density on [−1, 1], it vanishes at
in the Fourier domain. Delaigle and Meister [
1] consider the regression model (
1) with a Fourier-oscillating noise, which means the Fourier transform of
vanishes periodically. They show that if
and
m are compact, then they can be estimated with the standard rate, as in the case where
does not vanish in the Fourier domain. Guo and Liu ([
13,
14,
15]) extend Delaigle and Meister [
1]’s work to multivariate cases.
The compactness is the cost of eliminating the zero points effect in the Fourier domain. Belomestny and Goldenshluger [
16] apply the Laplace transform to construct a deconvolution density estimator without assuming the density to be compact. They provide sufficient conditions under which the zeros of the corresponding characteristic function have no effect on the estimation accuracy. Goldenshluger and Kim [
17] also construct a deconvolution density estimator via the Laplace transform; they study how zero multiplicity affects the estimation accuracy. Motivated by the above work, we apply the Laplace transform to study the regression model (
1) with errors following a convolution of uniform distributions.
The organization of the paper is as follows. In
Section 2, we present some knowledge about the covariate error distribution and functional classes.
Section 3 introduces the kernel regression estimator via the Laplace transform. The consistency and convergence rate of our estimator are discussed in
Section 4 and
Section 5, respectively.
2. Preparation
This section will introduce the covariate error distribution and functional classes.
For a integrable function
f, the bilateral Laplace transform [
18] is defined by
The Laplace transform
is an analytic function in the convergence region
, which is a vertical strip:
The inverse Laplace transform is given by the formula
Let the covariate error distribution be a
-fold convolution of the uniform distribution on
. This means
where
are i.i.d and
with density
. Hence,
Here, is the product of two functions; the function has zeros only on the imaginary axis, the function does not have zeros for the analyticity of . The zeros of are , where .
Now, we introduce some functional classes.
Definition 1. For , , and , a function is said to satisfy the local Hölder condition with smoothness parameter β if f is k times continuously differentiable andwhere and . All these functions are denoted by . If (
3) holds for any
,
f satisfies the Hölder condition with smoothness parameter
. All these functions are denoted by
.
Clearly, k in Definition 1 equals . In later discussions, .
Example 1. Then, and .
It is easy to see that must be contained in for each . However, the reverse is not necessarily true.
Example 2 ([
19]).
Consider the functionwhere is the indicator function on the interval for a non-negative integer l. Then, for each . However, . Note that (
3) is a local Hölder condition around
. When we consider the pointwise estimation, it is natural to assume the unknown function to satisfy a local smoothness condition.
Definition 2. Let and be real numbers. We say that a function f belongs to the functional class if We denote .
3. Kernel Estimator
This section will construct the kernel regression estimator. Two kernels K and will be used.
Assume that the kernel satisfies the following conditions:
(i) , and supp ;
(ii) There exists a fixed positive integer
such that
Example 3 ([
20]).
Functionwhereand . Then, the kernel satisfies conditions (i) and (ii) with . Motivated by Belomestny and Goldenshluger [
16], we will construct the regression estimator via the Laplace transform. Note that
does not have zeros out of the imaginary axis. Then, the kernel
is defined by the inverse Laplace transform
where
,
and
is the Laplace transform of kernel
K with the convergence region
. There is a complex-valued improper integral in (
4). One can use the property of the Laplace transform to compute it, see [
18].
The following lemma provides a infinite series of kernel
. It is a specific form of Lemma 2 in [
16]. In order to explain the construction of the estimator, we give the details of the proof.
Lemma 1. Let (2) hold and . Proof. This ends the proof. □
The truncation is used to deal with infinite series. Select parameter N so that
. The cut-off kernels are defined by
Motivated by the Nadaraya–Watson regression estimator, we define the regression estimator of
as
where
In what follows, we will write
and
for the estimator (
7) associated with
and
, respectively. Finally, our regression estimator is denoted by
4. Strong Consistency
In this section, we investigate the consistency of the regression estimator (
9). Roughly speaking, consistency means that the estimator
converges to
as the sample size tends to infinity.
Theorem 1 (Strong consistency).
Consider the model (1) with (2). Suppose , and kernel function K satisfies condition (i). If x is the Lebesgue point of both and p , then satisfieswith and . Proof. We consider the estimator for .
Note that , and . Then, it is sufficient to prove and .
Now, we prove
. For any
,
By Markov’s inequality, we obtain
for
. This motivates us to derive an upper bound on
. Combining (
5) with (
8), we have
and
We obtain
where
.
Let
denote the number of elements contained in the set
A. If
, at least one of
is independent of all other
,
. Hence,
On the other hand, if
for
, by Jensen’s inequality, we obtain
where
. Let
. Then,
Since
for all
k, we obtain that
for
. This, with
, leads to
Inserting this into (
10), we obtain
Note that
are identically distributed. Then, it follows from (
11) and
that
where
Since
and considering the boundedness of
K,
holds for an
h that is small enough. It follows from
,
and
that
Note that the kernel function
K satisfies condition (i) and
, then
holds for each Lebesgue point
x of
p. Hence, for an
n that is sufficiently large, the term
vanishes. This, with (
14), shows
for an
n that is large enough. Since
, we have
For any
, it follows from the Borel–Cantelli lemma that
When putting
almost surely, we have
(2) We consider the estimator
for
. Inserting (
6) into (
8), we obtain
and
We obtain
where
. Similar to (
12) and (
13), we obtain
By
and (
2), we have that
. So,
Similar to (
17), we obtain
where
Since
and considering the boundedness of
K,
holds for an
h that is small enough.
Similar to
, we get
This completes the proof. □
Remark 1. Theorem 1 shows the strong consistency of kernel estimator . It is different from the work of Meister [6] in that the density function of our covariate error δ contains zeros in the Fourier domain. Our covariate error belongs to the Fourier oscillating noise considered by Delaigle and Meister [1]. Compared to their work, we construct a regression estimator via the Laplace transform without assuming and m to be compact. 5. Convergence Rate
In this section, we focus on the convergence rate in the weak sense. Meister [
6] introduces the weak convergence rate by modifying the concept of weak consistency. A regression estimator
is said to attain the weak convergence rate
if
The set is the collection of all pairs that satisfy some conditions. The order of limits is first , and then . Here, C is independent of n.
Define the set
where
.
The following Lemma is used to prove the theorem in this section.
Lemma 2 ([
6]).
If , , and . Then, for a small enough ,with two positive constants, and . Theorem 2. Consider the model (1) with (2). Assume that with if , and if . Suppose kernel K satisfies conditions (i), (ii) with . Let , . Then,where . Proof. (1) We assume that
and consider the estimator
. Applying Lemma 2 and Markov’s inequality, we obtain
where
is the larger of
and
, and
appear in Lemma 2. Then,
and
First, we estimate
and
. By (
18), we have
By Taylor expansion of
p with the degree
, there exists
such that
Since kernel
K satisfies condition (ii) and
, we have
By
, we find that
holds for an
h that is small enough. Equations (
19) and (
28) imply the following upper bound:
Now, we estimate the term
. By (
8) and (
5),
Note that
. Then, similar arguments to (
15)–(
17) show
Similar to (
27)–(
29), we have
Now, we estimate
and
. By (
8), we have
Note that
. It follows from
and
that
. Then,
Let
where
is the number of weak compositions of
l in
parts [
21]. Note that
By supp
, we have supp
. Denote
. If
, the intervals
and
are disjointed for
. For an
h that is small enough, we obtain
Denote
. By supp
and
,
This, with (
37), leads to
When
, we obtain
by
and similar arguments to [
16]. Similarly,
and
When
, we have that
holds for
. Similar to (
41), for
, we have
Similar to estimate
, we have
Since
and
,
Note that
. Then,
This leads to the result of Theorem 2 for .
We consider the estimator
for
. By (
22), (
23) and (
28), we have
Similar arguments to (
30)–(
32) show
Similar to (
33),
and from (
6),
Similar arguments to (
34)–(
37) show
holds for an
h that is small enough, where
. Denote
. Similar to (
38),
By similar arguments to (
39)–(
42), we have
and
This leads to the result of Theorem 2 for .
This completes the proof. □
Remark 2. Our convergence rate is the same as that in the ordinary smoothness case of Meister [6], where the density function of the covariate error does not vanish in the Fourier domain. Compared to Delaigle and Meister [1], we do not assume and m to be compact. Remark 3. Belomestny and Goldenshluger [16] consider the density deconvolution problem with non-standard error distributions. They assume the density function to be estimated satisfies the Hölder condition. It is natural to assume a local smooth condition in point estimation. Hence, and are assumed to satisfy the local Hölder condition in our discussion. Remark 4. Theorem 1 shows the strong consistency of the regression estimator without the smoothness assumption. The main tool used is the Borel–Cantelli lemma which requires a convergent series. It is easy to see from (13) and (20) that the choice of h is not unique. Theorem 2 gives a weak convergence rate, which is defined by modifying the weak consistency. It is natural to assume the smoothness condition when discussing the convergence rate. In Theorem 2, the choice of h is related to the smoothness index β. It follows from our proof (44) that the choice of h is unique in the sense of a constant difference. Remark 5. In our discussion, . Substituting this into the proof of Theorem 3.5 in [6], one can obtain the optimality of convergence rate in our Theorem 2. This means that there does not exist an estimator of the regression function based on i.i.d data generated by model (1) with (2), which satisfies It would be interesting to study the numerical illustration of our estimation. We shall investigate this in the future.
Author Contributions
Writing—original draft preparation, H.G. and Q.B.; Writing—review and editing, H.G. All authors have read and agreed to the published version of the manuscript.
Funding
This paper is supported by the National Natural Science Foundation of China (No. 12001132), the Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation, and the Center for Applied Mathematics of Guangxi (GUET).
Data Availability Statement
Not applicable.
Acknowledgments
The authors would like to thank the editor and reviewers for their important comments.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Delaigle, A.; Meister, A. Nonparametric function estimation under Fourier-oscillating noise. Stat. Sin. 2011, 21, 1065–1092. [Google Scholar] [CrossRef]
- Dong, H.; Otsu, T.; Taylor, L. Bandwidth selection for nonparametric regression with errors-in-variables. Econom. Rev. 2023, 42, 393–419. [Google Scholar] [CrossRef]
- Di Marzio, M.; Fensore, S.; Taylor, C.C. Kernel regression for errors-in-variables problems in the circular domain. Stat. Methods Appl. 2023. [Google Scholar] [CrossRef]
- Fan, J.Q.; Truong, Y.K. Nonparametric regression with errors in variables. Ann. Stat. 1993, 21, 1900–1925. [Google Scholar] [CrossRef]
- Hu, Z.R.; Ke, Z.T.; Liu, J.S. Measurement error models: From nonparametric methods to deep neural networks. Stat. Sci. 2022, 37, 473–493. [Google Scholar] [CrossRef]
- Meister, A. Deconvolution Problems in Nonparametric Statistics; Springer: Berlin, Germany, 2009. [Google Scholar]
- Song, W.X.; Ayub, K.; Shi, J.H. Extrapolation estimation for nonparametric regression with measurement error. Scand. J. Stat. 2023. [Google Scholar] [CrossRef]
- Carroll, R.J.; Delaigle, A.; Hall, P. Non-parametric regression estimation from data contaminated by a mixture of Berkson and classical errors. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 859–878. [Google Scholar] [CrossRef] [PubMed]
- Zhou, S.; Pati, D.; Wang, T.Y.; Yang, Y.; Carroll, R.J. Gaussian processes with errors in variables: Theory and computation. J. Mach. Learn. Res. 2023, 24, 1–53. [Google Scholar]
- Delaigle, A.; Hall, P.; Jamshidi, F. Confidence bands in non-parametric errors-in-variables regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 2015, 77, 149–169. [Google Scholar] [CrossRef]
- Nadaraya, E.A. On estimating regression. Theory Probab. Its Appl. 1964, 9, 141–142. [Google Scholar] [CrossRef]
- Watson, G.S. Smooth regression analysis. Sankhyā Indian J. Stat. 1964, 26, 359–372. [Google Scholar]
- Guo, H.J.; Liu, Y.M. Strong consistency of wavelet estimators for errors-in-variables regression model. Ann. Inst. Stat. Math. 2017, 69, 121–144. [Google Scholar] [CrossRef]
- Guo, H.J.; Liu, Y.M. Convergence rates of multivariate regression estimators with errors-in-variables. Numer. Funct. Anal. Optim. 2017, 38, 1564–1588. [Google Scholar] [CrossRef]
- Guo, H.J.; Liu, Y.M. Regression estimation under strong mixing data. Ann. Inst. Stat. Math. 2019, 71, 553–576. [Google Scholar] [CrossRef]
- Belomestny, D.; Goldenshluger, A. Density deconvolution under general assumptions on the distribution of measurement errors. Ann. Stat. 2021, 49, 615–649. [Google Scholar] [CrossRef]
- Goldenshluger, A.; Kim, T. Density deconvolution with non-standard error distributions: Rates of convergence and adaptive estimation. Electron. J. Stat. 2021, 15, 3394–3427. [Google Scholar] [CrossRef]
- Oppenheim, A.V.; Willsky, A.S.; Nawab, H.S. Signals & Systems, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
- Liu, Y.M.; Wu, C. Point-wise estimation for anisotropic densities. J. Multivar. Anal. 2019, 171, 112–125. [Google Scholar] [CrossRef]
- Stein, E.M.; Shakarchi, R. Real Analysis: Measure Theory, Integration, and Hilbert Spaces; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
- Stanley, R.P. Enumerative Combinatorics; Cambridge University Press: Cambridge, UK, 1997; Volume 1. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).