An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions
Round 1
Reviewer 1 Report
The Authors propose new non-asymptotic upper bound of the bias of Nadaraya-Watson density
estimator using weak Lipschitz assumptins and Gaussian kernels. The obtained results are
interested, important, have many apllications especially in multidimesional data analsis and
artifiacial inteligence e.g. self-driving cars, where we wante to know the prediction-error. The
Authors bound requires less restrictive assumptions than the previous results.
The presented results seem to be correct. The results are supported by numerical simulations.
I have some minor remarks:
1. Page 2, Theorem 1, $m(s_i)$ is undefined and $s_i$,
2. Page 2, Theorem 1, if $f_X$ is the density of $y_i$ or for $(x_i,y_i)$?
3. Page 2, Theorem 1, what means $\sigma(\epsilon_i)$? it is a function of errors or it is
multiplication?
4. Page 4, In Theorem 2 and later, you use $\hat(f_n(x))$ but in the previous pages you use
$\hat(m_n(x))$. This is a certain inconsistency in the notation.
5. Page 4, In Theorem 2, what is the exact definition of function $\phi(x,y,z)$? This is the
scaled the standard normal density?
6. Page 5, Line 113. In what sense is the convergence of $L_m$ ? With respect to $n$?
7. Page 6, In Numerical Simulation, what distribution has error term $\epsilon$ in simulations?
8. Page 11, Proposition 1 holds under (A2).
9. Page 11, In Proposition 4, $\l$ without bold
10. Page 12, the first line in the proof, $h_i$ replace by $h$
I recommend this paper for publication under minor revision.
Comments for author File: Comments.pdf
Author Response
Dear reviewer,
thanks for your time! Your suggestion was very precious to improve the quality of our submission.
In the following, we answer each of your concerns.
- You are right. We corrected it.
- It is the density of x_i. We specified it in the paper.
- This bug is due to previous notation. We corrected it.
- You are right. We replaced them with $\hat{m}_n(x)$.
- Yes, it was. Now, this quantity has been replaced with the integral of the kernel function.
- We do not consider the number of samples in this work, which are always assumed to be infinite. The limit $L_m \to 0$ is meant to show that the bias goes to $0$ when $L_m$ goes to $0$. We replaced $L_m\to 0$ with $L_m=0$. Instead, we kept $h \to 0$ since in the theorem we require $h > 0$ and with some particular choice o kernel, the bound can be undefined with $h=0$.
- The noise used in the "uni-dimensional" and "multidimensional" analysis was normal with mean 0 and standard deviation $0.05$. In the new paragraph "Realistic Scenario" we used no noise. We updated this information in the new submission.
- Yes, you are right. We inserted this information in the new submission.
- Yes, we corrected.
Thanks again!
Best regards,
The authors.
Reviewer 2 Report
Report on "An Upper Bound of the Bias of Nadaraya–Watson Kernel Regression under Lipschitz Assumptions" with # stats-1002323 submitted to Stats by Tosatto et al.
Major Contributions:
In this paper, the authors consider the Nadaraya–Watson kernel estimator. Its asymptotic bias has been studied by Rosenblatt. This paper proposed an upper bound of the bias, which holds for finite bandwidths using Lipschitz assumptions and Gaussian kernels. The authors conducted simulation studies to show that the proposed approach works well.
Main Comments:
- In p. 3, the paper displayed the Lipschitz condition. It is worthwhile to explain these conditions. In particular, how to check it in the simulation and real data analysis? It is interesting to weaken this Lipschitz condition in the main result of this paper.
- The Gaussian kernels in the paper are strong. It is worthwhile to consider other kernels for the verification of an upper bound of the bias.
- It is of interest to add a real example, which is used to illustrate the proposed methods.
- It is interesting to compare the computational cost between the proposed method and several competing alternatives in the simulation study.
5. There are numerous typos and grammatical Please improve the poor organization.
Minor Comments:
- 1, line 20, -> “e.g.,".
- 3, line 76, add “,” before "we".
- 5, line 113, add "." at the end of the equation.
- 10, line 248, add “vol number”.
- 10, line 255, add "place".
- 10, line 256, add “vol number”.
Author Response
Dear reviewer,
thanks a lot for your feedback.
We built a new derivation that works for a broader family of kernels. All we require, is that few integrals have finite solution (\int_{-inf}^{inf} k(x) \de x, \int_{-inf}^{inf} k(x)e^{-xL} \de x and \int_{-inf}^{inf} k(x)e^{-xL}x \de x - where L is a non-negative constant and k(x) is the kernel function).
In the main paper, we present some numerical analyses with the Gaussian kernel, Box kernel, and Triangular kernel. In the Appendix, we show more numerical simulations, and we compute all integrals necessary for computing the bound for the different kernels discussed in the main paper.
We will now answer your specific concerns.
- We added a small explanation in lines 55-61, 91-93 of the new submission. The Lipschitz continuity is not very restrictive, and it is very common in fields like optimization or statistical machine learning. We think that this assumption is very reasonable for many different regression functions. In our particular case, we give the possibility to select the Lipschitz constant of a finite interval of the regression function, which widens, even more, the class of admissible functions. Furthermore, it allows functions like $f(x) = |x|$ which are not admissible for the Rosenblatt's analysis (since it requires finite m''). Besides that, we agree that the Lipschitz constant might be unknown. In those cases, it can be estimated from the data. We agree that it would be very interesting to weaken this condition, but it is far from trivial.
- We agree with you, and we have been able to relax this assumption. We tested the numerical simulation of three different kernels in total.
- We agree. We added the regression of a dynamical system (we have chosen an inverted pendulum). We estimated the Lipschitz constant from the data, and we plotted the bias and our upper bound.
- When the mentioned integrals are known, our bound requires negligible computation (i.e., the evaluation of the formula in Theorem 2 and 3). When the integrals are not known, one needs to use numerical integration. Note that the number of integrals to solve is still limited, and grows linearly with the number of dimensions. We included this information in the new submission.
Thanks again for your time,
Best regards
Round 2
Reviewer 2 Report
The new version has a significant improvement.The paper addressed my concerns of reviewer.