1. Introduction
In this paper, we consider a nonparametric regression model, where bivariate observations
satisfy the following equations:
where
, is an unknown random function (process) which is almost surely continuous, the design
consists of a set of observable random variables with possibly unknown distributions lying in
, and the design points are not necessarily independent or identically distributed. We will consider the design as a triangular array, i.e., the random variables
may depend on
n. In particular, this scheme includes regression models with fixed design. The random regression function
is not supposed to be design-independent. We will give below some fairly standard conditions for the regression analysis on the random errors
. In particular, they are supposed to be centered, not necessarily independent or identically distributed.
The paper is devoted to constructing uniformly consistent estimators for the regression function under minimal assumptions on the correlation of design points.
The most popular kernel estimation procedures in the classical case of nonrandom regression function are apparently related with the estimators of Nadaray–Watson, Priestley–Zhao, Gasser–Müller, local polynomial estimators, as well as their modifications (e.g., see [
1,
2,
3,
4,
5]). We are primarily interested in the dependence conditions of design elements
. In this regard, a huge number of publications in the field of nonparametric regression can be conditionally divided into two groups. We will classify papers with a random design to the first one, and to the second one with a fixed design.
In the papers dealing with random design, either independent and identically distributed quantities are considered or, as a rule, stationary sequences of observations that satisfy one or another known form of dependence. In particular, various types of mixing conditions, schemes of moving averages, associated random variables, Markov or martingale properties, and so on have been used. In this regard, we note, for example, the papers [
3,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22]. In the recent papers [
23,
24,
25,
26], nonstationary sequences of design elements with one or another special type of dependence are considered (Markov chains, autoregression, partial sums of moving averages, etc.). In the case of fixed design, in the overwhelming majority of works, certain conditions for the regularity of the design are assumed (e.g., see [
9,
10,
27,
28,
29,
30,
31,
32,
33]). So, the nonrandom design points
are most often given by the formula
with some function
g of bounded variation, where the error
is uniform in all
. If
g is linear then we obtain a so-called
equidistant design. Another version of the regularity condition is the relation
(here it is assumed that the design elements ranged in increasing order).
The problem of uniform approximation of a regression function has been studied by many authors (e.g., see [
7,
9,
10,
14,
15,
17,
20,
22,
26,
30,
34,
35,
36], and the references there).
In connection with studying the random regression function
, we note, for example, the papers [
37,
38,
39,
40,
41,
42,
43,
44,
45,
46] where the mean and covariance functions of the random regression function
f are estimated in the case when, for
N independent copies
of the function
f, noisy values of each of these trajectories are observed for some collection of design elements (the design can be either common to all trajectories or different from series to series). Estimation of the mean and covariance functions is an actively developing area of nonparametric estimation, especially in the last couple of decades, which is both of independent interest and plays an important role for some subsequent analysis of the random process
f (e.g., see [
39,
40,
45,
47,
48,
49]). We consider one of the variants of this problem as an application of the main result.
The purpose of this article is to construct estimators that are uniformly consistent (in the sense of convergence in probability) not only in the above-mentioned review of cases of dependence, but also for significantly different correlations of observations when the conditions of ergodicity or stationarity are not satisfied, as well as the classical mixing conditions and other well-known dependence restrictions. Note that the proposed estimators belong to the class of local linear kernel estimators, but with some different weights than in the classical version. In this case, instead of the original observations, we consider their concomitants associated with the variational series based on the design observations, and their spacings are taken as the additional weights for the corresponding weighted least-square method generating the above-mentioned new estimators. It is important to emphasize that these estimators have the property of universality regarding the nature of dependence of observations: the design can be either fixed and not necessarily regular, or random, while not necessarily satisfying the traditional correlation conditions. In particular, the only condition for design points that guarantees the uniform consistency of new estimators is the condition for dense filling of the domain of definition of the regression function. In our opinion, this condition is very clear and in fact, it is necessary to restore the function on the area of defining design elements. Previously, similar ideas were implemented in [
50] for slightly different evaluations (in detail, see
Section 4). Similar conditions for design elements were also used in [
51,
52] in nonparametric regression, and in [
53,
54,
55]—in nonlinear regression.
The paper has the following structure.
Section 2 contains the main results.
Section 3 discusses the problem of estimating the mean function of a stochastic process. Comparison of the universal local linear estimators with some known ones is given in
Section 4.
Section 5 contains some results of computer simulation. In
Section 6, we compare the results of using the new universal local linear estimators with the most common approaches of data analysis based on the epidemiological research ESSE-RF. In
Section 7, we briefly summarize the results of the study. The proofs of the results from
Section 2,
Section 3 and
Section 4 are referred to
Section 8.
2. Main Results
We need a number of assumptions.
The observations are represented in the form (
1)
, where the unknown random regression function , is almost surely continuous. The design points are a set of observable random variables with values in , having, generally speaking, unknown distributions, not necessarily independent or equally distributed. Moreover, the random variables may depend on n, i.e., can be considered as an array of design observations. The random function may be design-dependent. For all , the unobservable random errors satisfy with probability 1
the following conditions for all and :where the constant may be unknown and does not depend on n, the symbol stands for the conditional expectation given the σ-field generated both by the paths of the random process and by the random variables . A kernel , , is equal to zero outside the interval and is the density of a symmetric distribution with the support in , i.e., , for all , and . We assume that the function satisfies the Lipschitz condition with constant and .
In the future, we denote by
,
, the absolute
jth moment of the distribution with density
, i.e.,
. Put
. It is clear that
is a probability density with support lying in
. We need also the notation
Remark 1. We emphasize that assumption includes a fixed-design situation. We consider the segment as an area of design change solely for the sake of simplicity of exposition of the approach. In the general case, instead of the segment , one can consider an arbitrary Jordan measurable subset of .
Further, we denote by
the order statistics constructed by the sample
. Put
For every
i, the response variable and the random error from (
1) associated with the order statistic
will be denoted by
and
, respectively. It is easy to see that the new errors
satisfy condition
as well. Next, by
we denote a random variable
such that, for all
, one has
where
and
are positive (maybe random or not) variables and the function
that may depend on the kernel
K and
. We agree that, throughout what follows, all limits, unless otherwise stated, are taken for
.
Let us introduce one more constraint, which is the crucial condition of the paper (in particular, the only condition on design points that guarantees the existence of a uniformly consistent estimator; see also the comments at the end of the section).
The following limit relation holds: .
Finally, for any
, we introduce into consideration the following class of estimators for the regression function
f:
where
is the indicator function,
hereinafter, we use the notation
Remark 2. It is easy to see that the difference is the variance of a non-degenerate distribution; thus, this is strictly positive.
Remark 3. It is easy to verify that kernel estimator (3), without the indicator factor, is the first coordinate of the two-dimensional estimate of the weighted least-squares method, i.e., of the two-dimensional point at which the following minimum is attained: Thus, the proposed class of estimators in a certain sense (in fact, by construction) is close to the classical local linear kernel estimators, but in the weighted least squares method (5) we use slightly different weights. Remark 4. In the case when there are multiple design points, some spacings vanish, and we lose some of the sample information in the estimator (3). In this case, it is proposed, before using the estimator (3), to slightly reduce the sample by replacing the observations with the same points with their sample mean and leaving only one design point out of multiples in the new sample. In this case, the averaged observations will have less noise. So, despite the smaller size of the new sample, we do not lose the information contained in the original sample. Let us further agree to denote by , , absolute positive constants, and by , positive constants depending only on the kernel K. The main result of this section is as follows.
Theorem 1. Let conditions , , and be satisfied. Then, for any fixed , with probability 1
it is satisfiedwhere and the random variable meets the relationwith the constant from (4). Remark 5. As follows from the proof of Theorem 1, the constants and have the following structure: Remark 6. Since , then under condition the limit relation holds. Therefore, taking into account Theorem 1, we can assert that . Thus, the bandwidth h can be determined, for example, by the relation It is easy to see that, when is satisfied, the limit relations and hold. In fact, the value of equalizes in h the order of smallness in probability of both terms on the right-hand side of the relation (6). Note also that, for nonrandom f, one can choose as a solution to the equation It is clear that this solution tends to zero as n grows.
The relations (8) and (9) allow us to obtain the order of smallness of the optimal bandwidth h, but not the optimal value of h. In practice, h can be chosen, for example, by cross-validation. From Theorem 1 and Remark 6 it is easy to obtain the following corollary.
Corollary 1. Let the conditions , , , and be satisfied, the regression function be nonrandom, and be an arbitrary subset of equicontinuous functions in (for example, a precompact set). Thenwhere is defined by equation (9), in which the modulus of continuity is replaced with the universal modulus . Moreover, the asymptotic relation holds. Remark 7. It is easy to see that for a nonrandom the modulus of continuity in (9) can be replaced by one or another upper bound for , obtaining the corresponding upper bound for . Consider the case . If consists of functions satisfying the Hölder condition with exponent and a universal constant then and . In particular, if the functions from satisfy the Lipschitz condition () with a universal constant then . From Theorem 1 and Remark 6 we obtain the following corollary.
Corollary 2. Let the conditions , , , and be satisfied and let the modulus of continuity of the random regression function with probability 1
admit the upper bound , where is a random variable and is a positive continuous nonrandom function such that as . Thenwhere the value is defined in (9) after replacement . Let us discuss in more detail condition
. Obviously, condition
is satisfied for any nonrandom regular design (this is the case of nonidentically distributed
depending on
n). If
are independent and identically distributed and the interval
is the support of distribution of
, then condition
is also satisfied. In particular, if the distribution density of
is separated from zero on
, then
holds (see details in [
50]). If
is a stationary sequence with a marginal distribution with the support
, satisfying an
-mixing condition, then condition
is also satisfied (see Remark 8 below). Note that the dependence of the random variables
satisfying condition
can be much stronger, which is illustrated in the following example.
Example 1. Let the sequence of random variables be defined by the relationwhere and are independent and uniformly distributed on and , respectively, the sequence does not depend on , and consists of Bernoulli random variables with success probability , i.e., the distribution of random variables is an equilibrium mixture of two uniform distributions on the corresponding intervals. The dependence between the random variables for any natural number i is defined by the equalities and . In this case, the random variables in (11) form a stationary sequence of random variables uniformly distributed on the segment , satisfying condition . On the other hand, for all natural numbers m and n,Thus, all the known conditions for the weak dependence of random variables (in particular, the mixing conditions) are not satisfied here. According to the scheme of this example, it is possible to construct various sequences of dependent random variables uniformly distributed on by choosing sequences of Bernoulli switches with the conditions and for infinite numbers of indices and . In which case, condition will also be satisfied, but the corresponding sequence (not necessarily stationary) may not even satisfy the strong law of large numbers. For example, this is the case when for , and for , where (i.e., we randomly choose one of the two segments and , into which we randomly throw the first point, and then alternate the selection of one of the two segments by the following numbers of elements of the sequence: 1, 2,
,
,
etc.). Indeed, we can introduce the notation , , and note that, for all elementary events from the event , one haswhere and are the sets of indices, for which the observations lie in the intervals or , respectively. It is easy to see that and . Hence, almost surely as due to the strong law of large numbers for the sequences and . On the other hand, as , for all elementary events from one haswhere and are the sets of indices, for which the observations lie in the intervals or , respectively. Proving the convergence in (12), we took into account that and , i.e., . Similar arguments are valid for all elementary events from .
Remark 8. In the case of i.i.d. random variables , condition will be fulfilled if, for all ,where the supremum is taken over all intervals of length δ. Indeed, for any natural , we divide the interval into N subintervals , , of length . Then one hassince the event implies the existence of an interval of length that does not contain any points from the collection . Thereby, condition (13) implies the limit relation , which is equivalent to convergence with probability 1
due to the monotonicity of the sequence . In particular, if are independent then and , i.e., as , the finite collection with probability 1 forms a refining partition of the finite segment . It is easy to show that if is a stationary sequence satisfying an α-mixing condition and having a marginal distribution with the support then (13) will be valid. 4. Comparison with Some Known Approaches
In [
50], under the conditions of the present paper, the following estimators were studied:
It is interesting to compare the new estimators
with the estimators
from [
50] as well as with other estimators (for example, the Nadaraya–Watson estimators
and classical local linear estimators
). Throughout this section, we assume that conditions
,
, and
are satisfied and the regression function
is nonrandom. Moreover, we need the following constraint.
The regression function in Model (
1)
twice continuously differentiable, the errors are independent, identically distributed, centered, and independent of the design , whose elements are independent and identically distributed. In addition, the distribution function of the random variable has a strictly positive density continuously differentiable on . Such severe restrictions on the parameters of the regression model are explained both by problems in calculating the asymptotic representation for the variances of the estimators and as well as by properties of the Nadaraya–Watson estimators, which are very sensitive to the nature of the correlation of design elements.
For any statistical estimator
of the regression function
, we will use the notation
for its bias, i.e.,
Put
and for
, introduce the notation
The following asymptotic representation for the bias and variance of the estimator
was obtained in [
50].
Proposition 1. Let condition be fulfilled and . If and so that , , and then, for any , the following asymptotic relations are valid: Note that the first statement concerning the asymptotic behavior of the bias in Proposition 1 was actually proved for arbitrarily dependent design elements when condition is met. The following two propositions and corollaries are also obtained without any assumptions about correlation of design elements, only conditional centering and conditional orthogonality of the errors from condition are used.
Proposition 2. Let . Then, for any fixed ,where Proposition 3. Let the regression function be twice continuously differentiable. Then, for any fixed ,where Moreover,besides, the error terms and in (22) and (24) are uniform in t. Corollary 3. Let the regression function be twice continuously differentiable, , and . Then, for each fixed such that , the following asymptotic relations are valid: Corollary 4. Suppose that, under the conditions of the previous corollary, f has nonzero first and second derivatives in a neighborhood of zero. Then for any fixed positive such that , the following asymptotic relations hold:where Note that, due to the Cauchy–Bunyakovsky inequality and the properties of the density , the strict inequality holds for any .
Remark 11. Similar relations take place in a neighborhood of the right boundary of the segment , when for any . In this case, in the above asymptotics, one simply needs to replace the right-hand derivatives at zero by analogous (non-zero) left-hand derivatives at point 1, and instead of the quantities must be substituted . In this case, the coefficient will not change, and the corresponding coefficient on the right-hand side of the second asymptotics will only change its sign.
Thus, the qualitative difference between the estimators
and
is observed only in neighborhoods of the boundary points 0 and 1: for the estimator
, in the
h-neighborhoods of the indicated points, the order of smallness of the bias is
h, and for
this order is
. Such a connection between the estimators (
3) and (
19) seems to be quite natural in view of the relations (
5) and (
20), and the known relationship at the boundary points between Nadaraya–Watson estimators
and locally linear estimators
.
Remark 12. If condition is satisfied, then, for the bias and variance of estimators and , the following asymptotic representations are well known (see, for example, [1]), which are valid for any under broad conditions on the parameters of the model under consideration: The above asymptotic representations show that if the assumptionss are valid then the variance of the Nadaraya–Watson estimator and the locally linear estimator under broad conditions is asymptotically half the variance of the estimators and , respectively. However, the mean-square error of any estimator is equal to the sum of the variance and squared bias, which for the compared estimators is asymptotically determined by the quantities or , respectively. In other words, if the standard deviation σ of the errors is not very large andthen the estimator or may be more accurate than . The indicated effect for the estimator is confirmed by the results of computer simulations in [50]. Note also that in order to choose in a certain sense the optimal bandwidth h, the orders of the smallness of the bias and the standard deviation of the estimator are usually equated. In other words, if the assumptions are fulfilled, for all four types of estimators considered here, we need to solve the equation . Thus the optimal bandwidth has the standard order .
Remark 13. Estimators of the form and given in (3) and (19) can define a little differently, depending on the choice of one or another partition with highlighted points of the domain of the regression function underlying these estimators. For example, using the Voronoi partition of the segment , an estimator of the form (19) can be given by the equalitywhere , , for . Looking through the proofs from [50] it is easy to see that in this case all properties of the estimator are preserved, except for the asymptotic representation of the variance. Repeating (with obvious changes) the arguments in proving Proposition 1 in [50], we have Thus, in the case of independent and identically distributed design points, the asymptotic variance of the estimator can be somewhat reduced by choosing one or another partition.
Similarly, in the definition (3), the estimators , the quantities can be replaced by the Voronoi tiling . It is also worth noting that the indicator factor involved in the determination (3) of the estimator , does not affect the asymptotic properties of the estimator given in Theorem 1, and we only needed it to calculate the exact asymptotic behavior of the estimator bias. 5. Simulations
In the following computer simulations, instead of estimator (
3), we used the equivalent estimator
of the weighted least-squares method defined by the relation
where the quantities
are defined in (13) above. Estimator (
27) differs from estimator (
3) by excluding the indicator factor and replacing
with
, which is not essential (see Remark 13). If we had several observations at one design point, then the observations were replaced by one observation presenting their arithmetic mean (see Remark 4 above). Although the notation
in (
27) is somewhat different from the same notation in (
3), we retained the notation
, which will not lead to ambiguity.
In the simulations below, we will also consider the local constant estimator
from (
26), which can be defined by the equality
Here we also replace the observations corresponding to one design point by their arithmetic mean.
Recall that the Nadaraya–Watson estimator differs from (
28) by the absence of the factors
in the weighting coefficients:
The Nadaraya–Watson estimators are also weighted least-squares estimators:
In the following examples, estimators (
27) and (
28), which will be called
universal local linear (ULL) and
universal local constant (ULC), respectively, will be compared with the estimator of linear regression (LR), the Nadaraya–Watson (NW) estimator, LOESS of order 1, as well as with estimators of generalized additive models (GAM) and of random forest (RF). For LOESS estimators, the R
loess() function was used. Calculating the ULL estimator with the custom script was on average 3.2 times slower than the LOESS estimator calculated by the R
loess() function. That may be explained by the fact that the ULL estimator was implemented in R language (in contrast to R
loess() whose body is implemented in C and Fortran) and was not optimized for performance.
It is worth noting that, in the examples below, the best results were obtained by the new estimators ULL (
27) and ULC (
28), LOESS estimator of order 1, and the Nadaraya–Watson estimator.
With regard to the simulation examples, the main difference between the ULL (
27) and ULC (
28) estimators, and the Nadaraya–Watson and LOESS ones is that ULL (
27) and ULC (
28) are “more local”. This means that if a function
is evaluated on a design interval
A with a “small” number of observations adjacent to a design interval
B with a “large” number of observations, the Nadaraya–Watson and LOESS estimators will primarily seek to adjust to the “large” cluster of observations on the interval
B. At the same time, ULL (
27) and ULC (
28) will equally consider observations on intervals of equal lengths, regardless of the distribution of design points on the intervals.
In the examples below, for all of the kernel estimators that are the Nadaraya–Watson ones, LOESS, ULL (
27), and ULC (
28), we used the tricubic kernel
We chose the tricubic kernel because that kernel is employed in the R function loess() which was used in the simulations.
The accuracy of the models was estimated with respect to the maximum error and the mean squared error. In all the examples below, except Example 3, the maximum error was estimated on the uniform grid of 1001 points on the segment
by the formula
where
are the grid points of segment
,
,
,
are the values of the constructed estimator at the points of the partition grid, and
are the true values of the estimated function. In Example 3, a grid of 1001 points was taken on the interval from the minimum to the maximum point of the design. That was done in order to to avoid assessing the quality of extrapolation, since, in that example, the minimum design point could fall far from 0.
The mean squared error was calculated for one random splitting of the whole sample into training and validation samples in a proportion of
to
, according to the formula
where
m is the validation sample size,
are the validation sample design points,
are the noisy observations of the predicted function in the validation sample,
is the estimate calculated by the training sample. The splittings into training and validation samples were identical for all models.
For each of the kernel estimators, the parameter h of the kernel was determined using cross-validation, minimizing the mean squared error, where the set of observations was partitioned into 10 folds randomly. The same partitions were taken for all the kernel estimators.
When calculating the root mean square error, the cross-validation for choosing
h was carried out on the training set. To calculate the maximum error, the cross-validation was performed on the whole sample. For the Nadaraya–Watson models as well as for ULL (
27) and ULC (
28), the parameter
h was selected from 20 values located on the logarithmic grid from
to 0.9. For LOESS, the parameter
span was chosen in the same way from 20 values located on the logarithmic grid from 0.0001 to 0.9.
The simulations also included testing basic statistical learning algorithms: linear regression without regularization, generalized additive model, and random forest [
59]. The training of the generalized additive model was carried out using the R library
mgcv.
Thin-plate splines were used, the optimal form of which was selected using generalized cross-validation. Random forest training was done using the R library randomForest. The number of trees was chosen to be 1000 based on the out-of-bag error plot for a random forest with five observations per leaf. The optimal number of observations in a random forest leaves was chosen using 10-fold cross-validation on a logarithmic grid out of 20 values from 5 to 2000.
In each example, 1000 realizations of different training and validation sets were performed, for each of which the errors were calculated. In each of the training and validation sets realizations, 5000 observations were generated. The results of the calculations are presented below in the boxplots, where every box represents the median and the 1st and 3rd quartiles. The plots do not show the results of linear regression, since in the examples, the results appeared to be significantly worse than those of the other models. The mean squared and maximum errors of ULL (
27) were compared with the errors of LOESS estimator by the paired Wilcoxon test. The summaries of the errors on the 1000 realizations of different train and validation sets are reported as median (1st quartile, 3rd quartile).
The examples of this section were constructed so that the distribution of design points is “highly nonuniform”. Potentially, this could demonstrate the advantage of the new ULL estimator (
27) over known estimation approaches.
Example 2. Let us set the target functionand let the noise be centered Gaussian with standard deviation (Figure 1). In each realization, we draw 4500 independent design points uniformly distributed on the segment , and 500 independent design points uniformly distributed on the segment . The results are presented in Figure 2. For the maximum error, the advantage of the estimators of order 1 (LOESS and ULL (27)) over the estimators of order 0 (the Nadaraya–Watson and ULC (28)) is noticeable, while ULL (27) turns out to be the best of all considered estimators, in particular, ULL (27) performs better than LOESS: 0.6357 (0.4993, 0.8224) vs. 0.6582 (0.5205, 0.8508), p = 0.019. For the mean squared error, all models, except random forest and linear regression, show similar results. Moreover, ULL (27) turns out to be the best of the considered ones, although the difference between ULL (27) and LOESS is not statistically significant: 4.017 (3.896, 4.139) vs. 4.030 (3.906, 4.154), p = 0.11. Example 3. The piecewise linear target function is shown in Figure 3. For the sake of simplicity of presentation, we do not present the formula for the definition of this function. Here, the centered Gaussian noise has the standard deviation . The design points are independent and identically distributed with density proportional to the function , . The results are presented in Figure 4. The Nadaraya–Watson estimator appears to be the best model both for the maximum error and for the mean squared error. For the both errors, ULL (27) is better than LOESS (p < 0.0001 for the maximum error, p = 0.0030 for the mean squared error). Example 4. In this example, the design points are strongly dependent. We will define them as follows: , , where A is a positive number such that is irrational (we chose in this example),and are independent uniformly distributed on random variables independent of the noise. It was shown [50] that the random sequence is asymptotically everywhere dense on with probability 1. The target function isshown in Figure 5. The results are presented in Figure 6. For maximum error, ULL (27) turns out to be the best of all the considered estimators. In particular, ULL (27) is better than LOESS: 1.757 (1.491, 2.053) vs. 2.538 (2.216, 2.886), p < 0.0001. The median mean squared error for ULL (27) also turns out to be the smallest of those considered. In that sense, ULL (27) is better than LOESS, but the difference is not significant: 4.166 (4.025, 4.751) vs. 4.219 (4.096, 4.338), p = 0.92. Example 5. In this example, the target function was the same as in Example 4. The difference from the previous example is that 50,000 design points were generated by the same technique, and then 5000 points of the 50,000 ones were selected. This allowed us to fill the domain of f with design elements “more uniformly” than in the previous example, while preserving the clusters of design points.
The results are presented in Figure 7. For maximum error, ULL (27) turns out to be the best of all the considered estimators. In particular, ULL (27) is better than LOESS: 2.872 (2.369, 3.488) vs. 9.435 (5.719, 10.9), . For the mean squared error, the best estimator is LOESS. ULL (27) is worse than LOESS: 5.108 (4.535, 6.597) vs. 4.378 (4.229, 4.541), , but it is better than the other estimators considered. 6. Real Data Application
In this section, we consider an application of the models considered in the previous section to the data collected in the multicenter study “Epidemiology of cardiovascular diseases in the regions of the Russian Federation”. In that study, representative samples of unorganized male and female populations aged 25–64 years from 13 regions of the Russian Federation were studied. The study was approved by the Ethics Committees of the three federal centers: State Research Center for Preventive Medicine, Russian Cardiology Research and Production Complex, Almazov Federal Medical Research Center. Each participant provided written informed consent for the study. The study was described in detail in [
60].
One of the urgent problems of modern medicine is to study the relationship between heart rate (HR) and systolic arterial blood pressure (SBP), especially for low observation values. Therefore we will choose SBP as the outcome, and HR as the predictor. The association between these variables was previously estimated to be nonlinear [
61]. The general analysis included 6597 participants from four regions of the Russian Federation. The levels of SBP and HR were statistically significantly pairwise different between the selected regions. Thus, the hypothesis of the independence of design points was violated.
In this section, the maximum error cannot be calculated because the exact form of the relationship is unknown, so only the mean squared error is reported. The mean squared error was calculated for 1000 random partitions of the entire set of observations into training () and validation () samples.
The results are presented in
Figure 8. Here, the GAM estimator and the kernel estimators showed similar results, which were better than the results of both the linear regression and random forest.
The best estimator turned out to be ULC (
28), although its difference from the Nadaraya–Watson estimator was not statistically significant: 220.2 (215.4, 225.9) vs. 220.4 (215.4, 225.8),
. The difference between ULL (
27) and LOESS was not significant too: 220.4 (215.4, 225.9) vs. 220.6 (215.6, 226.1),
.
7. Conclusions
In this paper, for a wide class of nonparametric regression models with a random design, universal uniformly consistent kernel estimators are proposed for an unknown random regression function of a scalar argument. These estimators belong to the class of local linear estimators. However, in contrast to the vast majority of previously known results, traditional conditions of dependence of design elements are not needed for the consistency of the new estimators. The design can be either fixed and not necessarily regular, or random and not necessarily consisting of independent or weakly dependent random variables. With regard to design elements, the only condition that is required is the dense filling of the regression function domain with the design points.
Explicit upper bounds are found for the rate of uniform convergence in probability of the new estimators to an unknown random regression function. The only characteristic explicitly included in these estimators is the maximum spacing statistic of the variational series of design elements, which requires only the convergence to zero in probability of the maximum spacing as the sample size tends to infinity. The advantage of this condition over the classical ones is that it is insensitive to the forms of dependence of the design observations. Note that this condition is, in fact, necessary, since only when the design densely fills the regression function domain is it possible to reconstruct the regression function with some accuracy. As a corollary of the main result, we obtain consistent estimators for the mean function of continuous random processes.
In the simulation examples of
Section 5, the new estimators were compared with known kernel estimators. In some of the examples, the new estimators proved to be the most accurate. In the application to real medical data considered in
Section 6, the accuracy of new estimators was also comparable with that of the best-known kernel estimators.