1. Introduction
Modern biomedical technology has made it possible to measure multiple patient features during a time interval or intermittently at several discrete time points to review underlying biological mechanisms. Functional data also arise in genetic studies—a massive amount of gene expression data is recorded for each subject and could be treated as a functional curve [
1]. Functional data analysis provides distinct features related to the dynamics of cellular responses and activity and other biological processes. Existing methods, such as projection, dimension-reduction, and functional linear regression analysis, are not adapted for such data. Overviews can be found in the book by Horváth and Kokoszka [
2] and some recently published papers such as Yuan et al. [
3] and Lai et al. [
4].
Ramsay and Silverman [
5], Clarkson et al. [
6], and Ferraty and Vieu [
7] introduced some basic tools and widely accepted methods for functional data analysis; Horváth and Kokoszka [
2] established some fundamental methods for estimation and hypothesis testing on mean functions and covariance operators of functional data. The topics are broad and the results are in depth. Conventionally, each data curve is assumed to be observed over a dense set of points, often over thousands of points, then smoothing techniques are used to produce continuous curves, and these curves are treated as completely observed functional data for statistical inference. In contrast with those assumptions, we consider the more practical issues in which the data curves are only observed at some (not dense) time points, and the observed data curves are actually interpolations at those observed points. Of course, a relatively large sample size is needed for sparse observations. The effects of both number of observation points and sample size are also considered in our analysis.
For analyzing longitudinal data, Zeger and Diggle [
8] considered a semiparametric regression model of the form, with longitudinal observations
where
is the response variable,
is the
covariate vector at time
t,
is a
constant vector of unknown regression coefficients,
is an unspecified baseline function,
is a zero-mean stochastic process, and
represents the observation interval. Under this model, Lin and Ying [
9] estimated
via a weighted least squares estimator based on the theory of counting processes; Fan and Li [
10] further studied this model using a weighted difference-based estimator and a weighted local linear estimator followed by statistical inference, as discussed in Xue and Zhu [
11].
For functional data analysis, the data are often represented by
, and the model is [
12,
13,
14]
Some researchers considered the following model [
2,
5,
15]
To estimate
, assume there are basis
and
, which span the spaces of the
and
, respectively. The estimate of
of the form is given by
and
is estimated by minimizing the residual sum of squares
. Although the resulting estimator is useful, a representation theorem for such an estimator is hard to obtain, and hence the asymptotic distribution of this approach is not clear. Yao, et al. [
15] investigated a functional principle component method for estimation of model (
3) and obtained consistent results. Müller and Yao [
16] studied a variation of the above model in the conditional expectation format.
The smoothing spline method is popular for curve estimation. The function curves can be estimated at any point, followed by the computation of coefficients. However, the asymptotic property of estimators based on the spline method is tough to handle. For natural polynomial splines, the number of knots is the number of untied observations, which is sometimes redundant and undesirable. B-splines only require a few (the degree of the polynomial plus two) basis functions and are easy to implement [
17,
18,
19]. Another method is local linear fit [
20,
21,
22], but the difficulty is in choosing the bandwidth, especially when the observation points are uneven. Therefore, in this paper we employ reproducing kernel Hilbert space (RKHS), a special form of spline method in which the turning point from curve estimation to point estimation Yuan and Cai [
12] explored its application on functional linear regression problem, and Lei and Zhang [
23] extented it to RKHS-based partially functional linear models. In general, one needs to choose a set of (orthogonal) basis functions and the number of basis for functional estimations, while with RKHS one only needs to determine the kernel(s) of RKHS. Furthermore, the Riesz presentation theorem shows that any bounded linear function can be reproduced as a representer based on the RKHS kernel with a closed form.
However, existing RKHS methods often meet obstacles when choosing different norms and the corresponding optimization procedures. Although using a carefully selected norm in the optimization criterion has the advantage of interpretation, it suffers in that the resulting regression estimator generally needs the computation of an inversion of a large matrix (the same as the sample size). Moreover, most of the existing methods, including the aforementioned RKHS methods, are designed for the case where the observed data are sampled from a dense rate and are limited to models in which either the response or predictors are functions. New methods for estimation and hypothesis testing of regression parameters for the more general case where both the response and predictors are functions with sparsely observed data are needed. To address these problems, we propose a new RKHS method with a unified norm to characterize the RKHS and the optimization criterion for function-on-function regression. Although the statistical interpretation of this optimization criterion is not fully clear, with a simple closed form of the estimated regressors under a general function-on-function regression model, this optimization is more computationally reliable and applicable without the need of computing the inverse of a massive matrix. By establishing a representation theorem and a functional central limit theorem based on the proposed model, we obtain the asymptotic distribution of the estimators. Hypothesis testing of the underlying curves is proposed accordingly.
The remainder of this paper is organized as follows.
Section 2 describes the proposed method for the estimation and hypothesis testing of regression parameters for functional data via the reproducing kernel Hilbert space and establishes some theoretical properties. Simulation studies and a real-data example to demonstrate the effectiveness of our proposed method are given in
Section 3 and
Section 4, respectively.
Section 5 gives some concluding remarks, and all technical proofs are left in the
Appendix A.
2. The Proposed Method
We consider the observed data
. The underlying data curves
are iid copies from
, where
and
are random curves on some region
T. The observation times
are generally assumed to be different for each subject
i for some
. We assume that time points
(
) are iid copies from some integer-valued random variable
m, and given
, the time points
for (
) are iid copies from a positive random variable
G, with its support on
. For each individual, the observed data
can be interpolated as curves
on
T. We assume the following model for the observed data
where
are the true regression coefficient functions for the covariates
’s, and the
’s are random errors. In general,
and
are non-independent for
, e.g.,
being a zero-mean Gaussian process with some covariance function
, known or unknown. Note that model (
4) is more general than (
2) and is more straightforward than model (
3) in describing the relationship between the responses
-th and the covariates
. Typically, we set
, and so
is the baseline function. Since
and
may be different even for the same
j, there may be no observation or just a few observations at each time point
t.
To estimate the regression coefficient function
, the simplest way is the point-wise least squares estimate or any other non-smoothing (i.e., without roughness penalty) functional estimates. However, those estimates have some undesirable properties, often with wiggly shape and large variances in the area with sparse observations. An established performance measure for functional estimation is the mean square error (MSE),
Non-smoothed estimates often have small bias but large sampling variance, while smoothed estimates are the other way around, with much smoother shape by adjusting the shape from neighboring data, but with larger bias. To better balance the trade-off between bias and sampling variance and optimize the MSE, a regularized smooth estimate is preferred, in which a smoothing parameter could control the degree of penalty.
Existing smoothing methods all suffer different aspects of weakness. Functional principal component analysis [
15] is computationally intensive. General spline and kernel smoothing methods [
24] do not fit the problem under research due to their constant choice of bandwidth. It is known that for non-smoothing methods, computation complexity is often of the order
, where
n is the data sample size, while for smoothing methods the amount of computation may substantially exceeds
and even become computationally prohibitive. Thus, for smoothing methods, it is important to find a method with
computation load. To achieve this with spline methods, the basis should have only local support (i.e., nonzero only locally). Recently, a popular method in functional estimation is using the reproducing kernel Hilbert space (RKHS). RKHS is a special spline method that has this property, and can achieve the
computation for many functional estimation problems [
5,
12].
For functional estimate with RKHS, we define two norms (inner products) on the same RKHS
: one, denoted by
, defines the objective optimization criterion, and another one, denoted by
, is for the RKHS
. Different from a general Hilbert space, in an RKHS
of functions on
T, the point evaluation functional
is a continuous linear map, so that by the Riesz representation theorem, there is a bi-variate function
on
T such that
Take
, we also get
The above two properties yield the name RKHS.
Note that for a given Hilbert space
, a collection of functions on some domain
T with a given inner product
, its reproducing kernel
K may not be unique. In fact, for any mapping
,
is a reproducing kernel for
, and any reproducing kernel of
can be expressed in this form (Berlinet and Thomas-Agnan, 2004), and it has a one-to-one correspondence with a covariance function on
. The choice of a kernel is mainly for convenience. However, a reproducing kernel under one inner product may not be a reproducing kernel under another inner product on the same space
. Assume
, with
being some RKHS and a known kernel
, both are to be specified later. Let
be another inner product on
(typically
and
for all
). With the observed curves
, ideally an optimization procedure for estimating
in (
4) will be of the form
where
is a penalty functional, and
is the smoothing parameter. The penalty term
can be significantly simplified via the RKHS as shown in the proof of Theorem 1 below. If
, the above procedure gives the unsmoothed estimate with some undesirable properties such as overfitting and large variance.
For model (
2) with one covariate variable, Yuan and Cai [
12] considered penalized estimate
of
. The corresponding estimator
has a closed form of being linear in
, but the computation involves the inverse of an
matrix. For model (
1) with
d covariates, we first consider estimator of
in the form of linear in
. It turns out that the estimator has a closed form but also involves the inverse of a
matrix, which is computationally infeasible in general.
Consider an estimator of
in the form of a linear combination of
. For any
, denote
, and for any
, denote
and similarly for
. For
matrix
and
matrix
, let
be the
d rows of
,
be the
d columns of
, and define
a
d-column vector. Since
, and
,
has a basis
, we consider estimate
of
with the form
, where
is a
matrix,
is a
matrix, and
is
. With
, for fixed
an RKHS estimator of
is of the form
where
For the penalty, let
be a pre-specified
symmetric positive definite constant matrix; we define
and
as the null space for the penalty, and
is its orthogonal complement (with respect to the inner product
). Then,
. That is,
; it has the decomposition
, with
and
. Here,
is also an RKHS with some reproducing kernel
on
. With RKHS,
for all
, which implies that
. Further,
for all
, and
. Thus
Typically, is chosen to be a identity matrix. The choices of , , and the inner product will be addressed latter.
For a function
and a vector of functions
, denote
; for a matrix
, denote
, and similarly for the notations
and
. The following representation theorem shows that the estimator given in (
5) is computationally feasible for many applications.
Theorem 1. Assume , for . Then for the given penalty functional and fixed λ, there are constant matrices and such that given in (5) has the following representationwhere , and in vector form of where the matrices (), (), ), and , and the vectors and are given in the proof. For the ordinary regression model , with and , the least squares method yields the estimation of as . Since is of order (a.s.), can be viewed as approximately a linear form . Let and . Now we consider estimate of with linear form . Since , and , we only need to consider an estimate of the form , where is a parameter matrix, is a parameter matrix, and is a d-vector. This allows us to express the estimate via the basis of the RKHS and with a greater degree of flexibility than the linear combination of . Another advantage of using estimates of the form is convenience of hypothesis testing. As typically , thus testing the hypothesis of linearity of is equivalent to testing .
For any function
, we set
, and for fixed
,
where
Let be the vector representation of ; be that of , with , with , , and its vector form , and its vector form ; be all the eigenvalues of D, and be its normalized eigenvectors, and .
Theorem 2. Assume , for . Then for the given penalty functional and fixed λ, there are constant matrices and such that given in (6) has the following representationand in vector form of when the following inverse exists, Below we study asymptotic behavior of
given in (
6). Denote
as the true value of
, and
is the determinant of a square matrix
. Lai et al. [
25] proved strong consistency of the least squares estimate under general conditions, while Eicker [
26] studied its asymptotic normality. The proposed estimators in this paper have some similarity to the least squares estimate, but they also have some different features and require different conditions.
- (C1).
.
- (C2).
.
- (C3).
for all bounded , where .
- (C4).
(a.s.).
- (C5).
.
Theorem 3. Assume conditions (C1)–(C5) hold, then as , To emphasize the dependence on n, we denote . Let be the space of bounded functions on T equipped with the supreme norm, and stands for weak convergence in the space . With the following condition (C6), we obtain the asymptotic normality of
- (C6).
.
Theorem 4. Assume conditions (C1)–(C4) and (C6) hold. Then as ,where is the zero-mean Gaussian process on T with covariance function given in the proof, , and is given in the proof. Test linearity of .
It is of interest to test the hypothesis is linear in t, where J is a d-dimensional vector with entries 0 or 1, with 1 corresponding to the element of to be tested for linearity. The hypothesis is equivalent to test the corresponding coefficients in be zeros. Let , , , , . Let and be the vector representations of and , and . Denote , . By Theorem 4, we have
Corollary 1. Assume the conditions of Theorem 4 hold, under , we havewhere is the sub-matrix of that corresponds to the covariance of , , and Γ
is given in the proof of Theorem 4. The nonzero bias term in Theorem 4 and Corollary 1 is typical in functional estimation, and often such a bias term is zero for the corresponding Euclidean parameter estimation.
Choice of the smoothing parameter. In nonparametric penalized regression for the model
, the most commonly-used method for the choice of the smoothing parameter is cross-validation (CV), based on the ideas of Allen (1974) and Stone (1974). This method chooses
by minimizing
where
is the estimated regression function without using the observations of the
ith individual. This method is usually computationally intensive even when the sample size is moderate. An improved version of the method is
K-fold cross-validation. This method first randomly partitions the original sample equally into
K subsamples, and then the cross-validation process is conducted
K times. At each replicate,
subsamples are used as the training data to construct the model, while the remaining one is used as the validation datum. The results from
K folds are averaged to obtain a single estimation. In notation, let
be the sample sizes of the
K folds, then the
K-fold cross-validation method is to choose the
which minimizes
where
is the estimated regression function without using the data in the
Jth fold. In this paper, we set
, which is also the default setting in much software.
Choices of,
, and. For notational simplicity, we consider
without loss of generality. Recall that for a function
f on
with
continuous derivatives and
, it has the following Taylor expansion [
27]
where
if
and
otherwise.
To construct an RKHS
on
, a common choice for the inner product on
is
, and the orthogonal complement of
is
, with inner product
, where
The inner product on
is
. Kernels for the RKHS with more general
for
and
for
with these inner products can be found in [
28]. More generalized construction of kernels
and
can be found in Ramsay and Silverman [
5]. For our case,
With the above inner product,
, and
, let
, then
,
, and
and
are orthogonal to each other with respect to
, but these are not true if
is replaced by a different inner product
on
.
3. Simulation Studies
In this section, we conduct two simulation studies to investigate the finite sample performance of the proposed RKHS method. The first simulation study is designed to compare the RKHS estimator with the conventional smoothing spline and local polynomial model methods in terms of curve fitting. For more details on the implementations of smoothing spline and local polynomial model methods, please refer to the book by Fang, Li, and Sudijianto [
24]. The second simulation study is to examine the performance of Corollary 1 for testing the linearity of the regression functions. It turns out that with moderate sample sizes, the proposed RKHS estimator performs very favorably with the competitors, and the type I errors and powers of the testing are satisfactory.
Simulation 1. Assume that the underlying individual curve
i at time point
is generated from
where
, and
is a stationary Gaussian process with zero mean, unit variance, and a constant covariance
between any two distinct time points. For each subject
i, the number of observation time points
is generated from the discrete uniform distribution on
, and the observation time points
,
are independently generated from the exponential distribution
. The density function of
is displayed in the left panel of
Figure 1, from which it is easy to see that the density value decreases as
t increases.
Then, we use cubic interpolation to interpolate the , , and on T to obtain , , and , respectively.
Based on the functions
,
, and
described above, we use the RKHS introduced in
Section 2 to estimate the regression functions
, and
, and compare its performance with the spline smoother and local polynomial models. Typical comparisons (the random seed is set to be “set.seed(1)” in R) are given in
Figure 2,
Figure 3 and
Figure 4 with sample sizes of 50, 100, and 200, respectively. The simulation shows that the proposed RKHS method estimates the regression functions well and compares very favorably with the other two methods. Broadly speaking, the RKHS estimator has relatively stable performance and is close to the “true” curve; it has narrower confidence bands at dense sampling regions, and they become wider at sparse sampling regions. On the contrary, the spline smoother and local polynomial model appear to have good fit at dense sampling regions, but they have large bias when the data become sparse.
In order to make a thorough comparison for this simulation, we use the
root integrated mean squared prediction error (RIMSPE) to measure the accuracy of the estimates [
24]. The RIMSPE for estimate
of
is given by
and the simulation is repeated 1000 times. By using the R software, the CPU time of implementing this simulation is about 84.5 s on a PC with a 1.80 GHz dual-core Intel i5-8265U CPU and 8 GB memory. The boxplots of the RIMSPE values are presented in
Figure 5, from which it is clear that RKHS performs much better than the other two methods, because it has much smaller RIMSPE values.
Simulation 2. In this simulation study, we examine the performance of Corollary 1 for testing the hypothesis
According to the setting described in
Simulation 1,
is linear in
t, whereas
is apparently not linear in
t. Therefore, we will check the type I error for testing
and the power for testing
. By setting the significance level to
and repeating the simulation 1000 times, we use Corollary 1 to derive
testing statistics and list its type I errors and powers in
Table 1 for various sample sizes. The results in
Table 1 suggest that the type I error of the test is close to the nominal level
, and the power of the test is not small even with a sample size of 50.