Next Article in Journal
Tan-Type BLF-Based Attitude Tracking Control Design for Rigid Spacecraft with Arbitrary Disturbances
Next Article in Special Issue
Stochastic EM Algorithm for Joint Model of Logistic Regression and Mechanistic Nonlinear Model in Longitudinal Studies
Previous Article in Journal
A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation and Hypothesis Test for Mean Curve with Functional Data by Reproducing Kernel Hilbert Space Methods, with Applications in Biostatistics

1
School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
2
Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington, DC 20057, USA
3
National Heart, Lung and Blood Institute, Office of Biostatistics Research, Bethesda, MD 20892, USA
*
Authors to whom correspondence should be addressed.
Mathematics 2022, 10(23), 4549; https://doi.org/10.3390/math10234549
Submission received: 2 November 2022 / Revised: 22 November 2022 / Accepted: 23 November 2022 / Published: 1 December 2022
(This article belongs to the Special Issue Recent Development in Biostatistics and Health Science)

Abstract

:
Functional data analysis has important applications in biomedical, health studies and other areas. In this paper, we develop a general framework for a mean curve estimation for functional data using a reproducing kernel Hilbert space (RKHS) and derive its asymptotic distribution theory. We also propose two statistics for testing the equality of mean curves from two populations and a mean curve belonging to some subspace, respectively. Simulation studies are conducted to evaluate the performance of the proposed method and are compared with the major existing methods, which shows that the proposed method has a better performance than the existing ones. The method is then illustrated with an analysis of the growth data from the National Growth and Health Study (NGHS) project sponsored by the NIH.

1. Introduction

Functional data analysis with the objectives of estimating and testing mean curves over time has been extensively used in biomedical, health science and other areas of study. Functional data are random elements in the Banach/Hilbert space, and there are no density functions or parametric models for functional data. Thus, estimates and hypothesis tests for the mean curves are mostly based on nonparametric methods, without relying on potentially unrealistic parametric model assumptions. The commonly used methods for a functional data analysis and reviews of the existing work can be found in [1,2,3,4,5]. A popular postulate for nonparametric inferences with functional data is that the mean curves belong to some “structured space” [6], which can be approximated by expansions of a set of known basis functions, so that the estimation and testing procedures can be constructed through the unknown coefficients of the basis expansions. The existing results based on various basis approximation methods can be found in [7,8,9,10] proposed the functional principal components method via basis expansions, and [11] studied a likelihood ratio test for longitudinal and functional data. Ref. [12] gave a comprehensive review of the developments in this area. Ref. [13] studied the general properties of the mean curve estimation, under common and independent observation points, and obtained the optimal minimax convergence rates for both cases. Ref. [14] considered a multivariate functional principal method. Ref. [15] constructed a control chart for functional data. Ref. [16] proposed a cross-component registration method. Ref. [17] considered a random projection method. Ref. [18] studied a functional linear mixed model.
In practice, sometimes the observed functional data are rather “irregular” in that observation time points are unbalanced; they are dense in some time intervals, sparse in other time intervals. Such data often arise from medical studies, for example, patients can be observed on regular schedules during their treatments initially, and their subsequent visits to hospital become less frequent and gradually thin out as the patients’ conditions improve or become incurable.
To illustrate the general structure of such data in this research, Figure 1 depicts the growth data from the National Growth and Health Study (NGHS), sponsored by the National Heart, Lung, and Blood Institute, from 1985 to 2000. In these data, the observed time points are relatively dense at the beginning of the treatment and then become sparse, gradually thinning out near the study end.
Apparently, the curve estimation obtained by the common smoothing techniques aforementioned cannot be regarded as “true” observed data curves due to the unbalanced observations. In fact, our simulation studies in Section 4 show that those methods may result in a biased estimation or a relatively large variance of the estimator when the observation time points are unbalanced. Hence, some effective methods for estimating the mean curves of functional data should be developed.
In this paper, we take interpolated curves at the longitudinal times points for each individual as the estimated–observed curves; the goal is to estimate the underlying true mean curve and testing hypotheses about it. The simplest method for a mean curve estimation is just to take the empirical mean of the observed interpolated curves, or any other non-smoothing (i.e., without a roughness penalty) functional estimates. However, as the time points for the observations are generally sparse, the interpolated curves and their mean are not sufficiently smooth, often with a wiggly shape and large variances, a poor performance in the interval with sparse observations. To overcome these issues, there are several commonly used methods, such as kernel smoothing, spline and the method of the reproducing kernel Hilbert space (RKHS). The method of the RKHS has the following advantages. Instead of specifying a set of (orthogonal) basis functions and the number of bases, one only needs to choose the kernel(s) of the RKHS. Moreover, with this method, any bounded linear functional can be written as a representer (the Rize representer with respect to the inner product of the RKHS) in a closed form in terms of the kernel of the RKHS, as the estimator can often be formulated as a linear functional of the data in a closed form. Another consideration is the computation. It is known that for non-smoothing methods, the computation is often of order O ( n ) , where n is the sample size of the data, while for smoothing methods, the amount of computation may substantially exceed O ( n ) and become computationally extensive. Thus, for smoothing methods, it is important to find a method with O ( n ) computation load. To achieve this, for spline methods, the basis should have local-only support, i.e., nonzero only locally. The RKHS method is a special case of spline with this property and can achieve the O ( n ) computation for many functional estimation problems, which is called the optimal basis theorem in ([1], p.363) and the representer theorem in [19]. More specifically, the RHKS H is a Hilbert space of functions, equipped with an inner product · , · . On H , there is a kernel K ( · , · ) , a bivariate function, such that K ( t , · ) , h ( · ) = h ( t ) , for all h H , and so the name reproducing kernel. To apply this method to the functional estimation, like the other smoothing methods, a penalty term will be specified along with the object functional. The H can be divided into a null space H 0 , with a kernel K 0 , corresponding to the penalty term, and its orthogonal complement H 1 with a kernel K 1 . H 0 is a finite-dimensional space spanned by some basis g 1 , , d d , and an estimator μ ^ ( · ) of the mean curve of data Y 1 ( · ) , …, Y n ( · ) has the form, for some constants a 1 , , a d and b 1 , , b n ,
μ ^ ( · ) = j = 1 d a j g j ( · ) + i = 1 n b i K 1 , Y i ( · ) .
Given the above reasons, we adopt the method of the RKHS for its ease of use and computational efficiency, and other well-known properties.
Recently, the RKHS method has been studied by many researchers. Ref. [20] used the method in the spline model, Ref. [21] studied the quantile regression using this method and [19] used it for functional linear regressions. However, to the best of our knowledge, no asymptotic distribution theory for the mean estimation with functional data using the RKHS exists in the literature, as we do here. The simulation studies indicate an apparent advantage of the proposed method compared to some commonly used methods for this type of data. The rest of this paper is organized as follows. In Section 2, we describe the general RKHS method for the mean curve estimation and derive the theoretical results and the asymptotic distributions of the mean curve estimations are investigated with a special construction of the RKHS. And two statistics for testing mean curves are proposed in Section 3. Section 4 provides the results of the simulation studies and the application of the proposed methods to real functional data from the National Growth and Health Study (NGHS) of the NIH. We conclude with a discussion in Section 5, and the proofs of the main results are given in the Appendix A.

2. Mean Curve Estimation via RKHS

We consider the stochastic processes Y ( t ) indexed by the time point t ( 0 , T ] for some 0 < T < . At any given t ( 0 , T ] , Y ( t ) R is the real-valued outcome variable. Assume that there are n independent subjects and each subject is observed at randomly selected distinct time points. Let Y i j = Y i ( t i j ) be the observation of subject i at time t i j for i = 1 , , n and j = 1 , , m i . Denote the mean function of Y ( t ) by μ 0 ( t ) , the model is assumed as
Y i ( t i j ) = μ 0 ( t i j ) + ϵ i ( t i j ) . for j = 1 , , m i , and i = 1 , , n ,
where the ϵ i ( · ) is i.i.d. measurement error with mean zero and variance function σ 2 ( t ) . Furthermore, we assume that μ 0 ( · ) is a continuous function on ( 0 , T ] . Note that the Y i ( · ) is observed only at times t i j . In much of the functional data analysis literature, the observed data are interpolated and then treated as observed curves, which is not realistic. Here, we deal with the data in a realistic way. A second-order differential interpolated curve Y ^ i ( · ) is used for each subject i, such as the cubic spline interpolation [22], then Y ^ i ( t i j ) = Y i ( t i j ) at all times t i j . The second-order differential interpolation is chosen as needed in the asymptotic study. We assume the following model for the Y ^ i ( · ) is
Y ^ i ( t ) = μ 0 ( t ) + ϵ ^ i ( t ) , E [ ϵ ^ i ( t ) ] = 0 , E [ ϵ ^ i 2 ( t ) ] = σ 2 ( t ) , ( i = 1 , , n ) .
To estimate the true mean function μ 0 ( · ) , the RKHS approach is employed. Let H be a Hilbert space consisting of square integrable functions 2 ( T ) on [ 0 , T ] , with a given inner product < · , · > H . For any mapping G : [ 0 , T ] 2 2 ( T ) , K ( s , t ) : = < G ( s , · ) , G ( t , · ) > H is a reproducing kernel for H and any reproducing kernel of H can be expressed in this form ([23], Theorem 4, p. 22). Thus, for a given Hilbert space, its reproducing kernel K is non-unique. The choice of an adequate kernel for a specific statistical inference is important (for details, see Section 3 below). However, a reproducing kernel under one inner product may not be a reproducing kernel under another inner product on the same space H . Assume μ 0 H and there is some RKHS with a known kernel K ( · , · ) . For any h H , define K h = ( K h ) ( · ) = < K ( · , ) , h > H . Let < · , · > be another norm on H , | | h | | 2 = < h , h > for all h H (typically | | h | | 2 = T h 2 ( t ) d t ). We estimate μ 0 ( · ) by
μ ^ n , λ ( · ) = arg inf μ H 1 n i = 1 n | | Y ^ i μ | | 2 + λ J ( μ ) ,
where λ is a smoothing parameter, and J ( μ ) = | | K μ | | H 2 is a penalty functional for some kernel K to be addressed. In the spline method, the penalty is of the form | | μ ( r ) | | 2 for some order r derivative of μ , which is a special case of the RKHS methods (see below).
Let H 0 = { h H : J ( h ) = 0 } H be the null space for the penalty functional, and let H 1 be its orthogonal complement (with respect to the inner product < · , · > H ). Then, H = H 0 H 1 , i.e., h H , it has the decomposition h = h 0 + h 1 , with h 0 H 0 and h 1 H 1 , and there are two kernel functions K 0 and K 1 such that K = K 0 + K 1 , h H , ( K 0 h ) ( · ) : = < K 0 ( · , ) , h > H H 0 and ( K 1 h ) ( · ) : = < K 1 ( · , ) , h > H H 1 . Here, H 1 is also an RKHS with the reproducing kernel K 1 ( · , · ) on H 1 and ( K 1 h ) = h , h H 1 . Because K 0 μ H 0 and K 1 μ H 1 , we have | | K 0 μ | | H 2 = 0 , < K 0 μ , K 1 μ > H = 0 , and
J ( μ ) = | | K μ | | H 2 = | | K 0 μ + K 1 μ | | H 2 = | | K 0 μ | | H 2 + 2 < K 0 μ , K 1 μ > H + | | K 1 μ | | H 2 = | | K 1 μ | | H 2 = < K 1 μ , K 1 μ > H .
Note that the inner product < · , · > H of the RKHS often is not the inner product < · , · > used in the optimization objective, and the latter is often chosen as the L 2 norm. Thus, the expression of J ( μ ) in (4) does not hold under the inner product < · , · > . Often the norm < · , · > in (3) is more suitable for statistical interpretation while the norm < · , · > H is chosen for convenience of the computation of the penalty term J ( μ ) .
The RKHS estimator often has a closed form solution called representer theorem. Such a result was known for decades. Ref. [13] presented such results in their case. Here, we present it in our case. Let d = d i m ( H 0 ) and g 1 , , g d be an orthonormal basis of H 0 .
Theorem 1. 
Assume μ 0 ( · ) , Y ^ i ( · ) H for i = 1 , , n . Then, for the given penalty functional J ( μ ) = | | K 1 μ | | H 2 and fixed λ, there are constants a = ( a 1 , , a d ) and b = ( b 1 , , b n ) such that μ ^ n , λ given in (3) has the following representation
μ ^ n , λ ( t ) = j = 1 d a j g j ( t ) + i = 1 n b i ( K 1 Y ^ i ) ( t ) , t ( 0 , T ] .
Thus, instead of searching a function in a Hilbert space, for minimizing (3), only the two parametric vectors a and b are to be estimated based on this represent theorem, which is called the optimal basis theorem in ([1], p. 363).
The λ in (3) is a smoothing parameter and 0 < λ < . Unlike functional regression estimation, when λ 0 , the estimation of function μ will not break down, and μ ^ n , 0 is obtained as the sample mean of the Y ^ i which does not have desirable smoothness. When λ , the above procedure is equivalent to minimize J ( μ ) , and in the case of d i m ( H 0 ) = 2 , the estimator of μ ^ n , is linear in t. The most commonly used method for the choice of the smoothing parameter is the cross-validation (CV) such that the λ minimizes
n 1 i = 1 n T Y ^ i ( t ) μ ^ n , λ , i ( t ) 2 d t ,
where μ ^ n , λ , i ( · ) is the estimator in (3) without using the i-th observation Y ^ i . However, this method is computationally intensive. An improved version of this method is the generalized cross-validation (GCV) proposed by [24,25]. For a given linear operator A on H , let { η i : i = 1 , 2 , } be the eigenvalues of A such that | η 1 | | η 2 | . For integer m, define
| | A | | m = j = 1 m η j , M S E ( λ ) = 1 n i = 1 n T Y ^ i ( t ) μ ^ n , λ ( t ) 2 d t .
By Theorem 1, the estimator of μ 0 ( · ) can be written in the form μ ^ n , λ ( t ) = = ( K λ Y ^ n ) ( t ) , and K λ is a linear combination of K 0 and K 1 . Let I be the identity operator on H , and the smoothing parameter λ is chosen by minimizing the following G C V ( · ) ,
G C V ( λ ) = lim m m 1 | | I K λ | | m 2 M S E ( λ ) .
Obviously, the smoothing parameter λ above is dependent on the sample size n, and λ ( n ) 0 as n . For simplicity, we will use λ instead of λ ( n ) through this paper.
To obtain the asymptotic distribution of the proposed estimator in (3), a specific kernel function K ( · , · ) has to be chosen.
Recall d = d i m ( H 0 ) . A common choice is d = 2 , H 0 = { h : h ( 2 ) 0 } is spanned by g 1 ( t ) 1 and g 2 ( t ) = t , and K 1 for H 1 is chosen by
K 1 ( s , t ) = 1 ( 2 ! ) 2 B 2 ( s ) B 2 ( t ) 1 4 ! B 4 ( s t ) ,
where B r ( · ) is the r-th Bernoulli polynomial, t = t [ t ] is the fractional part of t and [ t ] is the integer part of t [19,20]. However, with this kernel function, the penalty term in (3) is J ( μ ) = b Ω b where Ω = ( ω i j ) is a n × n matrix with ω i j = T T Y ^ i ( s ) K 1 ( s , t ) Y ^ j ( t ) d s d t (see proof of Theorem 1 in Appendix A). Then, the computation for μ ^ n , λ ( · ) in (3) suffers from the inverse of a n × n matrix, which is a hurdle for a large n, and is difficult to obtain the asymptotic distribution of the estimator.
To construct an adequate RKHS H on L 2 [ 0 , T ] , we consider H 0 = { h : h ( 2 ) ( · ) 0 } with inner product < f , g > H , 0 , and its orthogonal complement H 1 = { h : h ( j ) ( 0 ) = 0 , j = 0 , 1 ; 0 T h ( 2 ) ( t ) d t < } with inner product < f , g > H , 1 , where
< f , g > H , 0 = j = 0 1 f ( j ) ( 0 ) g ( j ) ( 0 ) , < f , g > H , 1 = 0 T f ( 2 ) ( t ) g ( 2 ) ( t ) d t .
The inner product on H is defined as < · , · > H = < · , · > H , 0 + < · , · > H , 1 . Kernels for the RKHS with more general K 0 for H 0 and K 1 for H 1 with these inner products can be found in ([26], pp. 33–34). More general methods for construction of kernels K 0 and K 1 can be found in ([1] Section 20.3). In particular, we propose
K 0 ( s , t ) = 1 + s t , K 1 ( s , t ) = 0 T ( s u ) + ( t u ) + d u = ( s t ) 2 3 ( s t ) ( s t ) / 6 .
With the inner product given in (5), let K = K 0 + K 1 , then h H , h ( t ) = < K ( t , · ) , h ( · ) > H , and H 0 and H 1 are orthogonal to each other with respect to < · , · > H .
Let g ( t ) = ( 1 , t ) ; ( K 0 h ) ( t ) = < K 0 ( t , · ) , h ( · ) > H , 0 , ( K 1 h ) ( t ) = < K 1 ( t , · ) , h ( · ) > H , 1 , X n ( t ) = ( K 1 Y ^ ) ( t ) = ( K 1 Y ^ 1 ) ( t ) , , ( K 1 Y ^ n ) ( t ) , Y ¯ n ( t ) = 1 n i = 1 n Y ^ i ( t ) , R = < g ( · ) , g ( · ) > H , 0 , U n = < Y ¯ n ( · ) , g ( · ) > H , 0 , V n = < g ( · ) , X n ( · ) > H , S n = < Y ¯ n ( · ) , X n ( · ) > H , W n = < X n ( · ) , X n ( · ) > H = < X n ( · ) , X n ( · ) > H , 1 and
Ω = ( ω i j ) n × n , with ω i j = < Y ^ i ( · ) , ( K 1 Y ^ j ) ( · ) > H = < Y ^ i ( · ) , ( K 1 Y ^ j ) ( · ) > H , 1 .
Denote K 1 ( 2 ) ( t , s ) = 2 K 1 ( t , s ) / s 2 , then K 1 ( 2 ) ( t , s ) = t s if s t and 0 if s > t , and
( K 1 Y ¯ n ) ( t ) = < K 1 ( t , · ) , Y ¯ n ( · ) > H , 1 = 0 t ( t s ) Y ¯ n ( 2 ) ( s ) d s = Y ¯ n ( t ) Y ¯ n ( 0 ) t Y ¯ n ( 1 ) ( 0 ) .
By Theorem 1, μ ^ n , λ has the expression,
μ ^ n , λ ( t ) = a g ( t ) + b X n ( t ) ,
and the coefficients a and b satisfy
0 = U n + R a + V n b 0 = S n + V n a + ( λ Ω + W n ) b ,
because μ ^ n , λ in (7) minimizes (4). The solution of (8) is given by
a ^ n b ^ n = R V n V n λ Ω + W n 1 U n S n
and the estimate μ ^ n , λ in (7) for fixed λ becomes
μ ^ n , λ ( t ) = j = 1 2 a ^ j g j ( t ) + i = 1 n b ^ i ( K 1 Y ^ i ) ( t ) = ( g ( t ) , X n ( t ) ) R V n V n λ Ω + W n 1 U n S n .
Because each component of X n ( t ) is an element of H 1 , each component of g ( t ) is in H 0 , and H 0 and H 1 are orthogonal with respect to the inner product < · , · > H ; then, V n = < g , X n > H = 0 , W n = Ω , and S n = 1 n < X n ( · ) , Y ^ ( t ) 1 n > H = 1 n < X n ( · ) , Y ^ ( · ) > H 1 n = 1 n Ω 1 n , where 1 n is the n-dimensional vector of 1. Thus, we have
μ ^ n , λ ( t ) = g ( t ) U n + ( 1 + λ ) 1 1 n X n ( t ) 1 n .
By definitions of X n ( t ) and U n , we have 1 n X n ( t ) 1 n = ( K 1 Y ¯ n ) ( t ) , g ( t ) U n = g ( t ) < Y ¯ n , g > H , 0 = ( 1 , t ) ( Y ¯ n ( 0 ) , Y ¯ n ( 1 ) ( 0 ) ) = Y ¯ n ( 0 ) + Y ¯ n ( 1 ) ( 0 ) t = < K 0 , Y ¯ n > H , 0 ( t ) , and (7) becomes
μ ^ n , λ ( t ) = ( K 0 Y ¯ n ) ( t ) + ( 1 + λ ) 1 ( K 1 Y ¯ n ) ( t ) = ( K 0 Y ¯ n ) ( t ) + ( K 1 Y ¯ n ) ( t ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) = Y ¯ n ( t ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) ,
because K = K 0 + K 1 is a reproducing kernel of H , [ ( K 0 + K 1 ) Y ¯ n ] ( · ) = Y ¯ n ( · ) .
As a curve-smoothing curve estimation, μ ^ n , λ ( · ) is a biased estimator of μ 0 ( · ) and its bias is λ ( K 1 μ 0 ) ( · ) (unless λ = 0 and so no smoothing regularization). Below, we consider the asymptotic normality of μ ^ n , λ . Denote D is the convergence in distribution. Let l ( [ 0 , T ] ) be the space of bounded functions on [ 0 , T ] equipped with the supreme norm, and D for weak convergence in the space l ( [ 0 , T ] ) .
(C1). E T Y ^ ( t ) E [ Y ^ ( t ) ] 2 d t < .
(C2). T μ 0 2 ( t ) d t < .
(C3). δ n : = max j = 1 , , m i 1 ; i = 1 , , n ( t i , j + 1 t i j ) 0 (a.s.) as n .
Theorem 2. 
(i) Assume (C1)–(C3), and Ω defined above is invertible for all large n, then as n (also, λ 0 ),
| | μ ^ n , λ μ 0 | | 0 ( a . s . )
(ii) In addition, μ 0 ( · ) is twice differentiable with its second-order derivative μ ¨ 0 ( · ) , then
n 1 / 2 μ ^ n , λ ( t ) μ 0 ( t ) b n ( t ) D N ( 0 , σ 2 ( t ) ) ,
where σ 2 ( t ) = V a r [ Y ( t ) ] and with t j S ,
b n ( t ) = μ ¨ 0 ( t j ) ( t j + 1 t ) ( t t j ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) + o ( t j + 1 t j ) 2 , for t [ t j , t j + 1 ) .
(iii) If we assume further that Y ^ i ( · ) , μ 0 ( · ) H ( α ) for all i for some α > 0 , then
G n ( · ) : = n 1 / 2 μ ^ n , λ ( · ) μ 0 ( · ) b n ( · ) D G ( · ) ,
where G is the zero mean Gaussian process on [ 0 , T ] with covariance function R ( s , t ) = C o v [ Y ( s ) , Y ( t ) ] .

3. Hypothesis Tests for Mean Curves

In this section, two types of tests for mean curves of functional data are considered: one is to test the hypothesis of equal mean curves from two populations and another one is the hypothesis that the mean function μ ( · ) belongs to some subspace H 0 of H .

3.1. Test the Equality of Two Mean Curves

Suppose two observed samples are { Y ^ 1 , i : i = 1 , n 1 } i.i.d. from Y 1 and { Y ^ 2 , i : i = 1 , , n 2 } i.i.d. from Y 2 , with their mean curves μ 1 ( · ) and μ 2 ( · ) , respectively. The two samples are assumed to be independent. For the RKHS H on L 2 [ 0 , T ] with inner product (5) and kernel (6), their mean curve estimates are given by (9) as
μ ^ j , λ j = Y ¯ n j , j ( t ) λ j 1 + λ j ( K 1 Y ¯ n j , j ) ( t ) , ( j = 1 , 2 ) .
Let | T | be the Lebesgue measure of the set [ 0 , T ] . We are to test the null hypothesis
H 0 : μ 1 ( · ) = μ 2 ( · ) ( a . e . ) v s H 1 : μ 1 ( · ) μ 2 ( · ) .
In the above, (a.e.) means almost everywhere, and μ 1 ( · ) μ 2 ( · ) means not equal on some set with nonzero Lebesque measure. For this, we propose the test statistic
T n = 1 | T | | | n 1 n 2 n 1 + n 2 μ ^ 1 , λ 1 μ ^ 2 , λ 2 | | 2 = 1 | T | n 1 n 2 n 1 + n 2 0 T μ ^ 1 , λ 1 ( t ) μ ^ 2 , λ 2 ( t ) 2 d t ,
where λ 1 and λ 2 are determined by (6).
Theorem 3. 
Assume the conditions of Theorem 2 (iii) for the two samples, 0 < n 1 / ( n 1 + n 2 ) < 1 , and ( λ 1 + λ 2 ) n 1 n 2 / ( n 1 + n 2 ) 0 . Then, under H 0 ,
T n D 1 | T | j = 1 γ j Z j 2 : = W ,
where the Z j are i.i.d. N ( 0 , 1 ) random variables, and the γ j are the eigenvalues of
R ( s , t ) = j = 1 2 α j 2 C o v [ Y ^ j ( s ) , Y ^ j ( t ) ] , s , t [ 0 , T ] ,
and α j = lim n j / n 1 + n 2 , ( j = 1 , 2 ) .
Theorem 3 can be viewed as a generalization of Mahalanobis statistic for finite-dimensional statistics; it is analogous to the result in ([27], p. 66). In fact, if we observe the Y 1 i = ( Y 1 i , 1 , , Y 1 i , k ) ( i = 1 , , n 1 ) and Y 2 i = ( Y 2 i , 1 , , Y 2 i , k ) ( i = 1 , , n 2 ) at fixed k time points, with corresponding mean values ( μ 1 , 1 , , μ 1 , k ) and ( μ 2 , 1 , , μ 2 , k ) . We take | | · | | H as the L 2 -norm and with no penalty, i.e., λ 1 = λ 2 = 0 , then μ ^ 1 , j and μ ^ 2 , j are just the corresponding sample mean ( j = 1 , 2 ) , and Theorem 4 reduces to
T n = 1 k j = 1 k n 1 n 2 n 1 + n 2 μ ^ 1 , j μ ^ 2 , j 2 D j = 1 k γ j Z j 2 ,
with γ j being the eigenvalues of R = ( r i j ) 1 i , j k , r i j = a , b = 1 2 α a α b E [ Y a , i Y b , j ] .
In practice, the eigenvalues of R ( s , t ) above cannot be perfectly computed. As approximation, we compute the eigenvalues γ ^ j ( j = 1 , , m ) of the matrix R n = ( r i j ) 1 i , j m for some specified large integer m, where r i j = a , b = 1 2 α a α b Y ¯ a ( t i ) Y ¯ b ( t j ) ] , Y ¯ 1 ( t i ) = n 1 1 j = 1 n 1 Y 1 j ( t i ) and Y ¯ 1 ( t i ) = n 2 1 j = 1 n 2 Y 2 j ( t i ) .
If k is relatively large, only the first p largest eigenvalues are needed for good approximation, with some chosen p ( < k ) . Let λ 1 , , λ p be the p largest eigenvalues, and λ ^ j be their estimates. Then, by Theorem 2.7 in ([27], p. 31), E ( λ ^ j λ j ) 2 = O ( n 1 ) , for all 1 j p , i.e., the estimates are good up to order O ( n 1 ) .

3.2. Test Mean Curve in Some Subspace

This type of test has been systematically studied since the late 1980s. Ref. [28] developed such a test for regression function, and many related references can be found therein. Without loss of generality, we only consider that the subspace H 0 is of polynomials of degree no more than three, and the hypothesis
H 0 : μ ( · ) H 0 versus H 1 : μ ( · ) H 0 .
To test H 0 , we need the penalized estimate μ ˜ n , λ of μ in H 0
μ ˜ n , λ = arg min μ H 0 1 n i = 1 n | | Y ^ i μ | | 2 + λ J ( μ ) .
Let H 12 be the subspace spanned by { t 4 } , H 11 = H 1 \ H 12 , and H 1 = H 11 H 12 . Let K 11 and K 12 be the reproducing kernels for H 11 and H 12 . Let Ω ˜ = ( ω ˜ i j ) be the n × n matrix with ω ˜ i j = < Y ^ i , ( K 11 Y ^ j ) > H . Denote X ˜ n ( t ) = ( K 11 Y ^ 1 ) ( t ) , , ( K 11 Y ^ n ) ( t ) , V ˜ n = < g , X ˜ n > H , S ˜ n = < Y ¯ n , X ˜ n > H , W ˜ n = < X ˜ n , X ˜ n > H , and U n , Y ¯ n ( t ) and R as before. For h H , let ( K 11 h ) ( t ) = < K 11 ( · , t ) , h ( · ) > H 1 and ( K 12 h ) ( t ) = < K 12 ( · , t ) , h ( · ) > H 1 . Then,
μ ˜ n , λ ( t ) = j = 1 d a j g j ( t ) + i = 1 n b i ( K 11 Y ^ i ) ( t ) , t [ 0 , T ] .
and the coefficients ( a , b ) minimizing (18) are
a ˜ n b ˜ n = R V ˜ n V ˜ n λ Ω ˜ + W ˜ n 1 U n S ˜ n
and the estimate μ ˜ n , λ , for fixed λ , is
μ ˜ n , λ ( t ) = j = 1 2 a ˜ j g j ( t ) + i = 1 n b ˜ i ( K 11 Y ^ i ) ( t ) = ( g ( t ) , X n ( t ) ) R V ˜ n V ˜ n λ Ω ˜ + W ˜ n 1 U n S ˜ n , t [ 0 , T ] .
When H 0 is true, μ ^ n , λ ( · ) and μ ˜ n , λ ( · ) are expected to be close, and so large observed absolute value of any monotone functional of their difference will be evidence against H 0 . The following result characterizes such a difference and can be used to test H 0 .
Theorem 4. 
Under H 0 , we have (i) Assume conditions of Theorem 2 (i), then
n 1 / 2 μ ^ n , λ ( t ) μ ˜ n , λ ( t ) D N ( 0 , τ 2 ( t ) ) ,
where τ 2 ( t ) = E [ D 2 ( t ) ] , D ( t ) = ( K 12 [ Y μ 0 ] ) ( t ) .
(ii) Assume conditions of Theorem 2 (iii), then
D n ( · ) : = n 1 / 2 μ ^ n , λ ( · ) μ ˜ n , λ ( · ) D D ( · ) ,
where D ( · ) is the mean zero Gaussian process on T with covariance function Q ( s , t ) = C o v [ D ( s ) , D ( t ) ] .
In application, τ 2 ( · ) is estimated by τ ^ 2 ( t ) = ( n 1 ) 1 i = 1 n { ( K 12 [ Y i μ ˜ n , λ ] ) ( t ) } 2 , and Q ( s , t ) is estimated by Q ^ ( s , t ) = n 1 i = 1 n K 12 [ Y i μ ˜ n , λ ] ) ( s ) K 12 [ Y i μ ˜ n , λ ] ) ( t ) .

4. Numerical Analysis

To investigate the finite sample properties of the proposed procedures, we perform two simulation studies. The first study is to compare the proposed RKHS estimator of mean curves with the conventional local linear smooth and spline methods. The second study is to examine the performance of statistic T n for testing the equality of two mean curves. Then, a real data analysis illustrates the performance of our proposed procedures in this paper well.
Simulation 1. 
To compare with the commonly used local linear fit (R-package lowess) and spline smoother (R-package smooth.spline), we consider the estimator μ ^ n , λ ( · ) with the spacial choices of the kernel and inner products given in (22). We assume that the underlying individual curve i at time points t T = [ 0 , 10 ] is generated from y i ( t ) N ( μ ( t ) , σ 2 ( t ) ) , where μ ( t ) = t sin ( 5 + 3 t ) , σ 2 ( t ) = t 2 . For each subject i, the number of observation time points m i is assumed to generate from the uniform distribution on { 5 , 6 , , 20 } and the observation time points t i = ( t i 1 , , t i , m i ) are generated from E x p ( λ ) with λ = 0.6 . Then, interpolate the y ( t i j ) on T to obtain y ^ i ( · ) .
The fitted results are presented in Figure 2 with sample sizes of 50, 100 and 200, respectively.
The simulation shows that the proposed method (fitted curves (a1)–(a3) has a better performance than the other two methods. The RKHS estimator has a relatively stable performance; it has narrower confidence bands at the relatively dense region, and it becomes wider at the sparse region. For the local linear smoother (b1)–(b3), the width of its confidence bands has no apparent difference when the observation points change from relatively dense to sparse. The estimated curves with the local linear smoothing method have biases due to the sparse observation time points. The spline has a very good fit and confidence band when the data observation points are relatively dense, but the estimated curves have a large bias when the data become sparse, as seen in (c1)–(c3), near the right end of the x-axis, and the spline estimates are not stable, with some moderate to large fluctuations.
Simulation 2. 
In this simulation study, we examine the performance of the statistic T n for testing the hypothesis H 0 , the equality of the mean curves of two stochastic processes with the alternative hypothesis that two mean curves are not equal. We assume that the samples are generated as x i ( t ) N ( μ ( t ) , σ 2 ( t ) ) , where μ ( t ) = t sin ( 5 + ( 3 t ) ) , σ 2 ( t ) = t 2 , ( i = 1 , , 50 ) , and y i ( t ) N ( η ( t ) , σ 2 ( t ) ) , where η ( t ) = μ ( t ) + C cos ( 2 + ( 2 t ) , ( i = 1 , , 30 ) , respectively, and C is a turning parameter to measure the amount of difference between the two samples. We take the parameter C to be 0, 0.5, 0.7, 0.8, 0.9 and 1.0 with corresponding meaningful differences Δ 1 = | T | 1 T μ ( t ) η ( t ) 2 d t which are to be 0, 0.179, 0.352, 0.460, 0.582 and 0.718. For each subject i, the number of observation time points m i and the observation time points t i = ( t i 1 , , t i , m i ) are generated as in Simulation 1 above. The simulation results are presented in Table 1 based on 10,000 replications. Table 1 shows that the computed type I error is slightly less than the nominal type I error of 0.05 (first row with C = 0 ) and powers for C 0.5 .

Real Data Analysis

With the proposed method, we analyze the growth data from the National Growth and Health Study (NGHS) project (https://biolincc.nhlbi.nih.gov/studies/nghs/, accessed on 1 March 2016), sponsored by the National Heart, Lung, and Blood Institute, from 1985 to 2000. The main purpose of the study is to investigate the differences between Caucasian and African-American girls in the development of obesity in pubescent females due to psychosocial, socioeconomic and other environmental factors.
The NGHS is an epidemiological study of the cardiovascular risk factors in 1166 Caucasian and 1213 African-American girls during adolescence. In this longitudinal study, starting from age 9, each subject had a baseline examination and annual examinations. The study was renewed twice to continue the longitudinal investigation until the subjects reached the age of 19 to 20. We deleted those individuals (about 30 for each race) with only 1 or 2 observations, as our method is for longitudinal data, and the total number of subjects is n = n 1 + n 2 = 1136 + 1183 = 2319 . The number of follow-up visits for each subject varies from 3 to 10. The ages vary between 9 and 19 years, and we use the age range T = [ 9 , 19 ) . The body mass index (BMI) is used as the response y ( t ) as a function of age t. The mean curves are estimated using (22) for the two groups separately. The results are presented in Figure 3 which suggest that the girls’ BMI increases with age and shows that African-American girls tend to have higher BMIs than Caucasian girls.
Then, we test the null hypothesis of the equality of two mean curves for the two samples (Caucasian and African-American girls). The value of the statistic T n is 2576.302, the 95 % upper quartile of W in Theorem 6 is 938.439 and the corresponding p-value is 0.0015. Thus, the difference between Caucasian and African-American girls is statistically significant based on the observed data. The conclusion is consistent with other studies on these data. For example, Ref. [29] analyzed the same data. They used a varying coefficient model to analyze the regression relationship between systolic blood pressure (SBP) and age and race. Their conclusion is that African-American girls tend to have higher probabilities of “SBP > 100 mmHg” than Caucasian girls so that race is a factor affecting the SBP. It is known that SBP is strongly related to BMI.

5. Concluding Remarks

We have proposed and studied the reproducing kernel Hilbert space method for the analysis of functional data, motivated by a practical problem, in which the observation points are relatively dense in some time intervals and sparse in other time intervals. The unbalanced observed time points result in a biased estimation or a large variance of the estimation based on the current methods. The simulation studies indicate an apparent advantage of the proposed method compared to some commonly used methods for this type of data. We derived extensive theoretical results for the RKHS estimation, including convergence rates of estimates with two commonly used norms.
To use the RKHS methods for a functional data analysis, the key is to choose an adaptive kernel. Different kernels may result in a very different computational efficiency. In this paper, we proposed a special kernel and the corresponding estimator of the mean curve for functional data has a very simple expression. The asymptotic distribution of the estimator is also given. Furthermore, we proposed two statistics for testing the hypothesis of equal mean curves from two populations and the hypothesis that the mean function belongs to some subspace. The finite sample performance of the proposed methods is evaluated by the simulation studies and the methods provided new insight in the analysis of functional growth data in the NIH NGHS study.
As future works, we can extend the RKHS method to the case of case-control studies with observational functional data. With observational data, treatment assignment is often not randomized as in the ideal case; it is known that the naive estimate of a treatment–effect curve is biased and a causal inference method is needed. A popular such method is the doubly robust estimator commonly used in ordinary data. To construct such an estimator, a propensity score model and an outcome model will be specified, and as long as one of the models is correctly specified, the resulting estimator will be unbiased. To extend this estimator to functional data is non-trivial and will be our future work. Another possible extension is to consider the missing responses in the longitudinal data, with the case of missing not at random (MNAR), which is a topic of general interest.

Author Contributions

Conceptualization, A.Y. and H.-B.F.; methodology, A.Y.; software, M.X.; validation, A.Y., M.X. and H.-B.F.; formal analysis, M.X.; investigation, M.X., A.Y., H.-B.F.; resources, A.Y., H.-B.F., C.O.W. and M.T.T.; data curation, C.O.W., H.-B.F. and M.X.; writing—original draft preparation, A.Y.; writing—review and editing, A.Y. and H.-B.F.; visualization, M.X., A.Y. and H.-B.F.; supervision, M.T.T.; project administration, M.T.T.; funding acquisition, M.T.T. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Xiong, Yuan, Fang and Tan is partially supported by the National Cancer Institute (NCI) grants R01CA164717.

Informed Consent Statement

Patient consent was waived as the data is opened to public.

Data Availability Statement

The growth data from the national growth and health study (NGHS) project is avallable at https://biolincc.nhlbi.nih.gov/studies/nghs/, accessed on 1 March 2016.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1. 
Let H be the (closed) subspace linear spanned by { g j ( · ) , ( K 1 Y ^ i ) ( · ) : j = 1 , , d ; i = 1 , , n . } with respect to the norm < · , · > . Denote a = ( a 1 , , a d ) , g ( t ) = ( g 1 ( t ) , , g d ( t ) ) , b = ( b 1 , , b n ) and ( K 1 Y ^ ) ( t ) = ( ( K 1 Y ^ 1 ) ( t ) , , ( K 1 Y ^ n ) ( t ) ) . Let r ( · ) H \ H be orthogonal to g ( · ) and ( K 1 Y ^ ) ( · ) with respect to the norm < · , · > . We rewrite μ ^ n , λ ( · ) as
μ ^ n , λ ( t ) = a g ( t ) + b ( K 1 Y ^ ) ( t ) + r ( t ) .
We need only to show r 0 if μ ^ n , λ ( · ) in (A1) minimizes the right hand of (4).
Because K is the kernel function on H , we have that K = K 0 + K 1 , and h H , h = ( K h ) = ( K 0 h ) + ( K 1 h ) = h 0 + h 1 , h j = K j h H j ( j = 0 , 1 ) . As Y ^ i H , K 0 Y ^ i H 0 , and by definition of H 0 , | | K 0 Y ^ i | | H 2 = 0 , so K 0 Y ^ i = 0 . Thus, Y ^ i = K Y ^ i = K 0 Y ^ i + K 1 Y ^ i = K 1 Y ^ i H ( i = 1 , , n ) . Then, r ( · ) is also orthogonal to Y ^ i ( · ) ( i = 1 , , n ) (with respect to the norm < · , · > , because r H \ H ), and
1 n i = 1 n | | Y ^ i μ ^ n , λ | | 2 = 1 n i = 1 n | | Y ^ i j = 1 d a j g j j = 1 n b j ( K 1 Y ^ j ) r | | 2 = 1 n i = 1 n ( | | Y ^ i j = 1 d a j g j j = 1 n b i ( K 1 Y ^ j ) | | 2 2 < Y ^ i , r > 2 j = 1 d a j < g j , r > 2 j = 1 n b j < K 1 Y ^ j , r > + | | r | | 2 ) = 1 n i = 1 n | | Y ^ i j = 1 d a j g j j = 1 n b j ( K 1 Y ^ j ) | | 2 + | | r | | 2 .
Since ( K 1 g j ) 0 , we have ( K 1 μ ^ n , λ ) = b ( K 1 Y ^ ) + ( K 1 r ) and
J ( μ ^ n , λ ) = < K 1 μ ^ n , λ , K 1 μ ^ n , λ > H = b < K 1 Y ^ , K 1 Y ^ > H b + 2 b < K 1 Y ^ , K 1 r > H + < K 1 r , K 1 r > H .
Furthermore, K 1 is a linear operator { H , < · , · > } { K 1 H , < · , · > H } . Let K 1 * : { K 1 H , < · , · > H } { H , < · , · > } be its adjoint operator. Because K 1 and K 1 * are continuous bounded linear operators, H is closed with respect to both the norms | | · | | and | | · | | H . Thus, K 1 * ( K 1 Y ^ ) K 1 * H H (with respect to the norm < · , · > ), and obtain < K 1 Y ^ , K 1 r > H = < K 1 * ( K 1 Y ^ ) , r > = 0 due to the orthogonality of r ( · ) to H , i.e., < h , r > = 0 , h H . (A3) becomes
J ( μ ^ n , λ ) = b < K 1 Y ^ , K 1 Y ^ > H b + < K 1 r , K 1 r > H = b Ω b + J ( r ) ,
where Ω = ( ω i j ) n × n with ω i j = < K 1 Y ^ i , K 1 Y ^ j > H . From (A2) and (A4), we obtain
1 n i = 1 n | | Y ^ i μ ^ n , λ | | 2 + λ J ( μ ^ n , λ ) = 1 n i = 1 n | | Y ^ i j = 1 d a j g j j = 1 n b i ( K 1 Y ^ j ) | | 2 + | | r | | 2 + λ b Ω b + J ( r ) .
For any ( a , b ) , (A5) is minimized when | | r | | 2 + λ J ( r ) = 0 . J ( r ) 0 and λ > 0 imply r = 0 and Theorem 1 is proven. □
Proof of Theorem 2. 
(i) Because H 0 and H 1 are orthogonal with respect to the inner product < · , · > H , each component of X n ( t ) is an element of H 1 , and each component of g ( t ) is in H 0 , then V n = < g , X n > H = 0 . Denote W n = ( w i j ) n × n , w i j = < K 1 Y ^ i , K 1 Y ^ j > H . Because ( K 1 Y ^ i ) ( t ) = Y ^ i ( t ) ( K 0 Y ^ i ) ( t ) , ( K 1 Y ^ i ) ( t ) H 1 and ( K 0 Y ^ i ) ( t ) H 0 ,
w i j = < Y ^ i K 0 Y ^ i , K 1 Y ^ j > H = < Y ^ i , K 1 Y ^ j ) > H < K 0 Y ^ i , K 1 Y ^ j ) > H = < Y ^ i , K 1 Y ^ j > H = < Y ^ i , K 1 Y ^ j ) > H , 1 = ω i j ,
i.e., W n = Ω . Now, we have
μ ^ n , λ ( t ) = ( g ( t ) , X n ( t ) ) R 0 0 ( 1 + λ ) Ω 1 U n S n = g ( t ) R 1 U n + ( 1 + λ ) 1 X n ( t ) Ω 1 S n .
Let Y ^ ( t ) = ( Y ^ 1 ( t ) , , Y ^ n ( t ) ) , I 2 be the 2 × 2 identity matrix, and 1 n be the n-dimensional vector of 1. Note R = I 2 , Y ¯ n ( t ) = n 1 Y ^ ( t ) 1 n , and
S n = 1 n < X n ( · ) , Y ^ ( t ) 1 n > H = 1 n < X n ( · ) , Y ^ ( · ) > H 1 n = 1 n Ω 1 n .
Then, (A6) becomes
μ ^ n , λ ( t ) = g ( t ) U n + ( 1 + λ ) 1 1 n X n ( t ) 1 n .
By definition of X n ( t ) , we have 1 n X n ( t ) 1 n = ( K 1 Y ¯ n ) ( t ) . Recall the definition of U n , g ( t ) U n = g ( t ) < Y ¯ n , g > H , 0 = ( 1 , t ) ( Y ¯ n ( 0 ) , Y ¯ n ( 1 ) ( 0 ) ) = Y ¯ n ( 0 ) + Y ¯ n ( 1 ) ( 0 ) t = < K 0 , Y ¯ n > H , 0 ( t ) . We have that
μ ^ n , λ ( t ) = ( K 0 Y ¯ n ) ( t ) + ( 1 + λ ) 1 ( K 1 Y ¯ n ) ( t ) = ( K 0 Y ¯ n ) ( t ) + ( K 1 Y ¯ n ) ( t ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) = Y ¯ n ( t ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) .
Thus, μ ^ n , λ ( · ) is the smoothing of Y ¯ n ( · ) by the two operators K 0 and K 1 . Part (i) of the theorem is proved by the fact that λ = λ ( n ) 0 , and E ( Y ¯ n ( · ) ) E ( Y ( · ) ) = μ 0 under conditions (C1)–(C3).
(ii) Under (C1)–(C3), the Y ^ i ’s are i.i.d. but are dependent of n as the t i j is. Let S n = { T i , i = 1 , 2 , , n } and μ n ( · ) = E ( Y ¯ ( · ) | S n ) , the conditional expectation under the given observed time points of n subjects. We have
n 1 / 2 μ ^ n , λ ( t ) μ 0 ( t ) = n 1 / 2 ( Y ¯ n ( t ) μ n ( t ) + μ n ( t ) μ 0 ( t ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) ) = n 1 / 2 Y ¯ n ( t ) μ n ( t ) + b n ( t ) ,
where b n ( t ) = μ n ( t ) μ 0 ( t ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) . Note
n 1 / 2 Y ¯ n ( t ) μ n ( t ) = n 1 / 2 i = 1 n Y ^ i ( t ) E [ Y ^ i ( t ) ] .
The sequence Y ^ i ( t ) E [ Y ^ i ( t ) ] is i.i.d. with mean 0 and variance σ n 2 ( t ) = v a r ( Y ^ i ( t ) ) v a r ( Y ( t ) ) = σ 2 ( t ) as n . Then, using the central limit theory,
n 1 / 2 i = 1 n Y ^ i ( t ) E [ Y ^ i ( t ) ] D N ( 0 , σ 2 ) .
Similarly as in Theorem 1 in [30], by the assumption μ 0 H α , we obtain μ n ( t ) μ 0 ( t ) = O ( δ n α ) , and for t [ t j , t j + 1 ) with t j S ,
μ n ( t ) μ 0 ( t ) = μ ¨ 0 ( t j ) ( t j + 1 t ) ( t t j ) + o ( t j + 1 t j ) 2 .
Thus,
b n = ( μ ¨ 0 ( t j ) ( t j + 1 t ) ( t t j ) λ 1 + λ ( K 1 Y ¯ n ) ( t ) + o ( t j + 1 t j ) 2 .
(iii) Let ρ ( s , t ) = | Y ( s ) Y ( t ) | and ρ n ( s , t ) = n 1 i = 1 n ( Y ^ i ( s ) Y ^ i ( t ) ) 2 1 / 2 . The condition Y ^ i ( · ) , μ 0 ( · ) H ( α ) for all i for some α > 0 , and Corollary 2.7.2 in [31], we have
log N [ ] ( ϵ , H ( α ) , L 2 ) C ϵ 1 / ( 2 α ) < ,
for any ϵ > 0 and some constant C. Then, N [ ] ( ϵ , H ( α ) , L 1 ) < for every ϵ > 0 . By Theorem 2.4.1 in [26], H ( α ) is a Gelivenko–Cantelli class, which implies sup s , t T | ρ n ( s , t ) ρ ( s , t ) | 0 ( a . s . ) . Because H ( α ) is bounded, there is an envelop M < such that sup t T max i Y ^ i ( t ) E [ Y ^ i ( t ) ] 2 M . Moreover, N ( ϵ , H ( α ) , L 2 ) N [ ] ( ϵ / 2 , H ( α ) , L 2 ) and (A7) imply the uniform entropy condition
0 sup Q log N ( ϵ M , H α , L 2 ( Q ) ) d ϵ < ,
where the sup Q is for all probability measures Q. Assume suitable measurable conditions, then by Theorem 2.8.9 in [31], H ( α ) is a Donsker class, i.e., G n D G in l ( T ) , and G is a Gaussian process with mean 0 and covariance function R ( s , t ) = C o v [ Y ( s ) , Y ( t ) ] . □
Proof of Theorem 3. 
Let G j , n j ( · ) = n j 1 / 2 ( · ) Y ¯ j , n j ( · ) μ j ( · ) . In the proof of Theorem 2, we see that μ ^ j , λ j ( t ) = Y ¯ j , n j ( t ) [ λ j / ( 1 + λ j ) ] ( K 1 Y ¯ j , n j ) ( t ) , where Y ¯ j , n j ( · ) is the corresponding sample mean. Under H 0 , μ 1 ( · ) = μ 2 ( · ) ,
T n = 1 | T | | | n 2 [ n 1 + n 2 ] 1 / 2 G 1 , n 1 ( · ) + n 1 [ n 1 + n 2 ] 1 / 2 G 2 , n 2 ( · ) j = 1 2 λ j 1 + λ j n 1 1 / 2 n 2 1 / 2 [ n 1 + n 2 ] 1 / 2 ( K 1 Y ¯ j , n j ) ( · ) | | 2 = 1 | T | | | α 1 G 1 , n 1 ( · ) + α 2 G 2 , n 2 ( · ) | | 2 + o p ( 1 ) .
By Theorem 2 (iii),
α 1 G 1 , n 1 ( · ) + α 2 G 2 , n 2 ( · ) D G ( · ) ,
where G ( · ) = α 1 G 1 ( · ) + α 2 G 2 ( · ) , G 1 ( · ) and G 2 ( · ) are independent, G j ( · ) is a mean zero Gaussian process on T, with covariance function R j ( s , t ) = C o v [ Y j ( s ) , Y j ( t ) ] ( s , t T ) ( j = 1 , 2 ) . Thus, G ( · ) is a mean zero Gaussian process on T, with covariance function
R ( s , t ) = j = 1 2 α j 2 R j ( s , t ) , s , t T .
Hence, we have
T n D 1 | T | | | G ( · ) | | 2 .
Because R ( · , · ) is a.e. continuous and T is bounded, then R 2 ( · , · ) is integrable, i.e., T T R 2 ( s , t ) d s d t < . By Mercer’s Theorem (see Theorem 5.2.1 in [23], p. 208), we have
R ( s , t ) = j = 1 γ j h j ( s ) h j ( t ) ,
where γ j 0 ( j = 1 , 2 , ) are the eigenvalues of R ( · , · ) , and h j ( · ) ( j = 1 , 2 , ) are the corresponding orthonormal eigenfunctions ( T h i ( t ) h j ( t ) d t = 0 for i j , and T h i 2 ( t ) d t = 1 for all i). Let Z 1 , , Z m , be i.i.d. random variables and Z m N ( 0 , 1 ) , then Z ( t ) = j = 1 γ j Z j h j ( t ) is a Gaussian process on T with mean zero and covariance function R ( s , t ) . Thus, two stochastic processes G ( · ) and Z ( · ) have the same distribution on T, i.e.,
G ( · ) = d Z ( · ) = j = 1 γ j Z j h j ( · )
and for | | · | | being the L 2 -norm,
1 | T | | | G ( · ) | | 2 = 1 | T | T G 2 ( t ) d t = d 1 | T | T j = 1 γ j Z j h j ( t ) 2 d t = 1 | T | j = 1 γ j Z j 2 .
Proof of Theorem 4. 
(i) As in the proof of Theorem 3 (i), μ ˜ n , λ ( t ) = ( K 0 Y ¯ n ) ( t ) + ( 1 + λ ) 1 ( K 11 Y ¯ n ) ( t ) , and we have
μ ^ n , λ ( t ) μ ˜ n , λ ( t ) = ( 1 + λ ) 1 ( [ K 1 K 11 ] Y ¯ n ) ( t ) = ( 1 + λ ) ( [ K 1 K 11 ] Y ¯ n ) ( t ) + O ( λ 2 ) = ( 1 + λ ) ( K 12 Y ¯ n ) ( t ) .
Note that under H 0 , E { ( K 12 Y ¯ n ) ( · ) } = ( [ K 1 K 11 ] μ 0 ) ( · ) = 0 . So, under H 0 ,
n 1 / 2 μ ^ n , λ ( t ) μ ˜ n , λ ( t ) = n 1 / 2 ( 1 + λ ) ( K 12 [ Y ¯ n μ 0 ] ) ( t ) + O ( λ 2 ) = ( 1 + λ ) n 1 / 2 ( K 12 [ Y ¯ n μ 0 ] ) ( t ) + o ( 1 ) D N ( 0 , τ 2 ( t ) ) ,
where τ 2 ( t ) = E K 12 [ Y μ 0 ] ) ( t ) 2 .
(ii) The proof is similar to that of Theorem 2 (ii). □

References

  1. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
  2. Clarkson, D.B.; Fraley, C.; Gu, C.; Ramsay, J.O. S+ Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
  3. Ferraty, F.; Vieu, P. Nonparametric Fuctional Data Analysis; Springer: New York, NY, USA, 2006. [Google Scholar]
  4. Zhang, C.; Peng, H.; Zhang, J.-T. Two samples tests for functional data. Commun. Stat. Theory Methods 2010, 39, 559–578. [Google Scholar] [CrossRef]
  5. Degras, D. Simultaneous confidence bands for the mean of functional data. WIRS Comput. Stat. 2017, 9, e1397. [Google Scholar] [CrossRef] [Green Version]
  6. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
  7. Shi, M.; Weiss, R.E.; Taylor, J.M.G. An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. Appl. Stat. 1996, 45, 151–163. [Google Scholar] [CrossRef]
  8. Rice, J.A.; Wu, C.O. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 2001, 57, 253–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Huang, J.Z.; Wu, C.O.; Zhou, L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 2002, 89, 111–128. [Google Scholar] [CrossRef]
  10. Yao, F.; Müller, H.-G.; Wang, J.-L. Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 2005, 100, 577–590. [Google Scholar] [CrossRef]
  11. Staicu, A.-M.; Li, Y.; Ruppert, D.; Crainiceanu, C.M. Likelihood ratio tests for dependent data with applications to longitudinal and functional data analysis. Scand. J. Stat. 2014, 41, 932–949. [Google Scholar] [CrossRef] [Green Version]
  12. Wang, J.-L.; Chiou, J.-M.; Müller, H.-G. Functional data analysis. Annu. Rev. Stat. Its Appl. 2016, 3, 257–295. [Google Scholar] [CrossRef] [Green Version]
  13. Cai, T.; Yuan, M. Optimal estimation of the mean functions based on discretely sampled functional data: Phase transition. Ann. Stat. 2011, 39, 2330–2355. [Google Scholar] [CrossRef] [Green Version]
  14. Happ, C.; Greven, S. Multivariate functional principal component analysis for data observed on different (dimensional) domains. J. Am. Stat. Assoc. 2018, 113, 649–659. [Google Scholar] [CrossRef]
  15. Flores, M.; Naya, S.; Fernández-Casal, R.; Zaragoza, S.; Rana, P.; Tarrio-Saavedra, J. Constructing a control chart using functional data. Mathematics 2020, 8, 58. [Google Scholar] [CrossRef] [Green Version]
  16. Carroll, C.; Müller, H.G.; Kneip, A. Cross-component registration for multivariate functional data, with application to growth curves. Biometrics 2021, 77, 839–851. [Google Scholar] [CrossRef]
  17. Mel<i>e</i>´ndez, R.; Giraldo, R.; Leiva, V. Sign, Wilcoxon and Mann-Whitney tests for functional data: An approach based on random projections. Mathematics 2021, 9, 44. [Google Scholar] [CrossRef]
  18. Ran, M.; Yang, Y. Optimal estimation of large functional and longitudinal data by using functional linear mixed model. Mathematics 2022, 10, 4322. [Google Scholar] [CrossRef]
  19. Yuan, M.; Cai, T. A reproducing kernel Hilbert space approach to functional linear regression. Ann. Stat. 2010, 38, 3412–3444. [Google Scholar] [CrossRef]
  20. Wahba, G. Spline Models for Observational Data; SIAM: Philadelphia, PA, USA, 1990. [Google Scholar]
  21. Li, Y.; Liu, Y.; Zhu, J. Quantile regression in reproducing kernel Hilbert space. J. Am. Stat. 2007, 102, 255–268. [Google Scholar] [CrossRef]
  22. Hazewinkel, M. Spline interpolation. In Encyclopedia of Mathematics 1; Springer: New York, NY, USA, 2001. [Google Scholar]
  23. Berlinet, A.; Thomas-Agnan, C. Reproducing Kernel Hilbert Space in Probability and Statistics; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2004. [Google Scholar]
  24. Wahba, G. A survey of some smoothing problems and the method of generalized cross-validation for solving them. In Applications of Statistics; Krisnaiah, P.R., Ed.; North Holland: Amsterdam, The Netherlands, 1977; pp. 507–523. [Google Scholar]
  25. Craven, P.; Wahba, G. Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 1979, 31, 377–403. [Google Scholar] [CrossRef]
  26. Gu, C. Smoothing Spline ANOVA Models; Springer: New York, NY, USA, 2002. [Google Scholar]
  27. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer: New York, NY, USA, 2012. [Google Scholar]
  28. Stute, W. Nonparametric model checks for regression. Ann. Stat. 1997, 25, 613–641. [Google Scholar] [CrossRef]
  29. Wu, C.O.; Tian, X. Nonparametric estimation of conditional distributions and rank-tracking probabilities with time-varying transformation models in longitudinal studies. J. Am. Stat. Assoc. 2013, 108, 971–982. [Google Scholar] [CrossRef]
  30. Yuan, A.; Fang, H.-B.; Wu, C.O.; Tan, M.T. Hypothesis testing for multiple mean and correlation curves with functional data. Stat. Sin. 2019; in press. [Google Scholar] [CrossRef]
  31. Van der Vaart, A.; Wellner, J. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
Figure 1. Raw data in the NGHS study.
Figure 1. Raw data in the NGHS study.
Mathematics 10 04549 g001
Figure 2. Solid line: true curve; dotted middle blue line: estimated curve; dotted lower and upper blue lines: 95% confidence bands. The first, second and third rows are for sample size of 50, 100 and 200, respectively. In each row, the left panel is the proposed method, the middle panel is the local linear smoother and the right panel is the spline estimate.
Figure 2. Solid line: true curve; dotted middle blue line: estimated curve; dotted lower and upper blue lines: 95% confidence bands. The first, second and third rows are for sample size of 50, 100 and 200, respectively. In each row, the left panel is the proposed method, the middle panel is the local linear smoother and the right panel is the spline estimate.
Mathematics 10 04549 g002
Figure 3. The estimated mean curves of girls’ BMI.
Figure 3. The estimated mean curves of girls’ BMI.
Mathematics 10 04549 g003
Table 1. Power/type I error of T n -based simulated 10,000 replications.
Table 1. Power/type I error of T n -based simulated 10,000 replications.
Turning Parameter (C) Δ T n Power
000.0700.044
0.50.1797.1110.120
0.70.35214.6900.293
0.80.46018.4460.504
0.90.58223.5480.872
1.00.71830.6160.999
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xiong, M.; Yuan, A.; Fang, H.-B.; Wu, C.O.; Tan, M.T. Estimation and Hypothesis Test for Mean Curve with Functional Data by Reproducing Kernel Hilbert Space Methods, with Applications in Biostatistics. Mathematics 2022, 10, 4549. https://doi.org/10.3390/math10234549

AMA Style

Xiong M, Yuan A, Fang H-B, Wu CO, Tan MT. Estimation and Hypothesis Test for Mean Curve with Functional Data by Reproducing Kernel Hilbert Space Methods, with Applications in Biostatistics. Mathematics. 2022; 10(23):4549. https://doi.org/10.3390/math10234549

Chicago/Turabian Style

Xiong, Ming, Ao Yuan, Hong-Bin Fang, Colin O. Wu, and Ming T. Tan. 2022. "Estimation and Hypothesis Test for Mean Curve with Functional Data by Reproducing Kernel Hilbert Space Methods, with Applications in Biostatistics" Mathematics 10, no. 23: 4549. https://doi.org/10.3390/math10234549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop