Next Article in Journal
Theoretical Bounds on the Number of Tests in Noisy Threshold Group Testing Frameworks
Next Article in Special Issue
Non-Asymptotic Bounds of AIPW Estimators for Means with Missingness at Random
Previous Article in Journal
Mathematical Models for Typhoid Disease Transmission: A Systematic Literature Review
Previous Article in Special Issue
Sharper Sub-Weibull Concentrations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions

1
College of Mathematics and Statistics, Guangxi Normal University, Guilin 541004, China
2
Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, DC 20057, USA
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(14), 2507; https://doi.org/10.3390/math10142507
Submission received: 20 June 2022 / Revised: 14 July 2022 / Accepted: 15 July 2022 / Published: 19 July 2022
(This article belongs to the Special Issue New Advances in High-Dimensional and Non-asymptotic Statistics)

Abstract

:
We investigate a nonparametric, varying coefficient regression approach for modeling and estimating the regression effects caused by two functionally correlated datasets. Due to modern biomedical technology to measure multiple patient features during a time interval or intermittently at several discrete time points to review underlying biological mechanisms, statistical models that do not properly incorporate interventions and their dynamic responses may lead to biased estimates of the intervention effects. We propose a shared parameter change point function-on-function regression model to evaluate the pre- and post-intervention time trends and develop a likelihood-based method for estimating the intervention effects and other parameters. We also propose new methods for estimating and hypothesis testing regression parameters for functional data via reproducing kernel Hilbert space. The estimators of regression parameters are closed-form without computation of the inverse of a large matrix, and hence are less computationally demanding and more applicable. By establishing a representation theorem and a functional central limit theorem, the asymptotic properties of the proposed estimators are obtained, and the corresponding hypothesis tests are proposed. Application and the statistical properties of our method are demonstrated through an immunotherapy clinical trial of advanced myeloma and simulation studies.

1. Introduction

Modern biomedical technology has made it possible to measure multiple patient features during a time interval or intermittently at several discrete time points to review underlying biological mechanisms. Functional data also arise in genetic studies—a massive amount of gene expression data is recorded for each subject and could be treated as a functional curve [1]. Functional data analysis provides distinct features related to the dynamics of cellular responses and activity and other biological processes. Existing methods, such as projection, dimension-reduction, and functional linear regression analysis, are not adapted for such data. Overviews can be found in the book by Horváth and Kokoszka [2] and some recently published papers such as Yuan et al. [3] and Lai et al. [4].
Ramsay and Silverman [5], Clarkson et al. [6], and Ferraty and Vieu [7] introduced some basic tools and widely accepted methods for functional data analysis; Horváth and Kokoszka [2] established some fundamental methods for estimation and hypothesis testing on mean functions and covariance operators of functional data. The topics are broad and the results are in depth. Conventionally, each data curve is assumed to be observed over a dense set of points, often over thousands of points, then smoothing techniques are used to produce continuous curves, and these curves are treated as completely observed functional data for statistical inference. In contrast with those assumptions, we consider the more practical issues in which the data curves are only observed at some (not dense) time points, and the observed data curves are actually interpolations at those observed points. Of course, a relatively large sample size is needed for sparse observations. The effects of both number of observation points and sample size are also considered in our analysis.
For analyzing longitudinal data, Zeger and Diggle [8] considered a semiparametric regression model of the form, with longitudinal observations
Y ( t ) = X ( t ) β + θ ( t ) + ϵ ( t ) , t T ,
where Y ( t ) is the response variable, X ( t ) is the p × 1 covariate vector at time t, β is a p × 1 constant vector of unknown regression coefficients, θ ( t ) is an unspecified baseline function, ϵ ( t ) is a zero-mean stochastic process, and T represents the observation interval. Under this model, Lin and Ying [9] estimated β via a weighted least squares estimator based on the theory of counting processes; Fan and Li [10] further studied this model using a weighted difference-based estimator and a weighted local linear estimator followed by statistical inference, as discussed in Xue and Zhu [11].
For functional data analysis, the data are often represented by ( y i , x i ( · ) ) ( i = 1 , , n ) , and the model is [12,13,14]
y i = T β ( t ) x i ( t ) d t + ϵ i .
Some researchers considered the following model [2,5,15]
y i ( t ) = T β ( s , t ) x i ( s ) d s + ϵ i ( t ) .
To estimate β ( · , · ) , assume there are basis { ξ k } and { η k } , which span the spaces of the { x i ( · ) } and { y i ( · ) } , respectively. The estimate of β ( · , · ) of the form is given by
β ^ ( s , t ) = i = 1 k j = 1 r b i j ξ i ( s ) η j ( t ) ,
and b i j is estimated by minimizing the residual sum of squares i = 1 n | | y i β ^ ( s , t ) x i ( s ) d s | | 2 . Although the resulting estimator is useful, a representation theorem for such an estimator is hard to obtain, and hence the asymptotic distribution of this approach is not clear. Yao, et al. [15] investigated a functional principle component method for estimation of model (3) and obtained consistent results. Müller and Yao [16] studied a variation of the above model in the conditional expectation format.
The smoothing spline method is popular for curve estimation. The function curves can be estimated at any point, followed by the computation of coefficients. However, the asymptotic property of estimators based on the spline method is tough to handle. For natural polynomial splines, the number of knots is the number of untied observations, which is sometimes redundant and undesirable. B-splines only require a few (the degree of the polynomial plus two) basis functions and are easy to implement [17,18,19]. Another method is local linear fit [20,21,22], but the difficulty is in choosing the bandwidth, especially when the observation points are uneven. Therefore, in this paper we employ reproducing kernel Hilbert space (RKHS), a special form of spline method in which the turning point from curve estimation to point estimation Yuan and Cai [12] explored its application on functional linear regression problem, and Lei and Zhang [23] extented it to RKHS-based partially functional linear models. In general, one needs to choose a set of (orthogonal) basis functions and the number of basis for functional estimations, while with RKHS one only needs to determine the kernel(s) of RKHS. Furthermore, the Riesz presentation theorem shows that any bounded linear function can be reproduced as a representer based on the RKHS kernel with a closed form.
However, existing RKHS methods often meet obstacles when choosing different norms and the corresponding optimization procedures. Although using a carefully selected norm in the optimization criterion has the advantage of interpretation, it suffers in that the resulting regression estimator generally needs the computation of an inversion of a large matrix (the same as the sample size). Moreover, most of the existing methods, including the aforementioned RKHS methods, are designed for the case where the observed data are sampled from a dense rate and are limited to models in which either the response or predictors are functions. New methods for estimation and hypothesis testing of regression parameters for the more general case where both the response and predictors are functions with sparsely observed data are needed. To address these problems, we propose a new RKHS method with a unified norm to characterize the RKHS and the optimization criterion for function-on-function regression. Although the statistical interpretation of this optimization criterion is not fully clear, with a simple closed form of the estimated regressors under a general function-on-function regression model, this optimization is more computationally reliable and applicable without the need of computing the inverse of a massive matrix. By establishing a representation theorem and a functional central limit theorem based on the proposed model, we obtain the asymptotic distribution of the estimators. Hypothesis testing of the underlying curves is proposed accordingly.
The remainder of this paper is organized as follows. Section 2 describes the proposed method for the estimation and hypothesis testing of regression parameters for functional data via the reproducing kernel Hilbert space and establishes some theoretical properties. Simulation studies and a real-data example to demonstrate the effectiveness of our proposed method are given in Section 3 and Section 4, respectively. Section 5 gives some concluding remarks, and all technical proofs are left in the Appendix A.

2. The Proposed Method

We consider the observed data ( y i ( t i j ) , x i ( t i j ) ) , j = 1 , , m i ; i = 1 , , n . The underlying data curves ( y i ( · ) , x i ( · ) ) are iid copies from ( y ( · ) , x ( · ) ) , where y ( · ) and x ( · ) = ( x 1 ( · ) , , x d ( · ) ) are random curves on some region T. The observation times t i j ( 0 , T ] are generally assumed to be different for each subject i for some 0 < T < . We assume that time points m i ( i = 1 , , n ) are iid copies from some integer-valued random variable m, and given m i , the time points t i j for ( j = 1 , , m i ) are iid copies from a positive random variable G, with its support on ( 0 , T ] . For each individual, the observed data ( y i , x i ) can be interpolated as curves ( y ^ i , x ^ i ) on T. We assume the following model for the observed data
y i ( t ) = β ( t ) x i ( t ) + ϵ i ( t ) , E [ ϵ i ( t ) ] = 0 , ( i = 1 , , n ) ,
where β ( · ) = ( β 1 ( · ) , , β d ( · ) ) are the true regression coefficient functions for the covariates x i ( · ) ’s, and the ϵ i ( · ) ’s are random errors. In general, ϵ i ( s ) and ϵ i ( t ) are non-independent for s t , e.g., ϵ i ( · ) being a zero-mean Gaussian process with some covariance function Γ ( s , t ) , known or unknown. Note that model (4) is more general than (2) and is more straightforward than model (3) in describing the relationship between the responses y i ( · ) -th and the covariates x i ( · ) . Typically, we set x 1 ( · ) 1 , and so β 1 ( · ) is the baseline function. Since t i j and t k j may be different even for the same j, there may be no observation or just a few observations at each time point t.
To estimate the regression coefficient function β ( · ) , the simplest way is the point-wise least squares estimate or any other non-smoothing (i.e., without roughness penalty) functional estimates. However, those estimates have some undesirable properties, often with wiggly shape and large variances in the area with sparse observations. An established performance measure for functional estimation is the mean square error (MSE),
MSE = Bias 2 + Sampling   variance .
Non-smoothed estimates often have small bias but large sampling variance, while smoothed estimates are the other way around, with much smoother shape by adjusting the shape from neighboring data, but with larger bias. To better balance the trade-off between bias and sampling variance and optimize the MSE, a regularized smooth estimate is preferred, in which a smoothing parameter could control the degree of penalty.
Existing smoothing methods all suffer different aspects of weakness. Functional principal component analysis [15] is computationally intensive. General spline and kernel smoothing methods [24] do not fit the problem under research due to their constant choice of bandwidth. It is known that for non-smoothing methods, computation complexity is often of the order O ( n ) , where n is the data sample size, while for smoothing methods the amount of computation may substantially exceeds O ( n ) and even become computationally prohibitive. Thus, for smoothing methods, it is important to find a method with O ( n ) computation load. To achieve this with spline methods, the basis should have only local support (i.e., nonzero only locally). Recently, a popular method in functional estimation is using the reproducing kernel Hilbert space (RKHS). RKHS is a special spline method that has this property, and can achieve the O ( n ) computation for many functional estimation problems [5,12].
For functional estimate with RKHS, we define two norms (inner products) on the same RKHS H : one, denoted by < · , · > , defines the objective optimization criterion, and another one, denoted by < · , · > H , is for the RKHS H . Different from a general Hilbert space, in an RKHS H of functions on T, the point evaluation functional ρ t ( h ) = h ( t ) ( h H ) is a continuous linear map, so that by the Riesz representation theorem, there is a bi-variate function k ( · , · ) on T such that
ρ t ( h ) = h ( t ) = < h ( · ) , k ( · , t ) > H , h H .
Take h ( · ) = k ( · , s ) , we also get
k ( t , s ) = < k ( · , s ) , k ( · , t ) > H .
The above two properties yield the name RKHS.
Note that for a given Hilbert space H , a collection of functions on some domain T with a given inner product < · , · > H , its reproducing kernel K may not be unique. In fact, for any mapping G : T 2 ( T 2 ) , K ( s , t ) = < G ( s , · ) , G ( t , · ) > H is a reproducing kernel for H , and any reproducing kernel of H can be expressed in this form (Berlinet and Thomas-Agnan, 2004), and it has a one-to-one correspondence with a covariance function on H 2 . The choice of a kernel is mainly for convenience. However, a reproducing kernel under one inner product may not be a reproducing kernel under another inner product on the same space H . Assume β H d , with H being some RKHS and a known kernel K ( · , · ) , both are to be specified later. Let < · , · > be another inner product on H (typically < f , g T f ( t ) g ( t ) d t and | | h | | 2 = < h , h > for all h H ). With the observed curves ( y ^ i , x ^ i ) ( i = 1 , , n ) , ideally an optimization procedure for estimating β ( · ) in (4) will be of the form
β ^ n , λ ( · ) = arg inf β H d 1 n i = 1 n | | y ^ i β x ^ i | | 2 + λ J ( β ) ,
where J ( · ) is a penalty functional, and λ > 0 is the smoothing parameter. The penalty term J ( · ) can be significantly simplified via the RKHS as shown in the proof of Theorem 1 below. If λ = 0 , the above procedure gives the unsmoothed estimate with some undesirable properties such as overfitting and large variance.
For model (2) with one covariate variable, Yuan and Cai [12] considered penalized estimate β ^ of β ( · ) . The corresponding estimator β ^ ( · ) has a closed form of being linear in x i ( · ) , but the computation involves the inverse of an ( n + 2 ) matrix. For model (1) with d covariates, we first consider estimator of β ( · ) in the form of linear in x i ( · ) . It turns out that the estimator has a closed form but also involves the inverse of a d ( n + 2 ) matrix, which is computationally infeasible in general.
Consider an estimator of β in the form of a linear combination of ( x ^ 1 ( · ) , , x ^ n ( · ) ) . For any f H , denote ( K 0 f ) ( t ) = < K 0 ( t , · ) , f ( · ) > H , and for any f = ( f 1 , , f d ) H d , denote ( K 0 f ) ( t ) = ( ( K 0 f 1 ) ( t ) , , ( K 0 f d ) ( t ) ) and similarly for K 1 f . For d × n matrix B and n × d matrix Z , let b 1 , , b d be the d rows of B , z 1 , , z d be the d columns of Z , and define B Z = ( b 1 z 1 , , b d z d ) a d-column vector. Since x ^ i = K 0 x ^ i + K 1 x ^ i , and K 0 x ^ i H 0 d , H 0 has a basis g = ( g 1 ( · ) , , g k ( · ) ) , we consider estimate β ^ ( · ) of β 0 ( · ) with the form A g + B Z n , where A is a d × k matrix, B is a d × n matrix, and Z n ( · ) = ( K 1 x ^ 1 ( · ) , , K 1 x ^ n ( · ) ) is n × d . With | | h | | 2 = T h 2 ( t ) d t , for fixed λ an RKHS estimator of β 0 ( · ) is of the form
β ^ n , λ ( · ) = A ^ g + B ^ Z n ( · ) ,
where
( A ^ , B ^ ) = arg inf ( A , B ) 1 n i = 1 n | | y ^ i x ^ i ( A g + B Z n ) | | 2 + λ J ( A g + B Z n ) .
For the penalty, let D be a pre-specified d × d symmetric positive definite constant matrix; we define
J ( h ) = < h ( D 1 / 2 ) , D 1 / 2 h > H = < h D , h > H : = | | h | | H 2 , h H d
and
H 0 d = { h H d : J ( h ) = 0 } = { h H d : | | h | | H 2 = 0 } H d
as the null space for the penalty, and H 1 d is its orthogonal complement (with respect to the inner product < · D , · > H ). Then, H d = H 0 d H 1 d . That is, h H d ; it has the decomposition h = h 0 + h 1 , with h 0 H 0 d and h 1 H 1 d . Here, H 1 is also an RKHS with some reproducing kernel K 1 ( · , · ) on H 1 . With RKHS, K 0 h H 0 d for all h H d , which implies that < ( K 0 h ) D , K 0 h > H = 0 . Further, K 1 h H 1 d for all h H d , and < ( K 0 h ) D , K 1 h > H = 0 . Thus
J ( h ) = < h D , h > H = < ( K 0 h + K 1 h ) D , K 0 h + K 1 h > H = < ( K 0 h ) D , K 0 h > H + 2 < ( K 0 h ) D , K 1 h > H + < ( K 1 h ) D , K 1 h > H = < ( K 1 h ) D , K 1 h > H .
Typically, D is chosen to be a d × d identity matrix. The choices of K 0 , K 1 , and the inner product < · D , · > H will be addressed latter.
For a function a ( · ) and a vector of functions b ( · ) = ( b 1 ( · ) , , b k ( · ) ) , denote < a , b ( < a , b 1 > , < a , b k > ) ; for a matrix B ( · ) = b i j ( · ) d × k , denote < a , B ( < a , b i j > ) d × k , and similarly for the notations < a , b > H and < a , B > H . The following representation theorem shows that the estimator given in (5) is computationally feasible for many applications.
Theorem 1.
Assume β 0 ( · ) H d , ( y ^ i ( · ) , x ^ i ( · ) ) H d + 1 for i = 1 , , n . Then for the given penalty functional J ( β ) = | | K 1 ( β ) | | H 2 and fixed λ, there are constant matrices A ^ = ( a i j ) d × k and B ^ = ( b i j ) d × n such that β ^ n , λ given in (5) has the following representation
β ^ n , λ ( t ) = A ^ g ( t ) + B ^ ( K 1 x ^ ) n ( t ) , t ( 0 , T ]
where ( K 1 x ^ ) n ( · ) = ( K 1 x ^ 1 ( · ) , , K 1 x ^ n ( · ) ) , and in vector form ( a ^ , b ^ ) of ( A ^ , B ^ )
a ^ b ^ = O R R S + λ W 1 u v ,
where the matrices R ( d k × d n ), O ( d k × d k ), S ( d n × d n ), and W ( d n × d n ) , and the vectors u and v are given in the proof.
For the ordinary regression model y = β x + ϵ , with X n = ( x 1 , , x n ) and y n = ( y 1 , , y n ) , the least squares method yields the estimation of β as β ^ = ( X n X n ) 1 X n y n . Since ( X n X n ) 1 is of order n 1 (a.s.), β ^ can be viewed as approximately a linear form n 1 X n y n . Let X ^ n ( · ) = ( x ^ 1 ( · ) , , x ^ n ( · ) ) and y ^ n ( · ) = ( y ^ 1 ( · ) , , y ^ n ( · ) ) . Now we consider estimate β ^ ( · ) of β 0 ( · ) with linear form n 1 X ^ n y ^ n . Since n 1 X ^ n y ^ n = K 0 ( n 1 X ^ n y ^ n ) + K 1 ( n 1 X ^ n y ^ n ) , and K 0 ( n 1 X ^ n y ^ n ) H 0 d , we only need to consider an estimate of the form A g + B z ^ n , where A is a d × k parameter matrix, B is a d × d parameter matrix, and z ^ n ( · ) = n 1 [ K 1 ( X ^ n y ^ n ) ] ( · ) is a d-vector. This allows us to express the estimate via the basis of the RKHS and with a greater degree of flexibility than the linear combination of n 1 X ^ n y ^ n . Another advantage of using estimates of the form A g + B z ^ n is convenience of hypothesis testing. As typically g = ( 1 , t ) , thus testing the hypothesis of linearity of β ( · ) is equivalent to testing B = 0 .
For any function h ( · ) , we set | | h | | 2 = T h 2 ( t ) d t , and for fixed λ ,
β ^ n , λ ( · ) = A ^ g ( · ) + B ^ z ^ n ( · ) ,
where
( A ^ , B ^ ) = arg inf ( A , B ) 1 n i = 1 n | | y ^ i ( A g + B z ^ n ) x ^ i | | 2 + λ J ( A g + B z ^ n ) .
Let a = ( a 11 , , a 1 k , , a d 1 , , a d k ) be the vector representation of A ; b = ( b 11 , , b 1 d , , b d 1 , , b d d ) be that of B , O = O d k × d k = n 1 i = 1 n < s i , s i > with s i = ( x ^ i 1 g 1 , , x ^ i 1 g k , , x ^ i d g 1 , , x ^ i d g k ) , R = P = P d 2 × d k = n 1 i = 1 n < t i , s i > with t i = ( x ^ i 1 z ^ 1 , , x ^ i 1 z ^ d , , x ^ i d z ^ 1 , , x ^ i d z ^ d ) , S = S d 2 × d 2 = n 1 < t i , t i > , U = n 1 i = 1 n < y ^ i , x ^ i g ( u i j ) d × k and its vector form u = ( u 11 , , u 1 k , , u d 1 , , u d k ) , V = n 1 i = 1 n < y ^ i , x ^ i z ^ n ( v i j ) d × d and its vector form v = ( v 11 , , v 1 d , , v d 1 , , v d d ) ; λ 1 λ d 0 be all the eigenvalues of D, and q 1 , , q d be its normalized eigenvectors, W = W d 2 × d 2 = n 1 j = 1 n < c j , c j > H and c j = λ j ( q j 1 z ^ 1 , , q j 1 z ^ d , , q j d z ^ 1 , , q j d z ^ d ) .
Theorem 2.
Assume β ( · ) H d , ( y ^ i ( · ) , x ^ i ( · ) ) H d + 1 for i = 1 , , n . Then for the given penalty functional J ( β ) = | | K 1 ( β ) | | H 2 and fixed λ, there are constant matrices A ^ = ( a i j ) d × k and B ^ = ( b i j ) d × d such that β ^ n , λ ( · ) given in (6) has the following representation
β ^ n , λ ( t ) = A ^ g ( t ) + B ^ ( K 1 [ n 1 X ^ n y ^ n ] ) ( t ) , t ( 0 , T ]
and in vector form ( a ^ , b ^ ) of ( A ^ , B ^ ) when the following inverse exists,
a ^ b ^ = O R R S + λ W 1 u v .
Below we study asymptotic behavior of β ^ n , λ ( · ) given in (6). Denote β 0 ( · ) as the true value of β ( · ) , and | M | is the determinant of a square matrix M . Lai et al. [25] proved strong consistency of the least squares estimate under general conditions, while Eicker [26] studied its asymptotic normality. The proposed estimators in this paper have some similarity to the least squares estimate, but they also have some different features and require different conditions.
(C1).
β 0 S p a n ( E [ x y ] ) .
(C2).
inf t T | E [ x ( t ) x ( t ) ] | > 0 .
(C3).
E | | y ( A g + B Z ) x | | 2 < for all bounded ( A , B ) , where Z = E [ K 1 ( x y ) ] .
(C4).
lim n max 1 i n | | ( y ^ i , x ^ i ) ( y i , x i ) | | 0 (a.s.).
(C5).
λ = λ n 0 .
Theorem 3.
Assume conditions (C1)–(C5) hold, then as n ,
| | β ^ n , λ β 0 | | 0 , ( a . s . ) .
To emphasize the dependence on n, we denote λ = λ n . Let l ( T ) be the space of bounded functions on T equipped with the supreme norm, and D stands for weak convergence in the space l ( T ) . With the following condition (C6), we obtain the asymptotic normality of β ^ n , λ ( · )
(C6).
n λ n 0 .
Theorem 4.
Assume conditions (C1)–(C4) and (C6) hold. Then as n ,
W n : = n β ^ n , λ β 0 o p ( 1 ) D W o n l ( T ) ,
where W ( · ) is the zero-mean Gaussian process on T with covariance function σ ( s , t ) = E [ W ( s ) W ( t ) ] given in the proof, s , t T , and o p ( 1 ) is given in the proof.
Test linearity of  β 0 .
It is of interest to test the hypothesis H 0 ( J ) : J β 0 ( t ) is linear in t, where J is a d-dimensional vector with entries 0 or 1, with 1 corresponding to the element of β 0 to be tested for linearity. The hypothesis H 0 ( J ) is equivalent to test the corresponding coefficients J B ^ in B ^ be zeros. Let O 0 = E < s 1 , s 1 > , P 0 = E < t 1 , s 1 > , S 0 = E < t 1 , t 1 > , U 0 = E < y 1 , x 1 g > , V 0 = E < y 1 , x 1 z 0 > . Let u 0 and v 0 be the vector representations of U 0 and V 0 , and w 0 = ( u 0 , v 0 ) . Denote T = m a t r i x ( O , R ; P , S ) , T 0 = m a t r i x ( O 0 , R 0 ; P 0 , S 0 ) . By Theorem 4, we have
Corollary 1.
Assume the conditions of Theorem 4 hold, under H 0 ( J ) , we have
n ( J B ^ o p ( 1 ) ) D N ( 0 , Ω ( J ) ) ,
where Ω ( J ) is the sub-matrix of T 0 1 Γ T 0 1 that corresponds to the covariance of J B ^ , o p ( 1 ) = ( T T 0 ) w 0 , and Γ is given in the proof of Theorem 4.
The nonzero bias term o p ( 1 ) in Theorem 4 and Corollary 1 is typical in functional estimation, and often such a bias term is zero for the corresponding Euclidean parameter estimation.
Choice of the smoothing parameter. In nonparametric penalized regression for the model y ( t ) = < β , x > ( t ) + ϵ ( t ) , the most commonly-used method for the choice of the smoothing parameter is cross-validation (CV), based on the ideas of Allen (1974) and Stone (1974). This method chooses λ by minimizing
1 n i = 1 n 1 m i j = 1 m i y i ( t i j ) < β ^ n , λ , i , x ^ i > ( t i j ) 2 ,
where β ^ n , λ , i ( · ) is the estimated regression function without using the observations of the ith individual. This method is usually computationally intensive even when the sample size is moderate. An improved version of the method is K-fold cross-validation. This method first randomly partitions the original sample equally into K subsamples, and then the cross-validation process is conducted K times. At each replicate, K 1 subsamples are used as the training data to construct the model, while the remaining one is used as the validation datum. The results from K folds are averaged to obtain a single estimation. In notation, let n 1 , , n K be the sample sizes of the K folds, then the K-fold cross-validation method is to choose the λ which minimizes
1 K J = 1 K 1 n J i = 1 n J 1 m i j = 1 m i y i ( t i j ) < β ^ n , λ , J , x ^ i > ( t i j ) 2 ,
where β ^ n , λ , J ( · ) is the estimated regression function without using the data in the Jth fold. In this paper, we set K = 5 , which is also the default setting in much software.
Choices of K 0 , K 1 , and < · , · > H . For notational simplicity, we consider T = [ 0 , 1 ] without loss of generality. Recall that for a function f on [ 0 , 1 ] with m 1 continuous derivatives and f ( m ) ( · ) L 2 [ 0 , 1 ] , it has the following Taylor expansion [27]
f ( t ) = j = 0 m 1 f ( j ) ( 0 ) j ! t j + 0 1 f ( m ) ( s ) ( m 1 ) ! ( t s ) + m 1 d s ,
where ( x ) + = x if x > 0 and ( x ) + = 0 otherwise.
To construct an RKHS H on L 2 [ 0 , 1 ] , a common choice for the inner product on H 0 = { h : h ( 2 ) ( · ) 0 } is < f , g > H , 0 , and the orthogonal complement of H 0 is H 1 = { h : h ( j ) ( 0 ) = 0 , j = 0 , 1 ; 0 1 h ( 2 ) ( t ) d t < } , with inner product < f , g > H , 1 , where
< f , g > H , 0 = j = 0 1 f ( j ) ( 0 ) g ( j ) ( 0 ) , < f , g > H , 1 = 0 1 f ( 2 ) ( t ) g ( 2 ) ( t ) d t .
The inner product on H is < · , · > H = < · , · > H , 0 + < · , · > H , 1 . Kernels for the RKHS with more general K 0 for H 0 and K 1 for H 1 with these inner products can be found in [28]. More generalized construction of kernels K 0 and K 1 can be found in Ramsay and Silverman [5]. For our case,
K 0 ( s , t ) = 1 + s t , K 1 ( s , t ) = 0 1 ( s u ) + ( t u ) + d u = ( s t ) 2 3 ( s t ) ( s t ) / 6 .
With the above inner product, K 0 , and K 1 , let K = K 0 + K 1 , then h H , h ( t ) = < K ( t , · ) , h ( · ) > H , and H 0 and H 1 are orthogonal to each other with respect to < · , · > H , but these are not true if < · , · > H is replaced by a different inner product < · , · > on [ 0 , 1 ] .

3. Simulation Studies

In this section, we conduct two simulation studies to investigate the finite sample performance of the proposed RKHS method. The first simulation study is designed to compare the RKHS estimator with the conventional smoothing spline and local polynomial model methods in terms of curve fitting. For more details on the implementations of smoothing spline and local polynomial model methods, please refer to the book by Fang, Li, and Sudijianto [24]. The second simulation study is to examine the performance of Corollary 1 for testing the linearity of the regression functions. It turns out that with moderate sample sizes, the proposed RKHS estimator performs very favorably with the competitors, and the type I errors and powers of the testing are satisfactory.
Simulation 1. Assume that the underlying individual curve i at time point t T = [ 0 , 1 ] is generated from
y i ( t ) = β 0 ( t ) + β 1 ( t ) x i 1 ( t ) + β 2 ( t ) x i 2 ( t ) + ϵ i ( t ) ,
where β 0 ( t ) 10 , β 1 ( t ) = 1 + t , β 2 ( t ) = ( 1 t ) sin ( 2 π t ) , x i 1 ( t ) = sin ( 100 π t ) , x i 2 ( t ) = cos ( 100 π t ) , and ϵ i ( · ) is a stationary Gaussian process with zero mean, unit variance, and a constant covariance 0.5 between any two distinct time points. For each subject i, the number of observation time points m i is generated from the discrete uniform distribution on { 5 , 6 , , 30 } , and the observation time points t i j , j = 1 , , m i are independently generated from the exponential distribution E ( 0 , 1 ) . The density function of E ( 0 , 1 ) is displayed in the left panel of Figure 1, from which it is easy to see that the density value decreases as t increases.
Then, we use cubic interpolation to interpolate the y i ( t i j ) , x i 1 ( t i j ) , and x i 2 ( t i j ) on T to obtain y ^ i ( · ) , x ^ i 1 ( · ) , and x ^ i 2 ( · ) , respectively.
Based on the functions y ^ i ( · ) , x ^ i 1 ( · ) , and x ^ i 2 ( · ) described above, we use the RKHS introduced in Section 2 to estimate the regression functions β 0 ( t ) , β 1 ( t ) , and β 2 ( t ) , and compare its performance with the spline smoother and local polynomial models. Typical comparisons (the random seed is set to be “set.seed(1)” in R) are given in Figure 2, Figure 3 and Figure 4 with sample sizes of 50, 100, and 200, respectively. The simulation shows that the proposed RKHS method estimates the regression functions well and compares very favorably with the other two methods. Broadly speaking, the RKHS estimator has relatively stable performance and is close to the “true” curve; it has narrower confidence bands at dense sampling regions, and they become wider at sparse sampling regions. On the contrary, the spline smoother and local polynomial model appear to have good fit at dense sampling regions, but they have large bias when the data become sparse.
In order to make a thorough comparison for this simulation, we use the root integrated mean squared prediction error (RIMSPE) to measure the accuracy of the estimates [24]. The RIMSPE for estimate β ^ of β is given by
RIMSPE ( β ^ ) = 0 1 [ β ( t ) β ^ ( t ) ] 2 d t ,
and the simulation is repeated 1000 times. By using the R software, the CPU time of implementing this simulation is about 84.5 s on a PC with a 1.80 GHz dual-core Intel i5-8265U CPU and 8 GB memory. The boxplots of the RIMSPE values are presented in Figure 5, from which it is clear that RKHS performs much better than the other two methods, because it has much smaller RIMSPE values.
Simulation 2. In this simulation study, we examine the performance of Corollary 1 for testing the hypothesis
H 0 : β i ( t ) is linear in t V S H 1 : β i ( t ) is not linear in t , for i = 1 , 2 .
According to the setting described in Simulation 1, β 1 ( t ) is linear in t, whereas β 2 ( t ) is apparently not linear in t. Therefore, we will check the type I error for testing β 1 ( t ) and the power for testing β 2 ( t ) . By setting the significance level to 0.05 and repeating the simulation 1000 times, we use Corollary 1 to derive χ 2 testing statistics and list its type I errors and powers in Table 1 for various sample sizes. The results in Table 1 suggest that the type I error of the test is close to the nominal level 0.05 , and the power of the test is not small even with a sample size of 50.

4. Real Data Analysis

In this section, the proposed method is applied to characterize the relationships in patient immune response in a clinical trial of combination immunotherapy for advanced myeloma. The objective of the original trial was to study whether introducing vaccine-primed T cells early leads to cellular immune responses to the putative tumor antigen hTERT. In this study, 54 patients were recruited and assigned to two treatment arms based on their leukocyte response to human leukocyte antigen A2. Various immune cell parameters (CD3, CD4, CD8), T-cell levels, cytokines (IL7, IL-15), and immunoglobulins (IgA, IgG, IgM) were measured repeatedly to investigate the treatment effect on immune recovery and function. The measurements were taken at nine time points: 0, 2, 7, 14, 40, 60, 90, 100, and 180 days [29]. Moreover, as a subtype of white blood cells in the human immune system, absolute lymphocyte cell (ALC) count was recorded over time during or after patients’ hospitalization up to day 180. Figure 6 shows the trajectories of two individuals, namely “MD001” and “MD002”, in the dataset, with the observation interval scaled to [ 0 , 1 ] . The trajectories of all 54 individuals can be found in the paper by Fang et al. [30]. Previous research has shown that the patient’s survival time is associated with the trajectory of the patient’s ALC counts.
In the human immune system, the relationships among various biological features are too complicated and have been topologically described only. For illustrating the performance of our proposed methods with a limited sample size, we only investigate how the levels of a patient’s immunoglobulin IgG and immune cell CD8 dynamically affect the trajectory of the patient’s ALC counts in this section. For simplicity, the observation time points are scaled to the interval [ 0 , 1 ] . Let x 1 ( t ) , x 2 ( t ) and y ( t ) be the trajectories of the patient’s IgG, CD8, and ALC counts, respectively. Their relationship can then be described as follows
y ( t ) = β 0 ( t ) + β 1 ( t ) x 1 ( t ) + β 2 ( t ) x 2 ( t ) + ϵ ( t ) , E [ ϵ ( t ) ] = 0 ,
where β 0 ( t ) , β 1 ( t ) and β 2 ( t ) are the regression coefficient functions, and ϵ ( t ) is the random error function. The purpose of this study is to estimate the regression coefficient functions and test whether β 1 ( t ) and β 2 ( t ) are linear functions in t.
In the used data, the number of observation times generally becomes sparse as t increases. The right panel of Figure 7 visualizes the kernel density estimation of individual “MD001” in the data. The distribution of observed time points reveals the trend. The proposed RKHS method is used to estimate the regression coefficient functions and test the linearity. By using the R software, the CPU time of implementing the estimation procedure is only about 1.5 s on a PC with a 1.80 GHz dual-core Intel i5-8265U CPU and 8 GB memory. Figure 7 visualizes the estimated curves and their 95 % confidence bands. It is observed that β 1 ( t ) and β 2 ( t ) are apparently nonlinear in t. This observation is also confirmed by the χ 2 statistic derived from Corollary 1, which yields p-values less than 0.001 for both β 1 ( t ) and β 2 ( t ) . It is worth noting that β 0 ( t ) is monotone in t, but β 1 ( t ) and β 2 ( t ) are not monotone in t. The results show that with the immunotherapy of tumor antigen vaccination, a patient’s immunoglobulin IgG enhances the ALC counts. When the increasing CD8 immune cells result in a high ALC count, immunoglobulin IgG inhibits the patient’s ALC counts such that the level of ALC counts is reconverted into the normal interval ( 1000 , 4500 ) , and this immunotherapy can potentially improve patient survival time.

5. Concluding Remarks

The existing work on functional data analysis has focused primarily on the case where the observed data are sampled from a dense rate and has been limited to models in which either the response or predictors are functions. In this paper, we consider the more practical situation for functional data analysis where the data are only observed at some (not dense) time points, and we propose a general regression model in which both the response and predictors are functions. This function-on-function regression model, as given by Equation (4), can be viewed as a generalization of multivariate multiple linear regression to allow the response, predictors, and even the regression coefficients to be all functions of t. In order to estimate the underlying regression curves and conduct hypothesis testing on these curves, we use reproducing kernel Hilbert space (RKHS), which only needs to choose the kernel(s) of the RKHS, and enables a closed-form solution for the regression coefficients in terms of the kernel. To the best of our knowledge, this is the first representation of functional regression coefficients with sparsely observed data. Furthermore, the estimator based on RKHS provides a foundation for hypothesis testing, and the asymptotic distribution of the estimator is obtained. Simulation studies show that the RKHS estimator has relatively stable performance. Application and statistical properties of our method are further demonstrated through an immunotherapy clinical trial of advanced myeloma. By using the proposed function-on-function regression model and related theorems established in this paper, this real application showed that with the immunotherapy of tumor antigen vaccination, patient immunoglobulin IgG enhances ALC counts, and hence this immunotherapy can potentially improve patient survival time. Future work may consider experimental design for the time points to be observed. If the time points can be controlled by the experimenter, their careful selection would improve the efficiency of the estimator (e.g., reduce the bias or MES). Further, we hope to study function-on-function generalized linear regressions with sparse estimation coefficient functions by the penalized method of Zhang and Jia [31].

Author Contributions

Conceptualization, H.-B.F.; methodology, H.H.; validation, G.M.; formal analysis, H.L.; writing—original draft preparation, H.H.; writing—H.-B.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Cancer Institute (NCI) grant P30CA 051008 and the Key Laboratory of Mathematical and Statistical Models (Guangxi Normal University), Education Department of Guangxi Zhuang Autonomous Region.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data included in this study are available upon request by contacting the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Theorem 1 (1-dimensional case).
In this case, d = 1 , A = a = ( a 1 , , a k ) , B = b = ( b 1 , , b n ) , Z n ( · ) = ( K 1 x ^ 1 ( · ) , , K 1 x ^ n ( · ) ) , B Z n = b Z n = i = 1 n b i K 1 x ^ i , D = 1 , and
J ( a g + b Z n ) = J ( K 1 ( b Z n ) ) = < ( K 1 b Z n ) , K 1 ( b Z n ) > H = < ( b K 1 Z n ) , ( b K 1 Z n ) > H = < ( b Z n ) , ( b Z n ) > H .
Below we evaluate < ( b Z n ) , ( b Z n ) > H / b . As
( b Z n ) ( b Z n ) = i = 1 n b i 2 ( K 1 x ^ i ) 2 + i j b i b j ( K 1 x ^ i ) ( K 1 x ^ j ) ,
thus
( b Z n ) ( b Z n ) b i = 2 b i ( K 1 x ^ i ) 2 + 2 i j b j ( K 1 x ^ i ) ( K 1 x ^ j ) = 2 ( K 1 x ^ i ) j = 1 n b j ( K 1 x ^ j ) .
From this we get
< ( b Z n ) , ( b Z n ) > H b = < ( b Z n ) , ( b Z n ) > H b 1 , , < ( b Z n ) , ( b Z n ) > H b n = q , q = ( q 1 , , q n ) ,
where, q i = 2 j = 1 n < ( K 1 x ^ i ) , b j ( K 1 x ^ j ) > H = 2 < b Z n , z i > H . Note | | y x ( a g + b Z n ) | | 2 / a i = 2 < y x ( a g + b Z n ) , x g i > , or
| | y x ( a g + b Z n ) | | 2 a = 2 < y x ( a g + b Z n ) , x g > .
Further, x ( b Z n ) / b i = x z i , or
| | y x ( a g + b Z n ) | | 2 b = 2 < y x ( a g + b Z n ) , x Z n > ,
where, by convention, x Z n = ( x z 1 , , x z n ) , a n-dimensional row vector.
Rewrite (2) as
( a ^ , b ^ ) = arg inf ( a , b ) G ( a , b ) ,
where G ( a , b ) = 1 n i = 1 n | | y ^ i x ^ i ( a g + b Z n ) | | 2 + λ < ( b Z n ) , ( b Z n ) > H . ( a ^ , b ^ ) must satisfy
0 1 × k = G ( a , b ) a = 2 1 n i = 1 n < y ^ i x ^ i ( a g + b Z n ) , x ^ i g > 0 1 × n = G ( a , b ) b = 2 1 n i = 1 n < y ^ i x ^ i ( a g + b Z n ) , x ^ i Z n > λ 2 q
or
1 n i = 1 n < y ^ i x ^ i , g > = 1 n i = 1 n < x ^ i 2 a g , g > + 1 n i = 1 n < x ^ i 2 b Z n , g > 1 n i = 1 n < y ^ i x ^ i , Z n > = 1 n i = 1 n < x ^ i 2 a g , Z n > + 1 n i = 1 n < x ^ i 2 b Z n , Z n > + λ < b Z n , Z n > H .
It is easy to check that n 1 i = 1 n < x ^ i 2 a g , g n 1 i = 1 n < x ^ i 2 , g g > a : = O a , ( O k × k ), n 1 i = 1 n < x ^ i 2 b Z n , g n 1 i = 1 n < x ^ i 2 , g Z n > b : = R b , ( R k × n ) , n 1 i = 1 n < x ^ i 2 a g , Z n n 1 i = 1 n < x ^ i 2 , Z n g > a = R a , n 1 i = 1 n < x ^ i 2 b Z n , Z n > = n 1 i = 1 n < x ^ i 2 , Z n Z n > b : = S b , ( S n × n ) , and < b Z n , Z n > H = < Z n Z n > H b : = W b , ( W n × n ) . Denote 1 n i = 1 n < y ^ i x ^ i , g u , ( u k × 1 ) and 1 n i = 1 n < y ^ i x ^ i , Z n v , ( v n × 1 ), then the above system of equations can be rewritten as
O R R S + λ W a b = u v ,
or when the following inverse exists,
a ^ b ^ = O R R S + λ W 1 u v .
 □
Proof of Theorem 2 (one-dimensional case).
In this case, X ^ n = ( x ^ 1 , , x ^ n ) , z ^ n ( · ) = n 1 i = 1 n K 1 ( x ^ i y i ) ( · ) , a = ( a 1 , , a k ) , b = b , β ^ n , λ ( · ) = β ^ n , λ ( · ) = a ^ g ( · ) + b ^ z ^ n ( · ) , and
( a ^ , b ^ ) = arg inf ( a , b ) 1 n i = 1 n | | y ^ i ( a g + b z ^ n ) x ^ i | | 2 + λ J ( a g + b z ^ n ) = arg inf ( a , b ) 1 n i = 1 n | | y ^ i ( a g + b z ^ n ) x ^ i | | 2 + λ b 2 | | z ^ n | | H 2 : = G ( a , b ) .
As in the proof of Theorem 1 (one-dimensional case), ( a ^ , b ^ ) must satisfy
0 1 × k = G ( a , b ) a = 2 1 n i = 1 n < y ^ i x ^ i ( a g + b z ^ n ) , x ^ i g > 0 = G ( a , b ) b = 2 1 n i = 1 n < y ^ i x ^ i ( a g + b z ^ n ) , x ^ i z ^ n > λ b | | z ^ n | | H 2 ,
or
1 n i = 1 n < y ^ i x ^ i , g > = 1 n i = 1 n < x ^ i 2 a g , g > + b < x ^ i 2 z ^ n , g > 1 n i = 1 n < y ^ i x ^ i , z ^ n > = 1 n i = 1 n < x ^ i 2 a g , z ^ n > + b 1 n i = 1 n < x ^ i 2 , z ^ n 2 > + λ | | z ^ n | | H 2
It is easy to check that n 1 i = 1 n < x ^ i 2 a g , g n 1 i = 1 n < x ^ i 2 , g g > a : = O a , ( O k × k ); n 1 i = 1 n < x ^ i 2 b z ^ n , g n 1 i = 1 n < x ^ i 2 z ^ n , g > b : = R b , ( R k × 1 ) ; n 1 i = 1 n < x ^ i 2 a g , z ^ n n 1 i = 1 n < x ^ i 2 z ^ n , g > a = R a ; n 1 i = 1 n < x ^ i 2 b z ^ n , z ^ n n 1 i = 1 n < x ^ i 2 z ^ n , z ^ n > b : = S b ; and < b z ^ n , z ^ n > H = < z ^ n , z ^ n > H b : = W b . Denote 1 n i = 1 n < y ^ i x ^ i , g u , ( u k × 1 ) , and 1 n i = 1 n < y ^ i x ^ i , z ^ n v , the above system of equations is rewritten as
O R R S + λ W a b = u v ,
or when the following inverse exists,
a ^ b ^ = O R R S + λ W 1 u v .
 □
Proof of Theorem 1. 
We first simplify the penalty term J ( A g + B Z n ) . By property of RKHS, K ( s , t ) = < K ( s , · ) , K ( t , · ) > H , thus h H , ( K 1 h ) ( · ) : = < K 1 ( · , ) , h > H H 1 and h H 1 , ( K 1 h ) = h . Thus
J ( A g + B Z n ) = J ( K 1 ( B Z n ) ) = < K 1 ( B Z n ) D , K 1 ( B Z n ) > H = < ( B K 1 Z n ) D , B K 1 Z n > H = < ( B Z n ) D , B Z n > H .
Note that the inner product < · , · > H of the RKHS is often not the inner product < · , · > used in the optimization objective, such as the one corresponding to the L 2 norm. Thus, the above expression of J ( A g + B Z n ) does not hold under the inner product < · , · > .
Below we need to evaluate < ( B Z n ) D , B Z n > H / B . For this, write b i = ( b i 1 , , b i n ) for the i-th row of B ( i = 1 , , d ) , and z i = ( z 1 i , z n i ) for the i-th column of Z n . Then
( B Z n ) D ( B Z n ) = i , r = 1 d d i r ( b i z i ) ( b r z r ) = i d d i i ( b i z i ) 2 + i = 1 d r i d d i r ( b i z i ) ( b r z r )
and we get, since d i r = d r i , and b i z i = j = 1 n b i j z j i ,
( B Z n ) D ( B Z n ) b i j = 2 d i i z j i ( b i z i ) + r i d d i r z j i ( b r z r ) = d i i z j i ( b i z i ) + r = 1 d d i r z j i ( b r z r ) .
From this we get
< ( B Z n ) D , ( B Z n ) > H B = Q , Q = ( q i j ) d × n ,
where q i j = d i i < z j i , ( b i z i ) > H + r = 1 d d i r < z j i , ( b r z r ) > H . Note | | y x ( A g + B Z n ) | | 2 / a i j = 2 < y x ( A g + B Z n ) , x i g j > , or
| | y x ( A g + B Z n ) | | 2 A = 2 < y x ( A g + B Z n ) , x g > .
Further, x ( B Z n ) = i = 1 d x i ( b i z i ) , and x ( B Z n ) / b i j = x i z j i , or
| | y x ( A g + B Z n ) | | 2 B = 2 < y x ( A g + B Z n ) , x Z n > ,
where, by convention, x Z n is the d × n matrix with ( i , j ) -th entry x i z j i .
Rewrite (2) as
( A ^ , B ^ ) = arg inf ( A , B ) G ( A , B ) ,
where G ( A , B ) = 1 n i = 1 n | | y ^ i x ^ i ( A g + B Z n ) | | 2 + λ < ( B Z n ) D , ( B Z n ) > H . ( A ^ , B ^ ) must satisfy
0 d × k = G ( A , B ) A = 2 1 n i = 1 n < y ^ i x ^ i ( A g + B Z n ) , x ^ i g > 0 d × n = G ( A , B ) B = 2 1 n i = 1 n < y ^ i x ^ i ( A g + B Z n ) , x ^ i Z n > λ 2 Q .
To solve the linear system (A3), we need to rewrite it in terms of vector forms a and b of A and B . For this, let a = ( a 11 , , a 1 k , , a d , 1 , , a d , k ) be the vector representation of A ; b = ( b 11 , , b 1 n , , b d , 1 , , b d , n ) be that of B . For x = ( x 1 , , x d ) , < x A g , x g > is a d × k matrix with ( i , j ) -th entry < x A g , x i g j r = 1 d s = 1 k a r s < x r g s , x i g j > . Similarly, n 1 m = 1 n < x ^ m A g , x ^ m g > is a d × k matrix with ( i , j ) -th entry n 1 r = 1 d s = 1 k a r s m = 1 n < x ^ m r g s , x ^ m i g j > ; n 1 m = 1 n < x ^ m ( B Z n ) , x ^ m g > is a d × k matrix with ( i , j ) -th entry n 1 r = 1 d s = 1 n b r s m = 1 n < x ^ m r z s r , x ^ m i g j > ; and n 1 m = 1 n < y ^ m , x ^ m g > is a d × k matrix with ( i , j ) -th entry n 1 m = 1 n < y ^ m , x ^ m i g j > .
Likewise, n 1 m = 1 n < x ^ m A g , x ^ m Z n > is a d × n matrix with ( i , j ) -th entry n 1 r = 1 d s = 1 k a r s m = 1 n < x ^ m r g s , x ^ m i z j i > ; n 1 m = 1 n < x ^ m ( B Z n ) , x ^ m Z n > is a d × k matrix with ( i , j ) -th entry n 1 l = 1 d r = 1 n b l r m = 1 n < x ^ m l z r l , x ^ m i z j i > ; and n 1 m = 1 n < y ^ m , x ^ m Z n > is a d × k matrix with ( i , j ) -th entry n 1 m = 1 n < y ^ m , x ^ m i z j i > .
Let the notation < x A g , x g > O a means rearrange elements in the d × k matrix < x A g , x g > as a d k -vector in dictionary order in terms of its d k -vector a form. Thus,
n 1 m = 1 n < x ^ m A g , x ^ m g > O a , O d k × d k = n 1 i = 1 n < s i , s i > ,
where s i = ( x ^ i 1 g 1 , , x ^ i 1 g k , , x ^ i , d g 1 , , x ^ i , d g k ) ; Similarly,
n 1 m = 1 n < x ^ m ( B Z n ) , x ^ m g > R b , R d k × d n = n 1 i = 1 n < s i , t i > ,
where t i = ( x ^ i 1 z ^ 11 , , x ^ i 1 z ^ n 1 , , x ^ i d z ^ 11 , , x ^ i d z ^ n 1 ) ; and
n 1 m = 1 n < y ^ m , x ^ m g > u , u = ( u 11 , , u 1 k , , u d 1 , , u d k ) ,
where u i j = n 1 m = 1 n < y ^ m , x ^ m i g j > .
Likewise,
n 1 m = 1 n < x ^ m A g , x ^ m Z ^ n > P a , P d n × d k = n 1 i = 1 n < t i , s i R ;
n 1 m = 1 n < x ^ m ( B Z ^ n ) , x ^ m Z ^ n > S b , S d n × d n = n 1 i = 1 n < t i , t i > ;
and
n 1 m = 1 n < y ^ m , x ^ m Z ^ n > v , v = ( v 11 , , v 1 n , . , v d 1 , , v d n ) ,
where v i j = n 1 m = 1 n < y ^ m , x ^ m i z j i > .
Rewrite q i j as
q i j = s = 1 n b i s d i i < z ^ j i , z ^ s i > H + r = 1 d s = 1 n b r s d i r < z ^ j i , z ^ s r > H , ( 1 i d ; 1 j n ) .
Let z = ( z 11 , , z n 1 , , z 1 d , , z n d ) , 1 be the n × n matrix of 1’s, D 0 = d i a g { d 11 1 , , d d d 1 } , and
D 1 = d 11 1 d 1 d 1 d d 1 1 d d d 1 .
For any two matrices A = ( a i j ) and B = ( b i j ) of the same dimension, denote A B = ( a i j b i j ) . Let W d n × d n = ( D 0 + D 1 ) < z , z > H . It it not difficult to check that
Q = W b .
Then (A1) is rewritten as
O R R S + λ 2 W a ^ b ^ = u v ,
or when the following inverse exists,
a ^ b ^ = O R R S + λ 2 W 1 u v .
 □
Proof of Theorem 2. 
In this case, z ^ n = ( z ^ 1 , , z ^ d ) is a d-vector and, similar to the proof of Theorem 1, we have J ( A g + B z ^ n ) = < z ^ n B D , B z ^ n > H . To evaluate < z ^ n B D , B z ^ n > H / B , write B = ( b 1 , , b d ) , where b j = ( b 1 j , , b d j ) is the j-th column of B . Then B z ^ n = j = 1 d z j b j , and
z ^ n B D B z ^ n = j = 1 d z ^ j 2 b j D b j + 2 l j d z ^ j b j D b l z ^ l = j = 1 d z ^ j 2 i = 1 d b i j 2 d i i + 2 k i d b i j d i k b k j + k i d l i d b k j d k l b l j + 2 l j d z ^ j r , s = 1 d b r j d r s b s l z ^ l ,
we get, since d i j = d j i ,
z ^ n B D B z ^ n b i j = 2 z ^ j d i i b i j z ^ j + k i d d i k b k j z ^ j + l j d s = 1 d d i s b s l z ^ l = 2 z ^ j l = 1 d s = 1 d d i s b s l z ^ l = 2 z ^ j d i B z ^ n ,
where d i = ( d i 1 , , d i d ) is the i-th row of D. From this we get
< z ^ B D , B z ^ n > H B = 2 Q , Q = ( q i j ) d × d , q i j = < d i B z ^ n , z ^ j > H = d i B < z ^ n , z ^ j > H
or
< z ^ n B D B , z ^ n > H B = 2 D B < z ^ n , z ^ n > H .
Now (3) is rewritten as
( A ^ , B ^ ) = arg inf ( A , B ) G ( A , B ) ,
where G ( A , B ) = 1 n i = 1 n | | y i ( A g + B z ^ n ) x ^ i | | 2 + λ < z ^ n B D B , z ^ n > H , and ( A ^ , B ^ ) must satisfy
0 d × k = G ( A , B ) A = 2 1 n i = 1 n < y ^ i ( A g + B z ^ n ) x ^ i , x ^ i g > 0 d × d = G ( A , B ) B = 2 1 n i = 1 n < y ^ i ( A g + B z ^ n ) x ^ i , x ^ i z ^ n > λ D B < z ^ n , z ^ n > H ,
or
1 n i = 1 n < x i ( A g + B z ^ n ) , x ^ i g > = 1 n i = 1 n < y ^ i , x ^ i g > 1 n i = 1 n < x ^ i ( A g + B z ^ n ) , x ^ i z ^ n > + λ D B < z ^ n , z ^ n > H = 1 n i = 1 n < y ^ i , x ^ i z ^ n > .
Let ( A ^ , B ^ ) be the solution of (A5).
To solve the linear system (A5), we need to rewrite it in terms of vector forms a and b of A and B . For this, let a = ( a 11 , , a 1 k , , a d 1 , , a d k ) be the vector representation of A ; let b = ( b 11 , , b 1 d , , b d 1 , , b d d ) be that of B .
Let the notation < x A g , x g > O a mean rearranging the elements in the matrix < x A g , x g > in terms of its vetor a form. As in the proof of Theorem 1,
n 1 m = 1 n < x m A g , x ^ m g > O a , O d k × d k = n 1 i = 1 n < s i , s i > ,
where s i = ( x ^ i 1 g 1 , , x ^ i 1 g k , , x ^ i d g 1 , , x ^ i d g k ) .
Similarly,
n 1 m = 1 n < x ^ m A g , x ^ m z ^ n > P a , P d 2 × d k = n 1 i = 1 n < t i , s i > ,
where t i = ( x ^ i 1 z ^ 1 , , x ^ i 1 z ^ d , , x ^ i d z ^ 1 , , x ^ i d z ^ d ) ;
n 1 m = 1 n < x ^ m B z ^ n , x ^ m g > R b , R d k × d 2 = n 1 i = 1 n < s i , t i P ;
and
n 1 m = 1 n < x m B z ^ n , x ^ m z ^ n > S b , S d 2 × d 2 = n 1 i = 1 n < t i , t i > .
Denote U = n 1 i = 1 n < y ^ i , x ^ i g ( u i j ) d × k and its vector form u = ( u 11 , , u 1 k , , u d 1 , , u d k ) ; let V = n 1 i = 1 n < y ^ i , x ^ i z ^ n ( v i j ) d × d and its vector form v = ( v 11 , , v 1 d , , v d 1 , , v d d ) ; since D is semipositive definite, let λ 1 λ d 0 be its eigenvalues, Λ = d i a g ( λ 1 , , λ d ) and q 1 , , q d be its normalized eigenvectors, Q = ( q 1 , , q d ) , then D = Q Λ Q = j = 1 d λ j q j q j . Rearranging elements of D B < z ^ n , z ^ n > H in vector form similarly as before
D B < z ^ n , z ^ n > H = j = 1 d λ j < q j B z ^ n , q j z ^ n > H W b , W d 2 × d 2 = j = 1 d < c j , c j > H ,
where c j = λ j ( q j 1 z ^ 1 , , q j 1 z ^ d , , q j d z ^ 1 , , q j d z ^ d ) .
Then (A5) is rewritten as
O R R S + λ W a ^ b ^ = u v ,
or when the following inverse exists,
a ^ b ^ = O R R S + λ W 1 u v .
Proof of Theorem 3. 
Note that
z ^ n ( · ) = n 1 i = 1 n [ K 1 ( x ^ i y ^ i ) ] ( · ) = n 1 i = 1 n [ K 1 ( x i y i ) ] ( · ) + n 1 i = 1 n [ K 1 ( x ^ i y ^ i x i y i ) ] ( · ) : = x n ( · ) + r n ( · ) .
Note that (C3) implies E | | x y | | < and E | | K 1 ( x y ) | | < , so by Theorem 7.9 (or Corollary 7.10) in Ledoux and Talagrand [32], | | z n z 0 | | 0 (a.s.), where z 0 ( · ) = E [ K 1 ( x y ) ] ( · ) . By (C4), | | r n | | 0 (a.s.). Thus, | | z ^ n z 0 | | 0 (a.s.).
Let C = ( A , B ) , C ^ = ( A ^ , B ^ ) , m ( C ) = | | y ( A g + B z 0 ) x | | 2 , P m ( C ) = E [ m ( C ) ] , P n is the empirical distribution based on n iid samples from m ( C ) . Let
M n ( C ) = 1 n i = 1 n | | y ^ i ( A g + B z ^ n ) x ^ i | | 2 + λ J ( A g + B z ^ n ) .
By (C5) and (C4) and the fact | | z ^ n z 0 | | 0 (a.s.),
M n ( C ) = 1 n i = 1 n | | y i ( A g + B z 0 ) x i | | 2 + λ J ( A g + B z 0 ) + o ( 1 ) = 1 n i = 1 n | | y i ( A g + B z 0 ) x i | | 2 + o ( 1 ) : = P n m ( C ) + o ( 1 ) = P m ( C ) + o ( 1 ) , ( a . s . ) .
In the above we used Theorem 7.9 (or Corollary 7.10) in Ledoux and Talagrand [32] again to get P n m ( C ) = P m ( C ) + o ( 1 ) (a.s.).
Note that E | | x y | | < implies E ( | | x y | | | x ) < , this together with (C3) implies that inf C P m ( C ) = E inf C E [ m ( C ) | x ] has an unique (and finite) minimizer C 0 = ( A 0 , B 0 ) . We first prove | | C ^ C 0 | | 0 (a.s.).
By definition of C ^ , M n ( C ^ ) M n ( C 0 ) = P m ( C 0 ) + o ( 1 ) (a.s.), and by (A7), P m ( C ^ ) P n m ( C 0 ) + o ( 1 ) (a.s.). Thus,
P m ( C ^ ) P m ( C 0 ) P n m ( C 0 ) P m ( C 0 ) + o ( 1 ) sup C C | P n m ( C ) P m ( C ) | + o ( 1 ) 0 ( a . s . ) ,
where C is some bounded set of C ’s, and we used the fact that { P n m ( C ) : C C } is a Glivenko–Cantelli class on any bounded C . Thus sup C C | P n m ( C ) P m ( C ) | 0 (a.s.).
On the other hand, since C 0 is the unique minimizer of P m ( C ) , for every δ > 0 , there is η > 0 , such that
inf C : | | C C 0 | | δ P m ( C ) > P m ( C 0 ) + η .
Thus, by (A8) we must have that for all large n, | | C ^ C 0 | | < δ (a.s.) for every δ > 0 . This gives | | C ^ C 0 | | 0 (a.s.).
Note that E β 0 y | x = β 0 x , which is the minimizer of the conditional expectation E β 0 | | y β 0 x | | 2 | x , and β 0 is also the pointwise least squares “estimate" of itself under the objective functional E β 0 { | | y β 0 x | | 2 } = E { E β 0 | | y β 0 x | | 2 | x } , so by (C1), β 0 = [ E ( x x ) ] 1 E ( x y ) S p a n ( E ( x y ) ) = S p a n ( E [ K 0 ( x y ) ] , E [ K 1 ( x y ) ] ) S p a n ( g , z 0 ) , (C2) implies E [ x ( · ) x ( · ) ] is invertible, and so θ 0 can be written in the form ( ( A g ) , ( B z 0 ) ) . Since C 0 = ( A 0 , B 0 ) also minimizes P m ( C ) (over a larger space than that θ 0 belongs to), we must have ( ( A 0 g ) , ( B 0 z 0 ) ) = β 0 , and C ^ = ( A ^ , B ^ ) ( A 0 , B 0 ) (a.s.) gives β ^ n , λ = ( A ^ g ) , ( B ^ z ^ n ) ( A 0 g ) , ( B 0 z 0 ) = β 0 (a.s.). □
Proof of Theorem 4. 
Recall the blockwise inversion formula
A B C D 1 = A 1 + A 1 B ( D C A 1 B ) 1 C A 1 A 1 B ( D C A 1 B ) 1 ( D C A 1 B ) 1 C A 1 ( D C A 1 B ) 1
and for λ 0 , ( A + λ W ) 1 = A 1 λ A 1 W A 1 + O ( λ 2 ) = A 1 O ( λ ) .
By (C2) and (C3), for all large n, O 1 , P 1 , R 1 , S 1 and W 1 all exist (a.s.). Using the above blockwise inversion formulae, by Theorem 2, we get
a ^ b ^ = O R P S 1 u v O ( λ ) u v .
In the proof of Theorem 3, we showed | | C ^ C 0 | | 0 (a.s.), i.e., ( a ^ , b ^ ) ( a 0 , b 0 ) (a.s.). Further, similar to the proof of Theorem 3, we can get
O a . s . O 0 = E < s 1 , s 1 > , P = R a . s . P 0 = E < t 1 , s 1 > , S a . s . S 0 = E < t 1 , t 1 > .
U a . s . U 0 = E < y 1 , x 1 g > , V a . s . V 0 = E < y 1 , x 1 z 0 > .
Let u 0 and v 0 be the vector representations of U 0 and V 0 , then we have
a ^ b ^ a . s . O 0 R 0 P 0 S 0 1 u 0 v 0 : = a 0 b 0 .
Denote c ^ = ( a ^ , b ^ ) and c 0 = ( a 0 , b 0 ) , we first find the asymptotic distribution of c ^ . Denote T = m a t r i x ( O , R ; P , S ) , T 0 = m a t r i x ( O 0 , R 0 ; P 0 , S 0 ) , w = ( u , v ) and w 0 = ( u 0 , v 0 ) , then c 0 = T 0 1 w 0 , and c ^ = T 1 w . By (C6),
n c ^ c 0 = n T 0 1 + o p ( 1 ) ( w w 0 ) + o ( 1 ) .
It can be shown that the sequences { y ^ i , x ^ i g } and { y ^ i , x ^ i z ^ n } are Donsker classes, and so
n ( w w 0 ) D N ( 0 , Γ ) , Γ = ( γ i j ) d ( d + k ) × d ( d + k ) , γ i j = C o v ( w ˜ i , w ˜ j ) ,
where w ˜ ( · ) = ( u ˜ , v ˜ ) , u ˜ is the vector form of U ˜ = < y 1 , x 1 g > and v ˜ is the vector form of V ˜ = < y 1 , x 1 z 0 > . From the above we get, as T 0 is symmetric,
n ( c ^ c 0 o p ( 1 ) ) D N ( 0 , T 0 1 Γ T 0 1 ) .
Now, rewrite β ^ n , λ ( · ) = F n ( · ) c ^ , with F n = ( g , J n ) , and F 0 = ( g , J 0 ) , where J n = ( z ^ n , , z ^ n ) , and J 0 = ( Z 0 , , Z 0 ) . Then F n ( t ) = F 0 ( t ) + o p ( r n 1 / 2 ( t ) ) , and by (A9) we get
W n = n ( β ^ n , λ ( · ) β 0 ( · ) o p ( 1 ) ) = n F 0 ( · ) ( c ^ c 0 o p ( 1 ) ) D W , i n [ l ( T ) ] d ,
where W is a mean zero Gaussian process on T with covariance function σ ( s , t ) = E ( W ( s ) , W ( t ) ) = F 0 ( s ) T 0 1 Γ T 0 1 F 0 ( t ) . □

References

  1. Ullah, S.; Finch, C.F. Applications of functional data analysis: A systematic review. BMC Med. Res. Methodol. 2013, 13, 43. [Google Scholar] [CrossRef] [Green Version]
  2. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer: New York, NY, USA, 2012. [Google Scholar]
  3. Yuan, A.; Fang, H.B.; Li, H.; Wu, C.O.; Tan, M. Hypothesis Testing for Multiple Mean and Correlation Curves with Functional Data. Stat. Sin. 2020, 30, 1095–1116. [Google Scholar] [CrossRef]
  4. Lai, T.Y.; Zhang, Z.Z.; Wang, Y.F. Testing Independence and Goodness-of-Fit Jointly for Functional Linear Models. J. Korean Statsitical Soc. 2021, 50, 380–402. [Google Scholar] [CrossRef]
  5. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
  6. Clarkson, D.B.; Fraley, C.; Gu, C.; Ramsay, J.O. S+ Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
  7. Ferraty, F.; Vieu, P. Nonparametric Fuctional Data Analysis; Springer: New York, NY, USA, 2006. [Google Scholar]
  8. Zeger, S.L.; Diggle, P.J. Semiparametric Models for Longitudinal Data with Application to CD4 Cell Numbers in HIV Seroconverters. Biometrics 1994, 50, 689–699. [Google Scholar] [CrossRef] [PubMed]
  9. Lin, D.Y.; Ying, Z. Semiparametric and Nonparametric Regression Analysis of Longitudinal Data. J. Am. Stat. Assoc. 2001, 96, 103–126. [Google Scholar] [CrossRef]
  10. Fan, J.; Li, R. New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis. J. Am. Stat. Assoc. 2004, 99, 710–723. [Google Scholar] [CrossRef] [Green Version]
  11. Xue, L.; Zhu, L. Empirical Likelihood Semiparametric Regression Analysis for Longitudinal Data. Biometrika 2007, 94, 921–937. [Google Scholar] [CrossRef]
  12. Yuan, M.; Cai, T. A Reproducing Kernel Hilbert Space Approach to Functional Linear Regression. Ann. Stat. 2010, 38, 3412–3444. [Google Scholar] [CrossRef]
  13. Reiss, P.T.; Goldsmith, J.; Shang, H.L.; Ogden, R.T. Methods for Scalar-on-Function Regression. Inte. Stat. Rev. 2017, 85, 228–249. [Google Scholar] [CrossRef]
  14. Chen, C.; Guo, S.J.; Qian, X.H. Functional Linear Regression: Dependence and Error Contamination. J. Bus. Econ. Stat. 2022, 40, 444–457. [Google Scholar] [CrossRef]
  15. Yao, F.; Müller, H.G.; Wang, J.L. Functional Linear Regression Analysis for Longitudinal Data. Ann. Stat. 2005, 33, 2873–2903. [Google Scholar] [CrossRef] [Green Version]
  16. Müller, H.G.; Yao, F. Functional Additive Models. J. Am. Stat. Assoc. 2008, 103, 1534–1544. [Google Scholar] [CrossRef]
  17. Kramer, N.; Boulesteix, A.L.; Tutz, G. Penalized Partial Least Squares with Applications to B-spline Transformations and Functional Data. Chem. Intell. Lab. Syst. 2008, 94, 60–69. [Google Scholar] [CrossRef] [Green Version]
  18. Hayashi, K.; Hayashi, M.; Reich, B.; Lee, S.P.; Sachdeva, A.U.C.; Mizoguchi, I. Functional Data Analysis of Mandibular Movement Using Third-degree B-Spline Basis Functions and Self-modeling Regression. Orthod. Waves 2012, 71, 17–25. [Google Scholar] [CrossRef]
  19. Aguilera, A.M.; Aguilera-Morillo, M.C. Penalized PCA Approaches for B-spline Expansions of Smooth Functional Data. Appl. Math. Comput. 2013, 219, 7805–7819. [Google Scholar] [CrossRef]
  20. Berlinet, A.; Elamine, A.; Mas, A. Local Linear Regression for Functional Data. Ann. Inst. Stat. Math. 2011, 63, 1047–1075. [Google Scholar] [CrossRef] [Green Version]
  21. Abeidallah, M.; Mechab, B.; Merouan, T. Local Linear Estimate of the Point at High Risk: Spatial Functional Data Case. Commun. Stat. Theory Methods 2020, 49, 2561–2584. [Google Scholar] [CrossRef]
  22. Sara, L. Nonparametric Local Linear Regression Estimation for Censored Data and Functional Regressors. J. Korean Stat. Soc. 2020, 51, 1–22. [Google Scholar] [CrossRef]
  23. Lei, X.; Zhang, H. Non-asymptotic Optimal Prediction Error for RKHS-based Partially Functional Linear Models. arXiv 2020, arXiv:2009.04729. [Google Scholar]
  24. Fang, K.T.; Li, R.; Sudjianto, A. Design and Modeling for Computer Experiments; Chapman & Hall/CRC: New York, NY, USA, 2006. [Google Scholar]
  25. Lai, T.L.; Robins, H.; Wei, C.Z. Strong Consistency of Least Squares Estimates in Multiple Regression. Proc. Natl. Acad. Sci. USA 1978, 75, 3034–3036. [Google Scholar] [CrossRef] [Green Version]
  26. Eicker, F. Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions. Ann. Math. Stat. 1963, 34, 447–456. [Google Scholar] [CrossRef]
  27. Wahba, G. Spline Models for Observational Data; SIAM: Philadelphia, PA, USA, 1990. [Google Scholar]
  28. Gu, C. Smoothing Spline ANOVA Models; Springer: New York, NY, USA, 2002. [Google Scholar]
  29. Rapoport, A.P.; Aqui, N.A.; Stadtmauer, E.A.; Vogl, D.T.; Fang, H.B.; Cai, L.; Janofsky, S.; Chew, A.; Storek, J.; Gorgun, A.; et al. Combination immunotherapy using adoptive T-cell transfer and tumor antigen vaccination based on hTERT and survivin after ASCT for myeloma. Blood 2011, 117, 788–797. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Fang, H.B.; Wu, T.T.; Rapoport, A.P.; Tan, M. Survival Analysis with Functional Covariates Based on Partial Follow-up Studies. Stat. Methods Med. Res. 2016, 25, 2405–2419. [Google Scholar] [CrossRef]
  31. Zhang, H.; Jia, J. Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection. Stat. Sin. 2022, 32, 181–207. [Google Scholar] [CrossRef]
  32. Ledoux, M.; Talagrand, M. Probability in Banach Spaces; Springer: New York, NY, USA, 1991. [Google Scholar]
Figure 1. Left panel visualizes the density function of E ( 0 , 1 ) ; right panel visualizes the kernel density estimation of the number of observation time of MD001.
Figure 1. Left panel visualizes the density function of E ( 0 , 1 ) ; right panel visualizes the kernel density estimation of the number of observation time of MD001.
Mathematics 10 02507 g001
Figure 2. Performance of curve estimation when the sample size is 50 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines: 95 % confidence bands.
Figure 2. Performance of curve estimation when the sample size is 50 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines: 95 % confidence bands.
Mathematics 10 02507 g002
Figure 3. Performance of curve estimation when the sample size is 100 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines: 95 % confidence bands.
Figure 3. Performance of curve estimation when the sample size is 100 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines: 95 % confidence bands.
Mathematics 10 02507 g003aMathematics 10 02507 g003b
Figure 4. Performance of curve estimation when the sample size is 200 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines: 95 % confidence bands.
Figure 4. Performance of curve estimation when the sample size is 200 and the random seed is “set.seed(1)” in R. First row: curve estimation performance of the spline smoother; Second row: curve estimation performance of the local polynomial model; Third row: curve estimation performance of the proposed RKHS method. Solid red line: true curve; Solid blue line: estimated curve; Dotted lower and upper green lines: 95 % confidence bands.
Mathematics 10 02507 g004
Figure 5. Boxplots of the RIMSPE values. The first row corresponds to sample size 50, the second row corresponds to sample size 100, and the third row corresponds to sample size 200. In each row, the left panel is for estimating β 0 ( t ) , the middle panel is for estimating β 1 ( t ) , and the right panel is for estimating β 2 ( t ) .
Figure 5. Boxplots of the RIMSPE values. The first row corresponds to sample size 50, the second row corresponds to sample size 100, and the third row corresponds to sample size 200. In each row, the left panel is for estimating β 0 ( t ) , the middle panel is for estimating β 1 ( t ) , and the right panel is for estimating β 2 ( t ) .
Mathematics 10 02507 g005
Figure 6. Left panel: trajectory of individual “MD001"; right panel: trajectory of individual “MD002". The observation interval has been scaled to [ 0 , 1 ] .
Figure 6. Left panel: trajectory of individual “MD001"; right panel: trajectory of individual “MD002". The observation interval has been scaled to [ 0 , 1 ] .
Mathematics 10 02507 g006
Figure 7. The regression coefficient functions estimated by the proposed RKHS method. Solid blue line: estimated curve; dotted lower and upper green lines: 95 % confidence bands. The time t has been scaled to the interval [ 0 , 1 ] .
Figure 7. The regression coefficient functions estimated by the proposed RKHS method. Solid blue line: estimated curve; dotted lower and upper green lines: 95 % confidence bands. The time t has been scaled to the interval [ 0 , 1 ] .
Mathematics 10 02507 g007
Table 1. Summary of simulation results for linearity testing.
Table 1. Summary of simulation results for linearity testing.
Sample SizeType I Error
(for Testing β 0 ( t ) )
Power
(for Testing β 1 ( t ) )
500.0590.756
1000.0520.865
2000.0510.923
The simulation is based on 1000 repetitions.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Huang, H.; Mo, G.; Li, H.; Fang, H.-B. Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions. Mathematics 2022, 10, 2507. https://doi.org/10.3390/math10142507

AMA Style

Huang H, Mo G, Li H, Fang H-B. Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions. Mathematics. 2022; 10(14):2507. https://doi.org/10.3390/math10142507

Chicago/Turabian Style

Huang, Hengzhen, Guangni Mo, Haiou Li, and Hong-Bin Fang. 2022. "Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions" Mathematics 10, no. 14: 2507. https://doi.org/10.3390/math10142507

APA Style

Huang, H., Mo, G., Li, H., & Fang, H.-B. (2022). Representation Theorem and Functional CLT for RKHS-Based Function-on-Function Regressions. Mathematics, 10(14), 2507. https://doi.org/10.3390/math10142507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop