1. Introduction
Functional data can be found in various fields, such as biology, economics, engineering, geology, medicine and psychology. Recently, statistical methods and theories on functional data were widely studied ([
1,
2,
3,
4,
5,
6]). Functional data sometimes have more complicated structures. For example, the motivation data of this paper, stroke rehabilitation data, utilized a collection of 3D video games known as Circus Challenge to enhance upper limb function in stroke patients ([
7,
8,
9]). The patients were scheduled to play the movement game over three months at specified times. At each visit time
t, the level of impairment of stroke subject
i was measured using the CAHAI (Chedoke Arm and Hand Activity Inventory) score, denoted as
, and movements of upper limbs of patients, such as forward circle movement and sawing movement, were also recorded. The movement data at time
t and frequency
s from the
ith patient were denoted as
. Determining a way to model the relationship of
and functional
is key to studying whether the movements are helpful to the rehabilitation level of the stroke patient or not. Furthermore, there is a question of whether there are interaction effects among the movements on the stroke patient’s rehabilitation. Zhai et al. [
9] developed a nonparametric concurrent regression model to study the relationship between functional movements and the CAHAI score. However, they did not consider the interaction effects of functional movements on the CAHAI score. We aim to examine the interaction effects of movements on CAHAI scores and predict the rehabilitation level of stroke patients in this paper.
In this paper, we apply the following nonparametric concurrent regression model (NCRM) to model stroke rehabilitation data:
where
f is a bivariate functional to be estimated nonparametrically, response
is a function of
t, covariate
is a vector of functional with length
q, and
is a random error. To explore the interaction effects among components of covariates
, we use the smooth spline analysis of variance (SS ANOVA) method [
10,
11] to decompose regression function
f.
A multivariate function can be decomposed of main effects and interaction effects via the SS ANOVA method ([
10,
11]). When the dimension of covariates
q is large, the decomposed model contains a large number of interaction effects. Even if only the main effects and second-order interaction terms are investigated, the order of the number of decomposition terms is
, which leads to a highly complicated model. To model stroke rehabilitation data with
, there are 22 terms, including the main effects and interaction effects. This challenges the estimation method for the NCRM model. To avoid this shortcoming, Zhai et al. [
9] took all functional covariates as a whole and did not consider interaction effects among covariates. Following [
12], this paper conducts a model selection method for the NCRM model with all main effects and interaction effects. In this method, the regression function is estimated and significant components of the decomposition are selected simultaneously.
Model selection is a crucial step in building statistical models that accurately capture the relationships between variables ([
13,
14]). It can choose the most suitable model from a set of candidate models based on certain criteria such as goodness-of-fit, predictive performance, and interpretability. Based on the SS ANOVA approach, model selection is crucial to determine the contribution of each component of the decomposition to the overall variance of the response variable. Several methods have been proposed for selecting models with SS ANOVA, including forward selection, backward elimination, and stepwise regression ([
15,
16,
17,
18,
19,
20]). However, these methods are limited in their ability to handle high-dimensional data and identify complex interactions among variables. Hence, regularization methods such as the
penalty have gained popularity in recent years ([
12,
21,
22,
23,
24,
25]), which allow for the selection of sparse and robust models. For example, Zhang et al. [
23] developed a nonparametric penalized likelihood method with the likelihood basis pursuit and used it for variable selection and model construction. Lin and Zhang [
22] proposed a component selection and smoothing method for multivariate nonparametric regression models by penalizing the sum of component norms of SS ANOVA. Furthermore, Wang et al. [
12] developed a unified framework for estimation and model selection methods in nonparametric function-on-function regression, which performs well when using
penalty methods for model selection. Dong and Wang [
24] proposed a nonparametric method for learning conditional dependence structure in graph models by applying
regularization to detect the neighborhoods of edges, where SS ANOVA decomposition is used to depict interaction effects of edges in the graph model. In this paper, we borrowed the
regularization idea to build model selection by penalizing the sum of norms of the ANOVA decomposition components for the NCRM model. In addition, Bayesian analysis methods can also be used to study interaction effects; for example, Ren et al. [
26] proposed a novel semiparametric Bayesian variable selection model for investigating linear and nonlinear gene–environment interactions simultaneously, allowing for structural identification.
This paper proposes an estimation and model selection approach for the NCRM model (
1). Following [
12,
22], the SS ANOVA decomposition for the tensor product space of the reproducing kernel Hilbert spaces (RKHS) is constructed, and the
penalty approach for the components of the decomposition is implemented. We use estimation procedures under either an
or a joint
and
penalty to fit teh NCRM model. We study the interaction effects of the covariate
in model (
1) via ANOVA decomposition of the regression function, where the tensor product RKHS is built based on Gaussian kernels. The decomposition is different from that of Zhai et al. [
9], where they took the covariate as a whole variable and did not consider their interaction effects. Based on the decomposition, model selection with the tensor product RKHS is conducted using the
penalty method. With regards to the covariate
, the models from Wang et al. [
12] are not suitable to analyze the stoke data. In this paper, we apply the proposed method to stroke rehabilitation data and study the relationship of the movements and the patient’s CAHAI score. Besides the main effects, the interaction effect of the movements is also detected.
The remainder of the article is organized as follows. In
Section 2, we present the tensor product RKHS with the Gaussian kernel and the SS ANOVA decomposition of the regression function. In
Section 3, we show model selection and estimation procedures. The simulation study and application of stroke rehabilitation data are presented in
Section 4 and
Section 5. We conclude in
Section 6.
2. Nonparametric Concurrent Regression Model
For the NCRM model (
1), we consider
, where
for any fixed time
is a function of
s within a space denoted by
,
. Generally,
t and
s can be transformed into
. For simplicity, we let
and
and let
,
, which are independent of
t. Furthermore, we assume that
and
for
are identically and independently distributed in
with mean zero and
. It is shown that the regression function
f is a functional function with an independent covariate
. To provide a nonparametric estimation of
f, the SS ANOVA decomposition method is used to construct a tensor product space of RKHS to which
f belongs.
When
f is treated as a function with respect to the first augment
, we consider the Sobolev space [
10],
where
can be rewritten as
where
is a constant space,
is a linear function space with
t as an independent variable.
is a smooth function space orthogonal to the constant space and the linear function space. Reproducing kernels (RK) for these three subspaces are
,
, and
, where
,
and
are defined as
For functional augments
, RK and its corresponding RKHS for
f as a function of functions in
are constructed as follows. For any
, we construct a Gaussian kernel as
where
. We can show that when the space
is a complete space,
is a symmetric and strictly positive definite. The unique RKHS
derived from
is separable and does not contain any non-zero constants. To construct an SS ANOVA decomposition, we let
. Then, the tensor product space in this paper is
with the following decomposition:
Decomposition (
4) is different from that of Zhai et al. [
9] where
is decomposed of constant space
and another RKHS not considering interaction among
.
Next, we consider the tensor product space
which has the following decomposition:
There, the null space
stands for the main effect of the parametric form of
t,
is the main effect of the non-parametric form of
t,
is the main effect of the non-parametric form of
,
is the linear nonparametric interaction between
t and
,
is the nonparametric nonparametric interaction between
t and
, and so on,
is the nonparametric nonparametric interaction between
t and
u, where
. We denote
and
as the basis functions of
. For example, with
, the RKs corresponding to the above sub-RKHS are
for
and
, where the left and right parts stand for the tensor product spaces and their corresponding RKs, respectively.
3. Model Selection and Estimation
We let the projection of
f onto
be
,
, and
be the sub-RKHS generated by the tensor product method in
Section 2, where
Q is a number of sub-RKHS.
penalties are applied to coefficients
for the space
and components of the decomposition of
f (projections of
f onto
). We estimate
f by minimizing the following penalized least squares:
where
,
is the projection operator onto
,
is a norm induced from
,
and
are tuning parameters, and
are pre-specified weights. We may set
when
to avoid penalty to the constant function.
Since the response function is a stochastic process in the
space, there exists a set of orthogonal basis functions
in
, where
is an empirical functional principal component (EFPC) of
([
27]). We let
and
for
and
. We assume that
are bounded linear functionals. With EFPC, functional data can be transformed to scalar data such that modeling and analysis can be conducted by using traditional statistical methods. It can show that the PLS (
5) based on functional data
reduces to the following PLS based on scalar data
:
By Lemma 3.1 in Wang et al. [
12], minimizing the PLS (
6) is equivalent to minimizing the following PLS:
subject to
for
, where
,
,
are tuning parameters.
We let
. To provide an RK with linear combination of its subspaces RK for the space of
, we define a new inner product in
,
where
is the inner product in
. Under the new inner product, the RK of
is
where coefficient
can measure the contribution of the components of the decomposition to the model. Next, we use the reproducing property of the kernel function to transform the infinite-dimensional optimization problem (
7) into a finite-dimensional solution problem. We let
, which is a subspace of
. Then, any
can be decomposed as
where
,
, and
. We denote
as the evaluation function of point
, and
. Then, we can rewrite the PLS (
7) as
where
. The first equality uses the reproducing property, and the third equality uses the fact that
is orthogonal to
. Minimizing (
9) must have
, and we obtain the following representer theorem:
Theorem 1 (Representer Theorem).
The solution to PLS (9) iswhere , , and . From this representer theorem, the PLS (
9) reduces to
where
,
,
. We let
, the
th element of
is
. We let
,
,
,
,
,
be an
matrix with
as the
element, and
T be a
matrix with
as the
element. Then, the PLS (
11) reduces to
subject to
.
The backfitting algorithm in Wang et al. [
12] is applied to solve the PLS (
12) as follows (Algorithm 1):
Algorithm 1 Model Selection Algorithm |
Set initial value , . |
repeat |
Update c by minimizing |
Calculate , where R is an matrix with the v-th column being |
Update d by minimizing |
select tuning parameter M by the k-fold cross-validation or BIC method |
Update by minimizing subject to for and |
until c, d and converge |
return c, d and |
5. Simulation
In this section, numerical experiments are studied to evaluate the performance of the proposed model selection approach. Functional covariate take
, where
, and
follows a Gaussian process with mean function
. Kernel function for the GP takes the RBF kernel
for
and the rational quadratic kernel
for
. Three functions for
are presented as follows: for
[0, 1],
We see that
has the main effect of
t,
consists of three main effects and the nonparametric nonparametric interaction of
and
,
consists of the main effect of
t, and the nonparametric nonparametric interaction of
and
. Random error
follows
and
. All simulations are repeated 200 times.
We generate
n samples
as training data, and
samples
as test data. For comparison, we evaluate the performance using the following root mean square error (RMSE) on the test data:
where
is the norm of
.
The proposed model selection method is used to train the model and predict the test data, denoted by
. Not considering model selection, we use the
penalty method to estimate the NCRM model denoted by
, which minimizes the following objective function,
where
is the tuning parameter. After model selection, the selected model is estimated with the
penalty method, which is denoted by
.
Table 1 shows the average RMSE and standard deviation in parentheses for these three estimation methods,
,
and
. We see that under models M1 and M3,
have the smallest RMSEs among these three estimation methods. Under model M2,
has better performance than
and comparable results with those of
. In addition, for the three different methods, prediction performance improves as the
decreases and the training sample size increases.
To evaluate the performance of model selection by the
penalty method, we take three measurement indices in Wang et al. [
12], specificity (SPE), sensitivity (SEN) and
scores,
where TP, TN, FP and FN are the numbers of true positives, true negatives, false positives and false negatives, respectively. The non-zero components of the decomposition of the regression function are considered as positive samples. For
, its estimated value is larger than 0, which is considered a true positive.
Table 2 shows the sensitivities, specificities, and F1 scores. Overall, the
penalty method performs well in different simulation settings. In addition, model selection becomes better with decreasing
and increasing training sample size.
6. Application
The proposed model selection approach is applied to analyze stroke rehabilitation data with 70 stroke survivors ([
7]).
The data consist of 34 acute patients with an incidence of stroke less than a month ago and 36 chronic patients with an incidence of stroke more than six months ago. To improve upper limb functions for stroke patients, a convenient home-based rehabilitation system via action video games with 3D-position movement behaviors has been developed [
7,
8]. The patients played the movement game at scheduled times. For each visit time
t, the impairment level of subject
i was assessed using a measure called CAHAI (Chedoke Arm and Hand Activity Inventory) score, denoted as
, and movements such as forward circular movement and sawing movement were recorded. In this paper, three movements, forward circular movement of the parental limb from the
x-axis (
), sawing movement of the parental limb from the
y-axis (
), and the movement of the non-parental limb from the direction of the
x-axis (
) are taken as functional covariates. For the purpose of illustrating the proposed method, we use the data from acute patients. During the three-month study period, each acute patient received up to seven assessments, which resulted in 173 observations. CAHAI scores were normalized before analysis.
In this paper, we focus on the interaction effect upon the order of two, and take the following decomposition:
for
and
. Readers can also choose various kinds of SS ANOVA decomposition by merging kernel functions according to their own needs. From
Section 3, we have
where coefficient
for kernel function
provides levels of contribution of
to the overall model.
The penalty method with
regularization for model selection is applied to stroke rehabilitation data. Parameters
are computed, and values larger than 0 are
,
,
,
and
. This shows that the main effects of
,
and
, the linear nonparametric interaction of
t and
and the nonparametric-nonparametric interaction of
and
have nonzero contributions to the CAHAI score. Thus, the three movements, forward circular movements of the parental limb, awing movements of the parental limb and of the non-parental limb, may be helpful to the recovery of stroke patients. In addition, the interaction of awing movements of the parental limb and the non-parental limb, may contribute to the level of daily life dependence or upper limb function impairment.
Figure 1 plots the estimates of nonparametric regression functions for four stroke patients. We can see that the regression function in the NCRM model has the same trend as the scores of CAHAI, and on the whole, they all show an increasing trend along with fluctuating trends, which shows that movements may improve upper limb function for stroke patients.
Prediction performance of the proposed method is evaluated using a tenfold cross-validation,
where
is the predicted value of
based on the fitted selected model with the
penalty and the data excluding the
jth fold. The RPE for Stoke data is 1.0690, which is smaller than 1.1700 from the method of Zhai et al. [
9].