Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model

Ran, Mengfei; Yang, Yihe

doi:10.3390/math10224322

Open AccessArticle

Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model

by

Mengfei Ran

¹

and

Yihe Yang

^2,*

¹

Graduate School of Engineering Science, Osaka University, Osaka 560-0043, Japan

²

Department of Population and Quantitative Health Science, Case Western Reserve University, Cleveland, OH 44106, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(22), 4322; https://doi.org/10.3390/math10224322

Submission received: 16 October 2022 / Revised: 7 November 2022 / Accepted: 12 November 2022 / Published: 17 November 2022

(This article belongs to the Special Issue Advances of Functional and High-Dimensional Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

The estimation of large functional and longitudinal data, which refers to the estimation of mean function, estimation of covariance function, and prediction of individual trajectory, is one of the most challenging problems in the field of high-dimensional statistics. Functional Principal Components Analysis (FPCA) and Functional Linear Mixed Model (FLMM) are two major statistical tools used to address the estimation of large functional and longitudinal data; however, the former suffers from a dramatically increasing computational burden while the latter does not have clear asymptotic properties. In this paper, we propose a computationally effective estimator of large functional and longitudinal data within the framework of FLMM, in which all the parameters can be automatically estimated. Under certain regularity assumptions, we prove that the mean function estimation and individual trajectory prediction reach the minimax lower bounds of all nonparametric estimations. Through numerous simulations and real data analysis, we show that our new estimator outperforms the traditional FPCA in terms of mean function estimation, individual trajectory prediction, variance estimation, covariance function estimation, and computational effectiveness.

Keywords:

functional data analysis; functional linear mixed model; functional principal components analysis; longitudinal data analysis

MSC:

62G20; 62J10; 62R10

1. Introduction

Functional Data Analysis (FDA) is a field of statistics that examines data to provide information about curves, surfaces, or anything else that varies along a continuum [1]. The concrete continuum of these functions can be time, spatial location, wavelength, and probability, with applications to biomechanics, biomedicine, ecology, epidemiology, and neurology, among many others [2,3]. Functional data can be classified into common design data and independent design data, where the sampling locations of all individuals are the same under the former setting, while the sampling locations are individual-wise under the latter setting [4]. Longitudinal data can be regarded as a specific type of functional data where the underlying continuum is just time [5]. In the field of longitudinal data analysis, the counterparts of common design and independent design are called balanced longitudinal data and unbalanced longitudinal data, respectively.

The coronavirus 2019 (COVID-19) data is typical balanced longitudinal data, which records the daily infections (transformed in logarithm) of the COVID-19 pandemic from 51 states of the United States (US), where the date is from 16 March 2020 to 14 August 2020 (Johns Hopkins COVID-19 Case Tracker: https://www.kaggle.com/datasets/thecansin/johns-hopkins-covid19-case-tracker?resource=download). Figure 1 demonstrates the profiles of daily increasing infections and the related trajectories predictions separately yielded with the R package mgcv [6] of six states in the US. From this figure, it is easy to see some basic features of large longitudinal data: (1) functionality, the daily infections of states are likely some smooth curves of time; (2) heterogeneity, different individuals usually have different trends over time; and (3) high-dimensionality, the number of observations (

T = 152

) much larger than the number of individuals (

N = 51

). How to deal with this kind of large functional and longitudinal data is one of the most challenging problems in the field of high-dimensional statistics.

Functional Principal Components Analysis (FPCA) and Functional Linear Mixed Model (FLMM) are two common approaches that estimate the dominant modes of variation of a sample of random trajectories around an overall mean function. FPCA and FLMM share the same principle to estimate the mean function, but apply totally different strategies to estimate the covariance function and predict the individual trajectory. Specifically, FPCA first estimates the two-dimensional covariance function, then yields the eigenfunctions through PCA, and finally predicts the individual trajectory with the estimated eigenfunctions. In contrast, FLMM parameterizes the individual trajectories through basis functions, in particular, it specifies a Gaussian prior distribution on the coefficients of basis functions, then estimates the covariance matrix of these random coefficients using REstricted Maximum Likelihood (REML) [7], and eventually predicts the individual trajectories through the Best Linear Unbiased Prediction (BLUP) [8] and yields the covariance function of random curves using a weighted sum of inner products of basis functions. Commonly-employed nonparametric techniques for FPCA include the kernel method and local polynomial modeling ([9,10,11] and smoothing splines ([12,13,14,15,16]), while frequently-used basis expansion functions include B-splines ([17,18,19,20]), wavelets ([21,22,23]), a combination of linear mixed-effects modeling and local polynomial smoothing ([24,25]), etc. See more aspects of functional data in the following monographs ([1,26]) and review papers ([3,27,28]).

One of the greatest advantages of FPCA may be its clear asymptotic properties. Under mild regularity conditions, Yao et al. [11] showed the convergence rates of estimates of mean function covariance function, variance, eigenfunctions, and eigenvalues. Li and Hsing [10] further derived the strong uniform convergence rates of these estimates. Cai and Yuan [4] investigated the minimax risks of the mean function and covariance function estimates and Cai and Yuan [29] showed the minimax risk of mean function estimate in functional linear regression. However, because FPCA needs to estimate a two-dimensional covariance surface, its computational burden will dramatically increase as the number of observations T rises. In addition, since most FPCA methods employ local polynomial modeling [30] as the estimator, it is very difficult to select the optimal bandwidth in the two-dimensional kernel function in practice. On the other hand, FLMM suffers from more problems compared with FPCA, preventing its practical application in the field of FDA. First, FLMM requires estimating the covariance matrix of the basis expansion coefficients (which are treated as normal variables in LMM), but the estimation of this covariance matrix is usually unreliable if the number of employed bases is large. James et al. [17] provided a dimension reduction to estimate this covariance matrix, which resulted in a more complex estimator. Besides, the asymptotic properties of FLMM are difficult to study. Proper regularity conditions and proof techniques are lacking for FLMM.

In this paper, we propose a computationally effective estimator of large functional and longitudinal data by using FLMM, in which all the parameters can be automatically estimated. We also investigate the large sample property of the proposed automatic and flexible estimation of large functional data. Under mild regularity conditions, we prove that the estimation error of the mean curve estimator is bounded by

O (N^{- 1} + T^{- 2 s})

and the prediction error of the individual trajectory estimator is bounded by

O (T^{- 2 s / (2 s + 1)})

, where

s > 1

is a constant governing the smoothness of tge mean function and individual trajectory. In particular, both these two convergence rates reach the minimax lower boundaries of the mean function estimation and individual trajectory prediction derived by Cai and Yuan [4], meaning that our estimation enjoys the minimax efficiency. To the best of our knowledge, our work is the first to investigate the large sample property of FLMM estimation when both T and N are diverging.

The rest of the paper is organized as follows. In Section 2, we present the primal representation of FLMM to analyze the functional data. In Section 3, the new estimator of large balanced longitudinal data and its large sample property are illustrated. In Section 4, we exhibit the simulation studies. In Section 5, we apply our method to study the COVID-19 data. The concluding remarks are summarized in Section 6 and all the technical proofs of the theorems are relegated to the Appendix.

2. Preliminary

In this section, we introduce the functional data and the top two statistical methods, i.e., FPCA and FLMM, to deal with the functional data briefly.

2.1. Settings

For a vector

a = {(a_{j})}_{p \times 1}

,

{| | a | |}_{q} = (\sum_{j = 1}^{p} | a_{j} {|^{q})}^{1 / q}

with

q \in [0, \infty]

. For a symmetric matrix

A = {(A_{i j})}_{p \times p}

,

λ_{max} (A)

and

λ_{min} (A)

denote the maximum and minimum eigenvalues of matrix

A

, and

{| | A | |}_{q} = {max {| | A a | |}_{q} {, | | a | |}_{q} = 1}

. Besides,

a_{n} ≍ b_{n}

if there are positive constants c and C such that

c \leq a_{n} / b_{n} \leq C

. In addition,

O (\cdot)

and

o (\cdot)

are the infinitely large and small quantities, respectively, while

O_{P} (\cdot)

and

o_{P} (\cdot)

mean that such relationships hold with a probability tending to 1.

2.2. Functional Data Model

Let

X (t)

be a second-order stochastic process defined in the compact interval

T

, which is usually set as

T = [0, 1]

for convenience. The covariance function of

X (t)

is given by

\begin{matrix} C (s, t) = E [{X (s) - E (X (s))} {X (t) - E (X (t))}], (s, t) \in T \times T . \end{matrix}

(1)

Besides, let

X_{1} (t), \dots, X_{N} (t)

be N independent realizations of

X (t)

,

{t_{1}, \dots, t_{T}}

be T sampling locations of the realizations, and the observations of

X_{i} (t_{j})

be

\begin{matrix} Y_{i j} = X_{i} (t_{j}) + ε_{i j}, (i, j) \in {1, \dots, N} \times {1, \dots, T}, \end{matrix}

(2)

where

ε_{i j}

are independent and identically distributed (IID) random errors with mean 0 and variance

σ^{2}

. In addition, the sampling location is considered as fixed numbers in

T

, and

{X_{i} (t_{j})}

and

{e_{i j}}

are two mutually independent groups. In a more general case, the sampling locations can vary from individual, i.e.,

\begin{matrix} Y_{i j} = X_{i} (t_{i j}) + ε_{i j}, (i, j) \in {1, \dots, N} \times {1, \dots, T_{i}}, \end{matrix}

(3)

where

t_{i j}

is a random sampling location and

T_{i}

is the individual-wise sampling locations. In the literature, the setting of functional data subject to (2) is called common design, while the setting subject to (3) is termed independent design. In longitudinal data analysis where t is just time, the data subject to (2) is named as balanced longitudinal data or panel data, while the data subject to (3) is named as the unbalanced longitudinal data.

2.3. Methodologies of FPCA and FLMM

FPCA is almost the most frequently used technique to deal with the functional data in which the Karhunen–Loève representation of the stochastic process plays a central role. Specifically, let covariance function

C (s, t)

satisfy the definition of a Mercer kernel, i.e.,

\begin{matrix} \sum_{i = 1}^{n} \sum_{j = 1}^{n} C (s_{i}, t_{j}) c_{i} c_{j} \geq 0 \end{matrix}

(4)

for all finite sequence of points

(s_{i}, t_{j}) \in T \times T

and all choices of real numbers

c_{1}, \dots, c_{n}

. Then, according to Mercer’s theorem, there consequently exists a series of eigenfunctions

{ϕ_{k} (t)}

and a series of non-increasing eigenvalues

{λ_{k}}

satisfying

\sum_{k = 1}^{\infty} λ_{k} < \infty

such that

\begin{matrix} C (s, t) = \sum_{k = 1}^{\infty} λ_{k} ϕ_{k} (t) ϕ_{k} (s), (s, t) \in T \times T . \end{matrix}

(5)

In particular, under the condition of Mercer kernel, the Karhunen–Loève representation theory is able to guarantee that the ith realization of

X (t)

can be expressed as

\begin{matrix} X_{i} (t) = μ (t) + \sum_{k = 1}^{\infty} ξ_{i k} ϕ_{k} (t), t \in T, \end{matrix}

(6)

where

μ (t)

is termed the mean function and

ξ_{i 1}, \dots, ξ_{i k}, \dots

are uncorrelated random coordinates with mean 0 and variance E

(ξ_{i k}^{2}) = λ_{k}

.

FLMM deals with the functional data in a more direct way than FPCA. Specifically, FLMM expresses each random function by some known basis functions:

\begin{matrix} X_{i} (t) = \sum_{k = 1}^{p} Ψ_{k} (t) b_{i k} + δ_{i} (t), \end{matrix}

(7)

where

{Ψ_{k} (t)}

are a set of basis functions,

{b_{i k}}

are the corresponding random coordinates, and

δ_{i} (t)

is the approximation bias. In practice, the basis functions can be the Tikhonov basis (as defined in

T = [0, 1]

):

\begin{matrix} Ψ_{k} (t) = \sqrt{2} cos ((k - 1) π t) . \end{matrix}

(8)

Alternatively, B-splines [31] are common bases that are generated in an iterative procedure. If

δ_{i} (t)

is negligible, the uncertainty of random function

X_{i} (t)

is almost described by the random coordinates

{b_{i k}}

. In the literature, it is routine to assume

\begin{matrix} (\begin{matrix} b_{i 1} \\ b_{i 2} \\ ⋮ \\ b_{i p} \end{matrix}) \sim N (\begin{matrix} (\begin{matrix} β_{1} \\ β_{2} \\ ⋮ \\ β_{p} \end{matrix}), (\begin{matrix} Γ_{11} & Γ_{12} & \dots & Γ_{1 p} \\ Γ_{21} & Γ_{22} & \dots & Γ_{2 p} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Γ_{p 1} & Γ_{p 2} & \dots & Γ_{p p} \end{matrix}) \end{matrix}), \end{matrix}

(9)

where

β_{1}, \dots, β_{p}

are the mean parameters and

Γ = {(Γ_{i j})}_{p \times p}

is a covariance matrix. As a result,

\begin{matrix} μ (t) \approx \sum_{k = 1}^{p} Ψ_{k} (t) β_{k}, C (s, t) \approx \sum_{k = 1}^{p} \sum_{h = 1}^{p} Γ_{k h} Ψ_{k} (s) Ψ_{h} (t) . \end{matrix}

(10)

In addition, if

{Ψ_{k} (t)}

are chosen as the eigenfunctions of the covariance function

C (s, t)

, the covariance matrix

Γ

will reduce into a diagonal matrix with the eigenvalues

{λ_{k}}

. On the other hand, FPCA can reduce to an FLMM if the eigenfunctions are rotated for a better practical interpretation, which is subject to a similar idea of factor analysis [32]. Generally, since

C (s, t)

is never known in advance, FLMM will employ certain commonly-used basis functions and estimate

Γ

empirically.

2.4. Comparison between FPCA and FLMM

FLMM is more efficient than FPCA in terms of both estimation and prediction. Due to the space limitation, we directly give the conclusions and show the estimation procedures of FPCA and FLMM in Appendix B.

We illustrate the features of FPCA from two aspects: estimation and prediction. For estimation, FLMM does not require estimating a two-dimensional covariance surface

C (s, t)

, which is the most time-consuming component for FPCA. Besides, all terms in FLMM can be automatically estimated using many mature and effective algorithms, such as the Newton–Raphson algorithm. Whereas the main approaches for FPCA to estimate the mean

μ (t)

and covariance

C (s, t)

are local linear modeling [10], which is based on the Kernel function, it requires determining three different bandwidths

h_{u}, h_{C}, h_{v}

(see details in Appendix B), which is supposedly very complex in practice. For prediction, FPCA will use all the observations in addition to the local linear coefficients

{\hat{a}}_{0}, \dots, {\hat{c}}_{1}

, which are generated in the estimation process, to predict the values of

μ (t)

,

X_{i^{*}} (t)

,

C (s, t)

at any new locations

t_{i^{*} j}

(and

s_{i^{*} j}

), which is extremely costly in implementation. In contrast, predicting these functions is trivial for FLMM: it only needs to generate a new vector

Ψ_{i^{*}} = {(Ψ (t_{i^{*} 1}), \dots, Ψ (t_{i^{*} T_{i^{*}}}))}^{⊤}

and then predict the values by using

\hat{β}

,

{\hat{b}}_{i}

,

\hat{Γ}

and

{\hat{σ}}^{2}

.

However, the most serious problem of FLMM is the lack of statistical theory in comparison with the numerous theoretical investigations of FPCA. One theoretical burden of FLMM is inherited from the LMM and generalized linear model (GLM, [33]): the statistical properties of LMM and GLM are different to analyze. As far as we are concerned, the tool to investigate the asymptotic properties of GLM is first given by Vonesh et al. [34], who pointed out that the convergence rate of the mean parameter is of

O (T_{min}^{- 1 / 2})

, where

T_{min}

is the minimum number of observations among all individuals. However, this convergence rate is not ideal because

T_{min}

can be less than 10 in many longitudinal data. Many works aim to improve the GLM estimation by removing the high-order bias of Laplace approximation, such as [35,36,37]; however, these bias-corrections make the asymptotic properties more difficult to analyze.

Another theoretical burden of FLMM is inherited from the basis expansion method, approximating a smooth function through a series of basis functions. As Ruppert et al. [38] once commented: “The literature on inference in smoothing is large and varied. Much of it is in the local polynomial or kernel smoothing context, where theoretical properties are more tractable. Inference for spline-based smoothing is less studied.” Indeed, it is unclear under what conditions the FLMM using the basis expansion method is consistent. In the next section, we improve the traditional FLMM in terms of these two burdens.

3. Optimal Estimation for Large Balanced Longitudinal Data

In this paper, we propose a novel FLMM estimation for large balanced longitudinal data, which is effective in terms of computation and optimal in terms of statistical theory. Our new FLMM estimation for large balanced longitudinal data is motivated by the COVID-19 data, in which the number of sampling locations (i.e., the dates from 16 March 2020 to 14 August 2020) is much larger than the number of individuals (i.e., the 50 states plus the District of Columbia). The traditional estimation of FLMM may encounter numerical instability and becomes extremely time-consuming because the complexity of the REML estimator dramatically increases with the dimension of the covariance matrix

Σ_{y}

. In contrast, our new estimator for the variance components remains efficient and stable even when the dimension of

Σ_{y}

is very high. Note that under the setting of balanced longitudinal data, the sampling locations are the same for all individuals, i.e.,

t = {(t_{1}, \dots, t_{T})}^{⊤}

, the covariance matrix

Σ_{y_{i}}

will become a universal one

Σ_{y}

and the basis function matrix

Ψ_{i}

will reduce to a universal matrix

Ψ

for all individuals.

3.1. Novel Estimating Procedure

The biggest difference between our new estimation and the traditional one is that we will use different numbers of basis functions to estimate the mean function and the individual trajectory. In order to guarantee both the mean function estimation and individual trajectory prediction meet the optimal convergence rates, we approximate the mean function and individual trajectory as follows. Specifically:

\begin{matrix} μ (t) \approx \sum_{k = 1}^{q} Ψ_{k} (t) β_{k}, X_{i} (t) \approx μ (t) + \sum_{k = 1}^{p} Ψ_{k} (t) c_{i k}, \end{matrix}

(11)

where

1 \leq p \leq q \leq T

. Regarding the matrix form, we have

\begin{matrix} μ \approx Ψ_{q} β, X_{i} \approx μ + Ψ_{p} c_{i}, \end{matrix}

(12)

where

Ψ_{q} = {(Ψ_{k} (t_{j}))}_{T \times q}

,

Ψ_{p} = {(Ψ_{k} (t_{j}))}_{T \times p}

,

β = {(β_{1}, \dots, β_{q})}^{⊤}

,

c_{i} = {(c_{i 1}, \dots, c_{i p})}^{⊤}

. Here, the vector of random coordinates

c_{i}

is considered to follow a normal distribution

\begin{matrix} c_{i} \sim N (0, Γ), \end{matrix}

(13)

where

Γ = (Γ_{j k})

is a

(p \times p)

positive definite symmetric matrix. It is used to describe the local variability of individual trajectory

X_{i} (t)

beyond the mean function

μ (t) .

The first step of our estimator is estimating the mean coefficient

β

:

\begin{matrix} \hat{β} & = arg min_{β} \{\sum_{i = 1}^{N} {(Y_{i} - Ψ_{q} β)}^{⊤} {\hat{Σ}}_{y}^{- 1} (Y_{i} - Ψ_{q} β) + γ β^{⊤} Q β\} \\ = {(N Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Ψ_{q} + γ Q)}^{- 1} Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} (\sum_{i = 1}^{N} Y_{i}), \end{matrix}

(14)

where

{\hat{Σ}}_{y} = {\hat{σ}}^{2} I_{T} + Ψ_{p} \hat{Γ} Ψ_{p}^{⊤}

,

γ

is a smoothing parameter, and

Q

is a known semi-positive definite symmetric matrix ensuring

\begin{matrix} \int_{t \in T} {[μ^{^{″}} (t)]}^{2} d t \approx β^{⊤} Q β . \end{matrix}

(15)

Compared with traditional FLMM, which estimates

μ (t)

by (A51), we add a ridge penalty

β^{⊤} Q β

to control the smoothness of

μ (t)

. If

Ψ (t)

is chosen as the Tikhonov bases, then

Q

is a diagonal matrix with the jth entry

P_{j j} = 2 {((j - 1) π)}^{4}

. If

Ψ (t)

is chosen as the Bsplines, then

Q = D_{2}^{⊤} D_{2}

where the expression of

D_{2}

can be found in Eilers and Marx [31]. In addition, the covariance matrix of

\hat{β}

is given by

\begin{matrix} cov (\hat{β}) = {(N Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Ψ_{q} + γ Q)}^{- 1} . \end{matrix}

(16)

When T is large, we recommend the Sherman–Morrison–Woodbury formula to yield

{\hat{Σ}}_{y}^{- 1}

:

\begin{matrix} {\hat{Σ}}_{y}^{- 1} = \frac{1}{{\hat{σ}}^{2}} (I_{T} - Ψ_{p} {(Ψ_{p}^{⊤} Ψ_{p} + {\hat{σ}}^{2} {\hat{Γ}}^{- 1})}^{- 1} Ψ_{p}^{⊤}) . \end{matrix}

(17)

The mean function is then given by

\hat{μ} = Ψ_{q} \hat{β}

and its asymptotic covariance matrix is

\begin{matrix} cov (\hat{μ}) = Ψ_{q} {(N Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Ψ_{q} + γ Q)}^{- 1} Ψ_{q}^{⊤} . \end{matrix}

(18)

The confidence band of

\hat{μ}

is constructed by using the diagonal elements of

cov (\hat{μ})

.

The second step is to predict

c_{i}

by using BLUP:

\begin{matrix} \hat{c_{i}} & = arg min_{c_{i}} {\frac{1}{{\hat{σ}}^{2}} | | Y_{i} - \hat{μ} - Ψ_{p} c_{i} {| |}_{2}^{2} + c_{i}^{⊤} {\hat{Γ}}^{- 1} c_{i}} \\ = (Ψ_{p}^{⊤} Ψ_{p} + {\hat{σ}}^{2} {\hat{Γ}}^{- 1}) Ψ_{p}^{⊤} (Y_{i} - \hat{u}) . \end{matrix}

(19)

This estimator is called BLUP because

\hat{c_{i}}

can be regarded as the mean of the posterior distribution of

c_{i}

. The prediction of

X_{i}

is

{\hat{X}}_{i} = \hat{μ} + Ψ {\hat{c}}_{i}

. In addition, the asymptotic covariance matrix of

{\hat{c}}_{i}

is

\begin{matrix} cov ({\hat{c}}_{i}) = {\hat{σ}}^{2} {(Ψ_{p}^{⊤} Ψ_{p} + {\hat{σ}}^{2} {\hat{Γ}}^{- 1})}^{- 1}, \end{matrix}

(20)

and the asymptotic covariance matrix of

{\hat{X}}_{i}

is given by

\begin{matrix} cov ({\hat{X}}_{i}) = Ψ_{q} {(N Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Ψ_{q} + γ Q)}^{- 1} Ψ_{q}^{⊤} + {\hat{σ}}^{2} Ψ_{p} {(Ψ_{p}^{⊤} Ψ_{p} + {\hat{σ}}^{2} {\hat{Γ}}^{- 1})}^{- 1} Ψ_{p}^{⊤}, \end{matrix}

(21)

which is used to constructed the confidence band of

{\hat{X}}_{i}

.

The novelty of our new estimating procedure is the estimator of

Γ

and

σ^{2}

. Specifically, our method employs the Laplace Approximation Marginal Likelihood (LAML) [39] to estimate

Γ

and

σ^{2}

, which has a much lower computational cost than REML. The objective function of

Γ

and

σ^{2}

based on LAML is given by

\begin{matrix} L (σ^{2}, Γ) = & \sum_{i = 1}^{N} \{\frac{1}{{\hat{σ}}^{2}} | | Y_{i} - \hat{μ} - Ψ_{p} {\hat{c}}_{i} {| |}_{2}^{2} + {\hat{c}}_{j}^{⊤} Γ^{- 1} {\hat{c}}_{j} + log det Γ^{- 1} + log (σ^{2} I_{T})\} \\ + log det (\frac{Ψ_{p}^{⊤} Ψ_{p}}{σ^{2}} + Γ^{- 1}) . \end{matrix}

(22)

In this objective function, the first four components come from the joint log-likelihood function of

{Y_{i}}

and

{c_{i}}

, while the fifth term is resulted by the Laplace approximation of integral

\int f (Y_{i} | c_{i}) f (c_{i}) d c_{i}

, where

f (Y_{i} | c_{i})

is the density function of

Y_{i}

conditioned on

c_{i}

and

f (c_{i})

is the density function of

c_{i}

. The derivative of

L (σ^{2}, Γ)

with respective to

σ^{- 2}

is

\begin{matrix} \frac{\partial L (σ^{2}, Γ)}{\partial σ^{- 2}} = 0 = - N T σ^{2} + \sum_{i = 1}^{N} (| | Y_{i} - \hat{μ} - Ψ_{p} {\hat{c}}_{i} {| |}_{2}^{2}) + trace (Δ (σ^{2}, Γ) Ψ_{p}^{⊤} Ψ_{p}), \end{matrix}

(23)

where

Δ (σ^{2}, Γ) = {(Ψ_{p}^{⊤} Ψ_{p} / σ^{2} + Γ^{- 1})}^{- 1}

. As a result, the fixed-point iteration for

σ^{2}

is

\begin{matrix} σ^{2 [t + 1]} = \frac{1}{N T} \{\sum_{i = 1}^{N} (| | Y_{i} - \hat{μ} - Ψ_{p} {\hat{c}}_{i} {| |}_{2}^{2}) + trace (Δ (σ^{2 [t]}, Γ^{[t]}) Ψ_{p}^{⊤} Ψ_{p})\}, \end{matrix}

(24)

where

σ^{2 [t]}

and

Γ^{[t]}

are the current estimates. On the other hand, the derivative of

L (σ^{2}, Γ)

with respective to

Γ^{- 1}

is

\begin{matrix} \frac{\partial L (σ^{2}, Γ)}{\partial Γ^{- 1}} = - N Γ + \sum_{i = 1}^{N} ({\hat{c}}_{i} {\hat{c}}_{i}^{⊤}) + Δ (σ^{2}, Γ) . \end{matrix}

(25)

Hence, the fixed-point iteration for

Γ

is

\begin{matrix} Γ^{[t + 1]} = \frac{1}{N} \{\sum_{i = 1}^{N} ({\hat{c}}_{i} {\hat{c}}_{i}^{⊤}) + Δ (σ^{2 [t]}, Γ^{[t]})\} . \end{matrix}

(26)

We implement these two fixed-point iterations iteratively and regard the stable solutions as

{\hat{σ}}^{2}

and

\hat{Γ}

. Likewise, we also implement the estimator of

β

, the predictors of

{c_{i}}

, and the estimator of

Γ

and

σ^{2}

iteratively, and regard the stable solutions as the outputting estimates of

β

,

{c_{i}}

,

Γ

, and

σ^{2}

.

The reason why LAML is more efficient than REML is illustrated as follows. The objective function of REML with respect to balanced longitudinal data is

\begin{matrix} R (ϑ) = \sum_{i = 1}^{N} ({(Y_{i} - Ψ_{q} \hat{β})}^{⊤} Σ_{y}^{- 1} (Y_{i} - Ψ_{q} \hat{β}) + log det (Σ_{y})) + log det (Ψ_{q}^{⊤} Σ_{y}^{- 1} Ψ_{q}), \end{matrix}

(27)

where

ϑ = (σ^{2}, vec (Γ))

and

Σ_{y} \approx σ^{2} I_{T} + Ψ_{p} Γ Ψ_{p}^{⊤}

. Because

σ^{2}

and

Γ

are involved in

Σ_{y}

, it is impossible to obtain fixed-point iterations for

σ^{2}

and

Γ

separately as LAML. In other words, it seems that the only way to jointly estimate

σ^{2}

and vec(

Γ

) is by using the Fisher-scoring algorithm. The first-order derivative of

R (ϑ)

with respect to

ϑ_{k}

is

\begin{matrix} \frac{\partial R (ϑ)}{\partial ϑ_{k}} = 0 = - \sum_{i = 1}^{N} {(Y_{i} - Ψ_{q} \hat{β})}^{⊤} Σ_{y}^{- 1} \frac{\partial Σ_{y}}{\partial ϑ_{k}} Σ_{y}^{- 1} (Y_{i} - Ψ_{q} \hat{β}) + trace (P \frac{\partial Σ_{y}}{\partial ϑ_{k}}), \end{matrix}

(28)

while the expectation of second-order derivative of

R (ϑ)

with respect to

ϑ_{k}

and

ϑ_{h}

, i.e., the (

k, h

)th element of the Hessian matrix of

R (ϑ)

, is given by

\begin{matrix} E (\frac{\partial^{2} R (ϑ)}{\partial ϑ_{k} \partial ϑ_{h}}) = trace (P \frac{\partial Σ_{y}}{\partial ϑ_{k}} P \frac{\partial Σ_{y}}{\partial ϑ_{h}}), \end{matrix}

(29)

where

\begin{matrix} P = Σ_{y}^{- 1} - Σ_{y}^{- 1} Ψ_{q} {(Ψ_{q}^{⊤} Σ_{y}^{- 1} Ψ_{q})}^{- 1} Ψ_{q}^{⊤} Σ_{y}^{- 1} . \end{matrix}

(30)

Thus,

ϑ

is updated by

\begin{matrix} ϑ^{[t + 1]} = ϑ^{[t]} - H {(ϑ^{[t]})}^{- 1} g (ϑ^{[t]}), \end{matrix}

(31)

where the kth element of

g (ϑ^{[t]})

is calculated according to (28) and the (

k, h

)th element of

H (ϑ^{[t]})

is yielded by (29).

It is worth pointing out that there is no matrix formula to directly yield

g (ϑ^{[t]})

and

H (ϑ^{[t]})

, i.e., it is the sole way to calculate their elements one-by-one according to (28) and (29). This drawback makes the minimization of REML extremely costly because the computation complexity of matrix inverse

Σ_{y}^{- 1}

and matrix product

P \partial Σ_{y} / \partial ϑ_{k}

are of

O (T^{3})

. In particular, as the number of employed basis function p increases, the number of elements in

H (ϑ^{[t]})

is

{(p^{2} + 1)}^{2}

, which will be more than

10, 000

if p is just 10. As a result, the total computation complexity of the Hessian matrix calculation is

O (T^{3} p^{4})

, which is unacceptable for large balanced longitudinal data. In contrast, our separate fixed-point iterations (24) and (26) have only

O (T p^{2})

computation complexity, and are able to guarantee

{\hat{σ}}^{2}

positive and

\hat{Γ}

positive definite as long as the initial estimates

σ^{2 [0]}

is positive and

Γ^{[0]}

is positive definite. Therefore, our novel estimator is more attractive than the REML-based one for FLMM in high dimensionality. Note that what we contribute is the fixed-point iterations based on LAML. In the paper of LAML [39], the Newton–Raphson algorithm was employed to minimize the corresponding LAML objective function, in which will still face the same numerical problems as REML if T or p is large.

3.2. Regularity Assumptions

Now we turn to investigate the large sample properties of FLMM using our novel estimator. We first give three essential concepts that play central roles in our asymptotic investigation.

Definition 1

(Sub-Gaussianity). A random error

ε_{i}

with mean

E (ε_{i}) = 0

and variance

var (ε_{i}) = σ^{2}

is termed a sub-Gaussian variable if for all

r \in R

,

E (exp (r ε_{i})) \leq exp (\frac{1}{2} τ^{2} r^{2}),

where

τ > 0

is called the sub-Gaussian parameter.

Definition 2

(Well-conditioned covariance matrix). An

(p \times p)

symmetric matrix Σ is termed the well-conditioned covariance matrix if there is a constant

c_{0}

independent of p such that

0 < c_{0}^{- 1} \leq λ_{min} (Σ) \leq λ_{max} (Σ) \leq c_{0} < \infty .

Definition 3

(Sobolev–Hilbert space). A functional space

F_{s}

is called s-order Sobolev–Hilbert space if there is a constant

c_{0}

such that

\forall f \in F_{s}

,

E (\int_{t \in T} {[f^{s} (t)]}^{2} d t) \leq c_{0} < \infty .

Sub-Gaussianity is an important concept in high-dimensional statistical analysis, which generalizes the Gaussian distribution to include common continuous variables and all bounded discrete variables [40]. Besides, the concept of well-conditioned covariance is proposed by Bickel and Levina [41], which guarantees that all of its eigenvalues are bounded away from zero and infinity whatever its dimension is. In addition, Sobolev–Hilbert space is the most commonly-used functional space in nonparametric statistical analysis [42]. The minimax lower bound of function estimate in this space is

\begin{matrix} inf_{\hat{f}} sup_{f \in F_{s}} E (| | \hat{f} - {f | |}_{2}^{2}) \geq d_{0} N^{- \frac{2 s}{2 s + 1}}, \end{matrix}

(32)

where

d_{0} > 0

is a certain constant and N describes the sample size.

Now we give the following assumptions that facilitate the proofs of theorems.

Assumption 1

(Sub-Gaussian random error). For all

(i, j) \in {1, \dots, N} \times {1, \dots, T}

, the random error

ε_{i j}

is independently and identically distributively generated from a sub-Gaussian distribution with mean 0, variance

σ^{2}

, and sub-Gaussian parameter

τ_{ϵ}

.

Assumption 2

(Quasi uniform locations). The sampling locations

t_{1}, \dots, t_{T}

form a quasi uniform sequence defined in

T

.

Assumption 3

(Basis functions). For a series of basis functions

{Ψ_{k} (t)}

,

\int_{t \in T} Ψ_{k} (t) d_{t} = 0, \int_{t \in T} {[Ψ_{k} (t)]}^{2} d t = 1, \int_{t \in T} {[Ψ_{k} (t)]}^{4} d t \leq c_{0} < \infty,

for all

k \in {1, \dots, p}

, where

c_{0} > 0

is a constant. Besides, let

Ψ_{p} (t) = {(Ψ_{1} (t), \dots, Ψ_{p} (t))}^{⊤}

and

Σ_{Ψ_{p}} = \int_{t \in T} Ψ_{p} (t) Ψ_{p} {(t)}^{⊤} d t .

Then

Σ_{Ψ_{p}}

is a well-conditioned covariance matrix.

Assumption 4

(Approximation error of mean function). For a mean function

μ (t) \in F_{s}

and a series of basis functions

{Ψ_{k} (t)}

satisfying Assumption 3, there exists a unique series of coefficients

{β_{k}}

satisfying

{lim}_{q \to \infty} \sum_{k = 1}^{q} β_{k}^{2} \leq c_{0} < \infty

such that

\int_{t \in T} {(μ (t) - \sum_{k = 1}^{q} Ψ_{k} (t) β_{k})}^{2} d t \leq c_{0} q^{- 2 s},

where

c_{0} > 0

is certain constant.

Assumption 5

(Approximation error of individual function). For any individual trajectory

X_{i} (t) \in F_{s}

and a series of basis functions

{Ψ_{k} (t)}

satisfying Assumption 3, there exists a unique series of random coefficients

{c_{i k}}

satisfying

{lim}_{p \to \infty} \sum_{k = 1}^{p} E (c_{i k}^{2}) \leq c_{0} < \infty

such that

E (\int_{t \in T} {[X_{i} (t) - μ (t) - \sum_{k = 1}^{p} Ψ_{k} (t) c_{i k}]}^{2} d t) \leq c_{0} p^{- 2 s}, i \in {1, \dots, N},

where

c_{0} > 0

is certain constant.

Assumption 6

(Distribution of random coefficients). Random coefficient

c_{i k}

is sub-Gaussian with E

(c_{i k}) = 0

, var(

c_{i k}) = Γ_{k k}

, sub-Gaussian parameter

τ_{k}

, and

c_{i k}

is independent of

c_{j k}

for all

i \neq j

. Furthermore,

T Γ = {(T Γ_{j k})}_{p \times p}

is a well-conditioned covariance matrix.

Assumption 1 considers the random error to be independent and identically distributed (IID) sub-Gaussian variable for all i and j. Assumption 2 sets the sampling locations and

t_{1}, \dots, t_{T}

can be regarded as a uniform sequence. As a result, the related integral of a certain function of t can be approximated by averaging the values of this function at these discrete locations. Assumption 3 summarizes some basic properties of the basis functions

{Ψ_{k} (t)}

. Assumptions 4 and 5 describe how the accuracy of a function can be approximated by using p bases. Tsybakov [43] showed that, if the bases are the Tikhonov bases, Assumptions 4 and 5 hold as long as

μ (t)

and

X_{i} (t)

belong to the Sobolev–Hilbert space

F_{s}

. However, for a series of general basis functions, it remains unclear if Assumptions 4 and 5 still hold. In other words, we assume that we have picked the basis functions that have similar properties as the Tikhonov bases such that these two hold. Assumption 6 lists some conditions of the random coefficients. In particular, since

\begin{matrix} λ_{max} (Ψ Γ Ψ^{⊤}) = O_{P} (T) \times λ_{max} (Σ_{Ψ}) \times λ_{max} (Γ), \end{matrix}

(33)

λ_{max} (Γ)

must be

O (T^{- 1})

otherwise the covariance matrix of

Y_{i}

becomes divergent and violates some basic assumption of FDA; see, e.g., assumption (A4) in Yao et al. [11]. Hence, we assume that the covariance matrix of

\sqrt{T} c_{i}

, i.e.,

T Γ

, is well-conditioned.

3.3. Large Sample Property

We provide four theorems describing the asymptotic properties of estimators obtained by our new FLMM.

Theorem 1.

Suppose Assumptions 1–6 hold. If

\hat{β}

is yielded by (14),

λ_{max} (γ Q) = O (1)

, and

{\hat{Σ}}_{y}

is chosen such that

λ_{max} ({\hat{Σ}}_{y}^{- 1} Σ_{y}) \leq c_{0} < \infty

for certain constant

c_{0} > 0

, then

| | \hat{β} - β {| |}_{2}^{2} = O_{P} (\frac{q}{N T} + \frac{1}{q^{2 s}}) .

If

q = O (T)

, then the optimal convergence rate of

\hat{β}

is

| | \hat{β} - β {| |}_{2}^{2} = O_{P} (\frac{1}{N} + \frac{1}{T^{2 s}}) .

Theorem 1 indicates the convergence rate of the mean parameter estimate

\hat{β}

, which is a trade-off between variance term

O_{P} (q / N T)

and bias term

O (p^{- 2 s})

. As a corollary,

\begin{matrix} \int_{t \in T} {(μ (t) - \hat{μ} (t))}^{2} d t & \leq \int_{t \in T} {(μ (t) - Ψ_{q} {(t)}^{⊤} β)}^{2} d t + \frac{1}{T} \sum_{j = 1}^{T} {(Ψ_{q} {(t_{j})}^{⊤} {β - \hat{β}})}^{2} \\ = O_{P} (\frac{q}{N T} + \frac{1}{q^{2 s}}) . \end{matrix}

(34)

which means that the optimal convergence rate

\hat{μ} (t)

is

O_{P} (N^{- 1} + T^{- 2 s})

by letting

q = O (T)

. In particular, this convergence rate reaches the minimax lower bound of all nonparametric estimates of

μ (t)

[29,43], indicating that our estimate is optimal in terms of minimax efficiency. Besides, our theory does not require

λ

to vanish asymptotically. Indeed, any

γ

such that

λ_{max} (γ Q) = O (1)

is able to guarantee

\hat{β}

to reach the minimax lower bound.

Theorem 2.

Suppose Assumptions 1–6 hold. If

{\hat{c}}_{i}

is yielded by (19), then for

i = 1, \dots, N

,

| | {\hat{c}}_{i} - c_{i} {| |}_{2}^{2} = O_{P} (\frac{p}{T} + \frac{1}{p^{2 s}}) .

If

p = O (T^{\frac{1}{2 s + 1}})

, then the optimal convergence rate of

{\hat{c}}_{i}

is

| | {\hat{c}}_{i} - c_{i} {| |}_{2}^{2} = O_{P} (T^{- \frac{2 s}{2 s + 1}}) .

Theorem 2 shows the convergence rate of the parameter estimate in individual trajectory

{\hat{c}}_{i}

, which is a trade-off between variance term

O_{P} (p / T)

and bias term

O (p^{- 2 s})

. Similarly, the optimal convergence rate of

{\hat{X}}_{i} (t)

is

\begin{matrix} \int_{t \in T} {(X_{i} (t) - {\hat{X}}_{i} (t))}^{2} d t & \leq \int_{t \in T} {(X_{i} (t) - μ (t) - Ψ_{p} {(t)}^{⊤} c_{i})}^{2} d t + \int_{t \in T} {(μ (t) - \hat{μ} (t))}^{2} d t \\ + \frac{1}{T} \sum_{j = 1}^{T} {(Ψ_{p} {(t_{j})}^{⊤} {c_{i} - {\hat{c}}_{i}})}^{2} = O_{P} (\frac{p}{T} + \frac{1}{p^{2 s}}), \end{matrix}

(35)

which reaches the minimax lower bound of all nonparametric estimates of

X_{i} (t)

[29,43] by letting

p = O (T^{1 / (2 s + 1)})

. In addition, it is easy to see that the optimal q and the optimal p have obviously different orders of magnitude: the former should diverge at the same rate as T while the latter is much slower than T. Hence, to reach the optimal convergence rates, we should use an adaptive number of bases when estimating

μ (t)

and predicting

X_{i} (t)

. As far as we are concerned, we are the first ones to indicate this principle. Traditional FLMM methods, such as James et al. [17] and Shi et al. [20], did not require different numbers of bases when estimating the mean function and individual function, which may result in less efficient estimates of mean function and individual function in practice.

Theorem 3.

Suppose Assumptions 1–6 hold. If

{\hat{σ}}^{2}

is yielded by (24),

{({\hat{σ}}^{2} - σ^{2})}^{2} = O_{P} (\frac{p}{T} + \frac{1}{p^{2 s}}) .

If

p = O (T^{\frac{1}{2 s + 1}})

, the optimal convergence rate is

{({\hat{σ}}^{2} - σ^{2})}^{2} = O_{P} (T^{- \frac{2 s}{2 s + 1}}) .

Theorem 4.

Suppose Assumptions 1–6 hold. If

\hat{Γ}

is yielded by (26),

| | Γ^{- \frac{1}{2}} \hat{Γ} Γ^{- \frac{1}{2}} - I_{p} {| |}_{2}^{2} = O_{P} (\frac{p}{N} + \frac{p}{T} + \frac{1}{p^{2 s}}) .

If

p = O (T^{\frac{1}{2 s + 1}})

, the optimal convergence rate is

| | Γ^{- \frac{1}{2}} \hat{Γ} Γ^{- \frac{1}{2}} - I_{p} {| |}_{2}^{2} = O_{P} \{max (N^{- 1} T^{\frac{1}{2 s + 1}}, T^{- \frac{2 s}{2 s + 1}})\} .

Theorems 3 and 4 guarantee that the variance components

σ^{2}

and

Γ

can be estimated consistently by using the LAML criterion. The convergence rates of

σ^{2}

and

Γ

will be influenced by the estimation errors of

\hat{β}

and

{\hat{c}}_{i}

. As a corollary,

\begin{matrix} | | Σ_{y}^{*} - {\hat{Σ}}_{y} {| |}_{2}^{2} = O_{P} (N^{- 1} T^{\frac{1}{2 s + 1}} + T^{- \frac{2 s}{2 s + 1}}), \end{matrix}

(36)

where

Σ_{y}^{*} = σ^{2} I_{T} + Ψ_{p} Γ Ψ_{p}^{⊤}

and

{\hat{Σ}}_{y} = {\hat{σ}}^{2} I_{T} + Ψ_{p} \hat{Γ} Ψ_{p}^{⊤}

. However, the true covariance matrix is

Σ_{y} = σ^{2} I_{T} + cov (X_{i})

. The difference between

Σ_{y}

and

Σ_{y}^{*}

is unknown without additional assumption. Hence, in Theorem 1, we give the condition

λ_{max} ({\hat{Σ}}_{y}^{- 1} Σ_{y}) < \infty

to ensure the correctness of the corresponding proof.

3.4. Tuning Parameter Selection

In the new estimator of FLMM, three tuning parameters including p, q, and

γ

need to be determined artificially. Here, we should emphasize that our FLMM estimator is insensitive to the choices of p, q as long as they are moderately large (e.g.,

p = 10

and

q = 20

, or any other reasonable numbers), though the theoretical investigation requires them to be bounded by

O (T^{1 / (2 s + 1)})

and

O (T)

. In terms of regulating the wigglinesses of the mean function estimate and individual trajectory prediction, the smoothing parameter

γ

and covariance matrix

Γ

are more essential than the numbers of basis functions p and q. A large number of empirical analyses have confirmed this principle. For example, Wood [44] showed it can accurately fit a common function by setting

p = 10

and set this number as the default one in R package mgcv. We follow Wood’s measure and recommend choosing p and q as any reasonable numbers in this paper.

As for

γ

, we propose to automatically estimate the optimal one by using the LAML:

\begin{matrix} \hat{γ} = arg min_{γ} \{γ {\hat{β}}^{⊤} Q \hat{β} - log det (γ I_{rank (Q)}) + log det (n Ψ_{q}^{⊤} Ψ_{q} + γ Q)\} . \end{matrix}

(37)

Similar to the estimation of

Γ

and

σ^{2}

, the score function of

γ

is

\begin{matrix} 0 = {\hat{β}}^{⊤} Q \hat{β} - rank (Q) γ^{- 1} + trace ({(n Ψ_{q}^{⊤} Ψ_{q} + γ Q)}^{- 1} Q) . \end{matrix}

(38)

As a result, the fixed-point iteration of

γ

is

\begin{matrix} γ^{[t + 1]} = \frac{rank (Q)}{{\hat{β}}^{⊤} Q \hat{β} + trace ({(n Ψ_{q}^{⊤} Ψ_{q} + γ^{[t]} Q)}^{- 1} Q)} . \end{matrix}

(39)

We update

γ^{[t]}

along with the estimation of

β

. The stable estimate is regarded as

\hat{γ}

.

Since

\hat{γ}

can be estimated automatically, there is indeed no tuning parameter to be determined. In contrast, the estimating procedure of FPCA needs to select three bandwidths and the number of employed PCs, which is supposedly very time-consuming. In particular, FPCA is sensitive to the number of employed PCS, while FLMM is insensitive to numbers of the employed bases p and q. As a result, our new estimator of FLMM outperforms FPCA absolutely in terms of tuning-parameter robustness.

4. Simulation Study

In this section, we conduct a simulation study to assess the performance of our new FLMM estimator in comparison with FPCA. Here, we choose the well-known Principal Analysis by Conditional expectation (PACE) estimator (Matlab codes of PACE: https://www.stat.ucdavis.edu/PACE/, from 16 March 2020 to 14 August 2020) to estimate the FPCA.

4.1. Simulation Settings

The settings of simulations are as follows. Throughout the study, we set the

t = {(1 / T, 2 / T, \dots, 1)}^{⊤}

to keep the

t \in (0, 1) .

The mean function is

\begin{matrix} μ (t) = t^{11} {(10 (1 - t))}^{6} + 10 {(10 t)}^{3} (1 - t^{11}) - 1.396, \end{matrix}

(40)

which is constructed by Wood [44]. Next, we generate the covariance function

C (s, t)

by

\begin{matrix} C (s, t) = \sum_{k = 1}^{5} λ_{k} ϕ_{k} (s) ϕ_{k} (t) . \end{matrix}

(41)

Here, the eigenvalues

(λ_{1}, λ_{2}, λ_{3}, λ_{4}, λ_{5}) = (5, 2.5, 30, 0.5, 0.25) / 4

, and the eigenfunction vector

ϕ (t) = {(ϕ_{1} (t), \dots, ϕ_{5} (t))}^{⊤}

is constructed by rotating the Tikhonov bases through a (5

\times 5

) matrix

G

:

\begin{matrix} ϕ (t) = T (t) G, \end{matrix}

(42)

where

T (t) = {(1, \sqrt{2} cos (π t), \dots, \sqrt{2} cos (4 π t))}^{⊤}

,

A_{5} (0.5)

is a

(5 \times 5)

1-order autoregressive (AR(1)) structure correlation matrix with correlation coefficient

ρ = 0.5

, and

G

is the Cholesky factor of

A_{5} {(0.5)}^{- 1}

, i.e.,

G^{⊤} G = A_{5} {(0.5)}^{- 1}

. Furthermore, the individual trajectory is generated by

\begin{matrix} X_{i} (t_{j}) = μ (t_{j}) + \sum_{k = 1}^{5} ϕ_{k} (t_{j}) ξ_{i k}, (i, j) \in {1, \dots, N} \times {1, \dots, T}, \end{matrix}

(43)

where

ξ_{i k} \sim N (0, λ_{k})

. The observation

\begin{matrix} y_{i} (t_{j}) = X_{i} (t_{j}) + ε_{i j}, \end{matrix}

(44)

where

ε_{i j} \sim N (0, 2)

. Figure 2 visualizes the aforementioned settings. Specifically, the left-top panel shows the first three Tikhonov bases and the left-bottom panel demonstrates the first three eigenfunctions rotated by the Tikhonov bases. The middle-top and middle-bottom panels show the first and second individual trajectories and the related observations with random errors. It is observed that the individual trajectory is similar to the mean function but has its adaptive trend. The right-top and right-bottom panels visualize the covariance surfaces.

4.2. Simulation Results

For better comparison, we notated FLMM with Bsplines and Tikhonov bases as FLMM-Bsplines and FLMM-Tikhonov, respectively. The numbers of employed bases for Bsplines and Tikhonov bases are 30 and 15 when estimating the mean function trajectory and individual trajectory, respectively. Meanwhile, the FPCA implemented by PACE is noted as FPCA-PACE, where all settings follow the default ones in the package. We compare the proposed FLMM with FPCA and generate the data using the above settings. In order the get the whole evaluation of the three approaches, we take two cases in the simulations: (1) fix the N as 150, and set

T = 50, 100, 150, 200, 250, 300

; (2) fix

T = 150

, set

N = 50, 100, 150, 200, 250, 300

. On the other hand, the criteria to evaluate the results are as follows. For mean function estimation, we adopt the root mean squared error (RMSE) as the criterion:

\begin{matrix} \sqrt{\frac{1}{T} \sum_{i = 1}^{T} {(\hat{μ} (t_{i}) - μ (t_{i}))}^{2}} \end{matrix}

(45)

for individual trajectory prediction, we consider the maximum RMSE as the criterion:

\begin{matrix} max_{i \in {1, \dots, N}} \sqrt{\frac{1}{T} \sum_{i = 1}^{T} {({\hat{X}}_{i} (t_{i}) - X_{i} (t_{i}))}^{2}} \end{matrix}

(46)

the absolute value

| {\hat{σ}}^{2} - σ |

is used to evaluate the variance estimation; the Frobenius-norm of the matrix is adopted as a measure to compare the covariance surface estimation:

\begin{matrix} \sqrt{\frac{1}{T} \sum_{i = 1}^{T} \sum_{j = 1}^{T} {(\hat{C} (t_{i}, t_{j}) - C (t_{i}, t_{j}))}^{2}} \end{matrix}

(47)

and the computing time in seconds is recorded to compare the computational efficiency. The system and software to implement the simulations are Linux 4.18 (standard High Performance Computing (HPC) machine, icosa192gb feature nodes, memory 50 GB and 12 cpu-cores) and MATLAB (R2021). The replication runs are 500.

Figure 3 shows the results of the simulation. Specifically, the five subplots on the left side of Figure 3 show the results of FLMM-Bsplines, FLMM-Tikhonov, and FPCA-PACE based on mean function estimation, individual estimation prediction, variance estimation, covariance surface estimation, and computation time as the sample size N increases. Likewise, the five subplots on the right side of Figure 3 show the counterparts of the left subplots as the number of observations T increases. Based on these results, we make the following comments.

Mean function estimation. When sample size N is very small, FLMM-Bsplines is worse than FLMM-Tikhonov and FPCA-PACE. However, as N increases, the performances of FLMM-Bsplines and FLMM-Tikhonov boost significantly faster than FPCA-PACE. On the other hand, if N is fixed to 150, FPCA-PACE is extremely unreliable when T is very small and is always less accurate than them no matter whether T is large or small. These results show that our novel FLMM estimator is better than FPCA in terms of mean function estimation. In addition, no matter whether N or T rises, the estimation error of the mean function will decrease, which is consistent with Theorem 1.

Individual trajectory prediction. In terms of individual trajectory prediction, FPCA-PACE is much worse than FLMM-Bsplines and FLMM-Tikhonov. We checked the codes of the PACE package and found that it estimates the eigenvalues eigenfunctions by applying PCA on a

(51 \times 51)

covariance surface predictor, which may result in less accurate eigenvalues eigenfunctions estimates. (Nevertheless, we did not change the default settings of the PACE package in the simulations.) On the other hand, we find that our estimates become more accurate as T increases, while slowly becoming worse when T remains and N rises (the maximum RMSE (46) is affected by N). This is consistent with Theorem 2: the prediction of individual trajectory is only related to the number of observations T.

Variance estimation. Regarding variance estimation, FLMM-Bsplines and FLMM-Tikhonov enjoy the same accuracy in all cases. Since the variance estimate yielded by FLMM becomes more accurate no matter whether T or N increases, Theorem 3 is supposedly established correctly. However, it seems that FPCA-PACE cannot benefit from either the increase of T or the rise of N.

Covariance surface estimation. Regarding covariance surface estimation, all three estimators have the same degree of accuracy. In particular, as N is fixed and T increases, all the covariance surface estimators become worse in terms of the Frobenius-norm. This phenomenon indicates that the averaged Frobenius-norm of

| | cov (X_{i}) - Ψ_{p} Γ Ψ_{p}^{⊤} {| |}_{F} / \sqrt{T}

should diverge to infinity as T increases.

Covariance surface approximation. To verify this hypothesis, we conduct an additional simulation. Specifically, because

c_{i}

is unknown in simulation (we generate the random coordinates of eigenfunctions

ξ_{i}

directly and there should exist a correspondence between

c_{i}

and

ξ_{i}

), we approximate

b_{i}

by

\begin{matrix} {\tilde{c}}_{i} = {(Ψ_{p}^{⊤} Ψ_{p})}^{- 1} Ψ_{p}^{⊤} (X_{i} - μ), \end{matrix}

(48)

and the covariance matrix

Γ

is approximated by

\begin{matrix} \tilde{Γ} = \frac{1}{N - 1} ({\tilde{c}}_{i} - \bar{c}) {({\tilde{c}}_{i} - \bar{c})}^{⊤}, \end{matrix}

(49)

where

\bar{c}

is the sample mean. We implement the above simulations 300 times, record the empirical covariance matrix

\tilde{Γ}

and the approximation error

| | cov (X_{i}) - Ψ_{p} \tilde{Γ} Ψ_{p}^{⊤} {| |}_{F} / \sqrt{T}

. The left and middle panels in Figure 4 show the simulated covariance matrices of

c_{i}

with respect to Bsplines and Tikhonov bases. Because the basis functions of Bsplines have no natural order while the Tikhonov basis functions have one, the covariance matrix of the Tikhonov basis functions has sharply decreasing diagonal elements with an increasing order. The approximation error does increase with T and so our hypothesis is confirmed. In addition, there is no difference between Bsplines and Tikhonov bases in approximating the covariance surface although the latter have a natural order. In summary, the decreasing performance of covariance surface estimation is due to the approximation error.

Computational efficiency. In terms of computational efficiency, the computing time of our two methods is almost independent of T and N and is much less than that of FPCA-PACE. In particular, the computing time of FPCA-PACE increases sharply as T increases, confirming our comments on the drawbacks of FPCA given in Section 2. Thus, for the estimation of large balanced longitudinal data, our proposed method has a very clear advantage.

5. Real Data Analysis

We demonstrate the effectiveness of the proposed method via the analysis of the COVID-19 data in this section. This data is updated daily on the number of COVID-19 infectious cases and deaths in the 50 states and DC in the US by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. It contains

N = 51

subjects and

T = 152

observations. The realization is the number of COVID-19 cases after the logarithm, which was recorded every day from 16 March 2020 to 14 August 2020. Partial observations are negative because of certain unknown reasons. For such an observation, we replace it with the average of the three observations around it. We employ 15 and 10 numbers of Bsplines bases to approximate the mean function and individual trajectories, respectively. It should be pointed out that we have not changed the default settings of the PACE package, and in our FLMM method, the number of employed bases p is the only tuning parameter to select. Therefore, the comparison between FPCA and FLMM is fair.

The results of this data analysis are shown in Figure 5, Figure 6 and Figure 7. Figure 5 displays the two estimated mean functions obtained by FPCA and our novel FLMM, respectively. The smaller panel within tries to draw attention to the stability of the two mean curves. The results further reveal the reason as to why FPCA performs worse than FLMM in terms of mean function estimation: it may overfit the mean function. Besides, due to the

N = 51

sample size, we chose six states with the top populations among all states to represent individual trajectories, and the consequence is shown in Figure 6. Likewise, FPCA tends to overfit the individual trajectories while FLMM with our new estimator can fit them accurately. In addition, Figure 7 demonstrated the estimated covariance surface yielded by the two approaches. Generally, the two covariance surfaces are similar; but the one generated by our FLMM is smoother than FPCA. In addition, in terms of computing time, our FLMM estimator only takes 0.088 seconds, while PACE takes 44.08 seconds. Such a substantial gap confirms that our FLMM estimator is much more efficient than FPCA in terms of computation. In conclusion, FPCA implemented by PACE is very likely to overfit the mean function estimation, individual trajectory prediction, and covariance function estimation.

6. Discussion

In this paper, we propose a novel estimator of large functional data using the FLMM technique. In comparison with the FPCA, this novel estimator is much more efficient because all parameters can be automatically estimated. In comparison with the traditional estimator of FLMM, i.e., the REML criterion, our novel estimator adopts the LAML criterion, which enjoys a significantly lower computational complexity when the number of observations T or the number of employed bases p is large. In the simulation, our novel estimator of FLMM outperforms or performs equally as FPCA in all five criteria. In real data analysis, it is able to provide more reliable estimates than PFCA in terms of avoiding overfitting. Note that we only compare the novel estimator with the FPCA implemented by the PACE package, because the computing time of the traditional estimator of FLMM with REML is extremely large, and it is very easy to encounter numerical problems such as a degenerate Hessian matrix.

Another contribution of this paper is the asymptotic theory of FLMM. As far as we are concerned, our work is the first one to point out the convergence rates of mean function estimate, individual trajectory prediction, variance estimate, and covariance surface estimate. Theorems 1 and 2 show that the mean function estimate and individual trajectory prediction can reach the minimax lower bounds if the numbers of employed bases are chosen optimally. In particular, we point out that the number of basis functions should be adaptively chosen when estimating the mean function and individual trajectory, providing a novel guide about how to perform FLMM in practice. However, Theorems 3 and 4 illustrate that the convergence rates of variance estimate and covariance surface estimate cannot reach the minimax lower bounds because of the estimation errors of mean function estimate and individual trajectory prediction. These two findings indicate that the variance components cannot be precisely estimated if the mean function estimate and individual trajectory prediction are not consistent.

It should be pointed out that the proposed estimator can be simply extended to analyze unbalanced longitudinal data or the so-called sparse functional data [11]. Indeed, the FLMM estimator of unbalanced longitudinal data resembles the combination of penalized quasi-likelihood (PQL) and REML/LAML; a similar estimating procedure can be found in Breslow and Clayton [33]. Since in unbalanced longitudinal data, the number of observations for each individual is usually small, the FLMM estimator will not suffer from the theoretical and computational difficulties caused by the “dimensional curse”. As mentioned before, Vonesh et al. [34] pointed out that the convergence rate of the mean parameter in GLMM is

O (T_{min}^{- 1 / 2})

. However, as far as we are concerned, the convergence rates of covariance matrix estimate and individual prediction are still unknown. As for FLMM, since the approximation bias of basis expansions is further involved, it is even unclear if the four estimates are consistent. Hence, it is worth future studying under what conditions the FLMM estimation is consistent for unbalanced longitudinal data or sparse functional data.

Discrete functional data analysis is an area that has received much less attention than continuous functional data analysis. However, discrete data are more common in longitudinal data analysis than continuous data, and hence the technical tools for analyzing discrete functional data are more urgently needed. Specifically, the model of discrete functional data is

\begin{matrix} E (Y_{i} (t_{j})) = g^{- 1} (X_{i} (t_{j}) + W_{i j}^{⊤} γ), \end{matrix}

(50)

where

Y_{i} (t_{j})

is a random variable following the exponential family distribution [45],

g (\cdot)

is the so-called link function, and

W_{i j}

is certain vector of covariates. We conjecture that the convergence rates of mean function estimate, individual trajectory prediction, variance estimate, and covariance surface estimate in this new model are the same as the ones indicated by Theorems 1–4 if the sampling locations

{t_{j}}

are balanced for all individuals. On the other hand, such a problem has been studied by Hall et al. [46] using the FPCA technique. It is also worth comparing the performances of FLMM and FPCA when dealing with such discrete functional data.

Author Contributions

Conceptualization, M.R.; methodology, Y.Y.; writing—original draft preparation, M.R.; writing—review and editing, Y.Y.; resources, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.kaggle.com/datasets/thecansin/johns-hopkins-covid19-case-tracker?resource=download, from 16 March 2020 to 14 August 2020. Additionally, Matlab codes of PACE: https://www.stat.ucdavis.edu/PACE/, from 16 March 2020 to 14 August 2020.

Acknowledgments

The authors are appreciative of the numerous valuable comments from the editor, the associate editor, and referees during the preparation of the article. The author Ran is sincerely grateful to Kano Yutaka and Morikawa Kosuke of the graduate school of engineering science at Osaka University for their generous personalities and countless times of assistance to Ran in challenging times.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Theorems

Appendix A.1. Lemmas

We present some basic lemmas that facilitate the proofs of Theorems 1–4.

Lemma A1.

Suppose

X_{1}, \dots, X_{n}

are n independent sub-Gaussian variables with mean-zero and variance

σ_{1}^{2}, \dots, σ_{n}^{2}

. Then

lim_{n \to \infty} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} X_{i} \overset{D}{⟶} N (0, σ_{x}^{2}),

where

σ_{x}^{2} = lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} σ_{i}^{2} .

Proof of Lemma A1.

By using the second item of Proposition 2.5.2 of Vershynin [40],

\begin{matrix} [E (| X_{i} |^{p} {)]}^{\frac{1}{p}} = O (\sqrt{p}) \end{matrix}

(A1)

for all

p \geq 1

. Hence, for all fixed

δ > 0

,

\begin{matrix} lim_{n \to \infty} \frac{1}{n^{1 + δ}} \sum_{i = 1}^{n} E (| X_{i} |^{2 + 2 δ}) \leq \frac{{(K_{0} \sqrt{2 + 2 δ})}^{2 + 2 δ}}{n^{δ}} \to 0, \end{matrix}

(A2)

where

K_{0} > 0

is a fixed number, which verifies Lyapunov’s condition. □

Lemma A2.

Let

X_{1}, \dots, X_{n}

be n

(p \times 1)

independent identically distributed random vector with entries

x_{i 1}, \dots, x_{i p}

are sub-Gaussian with zero-mean. Besides, define the covariance matrix of

X_{i}

as

Σ = E (X_{i} X_{i}^{⊤})

and the related sample covariance matrix

\hat{Σ} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} X_{i}^{⊤} .

Then for every positive integer n,

E (| | \hat{Σ} - {Σ | |}_{2}) \leq C (\frac{p}{n} + \sqrt{\frac{p}{n}}) {| | Σ | |}_{2},

where C is a certain positive constant.

This lemma is provided by Vershynin [40] (Theorem 4.7.1).

Lemma A3.

Weyl’s lemma: Let $S, E$ be two Hermitian $(p \times p)$ matrices. Then, for each $1 \leq i \leq p$ ,

$λ_{i} (S) + λ_{max} (E) \leq λ_{i} (S + E) \leq λ_{i} (S) + λ_{min} (E),$
Ostrowski’s lemma: Let $A$ be a Hermitian $(n \times p)$ matrix, and $S$ be an $(p \times p)$ matrix. Then, for each $1 \leq i \leq p$ , there exists an nonnegative real $θ_{i}$ with $λ_{min} (A A^{⊤}) \leq θ_{i} \leq λ_{max} (A A^{⊤})$ such that

$λ_{i} (A S A^{⊤}) = θ_{i} λ_{i} (S),$

where $λ_{i} (S)$ means the ith eigenvalues of $S$ .

This lemma can be found in Horn and Johnson [47] (Theorem 4.3.1 and Theorem 4.5.9).

Lemma A4.

LetΓbe an (

p \times p

) symmetric matrix satisfying

p Γ

, a well-conditioned covariance matrix. Then

T C_{0}^{- 1} \leq λ_{min} (Ψ_{q}^{⊤} V^{- 1} Ψ_{q}) \leq λ_{max} (Ψ_{q}^{⊤} V^{- 1} Ψ_{q}) \leq C_{0} T,

where

V = σ^{2} I_{T} + Ψ_{p} Γ Ψ_{p}^{⊤}

,

C_{0} > 0

is a constant,

σ^{2} > 0

is an alternative constant,

Ψ_{q}

is an

(T \times q)

matrix satisfying

Ψ_{q}^{⊤} Ψ_{q} / T \to Σ_{Ψ_{q}}

,

Ψ_{p}

is an

(T \times p)

matrix satisfying

Ψ_{p}^{⊤} Ψ_{p} / T \to Σ_{Ψ_{p}}

, and

Σ_{Ψ_{q}}

,

Σ_{Ψ_{p}}

are two well-conditioned covariance matrices.

Proof of Lemma A4.

By the Ostrowski’s lemma,

\begin{matrix} 0 \leq λ_{min} (Ψ_{p} Γ Ψ_{p}^{⊤}), λ_{max} (Ψ_{p} Γ Ψ_{p}^{⊤}) \leq λ_{max} (T Σ_{Ψ_{p}}) λ_{max} (Γ) \leq \frac{c_{0} T}{T} = c_{0}, \end{matrix}

(A3)

for certain constant

c_{0}

, because the minimum eigenvalue of

Ψ_{p} Ψ_{p}^{⊤}

is 0. By using Weyl’s lemma,

\begin{matrix} σ^{2} \leq λ_{min} (V) \leq λ_{min} (V) \leq σ^{2} + c_{0} . \end{matrix}

(A4)

Hence,

\begin{matrix} \frac{T σ^{- 2} λ_{max} (Σ_{Ψ_{q}})}{σ^{2} + c_{0}} \leq λ_{min} (T Σ_{Ψ_{q}}) λ_{min} (V^{- 1}) \leq λ_{min} (Ψ_{q}^{⊤} V^{- 1} Ψ_{q}) \\ λ_{max} (Ψ_{q}^{⊤} V^{- 1} Ψ_{q}) \leq λ_{max} (T Σ_{Ψ_{q}}) λ_{max} (V^{- 1}) = T σ^{- 2} λ_{max} (Σ_{Ψ_{q}}) . \end{matrix}

(A5)

That is,

\begin{matrix} T C_{0}^{- 1} \leq λ_{min} (Ψ_{q}^{⊤} V^{- 1} Ψ_{q}) \leq λ_{max} (Ψ_{q}^{⊤} V^{- 1} Ψ_{q}) \leq C_{0} T, \end{matrix}

(A6)

for certain positive constant

C_{0}

. □

Appendix A.2. Proofs

Proof of Theorem 1.

The score function of

\hat{β}

is

\begin{matrix} 0 & = \frac{\partial \frac{1}{N} \{\sum_{i = 1}^{N} {(Y_{i} - Ψ_{q} β)}^{⊤} {\hat{Σ}}_{y}^{- 1} (Y_{i} - Ψ_{q} β) + γ β^{⊤} Q β\}}{\partial β} |_{β = \hat{β}} \\ = - Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} (\frac{1}{N} \sum_{i = 1}^{N} Y_{i} - Ψ_{q} \hat{β}) \\ = - Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} (\frac{1}{N} \sum_{i = 1}^{N} Y_{i} - μ) - Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} (μ - Ψ_{q} β) \\ - Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Ψ_{q} (β - \hat{β}) + \frac{γ}{N} Q β + \frac{γ}{N} Q (\hat{β} - β) \end{matrix}

(A7)

\begin{matrix} = I_{1} + I_{2} + I_{3} + I_{4} + I_{5} . \end{matrix}

(A8)

As for

I_{1}

,

\begin{matrix} I_{1} = & - Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} (\frac{1}{N} \sum_{i = 1}^{N} Y_{i} - μ) = - Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Σ_{y}^{\frac{1}{2}} Σ_{y}^{- \frac{1}{2}} (\frac{1}{N} \sum_{i = 1}^{N} (X_{i} + ε_{i})) \\ = - Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Σ_{y}^{\frac{1}{2}} (\frac{1}{N} \sum_{i = 1}^{N} e_{i}), \end{matrix}

(A9)

where

e = {(e_{i 1}, \dots, e_{i T})}^{⊤}

with E

(e_{i j}) = 0

, var(

e_{i j}

)=1, and E

(e_{i j} e_{i k}) = 0

for

j \neq k

. Here,

\begin{matrix} E (I_{1}) = 0, var (I_{1}) = \frac{1}{N} \times trace (Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Σ_{y}^{\frac{1}{2}} I_{T} Σ_{y}^{\frac{1}{2}} {\hat{Σ}}_{y}^{- 1} Ψ_{q}) \end{matrix}

(A10)

which guarantees

\begin{matrix} E (| | I_{1} | |_{2}^{2}) & \leq \frac{1}{N} \times trace (Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Ψ_{q}) \times λ_{max} ({\hat{Σ}}_{y}^{- 1} Σ_{y}) \\ = O (\frac{T q}{N}) . \end{matrix}

(A11)

As for

I_{2}

, by Assumption 4,

\begin{matrix} | | I_{2} {| |}_{2}^{2} & \leq λ_{max} (Ψ_{q}^{⊤} {\hat{Σ}}_{y}^{- 1} Ψ_{q}) \times λ_{min}^{- 1} ({\hat{Σ}}_{y}) \times | | μ - Ψ_{q} β {| |}_{2}^{2} \\ = O (T) \times O (1) \times O (T q^{- 2 s}) = O (T^{2} q^{- 2 s}) . \end{matrix}

(A12)

As for

I_{3} + I_{5}

,

\begin{matrix} I_{3} + I_{5} \geq λ_{min} (Ψ_{q}^{⊤} {\hat{Σ}}_{y} Ψ_{q} + \frac{γ}{N} Q) \times | | \hat{β} - {β | |}_{2} \geq O (T) \times | | \hat{β} - β {| |}_{2} . \end{matrix}

As for

I_{4}

,

\begin{matrix} | | I_{4} {| |}_{2}^{2} \leq \frac{1}{N^{2}} \times λ_{max}^{2} (γ Q) \times {| | β | |}_{2}^{2} = O (\frac{1}{N^{2}}) \end{matrix}

(A13)

As

q \leq T

when estimating the mean function,

\begin{matrix} | | \hat{β} - β {| |}_{2}^{2} \leq O_{P} (\frac{1}{T^{2}} (| | I_{1} {| |}_{2}^{2} + | | I_{3} {| |}_{2}^{2})) = O_{P} (\frac{q}{N T} + \frac{1}{q^{2 s}}) . \end{matrix}

(A14)

Hence, by letting

q = O (T)

,

\begin{matrix} | | \hat{β} - β {| |}_{2}^{2} = O_{P} (\frac{1}{N} + T^{- 2 s}), \end{matrix}

(A15)

which is consistent with the minimax lower bounder provided by Cai and Yuan [29]. Therefore, Theorem 1 is proved. □

Proof of Theorem 2.

The score function of

{\hat{c}}_{i}

is

\begin{matrix} 0 & = \frac{\partial {| | Y_{i} - \hat{μ} - Ψ_{p} c_{i} | |_{2}^{2} + c_{i}^{⊤} {\hat{Λ}}^{- 1} c_{i}}}{\partial c_{i}} |_{c_{i} = {\hat{c}}_{i}} = - Ψ_{p}^{⊤} (Y_{i} - \hat{μ} - Ψ_{p} {\hat{c}}_{i}) + \hat{Λ} {\hat{c}}_{i} \\ = - Ψ_{p}^{⊤} (Y_{i} - X_{i}) - Ψ_{p}^{⊤} (X_{i} - Ψ_{p} b_{i}) \\ - Ψ_{q}^{⊤} Ψ_{q} (\hat{β} - β) - Ψ_{p}^{⊤} Ψ_{p} ({\hat{c}}_{i} - c_{i}) + \hat{Λ} c_{i} + \hat{Λ} ({\hat{c}}_{i} - c_{i}) \\ = J_{1} + J_{2} + J_{3} + J_{4} + J_{5} + J_{6}, \end{matrix}

(A16)

where

\hat{Λ} = {\hat{σ}}^{2} {\hat{Γ}}^{- 1}

. As for

J_{1}

, it is easy to see

\begin{matrix} E (J_{1}) = 0, var (J_{1}) = σ^{2} trace (Ψ_{p}^{⊤} I_{T} Ψ_{p}) = O (p T), \end{matrix}

(A17)

which guarantees

\begin{matrix} E (| | J_{1} | |_{2}^{2}) = O (p T) . \end{matrix}

(A18)

As for

J_{2}

, by Assumption 4,

\begin{matrix} E (| | J_{2} {| |}_{2}^{2}) \leq λ_{max} (Ψ_{p}^{⊤} Ψ_{p}) E (| | X_{i} - Ψ_{p} b_{i} {| |}_{2}^{2}) = O_{P} (T^{2} p^{- 2 s}) . \end{matrix}

(A19)

As for

J_{3}

,

\begin{matrix} | | J_{3} {| |}_{2}^{2} \leq λ_{max} (Ψ_{q}^{⊤} Ψ_{q}) | | \hat{β} - β {| |}_{2}^{2} = O_{P} (\frac{T}{N} + T^{1 - 2 s}) . \end{matrix}

(A20)

As for

J_{5}

,

\begin{matrix} | | J_{5} {| |}_{2}^{2} \leq λ_{max} (\hat{Λ}) c_{i}^{⊤} \hat{Λ} c_{i} . \end{matrix}

(A21)

Using the consequence that

\hat{Λ} \to σ^{2} Γ^{- 1}

,

\begin{matrix} c_{i}^{⊤} \hat{Λ} c_{i} \sim σ^{2} χ_{p}^{2} . \end{matrix}

(A22)

Hence,

\begin{matrix} | | J_{5} {| |}_{2}^{2} = O_{P} (T) \times O_{P} (p) = O_{P} (T p) . \end{matrix}

(A23)

As for

J_{4} + J_{6}

,

\begin{matrix} | | J_{4} + J_{6} {| |}_{2}^{2} & \geq {(λ_{min} (Ψ_{p}^{⊤} Ψ_{p}) + λ_{max}^{- 1} (\hat{Λ}))}^{2} | | {\hat{c}}_{i} - c_{i} {| |}_{2}^{2} \geq O_{P} (T^{2}) | | {\hat{c}}_{i} - c_{i} {| |}_{2}^{2} \\ \geq O_{P} (T^{2}) | | {\hat{c}}_{i} - c_{i} {| |}_{2}^{2} . \end{matrix}

(A24)

As a result,

\begin{matrix} | | {\hat{c}}_{i} - c_{i} {| |}_{2}^{2} \leq O_{P} (\frac{p}{T} + \frac{1}{p^{2 s}} + \frac{1}{N T} + \frac{1}{T^{2 s + 1}} + \frac{p}{T}) \end{matrix}

(A25)

By letting

p = O (T^{1 / (2 s + 1)})

,

\begin{matrix} | | {\hat{c}}_{i} - c_{i} {| |}_{2}^{2} \leq O_{P} (T^{- \frac{2 s}{2 s + 1}}) . \end{matrix}

(A26)

Thus, Theorem 2 is proved. □

Proof of Theorem 3.

The score function of

{\hat{σ}}^{2}

is

\begin{matrix} 0 & = \frac{\partial L (σ^{2}, Γ)}{N T \partial σ^{- 2}} |_{σ^{2} = {\hat{σ}}^{2}} = - ({\hat{σ}}^{2} - σ^{2}) + (\frac{1}{N T} \sum_{i = 1}^{N} | | Y_{i} - μ - Ψ_{p} c_{i} {| |}_{2}^{2} - σ^{2}) \\ + \frac{1}{N T} \sum_{i = 1}^{N} (| | Y_{i} - μ - Ψ_{p} c_{i} {| |}_{2}^{2} - | | Y_{i} - \hat{μ} - Ψ_{p} c_{i} {| |}_{2}^{2}) \\ + \frac{1}{N T} \sum_{i = 1}^{N} (| | Y_{i} - \hat{μ} - Ψ_{p} {\hat{c}}_{i} {| |}_{2}^{2} - | | Y_{i} - \hat{μ} - Ψ_{p} c_{i} {| |}_{2}^{2}) + \frac{trace (Δ (σ^{2}, Γ) Ψ_{p}^{⊤} Ψ_{p})}{N T} \\ = K_{1} + K_{2} + K_{3} + K_{4} + K_{5} . \end{matrix}

(A27)

As for

K_{2}

, it is easy to see

\begin{matrix} K_{2} = \frac{1}{N T} \sum_{i = 1}^{N} | | Y_{i} - μ - Ψ_{p} c_{i} {| |}_{2}^{2} - σ^{2} = ({\hat{σ}}_{oracle}^{2} - σ^{2}), \end{matrix}

(A28)

where

\begin{matrix} {\hat{σ}}_{oracle}^{2} = \frac{1}{N T} \sum_{i = 1}^{N} | | Y_{i} - μ - Ψ_{p} c_{i} {| |}_{2}^{2} . \end{matrix}

(A29)

It is well known that

| | {\hat{σ}}_{oracle}^{2} - σ^{2} {| |}^{2} = O_{P} ({(N T)}^{- 1})

hence

K_{2}^{2} = O_{P} ({(N T)}^{- 1})

. As for

K_{3}

,

\begin{matrix} K_{3} = \frac{1}{N T} \sum_{i = 1}^{N} {(μ - \hat{μ})}^{⊤} (2 Y_{i} - \hat{μ} - μ - 2 Ψ_{p} c_{i}) . \end{matrix}

(A30)

Therefore,

\begin{matrix} K_{3}^{2} & \leq (\frac{1}{N T} \sum_{i = 1}^{N} | | μ - \hat{μ} {| |}_{2}^{2}) (\frac{1}{N T} \sum_{i = 1}^{N} | | 2 Y_{i} - {\hat{μ}}_{i} - μ_{i} - 2 Ψ_{p} c_{i} {| |}_{2}^{2}) \\ = O_{P} (\frac{1}{N} + \frac{1}{T^{2 s}}) . \end{matrix}

(A31)

As for

K_{4}

,

\begin{matrix} K_{4} = \frac{1}{N T} \sum_{i = 1}^{N} {(c_{i} - {\hat{c}}_{i})}^{⊤} Ψ_{p}^{⊤} (2 Y_{i} - 2 \hat{μ} - Ψ_{p} c_{i} - Ψ_{p} \hat{c_{i}}) . \end{matrix}

(A32)

Therefore,

\begin{matrix} K_{4}^{2} & \leq \frac{1}{N^{2} T^{2}} (\sum_{i = 1}^{N} | | 2 Y_{i} - 2 \hat{μ} - Ψ_{p} c_{i} - Ψ_{p} \hat{c_{i}} {| |}_{2}^{2}) (\sum_{i = 1}^{N} λ_{max} (T Σ_{Ψ_{p}}) | | c_{i} - {\hat{c}}_{i} {| |}_{2}^{2}) \\ = O_{P} (\frac{p}{T} + \frac{1}{p^{2 s}}) . \end{matrix}

(A33)

As for

K_{5}

,

\begin{matrix} K_{5}^{2} \leq \frac{1}{N^{2} T^{2}} \times λ_{max} (Ψ_{p}^{⊤} Ψ_{p}) \times λ_{min}^{- 1} (Ψ_{p}^{⊤} Ψ_{p}) = O_{P} (\frac{1}{N^{2} T^{2}}) . \end{matrix}

(A34)

As a result,

\begin{matrix} {({\hat{σ}}^{2} - σ^{2})}^{2} = O (K_{4}) = O_{P} (\frac{p}{T} + \frac{1}{p^{2 s}}) . \end{matrix}

(A35)

Hence, letting

p = O (T^{1 / (2 s + 1)})

,

\begin{matrix} {({\hat{σ}}^{2} - σ^{2})}^{2} = O_{P} (T^{- \frac{2 s}{2 s + 1}}) . \end{matrix}

(A36)

Thus, Theorem 3 is proved. □

Proof of Theorem 4.

The score function of

\hat{Γ}

is

\begin{matrix} 0 & = \frac{\frac{1}{N} \partial L (σ^{2}, Γ)}{\partial Γ^{- 1}} |_{Γ^{- 1} = {\hat{Γ}}^{- 1}} = - \hat{Γ} + \frac{1}{N} \sum_{i = 1}^{N} {\hat{c}}_{i} {\hat{c}}_{i}^{⊤} + \frac{1}{N} Δ (σ^{2}, Γ) \\ = - (\hat{Γ} - Γ) + (\frac{1}{N} \sum_{i = 1}^{N} c_{i} c_{i}^{⊤} - Γ) + (\frac{1}{N} \sum_{i = 1}^{N} {\hat{c}}_{i} {\hat{c}}_{i}^{⊤} - \frac{1}{N} \sum_{i = 1}^{N} c_{i} c_{i}^{⊤}) \\ + \frac{1}{N} Δ (σ^{2}, Γ) = L_{1} + L_{2} + L_{3} + L_{4} . \end{matrix}

(A37)

As for

L_{2}

, by using Vershynin [40] (Theorem 4.7.1),

\begin{matrix} | | L_{2} {| |}_{2}^{2} = ∥ \frac{1}{N} \sum_{i = 1}^{N} c_{i} c_{i}^{⊤} - Γ ∥_{2}^{2} = O_{P} (\frac{p}{N}) \times {| | Γ | |}_{2}^{2} = O_{P} (\frac{p}{N T}) . \end{matrix}

(A38)

As for

L_{3}

,

\begin{matrix} L_{3} & = \frac{1}{N} \sum_{i = 1}^{N} {\hat{c}}_{i} {({\hat{c}}_{i} - c_{i})}^{⊤} + \frac{1}{N} \sum_{i = 1}^{N} ({\hat{c}}_{i} - c_{i}) c_{i}^{⊤} = L_{31} + L_{32} . \end{matrix}

(A39)

For

L_{31}

,

\begin{matrix} | | L_{31} {| |}_{2}^{2} \leq λ_{max} (\frac{1}{N} \sum_{i = 1}^{N} {\hat{c}}_{i} {\hat{c}}_{i}^{⊤}) (\frac{1}{N} \sum_{i = 1}^{N} | | {\hat{c}}_{i} - c_{i} {| |}^{2}) \leq O_{P} (\frac{p}{T^{2}} + \frac{1}{p^{2 s} T}) . \end{matrix}

(A40)

As for

L_{32}

,

\begin{matrix} | | L_{32} {| |}_{2}^{2} \leq λ_{max} (\frac{1}{N} \sum_{i = 1}^{N} c_{i} c_{i}^{⊤}) (\frac{1}{N} \sum_{i = 1}^{N} | | {\hat{c}}_{i} - c_{i} {| |}^{2}) \leq O_{P} (\frac{p}{T^{2}} + \frac{1}{p^{2 s} T}) . \end{matrix}

(A41)

As for

L_{4}

,

\begin{matrix} | | L_{4} {| |}_{2}^{2} \leq N^{- 2} λ_{min}^{- 1} (Ψ_{p}^{⊤} Ψ_{p}) = O_{P} (T^{- 1} N^{- 2}) . \end{matrix}

(A42)

Note that

L_{2}

dominates

L_{3}

and

L_{4}

. Therefore,

\begin{matrix} | | \hat{Γ} - Γ {| |}_{2}^{2} = O_{P} (\frac{p}{N T} + \frac{p}{T^{2}} + \frac{1}{p^{2 s} T}) . \end{matrix}

(A43)

Hence, letting

p = O (T^{1 / (2 s + 1)})

,

\begin{matrix} | | Γ^{- \frac{1}{2}} \hat{Γ} Γ^{- \frac{1}{2}} - I_{p} {| |}_{2}^{2} = O_{P} (N^{- 1} T^{\frac{1}{2 s + 1}} + T^{- \frac{2 s}{2 s + 1}}) \end{matrix}

(A44)

Thus, Theorem 4 is proved. □

Appendix B. Estimators of FPCA and FLMM

Appendix B.1. Estimator of FPCA

In implementation, FPCA will (1) estimate the mean function

μ (t)

; (2) estimate the two-dimensional covariance function

C (s, t)

; (3) yield the first K eigenfunctions and eigenvalues

{ϕ_{k} (t)}

and

{λ_{k}}

through PCA; and (4) predict the corresponding coordinates

{ξ_{i k}}

by using Gaussian conditional expectation. Without loss of generality, we illustrate these four steps with the procedure employed by Li and Hsing [10]. Specifically, Li and Hsing employed the local linear modeling [30] to estimate

μ (t)

:

\begin{matrix} ({\hat{a}}_{0}, {\hat{a}}_{1}) = arg min_{a_{0}, a_{1}} \{\frac{1}{N} \sum_{i = 1}^{N} \frac{1}{T_{i}} \sum_{j = 1}^{T_{i}} {[Y_{i j} - a_{0} - a_{1} (t_{i j} - t)]}^{2} K_{h_{u}} (t_{i j} - t)\}, \end{matrix}

(A45)

where

K_{h_{u}} (t)

is a kernel function with bandwidth

h_{u}

. The estimate

\hat{u} (t)

is just

{\hat{a}}_{0}

. Subsequently, they estimate

C (s, t)

by

\begin{matrix} ({\hat{b}}_{0}, {\hat{b}}_{1}, {\hat{b}}_{2}) = arg min_{b_{0}, b_{1}, b_{2}} {\frac{1}{N} \sum_{i = 1}^{N} \frac{1}{T_{i}} \sum_{j = 1}^{T_{i}} \sum_{k = 1}^{T_{i}} & {[Y_{i j} Y_{i k} - b_{0} - b_{1} (t_{i j} - s) - b_{2} (t_{i k} - t)]}^{2} \\ \times K_{h_{C}} (t_{i j} - s) K_{h_{C}} (t_{i k} - t)}, \end{matrix}

(A46)

where

K_{h_{C}} (t)

is a kernel function with bandwidth

h_{C}

. The estimate

\hat{C} (s, t)

is given by

{\hat{b}}_{0} - \hat{μ} (t) \hat{μ} (s)

. Next, the variance of random error is estimated using a two-step produce. In the former step, Li and Hsing will estimate the variance function

\begin{matrix} ({\hat{c}}_{0}, {\hat{c}}_{1}) = arg min_{c_{0}, c_{1}} \{\frac{1}{N} \sum_{i = 1}^{N} \frac{1}{T_{i}} \sum_{j = 1}^{T_{i}} {[Y_{i j}^{2} - c_{0} - c_{1} (t_{i j} - t)]}^{2} K_{h_{v}} (t_{i j} - t)\}, \end{matrix}

(A47)

where

K_{h_{v}} (t)

is a kernel function with bandwidth

h_{v}

. The estimate

\hat{V} (t)

is given by

{\hat{c}}_{0}

. In the latter step,

\hat{σ}

is yielded by

\begin{matrix} {\hat{σ}}^{2} = \int_{0}^{1} (\hat{V} (t) - \hat{C} (t, t) - \hat{μ} {(t)}^{2}) d t . \end{matrix}

(A48)

Furthermore,

{{\hat{ϕ}}_{k} (t)}

and

{{\hat{λ}}_{k}}

are yielded by

\begin{matrix} \hat{C} (s, t) = \sum_{k = 1}^{K} {\hat{λ}}_{k} {\hat{ϕ}}_{k} (s) {\hat{ϕ}}_{k} {(t)}^{⊤}, \end{matrix}

(A49)

where K is a pre-given cutoff that can be selected by information criterion such as Bayesian information criterion (BIC) [48]. Finally, the random coordinate

{\hat{ξ}}_{i k}

is predicted by

\begin{matrix} {\hat{ξ}}_{i k} = \hat{E} (ξ_{i k} | Y_{i}) = {\hat{λ}}_{k} {\hat{ϕ}}_{i k}^{⊤} {\hat{Σ}}_{y_{i}}^{- 1} (Y_{i} - {\hat{μ}}_{i}), \end{matrix}

(A50)

where

Y_{i} = {(y_{i 1}, \dots, y_{i T_{i}})}^{⊤}

,

{\hat{ϕ}}_{i k} = ({\hat{ϕ}}_{k} {(t_{i 1}, \dots, {\hat{ϕ}}_{k} (t_{i T_{i}}))}^{⊤}

,

{\hat{μ}}_{i} = (\hat{μ} {(t_{i 1}, \dots, \hat{μ} (t_{i T_{i}}))}^{⊤}

,

{\hat{Σ}}_{i} = {\hat{C}}_{i} + {\hat{σ}}^{2} I_{T_{i}}

, and the

(k, h)

th element of

{\hat{C}}_{i}

is

\hat{C} (t_{i k}, t_{i h})

.

Appendix B.2. Estimator of FLMM

In the implementation, FLMM will (1) estimate the mean function

μ (t)

; (2) estimate the covariance matrix

Γ

by REML, and (3) horizontally yield the covariance function

C (s, t)

and predict the random coordinates

{b_{j k}}

by using the estimate of

Γ

. Without loss of generality, we illustrate these three steps with the procedure employed by Shi et al. [20]. Specifically, Shi et al. first estimated

μ (t)

by

\begin{matrix} \hat{β} = arg min_{β} \{\frac{1}{N} \sum_{i = 1}^{N} {(Y_{i} - Ψ_{i} β)}^{⊤} Σ_{y_{i}}^{- 1} (Y_{i} - Ψ_{i} β)\}, \end{matrix}

(A51)

where

β = {(β_{1}, \dots, β_{p})}^{⊤}

,

Ψ_{i} = (Ψ_{1} (t_{i}), \dots, Ψ_{p} (t_{i}))

is an

(T_{i} \times p)

matrix,

Ψ_{k} (t_{i}) = {(Ψ_{k} (t_{i 1}), \dots, Ψ_{k} (t_{i T_{i}}))}^{⊤}

, cov

(Y_{i}) = Σ_{y_{i}}

is approximated by

\begin{matrix} Σ_{y_{i}} \approx σ^{2} I_{T_{i}} + Ψ_{i} Γ Ψ_{i}^{⊤}, \end{matrix}

(A52)

and

{\hat{Σ}}_{y_{i}}

is an estimate of

Σ_{y_{i}}

. Then

\hat{μ} (t) = \sum_{k = 1}^{p} Ψ_{k} (t) {\hat{β}}_{k}

. Next, Shi et al. employed the technique of LMM [8] such as REML [7] to estimate

σ^{2}

and

Γ

:

\begin{matrix} ({\hat{σ}}^{2}, \hat{Γ}) = arg min_{σ^{2}, Γ} {\sum_{i = 1}^{N} ( & {(Y_{i} - Ψ_{i} \hat{β})}^{⊤} Σ_{y_{i}}^{- 1} (Y_{i} - Ψ_{i} \hat{β}) \\ + log det (Σ_{y_{i}}) + log det (Ψ_{i}^{⊤} Σ_{y_{i}}^{- 1} Ψ_{i}))} . \end{matrix}

(A53)

FLMM will iteratively implement the above two steps until the stable estimates of

{\hat{β}}_{1}, \dots, {\hat{β}}_{p}

,

{\hat{σ}}^{2}

, and

\hat{Γ}

is obtained. Here, the estimate of covariance function

C (s, t)

is

\begin{matrix} \hat{C} (s, t) = \sum_{k = 1}^{p} \sum_{h = 1}^{p} {\hat{Γ}}_{k h} Ψ_{k} (s) Ψ_{h} (t) . \end{matrix}

(A54)

As for the random coordinate vector

b_{i} = {(b_{j 1}, \dots, b_{j p})}^{⊤}

, its prediction is given by

\begin{matrix} {\hat{b}}_{i} = \hat{β} + \hat{Γ} Ψ_{i}^{⊤} {\hat{Σ}}_{y_{i}}^{- 1} (Y_{i} - {\hat{μ}}_{i}) . \end{matrix}

(A55)

The individual trajectory is predicted by

{\hat{X}}_{i} (t) = \sum_{k = 1}^{p} Ψ_{k} (t) {\hat{b}}_{i k}

.

References

Ramsay, J.; Silverman, B. Functional Data Analysis; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Králík, M.; Klíma, O.; Ŏuta, M.; Malina, R.M.; Kozieł, S.; Polcerová, L.; Škultétyová, A.; Španěl, M.; Kukla, L.; Zemčík, P. Estimating Growth in Height from Limited Longitudinal Growth Data Using Full-Curves Training Dataset: A Comparison of Two Procedures of Curve Optimization—Functional Principal Component Analysis and SITAR. Children 2021, 8, 934. [Google Scholar] [CrossRef] [PubMed]
Ullah, S.; Finch, C. Applications of functional data analysis: A systematic review. BMC Med Res. Methodol. 2013, 13, 43. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Cai, T.; Yuan, M. Nonparametric Covariance Function Estimation for Functional and Longitudinal Data; University of Pennsylvania and Georgia Inistitute of Technology: Philadelphia, PA, USA, 2010. [Google Scholar]
Diggle, P.; Diggle, P.; Heagerty, P.; Liang, K.; Zeger, S. Analysis of Longitudinal Data; Oxford University Press: Oxford, UK, 2002. [Google Scholar]
Wood, S.; Wood, M. Package ‘mgcv’. R Package Version 2015, 1, 729. [Google Scholar]
Patterson, H.; Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 1971, 58, 545–554. [Google Scholar] [CrossRef]
Laird, N.; Ware, J. Random-effects models for longitudinal data. Biometrics 1982, 1, 963–974. [Google Scholar] [CrossRef]
Hall, P.; Müller, H.; Wang, J. Properties of principal component methods for functional and longitudinal data analysis. Ann. Stat. 2006, 34, 1493–1517. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Hsing, T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann. Stat. 2010, 38, 3321–3351. [Google Scholar] [CrossRef]
Yao, F.; Müller, H.; Wang, J. Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 2005, 100, 577–590. [Google Scholar] [CrossRef]
Paul, D.; Peng, J. Consistency of restricted maximum likelihood estimators of principal components. Ann. Stat. 2009, 37, 1229–1271. [Google Scholar] [CrossRef]
Peng, J.; Paul, D. A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data. J. Comput. Graph. Stat. 2009, 18, 995–1015. [Google Scholar] [CrossRef] [Green Version]
Bunea, F.; Ivanescu, A.; Wegkamp, M. Adaptive inference for the mean of a Gaussian process in functional data. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2011, 73, 531–558. [Google Scholar] [CrossRef]
Rice, J.; Silverman, B. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1991, 53, 233–243. [Google Scholar] [CrossRef]
Yu, F.; Liu, L.; Yu, N.; Ji, L.; Qiu, D. A method of L1-norm principal component analysis for functional data. Symmetry 2020, 12, 182. [Google Scholar] [CrossRef] [Green Version]
James, G.; Hastie, T.; Sugar, C. Principal component models for sparse functional data. Biometrika 2000, 87, 587–602. [Google Scholar] [CrossRef] [Green Version]
James, G.; Sugar, C. Clustering for sparsely sampled functional data. J. Am. Stat. Assoc. 2003, 98, 397–408. [Google Scholar] [CrossRef]
Rice, J.; Wu, C. Nonparametric mixed effects models for unequally sampled noisy curves. Biometrics 2001, 57, 253–259. [Google Scholar] [CrossRef] [Green Version]
Shi, M.; Weiss, R.; Taylor, J. An analysis of paediatric CD4 counts for acquired immune deficiency syndrome using flexible random curves. J. R. Stat. Soc. Ser. C Appl. Stat. 1996, 45, 151–163. [Google Scholar] [CrossRef]
Antoniadis, A.; Sapatinas, T. Estimation and inference in functional mixed-effects models. Comput. Stat. Data Anal. 2007, 51, 4793–4813. [Google Scholar] [CrossRef]
Morris, J.; Carroll, R. Wavelet-based functional mixed models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2006, 68, 179–199. [Google Scholar] [CrossRef] [Green Version]
Zhu, H.; Brown, P.; Morris, J. Robust, adaptive functional regression in functional mixed model framework. J. Am. Stat. Assoc. 2011, 106, 1167–1179. [Google Scholar] [CrossRef] [Green Version]
Qiu, P.; Zou, C.; Wang, Z. Nonparametric profile monitoring by mixed effects modeling. Technometrics 2010, 52, 265–277. [Google Scholar] [CrossRef]
Wu, H.; Zhang, J. Local polynomial mixed-effects models for longitudinal data. J. Am. Stat. Assoc. 2002, 97, 883–897. [Google Scholar] [CrossRef]
Wu, H.; Zhang, J. Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Rice, J. Functional and longitudinal data analysis: Perspectives on smoothing. Stat. Sinica 2004, 1, 631–647. [Google Scholar]
Wang, J.; Chiou, J.; Müller, H. Functional data analysis. Annu. Rev. Stat. Appl. 2016, 3, 257–295. [Google Scholar] [CrossRef] [Green Version]
Cai, T.; Yuan, M. Optimal estimation of the mean function based on discretely sampled functional data: Phase transition. Ann. Stat. 2011, 39, 2330–2355. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications; Routledge: London, UK, 2018. [Google Scholar]
Eilers, P.; Marx, B. Flexible smoothing with B-splines and penalties. Stat. Sci. 1996, 11, 89–102. [Google Scholar] [CrossRef]
Acal, C.; Aguilera, A.; Escabias, M. New modeling approaches based on varimax rotation of functional principal components. Mathematics 2020, 8, 2085. [Google Scholar] [CrossRef]
Breslow, N.; Clayton, D. Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 1993, 88, 9–25. [Google Scholar]
Vonesh, E.; Wang, H.; Nie, L.; Majumdar, D. Conditional second-order generalized estimating equations for generalized linear and nonlinear mixed-effects models. J. Am. Stat. Assoc. 2002, 97, 271–283. [Google Scholar] [CrossRef]
Breslow, N.; Lin, X. Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 1995, 82, 81–91. [Google Scholar] [CrossRef]
Lee, Y.; Nelder, J. Hierarchical generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1996, 58, 619–656. [Google Scholar] [CrossRef]
Lin, X.; Breslow, N. Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 1996, 91, 1007–1016. [Google Scholar] [CrossRef]
Ruppert, D.; Wand, M.; Carroll, R. Semiparametric Regression; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Wood, S.; Pya, N.; Säfken, B. Smoothing parameter and model selection for general smooth models. J. Am. Stat. Assoc. 2016, 111, 1548–1563. [Google Scholar] [CrossRef]
Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
Bickel, P.; Levina, E. Regularized estimation of large covariance matrices. Ann. Stat. 2008, 36, 199–227. [Google Scholar] [CrossRef]
Wainwright, M. High-Dimensional Statistics: A Non-asymptotic Viewpoint; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
Tsybakov, A.B. Introduction to Nonparametric Estimation; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Wood, S. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2011, 73, 3–36. [Google Scholar] [CrossRef] [Green Version]
Nelder, J.; Wedderburn, R. Generalized linear models. J. R. Stat. Soc. Ser. A (Gen.) 1972, 135, 370–384. [Google Scholar] [CrossRef]
Hall, P.; Müller, H.; Yao, F. Modelling sparse generalized longitudinal observations with latent Gaussian processes. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2008, 70, 703–723. [Google Scholar] [CrossRef]
Horn, R.; Johnson, C. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 1, 461–464. [Google Scholar] [CrossRef]

Figure 1. Illustration of the COVID-19 data. The X-axis records the dates from 16 March 2020 to 14 August 2020 and the Y-axis shows the numbers of the logarithm of the daily infections.

Figure 2. The eigenfunctions, mean function, individual trajectory, and covariance function used in the simulation.

Figure 3. Estimation errors of mean function, individual trajectory, variance, covariance surface, and running time in the three methods.

Figure 4. Investigation of the covariance surface approximation.

Figure 5. Fitted mean functions of daily infection via FPCA and proposed FLMM.

Figure 6. Predictions of the trajectories of six states via FPCA and proposed FLMM.

Figure 7. Fitted covariance surfaces via FPCA and proposed FLMM.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ran, M.; Yang, Y. Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model. Mathematics 2022, 10, 4322. https://doi.org/10.3390/math10224322

AMA Style

Ran M, Yang Y. Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model. Mathematics. 2022; 10(22):4322. https://doi.org/10.3390/math10224322

Chicago/Turabian Style

Ran, Mengfei, and Yihe Yang. 2022. "Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model" Mathematics 10, no. 22: 4322. https://doi.org/10.3390/math10224322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model

Abstract

1. Introduction

2. Preliminary

2.1. Settings

2.2. Functional Data Model

2.3. Methodologies of FPCA and FLMM

2.4. Comparison between FPCA and FLMM

3. Optimal Estimation for Large Balanced Longitudinal Data

3.1. Novel Estimating Procedure

3.2. Regularity Assumptions

3.3. Large Sample Property

3.4. Tuning Parameter Selection

4. Simulation Study

4.1. Simulation Settings

4.2. Simulation Results

5. Real Data Analysis

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of Theorems

Appendix A.1. Lemmas

Appendix A.2. Proofs

Appendix B. Estimators of FPCA and FLMM

Appendix B.1. Estimator of FPCA

Appendix B.2. Estimator of FLMM

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI