Generalized Partially Functional Linear Model with Interaction between Functional Predictors

Weiwei Xiao; Kejing Mao; Haiyan Liu

doi:10.3390/axioms13090583

,

and

¹

School of Science, North China University of Technology, Beijing 100144, China

²

Department of Statistics, University of Leeds, Leeds LS2 9JT, UK

^*

Author to whom correspondence should be addressed.

Axioms2024, 13(9), 583;https://doi.org/10.3390/axioms13090583

This article belongs to the Special Issue Advances in Functional and Topological Data Analysis

Version Notes

Order Reprints

Abstract

This paper proposes a generalized partially functional linear model with interaction terms. It is suitable for cases where the response variable is scalar, and the predictor variables include a mix of functional and scalar types, while considering the correlations among functional predictor variables. The model uses principal component analysis for dimensionality reduction, employs maximum likelihood estimation to obtain parameter values, proves the asymptotic properties of the estimates, and validates the model’s accuracy through data simulation experiments. Finally, the proposed model was applied to investigate the influence of air quality, climate factors, and medical and social indicators, along with their interactions, on cancer incidence, which is a binary response.

Keywords:

functional data analysis; generalized functional linear model; interaction term; cancer incidence

MSC:

00A71; 62H25

1. Introduction

With the advent of the big data era, more and more functional data, providing information about objects varying over a continuum, are collected.

Currently, functional data analysis is being applied in various fields such as medicine, environmental science, and economics and is receiving increasing attention. For details on functional data analysis, see monographs by Ramsay and Silverman [1], Horváth and Kokoszka [2], and Hsing and Eubank [3].

Several variants of functional linear regression models have been proposed to investigate the influence of functional and/or scalar predictors on functional or scalar response and, therefore, to make predictions. Cardot [4], Tony [5], and others have utilized spline methods for estimation and prediction in functional linear regression models. In 2007, Cardot et al. [6] extended the population least squares method to functional linear models, proposing smooth spline estimates for model function coefficients and providing asymptotic results for this estimation. In 2012, Delaigle and Hall [7] utilized partial least squares to demonstrate the consistency and convergence of functional linear models. Tony and Ming [8] studied the estimation and prediction issues of functional linear regression models within the framework of reproducing kernel Hilbert spaces. Nevertheless, these models cannot deal with general responses such as binary and Poisson.

An important tool for functional data analysis is the functional linear regression model, while the generalized functional regression model is an extension of the functional linear regression model. As research progressed, the generalized functional regression model was introduced to handle more complex response variables. This model was first introduced by Nelder and Wedderburn [9] in 1972, and it investigates the relationship between continuous and discrete response variables and the predictor variables through a link function. In 2002, James [10] proposed generalized linear models with functional predictors and applied it to standard missing data problems. In 2005, Müller and Stadtmüller [11] proposed a generalized functional linear regression model where the response variable is a discrete scalar and the predictor is functional. In 2011, Goldsmith et al. [12] developed fast fitting methods for generalized functional linear models that can be applied to various functional data designs, including functions measured with and without error and sparsely or densely sampled. In 2021, Xiao et al. [13] proposed a generalized partially functional linear regression model where the response variable is general and the predictors are scalar and functional. However, none of these models incorporate the interaction of functional predictors.

To better address complex data that include both functional predictors and scalar predictors, scholars have improved the functional linear model and proposed a functional regression model with mixed predictors. In 2016, Kong et al. [14] explored the estimation and variable selection problems in cases where the parametric part is high-dimensional and the functional predictors are multidimensional. Yao [15] and Ma et al. [16] further built upon the work of Kong [14], conducting more in-depth research and proving the large sample properties of the estimators. In 2020, Xu et al. [17] studied the estimation and hypothesis testing issues for models with multiple functional predictors and demonstrated the corresponding large sample properties.

In many practical applications, we need to consider the interactions between variables, and failure to consider the interaction term may lead to the problem of missing variables in the model, thus introducing inaccurate predictions and inappropriate interpretations. By introducing interaction terms, the inaccuracy can be reduced and the model can be made more reliable, thereby improving the prediction by the model and providing more reliable decision support. Indeed, functional linear regressions models with interaction between functional predictors have been proposed recently; several examples follow. In 2016, Usset et al. [18] proposed a functional regression model with a scalar response and multiple functional predictors with two-way interactions in addition to their main effects. In 2019, Luo and Qin [19] proposed function-on-function regression models with interaction and quadratic effects, together with an efficient estimation method that has a minimum prediction error. In 2013, Yang et al. [20] introduced a class of nonlinear multivariate time-frequency functional models that can identify important features in each signal as well as the interaction of signals. Some models considered the interaction of two different time points in the functional data. In 2020, Matsui [21] proposed a functional quadratic model which took the interaction between two different time points of the functional data into consideration. In 2020, Sun and Wang [22] also considered a quadratic regression model where the predictor and the response are both functional; it estimated predictions for the coefficient functions, and unknown responses and asymptotics were demonstrated. Nonetheless, these models cannot be applied to general scalar responses. As far as we know, only Fuchs et al. [23] in 2015 considered general scalar response with functional predictors to include linear functional interaction terms. However, one drawback of the method of Fuchs et al. [23] is that scalar predictors are not included, and a second drawback is that the asymptotic properties of estimated regression coefficients were not established.

A practical motivation of this paper is the investigation of the influence of air qualities, climate factors, medical and social indicators, and their interactions on cancer incidence, which is a binary response. Cancer is one of the leading causes of death in humans; therefore, it is crucial to analyze the factors related to cancer incidence. Studying cancer incidence can help improve public health and quality of life, reduce social medical costs, and promote human health and socio-economic development. In 2022, Qiu et al. [24] pointed out that cancer incidence in China is much higher than those in the United States and the United Kingdom due to the fact that China faces problems such as a large population, uneven development in various regions, and a relative lag in cancer control strategies. In 2014, Qin et al. [25] indicated that long-term exposure to air pollutants or short-term exposure to some high concentrations of air pollutants such as PM2.5 may be associated with some increased incidence rates of overall cancer, especially prostate cancer and female breast cancer. In 2022, Wu et al. [26] found that areas with high green coverage have a lower risk of cancer. In 2023, Cao et al. [27] analyzed the relationship between per capita GDP and cancer incidence in 55 regions of China, showing that regions with high GDP have high cancer incidence. In 2017, Xu et al. [28] conducted a statistical analysis of the current situation of PM2.5 in Changzhou in China and considered an interaction between PM2.5 and relative humidity during the same period, indicating a certain degree of interaction between the two. In 2022, Yang et al. [29] used the generalized linear model to study the effects of PM2.5 and relative humidity on visibility and found a significant interaction between PM2.5 and relative humidity.

Therefore, we collected data on average daily PM2.5 concentration (from 1 January 2015 to 31 December 2020), average daily humidity (from 1 January 2015 to 31 December 2020), per capita GDP, green coverage rate in built-up areas, the proportion of medical personnel (PMP) (which is the ratio of the number of licensed (assistant) doctors to the population in the locality), and the binary cancer incidence in 49 cities in China from http://www.cnemc.cn/, http://www.stats.gov.cn/sj/ndsj/ and http://www.chinancpcn.org.cn/home. Our aim was to investigate the influence of PM2.5 concentration, air humidity, per capita GDP, green coverage, and PMP on cancer incidence, with the focus not only on the main effects but also on the interaction between PM2.5 concentration and air humidity to, therefore, make predictions.

Existing models with interaction terms between functional predictors and general scalar responses cannot deal with multiple functional and scalar predictors, which is the case in our motivated datasets. Moreover, the asymptotic properties of estimators have not been addressed in existing models. Therefore, in Section 2, we fully consider the combined influence of functional predictors, scalar predictors, and interactions between functional predictors on general scalar response by proposing a generalized partially functional linear model with interaction terms. In Section 3, the asymptotic properties of our proposed estimators are established. Extensive simulation studies are given in Section 4. Section 5 is reserved for the real data analysis.

2. Model and Estimation

2.1. Model Introduction

Suppose we have n subjects, and the data we observe for the i-th subject are

{(X_{i 1} (t_{1}), t_{1} \in T_{1}), (X_{i 2} (t_{2}), t_{2} \in T_{2}), Z_{i}, Y_{i}}

,

i = 1, \dots, n .

For

j = 1, 2,

the functional predictor

X_{i j} (t_{j})

is a random curve, which is observed for subject i and

X_{i j} (t_{j}) \in L^{2} (T_{j})

, where

T_{j}

is a bounded interval of

R

. Notice that, for the sake of simplicity in notations, we only consider the case with two functional predictors, and the case with multiple functional predictors can be easily similarly established. The scalar predictor vector

Z = {(Z_{1}, Z_{2}, \dots, Z_{q})}^{T}

is a q-dimensional random vector. The response

Y_{i}

is a real-valued random variable that may be continuous or discrete (e.g., binary, count, etc.).

We assume that there is a known link function

g (\cdot)

, which is a monotone and twice continuously differentiable function with bounded derivatives that is, thus, invertible.

We introduce the following generalized partially functional linear model with interaction between the functional predictors:

\begin{matrix} Y_{i} = & g (α + \int_{T_{1}} X_{i 1} (t_{1}) β_{1} (t_{1}) d t_{1} + \int_{T_{2}} X_{i 2} (t_{2}) β_{2} (t_{2}) d t_{2} \\ + {\int \int}_{T_{1} \times T_{2}} X_{i 1} (t_{1}) X_{i 2} (t_{2}) β (t_{1}, t_{2}) d t_{1} d t_{2} + Z_{i}^{T} γ) + ε_{i}, \end{matrix}

(1)

where

α \in R

is the intercept,

β_{1} (t_{1}), β_{2} (t_{2})

, and

β (t_{1}, t_{2})

are the regression coefficient functions corresponding to the two functional predictors and the interaction term, respectively,

Z_{i}

represents the i-th random variable in Z, and

γ = {(γ_{1}, γ_{2}, \dots, γ_{q})}^{T}

is the regression coefficient corresponding to the multiple scalar predictors Z. It is assumed that

ε_{i}

has mean 0 and variance

σ^{2}

and that

ε_{i}

is independent with

ε_{j}

if

i \neq j

.

Define the linear operator ℓ:

\begin{matrix} ℓ = & α + \int_{T_{1}} X_{1} (t_{1}) β_{1} (t_{1}) d t_{1} + \int_{T_{2}} X_{2} (t_{2}) β_{2} (t_{2}) d t_{2} \\ + {\int \int}_{T_{1} \times T_{2}} X_{1} (t_{1}) X_{2} (t_{2}) β (t_{1}, t_{2}) d t_{1} d t_{2} + Z^{T} γ . \end{matrix}

We specify

E (Y | X_{1} (\cdot), X_{2} (\cdot), Z) = η = g (ℓ), V a r (Y | X_{1} (\cdot), X_{2} (\cdot), Z) = σ^{2} (η) .

For simplicity, we assume that the predictors

X_{j} (t_{j})

and Z are both centralized, i.e.,

E (X_{j} (t_{j})) = 0, j = 1, 2

and

E (Z_{l}) = 0, l = 1, \dots, q

. The Karhunen–Loève expansion is a mathematical technique for representing stochastic processes in terms of an orthonormal basis derived from the process’s covariance function. It decomposes a random process into a series of orthogonal functions, each weighted by uncorrelated random coefficients. The basis functions are the eigenfunctions of the covariance function, and the expansion efficiently captures the process’s variability, often using only a few terms. For a detailed introduction to the Karhunen–Loève expansion, please refer to Equation (2.8) in [2]. Based on Karhunen–Loève expansion,

X_{i j} (t_{j})

can be expanded as

X_{i 1} (t_{1}) = \sum_{k = 1}^{\infty} χ_{i 1 k} φ_{1 k} (t_{1}),

X_{i 2} (t_{2}) = \sum_{l = 1}^{\infty} χ_{i 2 l} φ_{2 l} (t_{2}),

where

χ_{i 1 k}, χ_{i 2 l}

are the functional principal component scores,

φ_{1 k} (t_{1}), φ_{2 l} (t_{2})

are the functional principal component bases, and

\int_{T_{1}} φ_{1 k}^{2} (t_{1}) d t_{1} = 1, \int_{T_{2}} φ_{2 l}^{2} (t_{2}) d t_{2} = 1 .

Using the functional principal component bases, the regression coefficient functions

β_{j} (t_{j}), β (t_{1}, t_{2})

are expanded as

β_{1} (t_{1}) = \sum_{k = 1}^{\infty} b_{1 k} φ_{1 k} (t_{1}),

β_{2} (t_{2}) = \sum_{l = 1}^{\infty} b_{2 l} φ_{2 l} (t_{2}),

β (t_{1}, t_{2}) = \sum_{k = 1}^{\infty} \sum_{l = 1}^{\infty} u_{k l} φ_{1 k} (t_{1}) φ_{2 l} (t_{2}) .

Plugging the above expansions into Model (1) and truncating the predictors at

p_{j}

, which increases asymptotically with

n \to \infty

, we can obtain truncated Model (2):

Y_{i} = g (α + \sum_{k = 1}^{p_{1}} χ_{i 1 k} b_{1 k} + \sum_{l = 1}^{p_{2}} χ_{i 2 l} b_{2 l} + \sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} ρ_{i k l} u_{k l} + {Z_{i}}^{T} γ) + ε_{i},

(2)

where

ρ_{i k l} = χ_{i 1 k} \cdot χ_{i 2 l}

.

2.2. Parameter Estimation

Define the parameter vector

\begin{matrix} Ω = & (b_{11}, \dots, b_{1 p 1}, b_{21}, \dots, b_{2 p_{2}}, u_{11}, \dots, u_{1 p_{2}}, u_{21}, \dots, u_{2 p_{2}}, \dots, \\ {u_{p_{1} 1}, \dots, u_{p_{1} p_{2}}, γ_{0}, γ_{1}, \dots, γ_{q})}^{T}, \end{matrix}

and define

ℓ_{i} = α + \sum_{k = 1}^{p_{1}} χ_{i 1 k} b_{1 k} + \sum_{l = 1}^{p_{2}} χ_{i 2 l} b_{2 l} + ρ_{i}^{T} u + {Z_{i}}^{T} γ,

η_{i} = g (ℓ_{i}),

\begin{matrix} ω_{i} = & (χ_{i 11}, \dots, χ_{i 1 p 1}, χ_{i 21}, \dots, χ_{i 2 p_{2}}, ρ_{i 11}, \dots, ρ_{i 1 p_{2}}, ρ_{i 21}, \dots, ρ_{i 2 p_{2}}, \dots, \\ {ρ_{i p_{1} 1}, \dots, ρ_{i p_{1} p_{2}}, z_{i 0}, z_{i 1}, \dots, z_{i q})}^{T}, \end{matrix}

where

b_{j} = {(b_{j 1}, \dots, b_{j p_{j}})}^{T}, j = 1, 2,

u = {(u_{11}, \dots, u_{1 p_{2}}, u_{21}, \dots, u_{2 p_{2}}, \dots, u_{p_{1} 1}, \dots, u_{p_{1} p_{2}})}^{T},

γ = {(γ_{0}, γ_{1}, \dots, γ_{q})}^{T},

ρ_{i} = {(ρ_{i 11}, \dots, ρ_{i 1 p_{2}}, ρ_{i 21}, \dots, ρ_{i 2 p_{2}}, \dots, ρ_{i p_{1} 1}, \dots, ρ_{i p_{1} p_{2}})}^{T},

z_{i 0} = 1

, and

γ_{0} = α

.

The maximum likelihood estimate

\hat{Ω}

of

Ω

can be obtained by solving Equation (3):

U (Ω) = \sum_{i = 1}^{n} \frac{(Y_{i} - g (ℓ_{i})) g^{'} (ℓ_{i})}{σ^{2} (η_{i})} ω_{i} = 0,

(3)

\begin{matrix} \hat{Ω} = & ({\hat{b}}_{11}, \dots, {\hat{b}}_{1 p_{1}}, {\hat{b}}_{21}, \dots, {\hat{b}}_{2 p_{2}}, {\hat{u}}_{11}, \dots, {\hat{u}}_{1 p_{2}}, {\hat{u}}_{21}, \dots, {\hat{u}}_{2 p_{2}}, \dots, \\ {{\hat{u}}_{p_{1} 1}, \dots, {\hat{u}}_{p_{1} p_{2}}, {\hat{γ}}_{0}, {\hat{γ}}_{1}, \dots, {\hat{γ}}_{q})}^{T}, \end{matrix}

where

{\hat{b}}_{j} = {({\hat{b}}_{j 1}, \dots, {\hat{b}}_{j p_{j}})}^{T}, j = 1, 2,

\hat{u} = {({\hat{u}}_{11}, \dots, {\hat{u}}_{1 p_{2}}, {\hat{u}}_{21}, \dots, {\hat{u}}_{2 p_{2}}, \dots, {\hat{u}}_{p_{1} 1}, \dots, {\hat{u}}_{p_{1} p_{2}})}^{T},

\hat{α} = {\hat{γ}}_{0}

, and

\hat{γ} = {({\hat{γ}}_{0}, {\hat{γ}}_{1}, \dots, {\hat{γ}}_{q})}^{T}

are the estimates of

b_{j}, u, α, γ

, respectively.

Introducing the following matrices:

V = d i a g (σ^{2} (η_{1}), \dots, σ^{2} (η_{n})),

W = d i a g (g^{'} (ℓ_{1}), g^{'} (ℓ_{2}), \dots, g^{'} (ℓ_{n})),

A_{0} = A_{n, q + 1} = {(\frac{g^{'} (ℓ_{i}) z_{i m}}{σ (η_{i})})}_{1 \leq i \leq n, 0 \leq m \leq q},

A_{j} = A_{n, p_{j}} = {(\frac{g^{'} (ℓ_{i}) χ_{i j r}}{σ (η_{i})})}_{1 \leq i \leq n, 0 \leq r \leq p_{j}}, j \in {1, 2},

A_{12} = A_{n, t} = {(\frac{g^{'} (ℓ_{i}) ρ_{i t}}{σ (η_{i})})}_{1 \leq i \leq n, 1 \leq t \leq p_{1} p_{2},}

\begin{matrix} A & = A_{n, q + 1 + p_{1} + p_{2} + p_{1} p_{2}} = diag (A_{1}, A_{2}, A_{12}, A_{0}), \end{matrix}

and the vectors

Y = {(Y_{1}, \dots, Y_{n})}^{T},

η = {(η_{1}, \dots, η_{n})}^{T},

then Equation (3) can be written as

A^{T} V^{- \frac{1}{2}} (Y - η) = 0 .

The estimation of

Ω

is usually solved iteratively using a weighted least squares method. By Taylor expansion, we have

\begin{matrix} g^{- 1} (Y) = & g^{- 1} (η) + [g^{- 1} (η)]^{'} (Y - η) \\ = & ℓ + W^{- 1} (Y - η); \end{matrix}

thus, there is

A^{T} H (g^{- 1} (Y) - ℓ) = 0,

where

H = V^{- \frac{1}{2}} W

.

Simplify to obtain estimates of

b_{j}, γ, u

:

{\tilde{b}}_{j} = {(A_{j}^{T} A_{j})}^{- 1} A_{j}^{T} H g^{- 1} (Y),

\tilde{γ} = {(A_{0}^{T} A_{0})}^{- 1} A_{0}^{T} H g^{- 1} (Y),

\tilde{u} = {(A_{12}^{T} A_{12})}^{- 1} A_{12}^{T} H g^{- 1} (Y) .

Repeat the above process until convergence; then, the estimate of

Ω

is obtained:

\begin{matrix} \hat{Ω} = & ({\hat{b}}_{11}, \dots, {\hat{b}}_{1 p_{1}}, {\hat{b}}_{21}, \dots, {\hat{b}}_{2 p_{2}}, {\hat{u}}_{11}, \dots, {\hat{u}}_{1 p_{2}}, {\hat{u}}_{21}, \dots, {\hat{u}}_{2 p_{2}}, \dots, \\ {{\hat{u}}_{p_{1} 1}, \dots, {\hat{u}}_{p_{1} p_{2}}, {\hat{γ}}_{0}, {\hat{γ}}_{1}, \dots, {\hat{γ}}_{q})}^{T} . \end{matrix}

3. Asymptotic Properties

Considering the truncated Model (2), we have the metric

d_{G}^{2} ({\hat{β}}_{j}, β_{j}) = {({\hat{b}}_{j} - b_{j})}^{T} {\tilde{Γ}}_{j} ({\hat{b}}_{j} - b_{j}) + \sum_{k_{1}, k_{2} = p_{j} + 1}^{\infty} λ_{j, k_{1} k_{2}} {\bar{b}}_{j}^{2}, j = 1, 2,

where

b_{j} = (b_{j 1}, b_{j 2}, \dots, b_{j p_{j}}),

{\tilde{Γ}}_{j} = {(λ_{j, k_{1} k_{2}})}_{1 \leq k_{1}, k_{2} \leq p_{j}}

is a symmetric positive definite matrix and

λ_{j, k_{1} k_{2}} = E {[\frac{g^{'} {(ℓ)}^{2}}{σ^{2} (η)} χ_{j k_{1}} χ_{j k_{2}}]}_{1 \leq k_{1}, k_{2} \leq p_{j}}

is an eigenvalue of the generalized self-covariance operator

A_{G_{j}}

with kernel

G_{j} (s, t) = E [\frac{g^{'} {(ℓ)}^{2}}{σ^{2} (η)} X_{j} (s) X_{j} (t)],

and we have

{\tilde{Γ}}_{j}^{- 1} = {(ξ_{j, k_{1} k_{2}})}_{1 \leq k_{1}, k_{2} \leq p_{j}}

and

{\bar{b}}_{j} = {(b_{j (p_{j} + 1)}, b_{j (p_{j} + 2)}, \dots)}^{T} .

Combined with Corollary 4.1 in Müller (2005) [11], we have

\sum_{k_{1}, k_{2} = p_{j} + 1}^{\infty} λ_{j, k_{1} k_{2}} {\bar{b}}_{j}^{2} = o (\frac{\sqrt{p_{j}}}{n}) .

We specify

{∥f∥}^{2} = \int_{S} f {(s)}^{2} d s, f \in L^{2} (S),

{∥g∥}^{2} = \int_{S} \int_{T} g {(s, t)}^{2} d s d t, g \in L^{2} (S \times T)

and

(f \otimes g) (x, y) = f (x) g (y), x \in X, y \in Y,

where

X, Y

are the domains of

f, g

, respectively.

Define

C_{X_{j}}

as the covariance function of a random function

X_{j}

, for

j = 1, 2

. By Mercer’s theorem,

C_{X_{1}} (t_{11}, t_{12}) = \sum_{k \geq 1} λ_{k} φ_{1 k} (t_{11}) φ_{1 k} (t_{12}),

C_{X_{2}} (t_{21}, t_{22}) = \sum_{l \geq 1} σ_{l} φ_{2 l} (t_{21}) φ_{2 l} (t_{22}),

where

t_{11}, t_{12} \in T_{1};

t_{21}, t_{22} \in T_{2};

λ_{k}

and

φ_{1 k}

,

k = 1, 2, \dots

, are the non-negative eigenvalues and the corresponding eigenfunctions of the covariance function

C_{X_{1}} (t_{11}, t_{12})

; and

σ_{l}

and

φ_{2 l}

,

l = 1, 2, \dots

, are the non-negative eigenvalues and the corresponding eigenfunctions of the covariance function

C_{X_{2}} (t_{21}, t_{22})

.

In order to derive the asymptotic nature of the regression coefficients, we have made the following assumptions in addition to the basic conditions in Section 2:

(i): The connected function $g (\cdot)$ is monotonically invertible and has bounded second-order derivatives, the derivative of the variance function $σ^{2} (\cdot)$ is continuously bounded, and there exists an $σ (\cdot) > Δ > 0$ ;
(ii): The scalar predictor variable Z and the functional predictor variable $X_{j} (t_{j})$ are independent of each other;
(iii): When $n \to \infty$ , $p_{j}$ satisfies $p_{j} \to \infty$ and $p_{j} n^{- \frac{1}{4}} \to 0$ ;
(iv): $E [\int_{T_{j}} {X_{j} (t_{j})}^{4} d t_{j}] < \infty$ ;
(v): Define $μ_{X_{1}, k} = min_{1 \leq k \leq p_{1}} (λ_{p_{1}} - λ_{k + 1})$ , $μ_{X_{2}, l} = min_{1 \leq l \leq p_{2}} (σ_{l} - σ_{l + 1})$ , and $μ_{X_{1}, k} > 0$ , $μ_{X_{2}, l} > 0$ ;
(vi): Define $d_{n} = ∥{\hat{C}}_{X} - C_{X}∥$ , ${\tilde{K}}_{n} = min \{k \geq 1 : λ_{k} \leq 2 d_{n}\} - 1$ , ${\tilde{L}}_{n} = min \{l \geq 1 : σ_{l} \leq 2 d_{n}\} - 1$ ; $d_{n} \to 0$ , ${\tilde{K}}_{n} \to \infty$ , and ${\tilde{L}}_{n} \to \infty$ when $n \to \infty$ .

Lemma 1.

If the above basic conditions and assumptions hold, while

p_{1} \leq {\tilde{K}}_{n}

,

p_{2} \leq {\tilde{L}}_{n}

and

\sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} \frac{1}{n} (\frac{1}{μ_{X_{1}, k}^{2}} + \frac{1}{μ_{X_{2}, l}^{2}}) \to 0,

we have

{∥\hat{β} - β∥}^{2} = O_{p} (\sum_{t = 1}^{p_{1} p_{2}} {(u_{t} - {\hat{u}}_{t})}^{2}) .

Proof.

\begin{matrix} {∥β - \hat{β}∥}^{2} = & {∥\sum_{k = 1}^{\infty} \sum_{l = 1}^{\infty} u_{k l} φ_{1 k} \otimes φ_{2 l} - \sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} {\hat{u}}_{k l} {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l}∥}^{2} \\ = & ∥\sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} u_{k l} (φ_{1 k} \otimes φ_{2 l} - {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l}) + \sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} (u_{k l} - {\hat{u}}_{k l}) {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l} \\ + {\sum_{k = 1}^{p_{1}} \sum_{l > p_{2}} u_{k l} φ_{1 k} \otimes φ_{2 l} + \sum_{k > p_{1}} \sum_{l = 1}^{\infty} u_{k l} φ_{1 k} \otimes φ_{2 l}∥}^{2} \\ \leq & 4 {∥\sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} u_{k l} (φ_{1 k} \otimes φ_{2 l} - {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l})∥}^{2} + 4 \sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} {(u_{k l} - {\hat{u}}_{k l})}^{2} \\ + 4 \sum_{k = 1}^{p_{1}} \sum_{l > p_{2}} u_{k l}^{2} + 4 \sum_{k > p_{1}} \sum_{l = 1}^{\infty} u_{k l}^{2} \\ = & 4 I_{1} + 4 \sum_{t = 1}^{p_{1} p_{2}} {(u_{t} - {\hat{u}}_{t})}^{2} + 4 R_{β} {(p_{1}, p_{2})}^{2}, \end{matrix}

where

R_{β} (p_{1}, p_{2}) = {(\sum_{k = 1}^{p_{1}} \sum_{l > p_{2}} u_{k l}^{2} + \sum_{k > p_{1}} \sum_{l = 1}^{\infty} u_{k l}^{2})}^{\frac{1}{2}} \to 0,

{∥β∥}^{2} < \infty (p_{1}, p_{2} \to \infty) .

From the Cauchy–Schwarz’s inequality and Yifan Sun (2020) [22] Lemmas 1 and 2, we obtain

\begin{matrix} I_{1} = {∥\sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} u_{k l} (φ_{1 k} \otimes φ_{2 l} - {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l})∥}^{2} \\ = \int_{T_{1}} {\int_{T_{2}} [\sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} u_{k l} (φ_{1 k} \otimes φ_{2 l} - {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l})]}^{2} d s d t \\ \leq \int_{T_{1}} \int_{T_{2}} (\sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} u_{k l}^{2}) [\sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} {(φ_{1 k} \otimes φ_{2 l} - {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l})}^{2}] d s d t \\ \leq {∥β∥}^{2} \sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} {∥φ_{1 k} \otimes φ_{2 l} - {\hat{φ}}_{1 k} \otimes {\hat{φ}}_{2 l}∥}^{2} \\ = {∥β∥}^{2} \sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} \frac{C}{n} (\frac{1}{μ_{X_{1}, k}^{2}} + \frac{1}{μ_{X_{2}, l}^{2}}) . \end{matrix}

Therefore, we have

I_{1} = O_{p} (\begin{matrix} \sum_{k = 1}^{p_{1}} \sum_{l = 1}^{p_{2}} \frac{1}{n} (\frac{1}{μ_{X_{1}, k}^{2}} + \frac{1}{μ_{X_{2}, l}^{2}}) \end{matrix}) \to 0 .

Thus, we conclude that

∥β - \hat{β}∥ = O_{p} (\sum_{t = 1}^{p_{1} p_{2}} {(u_{t} - {\hat{u}}_{t})}^{2}) .

Therefore, Lemma 1 is proven. □

Theorem 1.

If the above conditions and assumptions hold, then we have

(\begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \frac{n d_{G}^{2} ({\hat{β}}_{1}, β_{1}) - p_{1}}{\sqrt{2 p_{1}}} \\ \frac{n d_{G}^{2} ({\hat{β}}_{2}, β_{2}) - p_{2}}{\sqrt{2 p_{2}}} \\ \frac{n d^{2} (β, \hat{β}) - p_{1} p_{2} τ}{\sqrt{2 p_{1} p_{2} τ}} \end{matrix} \\ \sqrt{n Θ_{0}} (γ_{0} - {\hat{γ}}_{0}) \end{matrix} \\ \sqrt{n Θ_{1}} (γ_{1} - {\hat{γ}}_{1}) \end{matrix} \\ \begin{matrix} ⋮ \\ \sqrt{n Θ_{q}} (γ_{q} - {\hat{γ}}_{q}) \end{matrix} \end{matrix}) \to N (0, I),

where

Θ_{m} = E [\frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} z_{i m}^{2}],

τ = E [\frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} ρ_{i t}^{2}]

, and I is a unit matrix of

(q + 1 + p_{1} + p_{2}) \times (q + 1 + p_{1} + p_{2})

.

Proof.

A Taylor expansion-based approach is used to prove the asymptotic normality of the estimates. The Hessian of the proposed likelihood is

J_{Ω} = Δ_{Ω} U (Ω)

and

A^{T} A = \sum_{i = 1}^{n} \frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} ω_{i} {ω_{i}}^{T} .

Thus, we have

\begin{matrix} J_{Ω} & = \frac{\partial U (Ω)}{\partial Ω} = \frac{\partial U (Ω)}{\partial ℓ_{i}} \frac{\partial ℓ_{i}}{\partial Ω} \\ = - \sum_{i = 1}^{n} \frac{{g^{'}}^{2} (ℓ_{i}) ω_{i} {ω^{'}}_{i}}{σ^{2} (g (ℓ_{i}))} + \sum_{i = 1}^{n} (\frac{g^{″} (ℓ_{i})}{σ^{2} (η_{i})} - \frac{{g^{'}}^{2} (ℓ_{i}) {σ^{2}}^{'} (η_{i})}{σ^{4} (η_{i})}) (Y_{i} - g (ℓ_{i})) ω_{i} {ω^{'}}_{i} \\ = - A^{T} A + R . \end{matrix}

The remainder term R can be ignored and, using Taylor expansions, we obtain a

\tilde{Ω}

that lies between

Ω

and

\hat{Ω}

. We have

\frac{U (Ω) - U (\hat{Ω})}{Ω - \hat{Ω}} = J_{\tilde{Ω}};

therefore,

\sqrt{n} (Ω - \hat{Ω}) = {[I + M + N]}^{- 1} {(\frac{A^{T} A}{n})}^{- 1} \frac{U (Ω)}{\sqrt{n}},

where

M = {(\frac{A^{T} A}{n})}^{- 1} \frac{J_{\tilde{Ω}} - J_{Ω}}{n}

,

N = {(\frac{A^{T} A}{n})}^{- 1} \frac{J_{Ω} - A^{T} A}{n}

.

From Lemma 7.1 in Müller (2005) [11], it follows that

\sqrt{n} (Ω - \hat{Ω}) \sim {(\frac{A^{T} A}{n})}^{- 1} \frac{U (Ω)}{\sqrt{n}} .

Asymptotic convergence of the lower proof

{(\frac{A^{T} A}{n})}^{- 1} \frac{U (Ω)}{\sqrt{n}}

gives

{(\frac{A^{T} A}{n})}^{- 1} \frac{U (Ω)}{\sqrt{n}} = {(\frac{A^{T} A}{n})}^{- 1} \frac{A^{T} V^{- \frac{1}{2}} (Y - η)}{\sqrt{n}} = {(\frac{A^{T} A}{n})}^{- 1} \frac{A^{T} \bar{ε}}{\sqrt{n}},

where

\bar{ε} = \frac{ε}{σ (η)}

and follows a standard normal distribution.

Thus, we have

\sqrt{n} (Ω - \hat{Ω}) \sim {(\frac{A^{T} A}{n})}^{- 1} \frac{A^{T} \bar{ε}}{\sqrt{n}} .

Since

β_{j} (t_{j})

,

β (t_{1}, t_{2})

, and

γ

are of different data types,

\sqrt{n} (Ω - \hat{Ω})

is divided into three terms, i.e.,

\sqrt{n} (b_{j} - {\hat{b}}_{j}) \sim {(\frac{{A_{j}}^{T} A_{j}}{n})}^{- 1} \frac{{A_{j}}^{T} \bar{ε}}{\sqrt{n}}, j = 1, 2,

\sqrt{n} (u - \hat{u}) \sim {(\frac{{A_{12}}^{T} A_{12}}{n})}^{- 1} \frac{{A_{12}}^{T} \bar{ε}}{\sqrt{n}},

\sqrt{n} (γ - \hat{γ}) \sim {(\frac{{A_{0}}^{T} A_{0}}{n})}^{- 1} \frac{{A_{0}}^{T} \bar{ε}}{\sqrt{n}},

where

{A_{j}}^{T} A_{j}

are symmetric matrices and

{A_{0}}^{T} A_{0}

are diagonal matrices.

First, we prove

\sqrt{n} (b_{j} - {\hat{b}}_{j}) \sim {(\frac{{A_{j}}^{T} A_{j}}{n})}^{- 1} \frac{{A_{j}}^{T} \bar{ε}}{\sqrt{n}}, j = 1, 2 .

Let

X_{n j} = \frac{{\tilde{Λ}}_{j}^{- \frac{1}{2}} {A_{j}}^{T} \bar{ε}}{\sqrt{n}}, Z_{n j} = {(\frac{{A_{j}}^{T} A_{j}}{n})}^{- 1} \frac{{A_{j}}^{T} \bar{ε}}{\sqrt{n}},

Ψ_{n j} = {\tilde{Γ}}_{j}^{\frac{1}{2}} {(\frac{{A_{j}}^{T} A_{j}}{n})}^{- 1} {\tilde{Γ}}_{j}^{\frac{1}{2}} .

Thus, we have

\begin{matrix} n d_{G}^{2} (β, \hat{β}) & = Z_{n j}^{T} {\tilde{Γ}}_{j} Z_{n j} = X_{n j}^{T} Ψ_{n j}^{2} X_{n j} \\ = X_{n j}^{T} X_{n j} + 2 X_{n j}^{T} (Ψ_{n j} - I_{n j}) X_{n j} + X_{n j}^{T} (Ψ_{n j} - I_{n j}) (Ψ_{n j} - I_{n j}) X_{n j}, \end{matrix}

From Lemma 7.2 in Müller (2005) [11], it follows that

n d_{G}^{2} ({\hat{β}}_{j}, β_{j}) = X_{n j}^{T} X_{n j} .

Then,

\begin{matrix} X_{n j}^{T} X_{n j} & = \frac{1}{n} {\sum_{k_{1} = 1}^{p_{j}} (\sum_{k_{2} = 1}^{p_{j}} {ζ_{j, k_{1} k_{2}}}^{\frac{1}{2}} \sum_{i = 1}^{n} \frac{g^{'} (η_{i}) χ_{i j k_{2}}}{σ (μ_{i})} {\bar{ε}}_{i})}^{2} \\ = E + F, \end{matrix}

where

E = \frac{1}{n} \sum_{i = 1}^{n} {\bar{ε}}_{i}^{2} \sum_{k_{2^{'}}, k_{2^{″}} = 1}^{p_{j}} \frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} χ_{i j k_{2^{'}}} χ_{i j k_{2^{″}}} \sum_{k_{1} = 1}^{p_{j}} {ζ_{j, k_{1} k_{2^{'}}}}^{\frac{1}{2}} {ζ_{j, k_{1} k_{2^{″}}}}^{\frac{1}{2}},

F = \frac{1}{n} \sum_{i_{1} \neq i_{2} = 1}^{n} {\bar{ε}}_{i_{1}} {\bar{ε}}_{i_{2}} \frac{g^{'} (ℓ_{i_{1}}) g^{'} (ℓ_{i_{2}})}{σ (η_{i_{1}}) σ (η_{i_{2}})} \sum_{k_{2^{'}}, k_{2^{″}} = 1}^{p_{j}} χ_{i_{1} j k_{2^{'}}} χ_{i_{2} j k_{2^{″}}} \sum_{k_{1} = 1}^{p_{j}} {ζ_{j, k_{1} k_{2^{'}}}}^{\frac{1}{2}} {ζ_{j, k_{1} k_{2^{″}}}}^{\frac{1}{2}} .

Since

\bar{ε}

follows a standard normal distribution, we have

E [X_{n j}^{T} X_{n j}] = p_{j}, V a r [X_{n j}^{T} X_{n j}] = 2 p_{j} .

Therefore,

\frac{n d_{G}^{2} ({\hat{β}}_{j}, β_{j}) - p_{j}}{\sqrt{2 p_{j}}} \to N (0, 1) j = 1, 2 .

The following proves that

\sqrt{n} (u - \hat{u}) \sim {(\frac{{A_{12}}^{T} A_{12}}{n})}^{- 1} \frac{{A_{12}}^{T} \bar{ε}}{\sqrt{n}} .

For the coefficient function of the interaction term, we have the metric

d^{2} (β, \hat{β}) = {∥β - \hat{β}∥}^{2},

so, according to Lemma 1, we have

d^{2} (β, \hat{β}) = \sum_{t = 1}^{p_{1} p_{2}} {(u_{t} - {\hat{u}}_{t})}^{2} = {(u - \hat{u})}^{T} (u - \hat{u}) .

Let

Q_{n t} = {(\frac{{A_{12}}^{T} A_{12}}{n})}^{- 1} \frac{{A_{12}}^{T} \bar{ε}}{\sqrt{n}},

A_{n t} = \frac{A_{12}^{T} \bar{ε}}{\sqrt{n}}, F_{n t} = {(\frac{A_{12}^{T} A_{12}}{n})}^{- 1} .

Thus, we have

\begin{matrix} n d^{2} (β, \hat{β}) & = Q_{n t}^{T} Q_{n t} = A_{n t}^{T} F_{n t}^{2} A_{n t} \\ = A_{n t}^{T} A_{n t} + 2 A_{n t}^{T} (F_{n t} - I_{n t}) A_{n t} + A_{n t}^{T} (F_{n t} - I_{n t}) (F_{n t} - I_{n t}) A_{n t}, \end{matrix}

From Lemma 7.2 in Müller (2005) [11], it follows that

n d^{2} (β, \hat{β}) = A_{n t}^{T} A_{n t} .

Then,

\begin{matrix} A_{n t}^{T} A_{n t} = & \frac{1}{n} \sum_{t = 1}^{p_{1} p_{2}} {(\sum_{i = 1}^{n} \frac{g^{'} (ℓ_{i}) ρ_{i t}}{σ (η_{i})} {\bar{ε}}_{i})}^{2} \\ = & \frac{1}{n} \sum_{i = 1}^{n} {\bar{ε}}_{i}^{2} \sum_{t = 1}^{p_{1} p_{2}} \frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} {ρ_{i t}}^{2} \\ + \frac{1}{n} \sum_{i_{1} \neq i_{2} = 1}^{n} {\bar{ε}}_{i_{1}} {\bar{ε}}_{i_{2}} \frac{g^{'} (ℓ_{i_{1}})}{σ (η_{i_{1}})} \frac{g^{'} (ℓ_{i_{2}})}{σ (η_{i_{2}})} \sum_{t = 1}^{p_{1} p_{2}} ρ_{i_{1} t} ρ_{i_{2} t} . \end{matrix}

We have

E [A_{n k}^{T} A_{n k}] = p_{1} p_{2} τ, V a r [A_{n k}^{T} A_{n k}] = 2 p_{1} p_{2} τ .

Thus, there is

\frac{n d^{2} (β, \hat{β}) - p_{1} p_{2} τ}{\sqrt{2 p_{1} p_{2} τ}} \to N (0, 1) .

Next, prove that

\sqrt{n} (γ - \hat{γ}) \sim {(\frac{{A_{0}}^{T} A_{0}}{n})}^{- 1} \frac{{A_{0}}^{T} \bar{ε}}{\sqrt{n}} .

Let

Z_{0} = {(\frac{{A_{0}}^{T} A_{0}}{n})}^{- 1} \frac{{A_{0}}^{T} \bar{ε}}{\sqrt{n}} .

Then, its matrix form is

Z_{0} = \sqrt{n} {(\sum_{i = 1}^{n} \frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} z_{i m}^{2})}^{- 1} \sum_{i = 1}^{n} \frac{g^{'} (ℓ_{i}) z_{i m}}{σ (η_{i})} {\bar{ε}}_{i} .

Therefore, we have

\sqrt{n} (γ_{m} - {\hat{γ}}_{m}) \sim \sqrt{n} {(\sum_{i = 1}^{n} \frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} z_{i m}^{2})}^{- 1} \sum_{i = 1}^{n} \frac{g^{'} (ℓ_{i}) z_{i m}}{σ (η_{i})} {\bar{ε}}_{i} .

Since

E [\sqrt{n} (γ_{m} - {\hat{γ}}_{m})] = 0,

V a r [\sqrt{n} (γ_{m} - {\hat{γ}}_{m})] = {(E [\frac{g^{'} {(ℓ_{i})}^{2}}{σ^{2} (η_{i})} z_{i m}^{2}])}^{- 1} = Θ_{m}^{- 1} .

There is

\sqrt{n Θ_{m}} (γ_{m} - {\hat{γ}}_{m}) \to N (0, 1) .

Therefore, Theorem 1 is proven. □

4. Simulation

In this simulation, we consider the case that has two functional predictors, three scalar predictors, an interaction term between the two functional predictors, and a binary response. In order to include the case in which the functional predictors do not have the same domain, we define the functional predictors

X_{i 1} (t_{1}), t_{1} \in [0, 1]

and

X_{i 2} (t_{2}), t_{2} \in [- 1, 1]

,

i = 1, \dots, n,

where n can be any positive integer. In the latter sample size, n takes the values of 50, 100, and 500, and, for each n, we run 100 simulations. First, we define two standard orthogonal bases

φ_{1 k} (t_{1}), t_{1} \in [0, 1]

and

φ_{2 l} (t_{2}), t_{2} \in [- 1, 1]

, satisfying

φ_{1 k} (t_{1}) = \sqrt{2} cos (2 k π t_{1}), k = 1, \dots, 4,

φ_{2 l} (t_{2}) = \sqrt{2} sin (2 l π t_{2}), l = 1, \dots, 5 .

Under the Gaussian assumption, we define the two randomly generated functional principal component scores

χ_{i 1 k}, χ_{i 2 l}

that satisfy

χ_{i 1 k} \sim N (0, λ_{1 k}), k = 1, \dots, 4,

χ_{i 2 l} \sim N (0, λ_{2 l}), l = 1, \dots, 5,

where

λ_{11} = 8, λ_{12} = 6, λ_{13} = 4, λ_{14} = 2,

λ_{21} = 4, λ_{22} = 2, λ_{23} = 1, λ_{24} = \frac{1}{2}, λ_{25} = \frac{1}{4}

. Notice that the first three functional principal components explain up to 90% of the variation in the two predictors. So, we have

X_{i 1} (t_{1}) = \sum_{k = 1}^{4} χ_{i 1 k} φ_{1 k} (t_{1}),

X_{i 2} (t_{2}) = \sum_{l = 1}^{5} χ_{i 2 l} φ_{2 l} (t_{2}) .

Fifty images of

X_{1} (t_{1})

and

X_{2} (t_{2})

are shown in Figure 1.

Figure 1. Functional predictors

X_{1} (t_{1})

and

X_{2} (t_{2})

.

For scalar predictors, we assume

Z_{1} \sim N (0, 2),

Z_{2} \sim N (0, 5)

, and

Z_{3} \sim N (0, 6) .

We assume that the theoretical values of the regression coefficients are

γ = {(4, 6, 8)}^{T},

β_{1} (t_{1}) = \sum_{k = 1}^{4} b_{1 k} φ_{1 k} (t_{1}),

β_{2} (t_{2}) = \sum_{l = 1}^{5} b_{2 l} φ_{2 l} (t_{2}),

where

b_{1 k} = \frac{k}{5}

,

b_{2 l} = \frac{l^{2}}{25} .

For the interaction term, its principal component score is denoted by

ρ_{i k l}

and satisfies

ρ_{i k l} = χ_{i 1 k} χ_{i 2 l},

ψ_{k l} (t_{1}, t_{2}) = φ_{1 k} (t_{1}) φ_{2 l} (t_{2}),

β (t_{1}, t_{2}) = \sum_{k = 1}^{4} \sum_{l = 1}^{5} u_{k l} φ_{1 k} (t_{1}) φ_{2 l} (t_{2}),

where

u_{k l} = {(\frac{k}{10})}^{2} .

The corresponding response variable is generated by

\begin{matrix} p (X_{i}, Z_{i}) = & g (\int_{T_{1}} X_{i 1} (t_{1}) β_{1} (t_{1}) d t_{1} + \int_{T_{2}} X_{i 2} (t_{2}) β_{2} (t_{2}) d t_{2} \\ + {\int \int}_{T_{1} \times T_{2}} X_{i 1} (t_{1}) X_{i 2} (t_{2}) β (t_{1}, t_{2}) d t_{1} d t_{2} + Z_{i}^{T} γ), \end{matrix}

where the link function

g (x) = \frac{exp (x)}{1 + exp (x)}

and

Y (X, Z) \sim B e r n o u l l i (p (X, Z))

is a sequence of pseudo-random numbers.

The principal component analysis was performed for

n = 50, 100, 500

, and the running results showed that the principal component scores of

X_{1}

with

90 %

cumulative contribution were 3, 3, 3 for each sample size, respectively and the principal component scores of

X_{2}

with

90 %

cumulative contribution were 2, 2, 2.

Table 1 shows how the standardized prediction error (SPE) varies with different sample sizes, and the results show that the model’s predictions become more and more accurate as the sample size increases. Here, SPE is defined by

\sum_{i} | {\hat{Y}}_{i} - Y_{i} | / \sum_{i} | Y_{i} |

.

Table 1. Standardized prediction error for different sample sizes.

Figure 2 shows

{\hat{β}}_{1} (t_{1})

,

{\hat{β}}_{2} (t_{2})

and the corresponding

95 %

confidence interval bands for different sample sizes, where the red curves are the theoretical values of

β_{1} (t_{1})

and

β_{2} (t_{2})

and the black curves are the corresponding estimates

{\hat{β}}_{1} (t_{1})

and

{\hat{β}}_{2} (t_{2}) .

From Figure 2, it can be seen that, as the sample size increases, the estimated value becomes closer to the theoretical value. Figure 3

\hat{β} (t_{1}, t_{2})

shows the visualized 3D plot with

\hat{β} (t_{1}, t_{2})

in the middle panel and the

95 %

confidence intervals for

\hat{β} (t_{1}, t_{2})

in the left and right panels.

Figure 2. Estimated regression coefficient functions

{\hat{β}}_{1} (t_{1})

,

{\hat{β}}_{2} (t_{2})

(black curves) and their

95 %

confidence bands (grey area) for different sample sizes, where the red curves are the theoretical regression coefficient functions

β_{1} (t_{1})

,

β_{2} (t_{2})

.

Figure 3.

β (t_{1}, t_{2})

and

\hat{β} (t_{1}, t_{2})

are visualized in 3D.

Table 2 shows the estimated values of

\hat{γ}

and their corresponding standard deviations for different sample sizes. It can be seen that, as n increases, the standard deviation becomes smaller and the estimated value of

γ

becomes closer to the theoretical value, where the theoretical values of

γ

are 4, 6, and 8, respectively. Table 3 shows the standard deviation and root mean square error for

{\hat{β}}_{1},

{\hat{β}}_{2}

and

\hat{β} (t_{1}, t_{2})

for different sample sizes. Here, we use the coefficients of the basis expansion of the regression coefficient function to calculate the root mean square error. For example, the root mean square error of

{\hat{β}}_{1}

is

\sqrt{\sum_{k = 1}^{4} {({\hat{b}}_{1 k} - b_{1 k})}^{2}}

. The results show that, as n increases, both the standard deviation and the RMS error become smaller, indicating that, as sample size increases, the prediction becomes more accurate.

Table 2. Estimates of the regression coefficients and their standard deviations.

Table 3. Standard deviation and root mean square error of the estimated values of the regression coefficient function.

5. Application

To investigate the influence of the influence of air qualities, climate factors, medical and social indicators, and their interactions on cancer incidence using the proposed model, we collected data on average daily PM2.5 concentration, average daily humidity, per capita GDP, green coverage rate in built-up areas, the proportion of medical personnel (PMP), and the incidence of cancer in 49 cities in China from http://www.cnemc.cn/, http://www.stats.gov.cn/sj/ndsj/, and http://www.chinancpcn.org.cn/home.

There are two functional predictors (average daily PM2.5 concentration and average daily humidity from 1 January 2015 to 31 December 2020), three scalar predictors (per capita GDP, greenery coverage, and PMP in 2020), and the response is the cancer incidence in 2020. The ratio of the number of new cancer cases to the total number of people in China in 2020 is

0.3156 %

. The data of the cancer incidence can only contain 0 and 1, indicating high or low cancer incidence rate. When the cancer incidence of a city was less than

0.3156 %

, the city was considered to have a low cancer incidence rate, denoted by 0; otherwise, the cancer incidence was high, denoted by 1. Figure 4 shows average daily PM2.5 concentration and daily relative humidity in 21 cities selected from the 49 cities.

Figure 4. PM2.5 concentrations and average daily humidity in 21 selected cities in 2020.

We chose

g (x) = \frac{exp (x)}{1 + exp (x)}

as the link function. The model was first subjected to principal component analysis and then the number of principal components was determined based on the cumulative contribution to obtain the number of functional principal components for PM2.5 concentrations and relative humidity, which were chosen as

p_{P M 2.5} = 7

,

p_{Humidity} = 14

in order to explain 75% of the variation.

The prediction accuracy is shown by the Generalized Cross Validation (GCV) with a value of 0.0038.

The results of the regression coefficients for the scalar predictor variable

\hat{γ}

are shown in Table 4, where we can see that the per capita GDP is positively correlated with the incidence of cancer, i.e., the higher the GDP per capita, the higher the incidence of cancer in that city, which is consistent with the findings of Cao et al. [27]. The reason for this situation is that the promotion of cancer screening, early diagnosis, and treatment in the more economically developed regions has, to some extent, facilitated the detection of the disease. The greenery coverage is negatively correlated with the cancer incidence, i.e., the higher the greenery coverage, the lower the cancer incidence, which is also consistent with the findings of Wu et al. [26]. A high green coverage rate implies better air quality, which in turn reduces the risk of cancer. Additionally, a high green coverage rate may provide more outdoor recreational spaces, promoting physical activity and exercise, contributing to maintaining good physical health, and, thus, reducing the risk of cancer. The PMP is positively correlated with the incidence of cancer. As we all know, cancer incidence is age-related, and older people are more susceptible to cancer. The higher PMP, the better the medical conditions, the longer the average life expectancy of the people, and, therefore, the higher the cancer incidence.

Table 4. Estimates of regression coefficients and their levels of significance.

The regression coefficient functions

{\hat{β}}_{1} (t_{1})

and

{\hat{β}}_{2} (t_{2})

for the functional predictors are shown in Figure 5. From Figure 5, we can see that the effect of PM2.5 concentration on cancer incidence is generally positively correlated, i.e., the higher the PM2.5 concentration, the higher the cancer incidence. This result is consistent with Qin et al. [25] from 2014. Regarding the effect of humidity on cancer incidence, there is a more significant positive correlation between humidity and cancer incidence, i.e., the higher the humidity, the higher the cancer incidence. In high-humidity environments, there may be a higher presence of mold and fungi, and the spores and harmful substances released by these microorganisms may have negative effects on human health, increasing the risk of cancer. In high-humidity environments, pollutants in the air are more likely to adhere to suspended particles, making them more easily inhalable by humans. These pollutants include PM2.5, organic compounds, and heavy metals, which are believed to be associated with the occurrence of cancer. High humidity increases the survival time of bacteria and viruses in the air, increasing the chances of people becoming infected with diseases. Certain viruses such as hepatitis B virus and human papillomavirus (HPV) are believed to be associated with the occurrence of cancer.

Figure 5. Regression coefficient functions

{\hat{β}}_{1} (t_{1})

,

{\hat{β}}_{2} (t_{2})

and their

95 %

confidence bands.

The interaction surface estimate

\hat{β} (t_{1}, t_{2})

(middle) ± two times the estimated standard errors (left and right) are given in Figure 6. Figure 7 shows the contour map of

\hat{β} (t_{1}, t_{2})

, from which it can be seen that

\hat{β} (t_{1}, t_{2})

decreases and then increases with

t_{1}

when

t_{2} \in [0, 1100]

and increases and then decreases with

t_{1}

when

t_{2} \in [1100, 2192]

. In the conditions of higher humidity, PM2.5 particles may be more prone to settling, reducing the suspended harmful particles in the air, potentially lowering the incidence of cancer. Conversely, in lower-humidity conditions, PM2.5 may be more likely to remain suspended in the air, increasing the risk of respiratory system exposure, thereby raising the incidence of cancer. Additionally, the concentrations of PM2.5 and humidity may not fluctuate synchronously throughout the day. By introducing interaction terms, the model can capture the temporal complexities, making the estimation results more in line with real-world conditions.

Figure 6. Visualization of

\hat{β} (t_{1}, t_{2})

in 3D.

Figure 7. Contour map of

\hat{β} (t_{1}, t_{2})

.

To verify the necessity of considering the interaction term, i.e., to demonstrate the effectiveness of our proposed method, we compare mod1 proposed in this paper with mod2, which does not include the interaction term, i.e.,

\begin{matrix} m o d 1 : Y_{i} = & g (α + \int_{T_{1}} X_{i 1} (t_{1}) β_{1} (t_{1}) d t_{1} + \int_{T_{2}} X_{i 2} (t_{2}) β_{2} (t_{2}) d t_{2} \\ + {\int \int}_{T_{1} \times T_{2}} X_{i 1} (t_{1}) X_{i 2} (t_{2}) β (t_{1}, t_{2}) d t_{1} d t_{2} + Z_{i}^{T} γ) + ε_{i} . \end{matrix}

m o d 2 : Y_{i} = g (α + \int_{T_{1}} X_{i 1} (t_{1}) β_{1} (t_{1}) d t_{1} + \int_{T_{2}} X_{i 2} (t_{2}) β_{2} (t_{2}) d t_{2} + Z_{i}^{T} γ) + ε_{i} .

The general standards for evaluating model performance are AIC (Akaike Information Criterion), residual, R-squared, RMSE (root mean square error), and MAE (mean absolute error). The smaller values of AIC, residuals, RMSE, and MAE indicate that the model’s fitting effect and generalization ability are better. The R-squared takes a value between 0 and 1, and, the bigger the value, the better the model’s fitting effect. According to Table 5, we can see that the AIC, residuals, RMSE, and MAE values of mod1 are smaller and that R-squared is much closer to 1 compared to that of mod2, which indicates that mod1 has a better performance. Thus, including the interaction term between PM2.5 concentration and relative humidity makes the research results more meaningful.

Table 5. Results of model comparison.

6. Discussion

This paper proposes a generalized partially functional linear model with interaction terms. We first use principal component analysis to reduce the dimensionality of the functional data, followed by maximum likelihood estimation to obtain the estimates of the unknown parameters, then prove the asymptotic property of the estimators, and finally perform data simulations and apply our model to a real data example.

As the incidence and mortality of cancer in China are increasing year by year, it is necessary to study the influencing factors and formulate corresponding measures. The effect of PM2.5 concentration, average daily humidity, per capita GDP, the greenery coverage of built-up areas, and PMP on cancer incidence in 49 cities in China was investigated, which showed that the effect of PM2.5 concentration and relative humidity on cancer incidence was generally positively correlated. The effect of greenery coverage in built-up areas on cancer incidence is negatively correlated, while the effect of per capita GDP and the proportion of medical personnel on cancer incidence is positively correlated. The higher the economic level and the more developed the medical conditions, the longer the average life expectancy of people and, therefore, the higher the cancer incidence. Comparing this model with the model without the interaction term shows that considering the role of the interaction term leads to more accurate and meaningful predictions.

Our research lays a foundation for further study on the generalized partially functional linear model with interaction terms and of unknown link function or variance function.

Author Contributions

W.X.: methodology, validation, writing—review, supervision, funding acquisition. K.M.: methodology, software, data curation, writing—original draft. H.L.: writing—review, supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Yujie Talent Project of North China University of Technology (Grant No. 107051360024XN153).

Data Availability Statement

The original data supporting the results of this study can be obtained from the National Meteorological Science Data Sharing Service Platform, the National Environmental Monitoring Station, and local statistical bulletins.

Acknowledgments

The authors would like to thank the referees and the editor for their useful suggestions, which helped us to improve the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ramsay, J.O.; Silverman, B.W. Functional Data Analysis, 2nd ed.; Springer: New York, NY, USA, 2005. [Google Scholar]
Horváth, L.; Kokoszka, P. Inference for Functional Data with Application; Springer: New York, NY, USA, 2012. [Google Scholar]
Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; Springer: New York, NY, USA, 2015. [Google Scholar]
Cardot, H.; Ferraty, F.; Sarda, P. Functional linear model. Stat. Probab. Lett. 1999, 45, 11–22. [Google Scholar] [CrossRef]
Tony, C.; Peter, H. Prediction in functional linear regression. Ann. Stat. 2006, 34, 2159–2179. [Google Scholar] [CrossRef]
Cardot, H.; Crambes, C.; Kneip, A.; Sarda, P. Smoothing Splines Estimators in Functional Linear Regression with Errors in Variables. Comput. Stat. Data Anal. 2007, 51, 4832–4848. [Google Scholar] [CrossRef]
Delaigle, A.; Hall, P. Methodology and theory for partial least squares applied to functional data. Ann. Stat. 2012, 40, 322–352. [Google Scholar] [CrossRef]
Cai, T.; Yuan, M. Minimax and adaptive prediction for functional linear regression. J. Am. Stat. Assoc. 2012, 107, 1201–1216. [Google Scholar] [CrossRef]
Nelder, J.A.; Wedderburn, R.W.M. Generalized Linear Models. J. R. Stat. Soc. 1972, 135, 370–384. [Google Scholar] [CrossRef]
James, G.M. Generalized linear models with functional predictors. J. R. Stat. Soc. Ser. B 2002, 64, 411–432. [Google Scholar] [CrossRef]
Müller, H.G.; Stadtmüller, U. Generalized functional linear models. Ann. Stat. 2005, 33, 774–805. [Google Scholar] [CrossRef]
Goldsmith, J.; Bobb, J.; Crainiceanu, C.M.; Caffo, B.; Reich, D. Penalized Functional Regression. J. Comput. Graph. Stat. 2011, 20, 830–851. [Google Scholar] [CrossRef]
Xiao, W.W.; Wang, Y.X.; Liu, H.Y. Generalized partially functional linear model. Sci. Rep. 2021, 11, 23428. [Google Scholar] [CrossRef]
Kong, D.; Xue, K.; Yao, F.; Zhang, H.H. Partially functional linear regression in high dimensions. Biometrika 2016, 103, 147–159. [Google Scholar] [CrossRef]
Yao, F.; Sue, C.S.; Wang, F. Regularized partially functional quantile regression. J. Multivar. Anal. 2017, 156, 39–56. [Google Scholar] [CrossRef]
Ma, H.; Li, T.; Zhu, H.; Zhu, Z. Quantile regression for functional partially linear model in ultra-high dimensions. Comput. Stat. Data Anal. 2019, 129, 135–147. [Google Scholar] [CrossRef]
Xu, W.; Ding, H.; Zhang, R.; Liang, H. Estimation and inference in partially functional linear regression with multiple functional covariates. J. Stat. Plan. Inference 2020, 209, 44–61. [Google Scholar] [CrossRef]
Usset, J.; Staicu, A.M.; Maity, A. Interaction models for functional regression. Comput. Stat. Data Anal. 2016, 94, 317–329. [Google Scholar] [CrossRef]
Luo, R.; Qi, X. Interaction Model and Model Selection for Function-on-Function Regression. J. Comput. Graph. Stat. 2019, 28, 309–322. [Google Scholar] [CrossRef]
Yang, W.H.; Wikle, C.K.; Holan, S.H.; Wildhaber, M.L. Ecological Prediction with Nonlinear Multivariate Time-Frequency Functional Data Models. J. Agric. Biol. Environ. Stat. 2013, 18, 450–474. [Google Scholar] [CrossRef]
Matsui, H. Quadratic regression for functional response models. Econom. Stat. 2020, 13, 125–136. [Google Scholar] [CrossRef]
Sun, Y.; Wang, Q. Function-on-function quadratic regression models. Comput. Stat. Data Anal. 2020, 142, 106814. [Google Scholar] [CrossRef]
Fuchs, K.; Scheipl, F.; Greven, S. Penalized scalar-on-functions regression with interaction term. Comput. Stat. Data Anal. 2015, 81, 38–51. [Google Scholar] [CrossRef]
Qiu, H.; Cao, S.; Xu, R. Cancer incidence, mortality, and burden in China: A time-trend analysis and comparison with the United States and United Kingdom based on the global epidemiological data released in 2020. Cancer Commun. 2021, 41, 1037–1048. [Google Scholar] [CrossRef]
Qin, X.; Wan, F.; Zhang, H.; Dai, B.; Shi, G.; Zhu, Y.; Ye, D. Relationship between air pollution PM2.5 concentration and cancer. In Proceedings of the 8th Chinese Oncology Academic Conference and the 13th Cross-Strait Oncology Academic Conference, Jinan, China, 12–13 September 2014; p. 76. [Google Scholar]
Wu, X.; Feng, Y.; Chang, K.; Jia, X.; Xue, F. Analysis of the causal relationship between green coverage and the incidence of cancer. J. Shandong Univ. (Health Sci.) 2022, 60, 115–119. [Google Scholar]
Cao, W.; Li, F.; Liang, Y.; Yu, D. Analysis of the relationship between the level of economic development and cancer incidence and mortality in selected regions of China. Chin. J. Dis. Control Prev. 2023, 27, 209–215. [Google Scholar] [CrossRef]
Xu, J.; Kuang, H.Y.; Wang, G.Q.; Chen, M.; Lu, L. Analysis of the relationship between PM2.5 and air relative humidity. Agric. Technol. 2017, 37, 148–149. [Google Scholar]
Yang, Z.; Wang, Y.K.; Xu, X.H.; Yang, J.; Ou, C.Q. Quantifying and characterizing the impacts of PM2.5 and humidity on atmospheric visibility in 182 Chinese cities: A nationwide time-series study. J. Clean. Prod. 2022, 368, 133182. [Google Scholar] [CrossRef]

Figure 1. Functional predictors

X_{1} (t_{1})

and

X_{2} (t_{2})

.

Figure 1. Functional predictors

X_{1} (t_{1})

and

X_{2} (t_{2})

.

Figure 2. Estimated regression coefficient functions

{\hat{β}}_{1} (t_{1})

,

{\hat{β}}_{2} (t_{2})

(black curves) and their

95 %

confidence bands (grey area) for different sample sizes, where the red curves are the theoretical regression coefficient functions

β_{1} (t_{1})

,

β_{2} (t_{2})

.

Figure 2. Estimated regression coefficient functions

{\hat{β}}_{1} (t_{1})

,

{\hat{β}}_{2} (t_{2})

(black curves) and their

95 %

confidence bands (grey area) for different sample sizes, where the red curves are the theoretical regression coefficient functions

β_{1} (t_{1})

,

β_{2} (t_{2})

.

Figure 3.

β (t_{1}, t_{2})

and

\hat{β} (t_{1}, t_{2})

are visualized in 3D.

Figure 3.

β (t_{1}, t_{2})

and

\hat{β} (t_{1}, t_{2})

are visualized in 3D.

Figure 4. PM2.5 concentrations and average daily humidity in 21 selected cities in 2020.

Figure 5. Regression coefficient functions

{\hat{β}}_{1} (t_{1})

,

{\hat{β}}_{2} (t_{2})

and their

95 %

confidence bands.

Figure 5. Regression coefficient functions

{\hat{β}}_{1} (t_{1})

,

{\hat{β}}_{2} (t_{2})

and their

95 %

confidence bands.

Figure 6. Visualization of

\hat{β} (t_{1}, t_{2})

in 3D.

Figure 6. Visualization of

\hat{β} (t_{1}, t_{2})

in 3D.

Figure 7. Contour map of

\hat{β} (t_{1}, t_{2})

.

Figure 7. Contour map of

\hat{β} (t_{1}, t_{2})

.

Table 1. Standardized prediction error for different sample sizes.

n	SPE
50	0.0156
100	0.0130
500	0.0106

Table 2. Estimates of the regression coefficients and their standard deviations.

n	${\hat{γ}}_{1}$	${\hat{γ}}_{2}$	${\hat{γ}}_{3}$
50	3.912 (0.193)	5.902 (0.068)	8.053 (0.045)
100	4.013 (0.062)	5.958 (0.026)	8.015 (0.018)
500	3.998 (0.022)	5.993 (0.006)	7.996 (0.007)

Table 3. Standard deviation and root mean square error of the estimated values of the regression coefficient function.

	n	Sd	RMSE
${\hat{β}}_{1}$	50	0.034	0.026
	100	0.018	0.006
	500	0.008	0.001
${\hat{β}}_{2}$	50	0.325	0.204
	100	0.126	0.044
	500	0.043	0.004
$\hat{β} (t_{1}, t_{2})$	50	0.137	0.094
	100	0.054	0.024
	500	0.020	0.004

Table 4. Estimates of regression coefficients and their levels of significance.

	Estimate	Std. Error	t Value	Pr (>\|t\|)
GDP	1.165 × 10⁻⁶	2.632 × 10⁻⁷	4.424	6.45 × 10⁻⁵ ***
Greenery coverage	−2.062 × 10⁻¹	1.920 × 10⁻¹	−2.654	0.0378 *
PMP	3.676 × 10⁻⁴	1.642 × 10⁻⁴	2.239	0.0491 *

PMP—Proportion of medical personnel. * indicates that the result is significant at the 5% significance level. *** indicates that the result is significant at the 0.1% significance level.

Table 5. Results of model comparison.

	AIC	R-Squared	Residual	RMSE	MAE
mod1	8.281	0.9287	0.7816	0.1263	0.1036
mod2	35.592	0.6465	3.2158	0.2562	0.1989

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Generalized Partially Functional Linear Model with Interaction between Functional Predictors

Abstract

1. Introduction

2. Model and Estimation

2.1. Model Introduction

2.2. Parameter Estimation

3. Asymptotic Properties

4. Simulation

5. Application

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics