Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data

Zhang, Ximeng; Zhao, Shishun; Hu, Tao; Sun, Jianguo

doi:10.3390/axioms12020198

Open AccessArticle

Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data

by

Ximeng Zhang

¹

,

Shishun Zhao

¹,

Tao Hu

^2,*

and

Jianguo Sun

³

¹

Center for Applied Statistical Research, College of Mathematics, Jilin University, Changchun 130012, China

²

School of Mathematical Sciences, Capital Normal University, Beijing 100048, China

³

Department of Statistics, University of Missouri, Columbia, MO 65211, USA

^*

Author to whom correspondence should be addressed.

Axioms 2023, 12(2), 198; https://doi.org/10.3390/axioms12020198

Submission received: 22 December 2022 / Revised: 7 February 2023 / Accepted: 11 February 2023 / Published: 13 February 2023

(This article belongs to the Special Issue Computational Statistics & Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we discuss regression analysis of bivariate interval-censored failure time data that often occur in biomedical and epidemiological studies. To solve this problem, we propose a kind of general and flexible copula-based semiparametric partly linear additive hazards models that can allow for both time-dependent covariates and possible nonlinear effects. For inference, a sieve maximum likelihood estimation approach based on Bernstein polynomials is proposed to estimate the baseline hazard functions and nonlinear covariate effects. The resulting estimators of regression parameters are shown to be consistent, asymptotically efficient and normal. A simulation study is conducted to assess the finite-sample performance of this method and the results show that it is effective in practice. Moreover, an illustration is provided.

Keywords:

Archimedean copula model; Bernstein polynomials; bivariate interval-censored data; partly linear model

MSC:

62N02; 62H05; 62G05; 62G20

1. Introduction

In this article, we discuss a regression analysis using the marginal additive hazards model on bivariate interval-censored data. Interval-censored data refer to failure times that are observed to only belong to an interval rather than being known with absolute certainty. These types of data frequently occur in various areas, including biomedical and epidemiological investigations [1]. It is easy to see that the most commonly studied right-censored data can be seen as a special case of interval-censored data, and it is important to note that the analysis of interval-censored data is typically far more difficult than right-censored data research. Bivariate interval-censored data occur when there exist two correlated failure times of interest and the observed times of both failure events suffer interval censoring. Clinical trials or medical studies on several events from the same individual such as eye disease studies often yield bivariate interval-censored data. The purpose of this article is to propose a flexible regression model which can process bivariate interval-censored data when the main interest is on the risk differential or excess risk.

There are several methods available for regression analysis of univariate interval-censored data arising from the additive hazards model. Refs. [2,3], for example, investigated the problem for case I interval-censored or current state data, a special form of interval-censored data in which the observed time contains zero or infinite. The former discussed an estimating equation approach and the latter considered an efficient estimation approach. More recently, Ref. [4] studied the same issue as Ref. [2] but with some inequality constraints, and Ref. [5] proposed a sieve maximum likelihood approach. Moreover, Ref. [6] developed several inverse probability weight-based and reweighting-based estimation procedures for the situation with missing covariates, while Ref. [7] presented an efficient approach for general situations.

A large number of approaches have been established for modeling bivariate interval-censored survival data and three types of methods are generally used. One is a marginal method that relies on the working independence assumption [8,9]. Another commonly used approach is the frailty-model-based method, which employs the frailty or latent variable to build the correlation among the associated failure times [10,11]. Ref. [12] proposed a frailty model method for multivariate interval-censored data with informative censoring. The third type of method is the copula-based approach, which gives a different, specific way to model two dependent failure times. One advantage of the approach is that it directly connects the two marginal distributions through a copula function to construct the joint distribution and uses the copula parameter to determine the correlation. This distinguishing characteristic makes it possible to represent the margins independent of the copula function. This advantage is appealing for the points of both modeling and interpretation views. Among others, Ref. [13] discussed this approach based on the marginal transformation model with the two-parameter Archimedean copula. Ref. [14] proposed a copula link-based additive model for right-censored event time data. Moreover, Ref. [15] proposed a copula-based model to deal with bivariate survival data with various censoring mechanisms. In the following, we discuss the regression analysis of bivariate cases by employing the method in [13]. More specifically, we propose a kind of semiparametric partly linear additive hazards model.

Partly linear models are becoming more and more common since they combine the flexibility of nonparametric modeling with the simplicity and ease of interpretation of parametric modeling. It is presumptive that the marginally conditional hazard function has nonlinear relationships with some covariates but linear relationships with others [16,17,18]. In practice, nonlinear covariate effects are typical. For instance, in some medication research, the influence of the dosage of a particular medication may reach a peak at a certain dosage level and be maintained at the peak level, or it may diminish after the dosage level. Although there is a fair amount of literature in this field, to the best of our knowledge, there does not seem to exist a study considering this for bivariate interval-censored survival data.

The presented model involves two nonparametric functions, one identifies the baseline cumulative hazard function and another describes the nonlinear effects of a continuous covariate. In the following, the two-parameter Archimedean copula model is employed for the dependence and more comments on this is given below. Moreover, a sieve maximum likelihood estimation approach is developed to approximate two involved nonparametric nuisance functions by Bernstein polynomials. The proposed method has several desirable features: (a) it allows both time-dependent covariates and the covariates that may have nonlinear effects; (b) the two-parameter Archimedean copula model can flexibly handle dependence structures on both upper and lower tails and the strength of the dependence can be quantified via Kendall’s

τ

; (c) the sieve maximum likelihood estimation approach by Bernstein polynomials can be easily implemented with the use of some existing software; (d) as is seen in the simulation study, the computation is both stable and efficient. Note that the method given by [13] only allows for linear covariate effects.

More specifically, in Section 2, after recommending some notation, assumptions and models that would be used throughout the paper, the resulting likelihood function is presented. In Section 3, we first describe the proposed sieve maximum likelihood estimation approach and then present the asymptotic properties of the proposed estimators. Section 4 presents a simulation study for the assessment of the finite sample performance of the proposed estimation approach, and the results indicate that it works well as expected. In Section 5, an illustration is provided by using a set of data arising from the Age-Related Eye Disease Study (AREDS), and Section 6 gives the conclusion with some discussion and concluding remarks.

2. Assumptions and Likelihood Function

Consider a study consisting of n independent subjects. Define

T_{i j}

as the failure time of interest associated with the ith subject of the jth failure event. Suppose that for

T_{i j}

, the two observation times are given by

(U_{i j}, V_{i j}]

such that

U_{i j} < T_{i j} \leq V_{i j}

. In addition, suppose that for the ith subject, the p-dimensional covariate vectors are possibly time-dependent and denoted by

X_{i 1}

and

X_{i 2}

and the single continuous covariates

Z_{i 1}

and

Z_{i 2}

are related to

T_{i 1}

and

T_{i 2}

, respectively. More details on them are given below. Here, we assume that

X_{i 1}

and

X_{i 2}

or

Z_{i 1}

and

Z_{i 2}

could be the same, entirely different, or they also could have some common components. Then, the observed data are as follows:

O = {O_{i} = (U_{i j}, V_{i j}, X_{i j}, Z_{i j}); i = 1, \dots, n, j = 1, 2} .

Note that when

V_{i j} = \infty, T_{i j}

is right-censored and when

U_{i j} = 0, T_{i j}

is left-censored. Moreover, it is assumed that given the covariates, the interval censoring is independent of the failure times

T_{i j}

[1].

Given some covariates, define

S_{i j} (t_{i j} | X_{i j}, Z_{i j}) = P (T_{i j} > t_{i j} | X_{i j}, Z_{i j})

as the marginal survival function of

T_{i j}

, and

S (t_{i 1}, t_{i 2} | X_{i 1}, X_{i 2}, Z_{i 1}, Z_{i 2}) = P (T_{i 1} > t_{i 1}, T_{i 2} > t_{i 2} | X_{i 1}, X_{i 2}, Z_{i 1}, Z_{i 2}),

is the joint survival function of

T_{i 1}

and

T_{i 2}

. For the covariate effects, suppose that given

X_{i j}

and

Z_{i j}

, the marginal hazard function of

T_{i j}

is defined as follows:

h_{j} (t | X_{i j}, Z_{i j}) = λ_{j} (t) + β^{T} X_{i j} (t) + g (Z_{i j}),

where

λ_{j} (t)

denotes an unknown baseline hazard function,

β

is an unknown regression coefficient vector, and

g (\cdot)

is an unknown, smooth nonlinear regression function. That is,

T_{i j}

follows a partially linear additive hazards model, in which

Z_{i j}

represents the covariate that may have nonlinear effects on

T_{i j}

. Correspondingly,

T_{i j}

has the cumulative hazard

H_{j} (t | X_{i j}, Z_{i j}) = \int_{0}^{t} h_{j} (s | X_{i j}, Z_{i j}) d s = Λ_{j} (t) + β^{T} W_{i j} (t) + g (Z_{i j}) t,

where

Λ_{j} (t) = \int_{0}^{t} λ_{j} (s) d s

,

W_{i j} (t) = \int_{0}^{t} X_{i j} (s) d s

, and we have that

\begin{matrix} S_{i j} (t | X_{i j}, Z_{i j}) = exp [- \int_{0}^{t} h_{j} (s | X_{i j}, Z_{i j}) d s] = exp [- H_{j} (t; X_{i j}, Z_{i j})] \\ = exp [- {Λ_{j} (t) + β^{T} W_{i j} (t) + g (Z_{i j}) t}], i = 1, \dots, n, j = 1, 2, \end{matrix}

(1)

It is worth noting that for the simplicity of the expressions and calculations, we assume that both linear and nonlinear covariates effects are the same for the two associated failure times of interest [19,20]. It is simple to extend the following method to the situation where the covariate effects are different.

It follows from Sklar’s theorem [21] that if the marginal survival functions

S_{i j} (\cdot)

are continuous, there exists a unique copula function

C_{ξ} (\cdot, \cdot)

on

{[0, 1]}^{2}

such that

C_{ξ} (s_{1}, 0) = C_{ξ} (0, s_{2}) = 0

,

C_{ξ} (s_{1}, 1) = s_{1}

and

C_{ξ} (1, s_{2}) = s_{2}

, and it gives

S (t_{i 1}, t_{i 2} | X_{i 1}, X_{i 2}, Z_{i 1}, Z_{i 2}) = C_{ξ} (S_{i 1} (t_{i 1} | X_{i 1}, Z_{i 1}), S_{i 2} (t_{i 2} | X_{i 2}, Z_{i 2})), t_{i 1}, t_{i 2} ⩾ 0 .

Here, the parameter

ξ

generally denotes the correlation or dependence between

T_{i 1}

and

T_{i 2}

. As mentioned above, a significant advantage of the copula representation above is that it separates the correlation from the two marginal distributions [22]. There exist many copula functions and among others, one type of the most commonly used copula functions for bivariate data is perhaps the Archimedean copula family. By following [13], we focus on the flexible two-parameter Archimedean copula model given by

\begin{matrix} C_{φ, ω} (s_{1}, s_{2}) & = {[1 + {\{{(s_{1}^{- 1 / ω} - 1)}^{1 / φ} + {(s_{2}^{- 1 / ω} - 1)}^{1 / φ}\}}^{φ}]}^{- ω} \\ = μ (μ^{- 1} (s_{1}) + μ^{- 1} (s_{2})), φ \in (0, 1], ω \in (0, \infty), \end{matrix}

(2)

where

μ (s) = μ_{φ, ω} (s) = {(1 + s^{φ})}^{- ω}

and

μ^{-}

is the generalized inverse of

μ

, which is defined as

μ^{-} (y) = inf {x \in R : y \leq μ (x)}, y \in R

. The detailed derivation and more comments can be found in Chapter 5 of [23].

As mentioned before, the two parameters

φ

and

ω

in the copula function above are the association parameters, representing the correlation in both the upper and lower tails. In particular, when

φ = 1,

the copula model above is equal to the Clayton copula [24], while if

ω \to \infty

, the copula model becomes the Gumbel copula [25]. In other words, the two-parameter copula model is more flexible and has the Clayton or Gumbel copula as special cases. It is well-known that another commonly used measure for the correlation is Kendall’s

τ

, and it has an explicit connection with

φ

,

ω

as

τ = 1 - 2 φ ω / (2 ω + 1) .

Under the assumptions above, the observed likelihood function has the form

\begin{matrix} L_{n} (S_{1}, S_{2}, φ, ω | O) = \prod_{i = 1}^{n} P (U_{i 1} < T_{i 1} ⩽ V_{i 1}, U_{i 2} < T_{i 2} ⩽ V_{i 2} | Y_{i 1}, Y_{i 2}) \\ = \prod_{i = 1}^{n} \{P (T_{i 1} > U_{i 1}, T_{i 2} > U_{i 2} | Y_{i 1}, Y_{i 2}) - P (T_{i 1} > U_{i 1}, T_{i 2} > V_{i 2} | Y_{i 1}, Y_{i 2}) \\ - P (T_{i 1} > V_{i 1}, T_{i 2} > U_{i 2} | Y_{i 1}, Y_{i 2}) + P (T_{i 1} > V_{i 1}, T_{i 2} > V_{i 2} | Y_{i 1}, Y_{i 2})\} \\ = \prod_{i = 1}^{n} \{S (U_{i 1}, U_{i 2} | Y_{i 1}, Y_{i 2}) - S (U_{i 1}, V_{i 2} | Y_{i 1}, Y_{i 2}) - S (V_{i 1}, U_{i 2} | Y_{i 1}, Y_{i 2}) \\ + S (V_{i 1}, V_{i 2} | Y_{i 1}, Y_{i 2})\} \\ = \prod_{i = 1}^{n} \{C_{φ, ω} [S_{1} (U_{i 1} | Y_{i 1}), S_{2} (U_{i 2} | Y_{i 2})] - C_{φ, ω} [S_{1} (U_{i 1} | Y_{i 1}), S_{2} (V_{i 2} | Y_{i 2})] \\ - C_{φ, ω} [S_{1} (V_{i 1} | Y_{i 1}), S_{2} (U_{i 2} | Y_{i 2})] + C_{φ, ω} [S_{1} (V_{i 1} | Y_{i 1}), S_{2} (V_{i 2} | Y_{i 2})]\}, \end{matrix}

(3)

where

Y_{i 1} = (X_{i 1}, Z_{i 1}), Y_{i 2} = (X_{i 2}, Z_{i 2})

. Let

η = (β, φ, ω, Λ_{1}, Λ_{2}, g)

, all unknown parameters. The next section proposes a sieve approach for the estimation of

η

.

3. Sieve Maximum Likelihood Estimation

It is well-known that directly maximizing the likelihood function

L_{n} (η | O) = L_{n} (S_{1}, S_{2}, φ,

ω | O) = \sum_{i = 1}^{n} L (η; O_{i})

can provide an estimate of

η

. On the other hand, it is inappropriate for this situation, as it involves infinite-dimensional functions

Λ_{j} (t)

and

g (\cdot)

. For this, according to [26] and others, we suggest using the sieve method to approximate these functions based on Bernstein’s polynomial first, and then maximize the likelihood function.

Specifically, define

Θ = A \otimes M \otimes M \otimes G

, the parameter space, and

A = \{(β, φ, ω) \in R^{p} \times R^{(0, 1]} \times R^{+}, ∥ β ∥ + ∥ φ ∥ + ∥ ω ∥ \leq K\},

in which ⊗ denotes the Kronecker product, and K is a nonnegative constant. Additionally,

M

denotes the subset of all bounded and continuous, nondecreasing, nonnegative functions within

[c_{1}, d_{1}]

,

0 ⩽ c_{1} < d_{1} < \infty

. Similarly,

G

denotes the collection of all bounded and continuous functions within

[c_{2}, d_{2}]

, the support of

Z_{i j}

. In practice,

[c_{1}, d_{1}]

is generally valued as the large and minimum values of all observation times. In the following, define the sieve parameter space

Θ_{n} = \{η_{n} = {(β^{T}, φ, ω, Λ_{n 1}, Λ_{n 2}, g_{n})}^{T} \in A \otimes M_{n} \otimes M_{n} \otimes G_{n}\} .

In the above,

M_{n} = {Λ_{n j} (t) = \sum_{k = 0}^{m_{1}} ϕ_{j l}^{*} B_{l} (t, m_{1}, c_{1}, d_{1}) : \sum_{l = 0}^{m_{1}} ϕ_{j l}^{*} ⩽ M_{n}; 0 ⩽ ϕ_{j 0}^{*} ⩽ \dots ⩽ ϕ_{j m_{1}}^{*}; j = 1, 2},

and

G_{n} = {g_{n} (z) = \sum_{k = 0}^{m_{2}} α_{k} B_{k} (z, m_{2}, c_{2}, d_{2}) : \sum_{k = 0}^{m_{2}} |α_{k}| ⩽ G_{n}}

with

B_{k} (x, m, c, d) = (\begin{matrix} m \\ k \end{matrix}) {(\frac{x - c}{d - c})}^{k} {(1 - \frac{x - c}{d - c})}^{m - k}, k = 0, \dots, m,

the Bernstein basis polynomial with the degrees

m_{1} = O (n^{ν_{1}})

and

m_{2} = O (n^{ν_{2}})

for some fixed

ν_{1}, ν_{2} \in (0, 1)

, and

M_{n}

and

G_{n}

being some positive constants. To estimate

η

, define

{\hat{η}}_{n} = ({\hat{β}}_{n}, {\hat{φ}}_{n}, {\hat{ω}}_{n}, {\hat{Λ}}_{n 1}, {\hat{Λ}}_{n 2}, \hat{g})

as the value of

η

by maximizing the log-likelihood function

ℓ_{n} (η; O) = log L_{n} (η; O) = \sum_{i = 1}^{n} log L (η; O_{i}) .

(4)

Note that one of the main advantages of Bernstein polynomials is that they can easily implement the nonnegativity and monotonicity properties of

Λ_{j} (t)

by the reparameterization

ϕ_{j 0}^{*} = e^{ϕ_{j 0}}

,

ϕ_{j l}^{*} = \sum_{i = 0}^{l} e^{ϕ_{j i}}, \forall 1 \leq l \leq m_{1}

[26]. In addition, of all approximation polynomials, they have the optimal shape-preserving properties [27]. By the way, they are easy to use because they do not need interior knots. In the above, for simplicity, the same basis polynomial is used for

Λ_{1}

and

Λ_{2}

. Moreover, note that this approach can be relatively easily implemented as discussed below, although it may seem to be complicated. In practice, one need to choose

m_{j} (j = 1, 2)

. We suggest employing the Akaike information criterion (AIC) defined as

AIC = - 2 ℓ_{n} ({\hat{η}}_{n}) + 2 (p + m_{1} + m_{2} + 2 + 2) .

In the above, the form in the second bracket is the number of unknown parameters in the model [28], where p denotes the dimension of

β

;

m_{1} + 1

and

m_{2} + 1

represent the degree of the Bernstein polynomials (i.e.,

m_{1} + m_{2} + 2

); and the last number denotes the dimension of correlation parameters

φ

and

ω

.

For the maximization of the log-likelihood function

l_{n} (η; O)

over the sieve space

Θ_{n}

or the determination of

{\hat{η}}_{n} = {({\hat{β}}_{n}^{T}, {\hat{φ}}_{n}, {\hat{ω}}_{n}, {\hat{Λ}}_{n 1}, {\hat{Λ}}_{n 2}, \hat{g})}^{T}

, we suggest first determining the initial estimate of

η

and then applying the Newton–Raphson approach to maximize

ℓ_{n} (β, φ, ω, Λ_{n 1}, Λ_{n 2}, g_{n}; O)

. In the numerical study section, one can use the R function

n l m

for the maximization. For the determination of the initial estimates, the following procedure can be applied.

Step 1: obtain $({\hat{β}}^{(0)}, {\hat{Λ}}_{n 1}^{(0)}, {\hat{g}}^{(0)})$ by maximizing the log marginal likelihood function under the observation data on the $T_{i 1}$ ’s;
Step 2: obtain ${\hat{Λ}}_{n 2}^{(0)}$ by maximizing the log marginal likelihood function under the observation data on the $T_{i 2}$ ’s;
Step 3: obtain the initial estimates $({\hat{φ}}^{(0)}, {\hat{ω}}^{(0)})$ of $φ$ and $ω$ by maximizing the joint sieve log likelihood function $ℓ_{n} ({\hat{β}}^{(0)}, φ, ω,$ ${\hat{Λ}}_{n 1}^{(0)}, {\hat{Λ}}_{n 2}^{(0)}, {\hat{g}}_{n}^{(0)})$ .

Now, we establish the asymptotic properties of

{\hat{η}}_{n}

. Define

η^{1} = (β^{1}, φ^{1},

ω^{1}, Λ_{1}^{1}, Λ_{2}^{1}, g^{1}) \in Θ_{n}

and

η^{2} =

(β^{2}, φ^{2}, ω^{2}, Λ_{1}^{2}, Λ_{2}^{2}, g^{2}) \in Θ_{n}

, and their distance has the form

d (η^{1}, η^{2}) = {\{∥ β^{1} - β^{2} ∥^{2} + | φ^{1} - φ^{2} ∥^{2} + | ω^{1} - ω^{2} |^{2} + ∥ Λ_{1}^{1} - Λ_{1}^{2} ∥_{2}^{2} + ∥ Λ_{2}^{1} - Λ_{2}^{2} ∥_{2}^{2} + {∥ g^{1} - g^{2} ∥}_{2}^{2}\}}^{1 / 2} .

In the above,

{∥Λ_{j}∥}_{2}^{2} =

\int [{(Λ_{j} (u))}^{2} + {(Λ_{j} (v))}^{2}] d F_{j} (u, v)

with

F_{j} (u, v)

denoting the joint distribution function of the

U_{j}^{'} s

and

V_{j}^{'} s, j = 1, 2

. Let the true value of

η

be denoted

η_{0} = (β_{0}, φ_{0}, ω_{0}, Λ_{10}, Λ_{20}, g_{0})

.

Theorem 1 (Consistency).

Suppose that Conditions 1–4 given in Appendix A hold. Then, we have

d ({\hat{η}}_{n}, η_{0}) \to

0 almost surely as

n \to \infty

.

Theorem 2 (Convergence rate).

Suppose that Conditions 1–5 given in the Appendix A hold. Then,

d ({\hat{η}}_{n}, η_{0}) = O_{p} (n^{- min {q ν_{1} / 2, (1 - ν_{1}) / 2, r ν_{2} / 2, (1 - ν_{2}) / 2}}),

where

ν_{1} \in (0, 1)

and

ν_{2} \in (0, 1)

such that

m_{1} = o (n^{ν_{1}})

,

m_{2} = o (n^{ν_{2}})

, and q and r are defined in Condition 4 of Appendix A.

Theorem 3 (Asymptotic normality).

Suppose that Conditions 1–5 given in Appendix A hold. Then, we have

n^{1 / 2} ({\hat{b}}_{n} - b_{0}) \to_{d} N \{0, I^{- 1} (b_{0})\},

where

{\hat{b}}_{n} = {({\hat{β}}_{n}^{T}, {\hat{φ}}_{n}, {\hat{ω}}_{n})}^{T}

,

b_{0} = {(β^{(0) T}, φ_{0}, ω_{0})}^{T}

,

I (b_{0}) = P {\{{\dot{ℓ}}_{b} (η_{0}) - {\dot{ℓ}}_{Λ_{1}} (η_{0}) [h_{Λ_{1}}^{*}] - {\dot{ℓ}}_{Λ_{2}} (η_{0}) [h_{Λ_{2}}^{*}] - {\dot{ℓ}}_{g} (η_{0}) [h_{g}^{*}]\}}^{\otimes 2},

w^{\otimes 2} = w w^{T}

for an arbitrary vector w, and

{\dot{ℓ}}_{b} (η_{0}), {\dot{ℓ}}_{Λ_{1}} (η_{0}) [h_{Λ_{1}}], {\dot{ℓ}}_{Λ_{2}} (η_{0}) [h_{Λ_{2}}], {\dot{ℓ}}_{g} (η_{0}) [h_{g}]

are the score statistics defined in Appendix A.

We sketch the proofs of the theorems above in Appendix A. Note that based on Theorem 2, one can get the optimal rate of convergence

m i n (n^{q / 2 (1 + q)}, n^{r / 2 (1 + r)})

with

v_{1} = 1 / (1 + q)

and

v_{2} = 1 / (1 + r)

. Specifically, it becomes

n^{1 / 3}

with

r = 2

or

q = 2

and increases while q and r increases. Theorem 3 demonstrates that the suggested estimator of the regression parameter is nonetheless asymptotically normal and effective even when the total convergence rate is less than

n^{1 / 2}

. It is clear that to give the inference of the suggested estimators, we need to estimate the variance or covariance matrix of

β

,

φ

and

ω

. However, it can be seen from Appendix A that obtaining their consistent estimators would be difficult. As a result, we propose using the simple nonparametric bootstrap approach [29], which is well-known for offering a direct and simple tool for estimating covariances when no explicit formula is given. It looks to work well, according to the numerical analysis below.

4. A Simulation Study

In this section, we present some simulation studies conducted to evaluate the finite sample performance of the sieve maximum likelihood estimation approach suggested in the preceding sections. Three scenarios for covariates were taken into account in the study. The first one was to generate the single covariates

X_{i}

’s to follow the Bernoulli distribution with the success probability 0.5 and

Z_{i}

’s to follow the uniform distribution over

(0, 1)

. In scenario two, we considered the same

Z_{i}

’s as above but generated a two-dimensional vector of the covariate

{(X_{1}, X_{2})}^{T}

with

X_{1} \sim

Bernoulli

(0.5)

and

X_{2} \sim U (0, 1)

. For scenario three, we first generated covariates

X_{i}

’s and

Z_{i}

’s as in scenario one and then replaced

X_{i}

by

X_{i} exp (t)

. That is, we had time-dependent covariates.

To generate the true bivariate failure times

(T_{i 1}, T_{i 2})

under model (1), one needed to generate

u_{i 2}

and

w_{i}

from the uniform distribution over

(0, 1)

independently for the first step and solve

u_{i 1}

from the equation

w_{i} = h (u_{i 1}, u_{i 2}) = \partial C_{ξ} (u_{i 1}, u_{i 2}) / \partial u_{i 2}

for a given copula function

C_{ξ}

. Then, the two dependent survival times

t_{i 1}

and

t_{i 2}

were obtained based on

t_{i 1} = S_{1}^{- 1} (u_{i 1})

and

t_{i 2} = S_{2}^{- 1} (u_{i 2})

, respectively, with

h_{j} (t | X, Z) = 2 + β X + s i n (\frac{1}{2} π Z)

or

h_{j} (t | X, Z) = 2 + β X + Z^{2}

,

j = 1, 2

. In order to generate the censoring intervals or the observed data, we assumed that each subject was assessed at discrete time points, and the length of two contiguous observation times followed the standard exponential distribution. Then, for every subject,

U_{i j}

was valued as the last assessment time before

T_{i j}

, and

V_{i j}

was equal to the first assessment time behind

T_{i j}

. The length of the study was determined to yield about a

20 %

right-censoring rate. On the Bernstein polynomial approximation, by following [13], we set

m_{1} = m_{2} = 3

and

ϕ_{1 k} = ϕ_{2 k}

and also took a Kendall’s

τ

equal to

0.3

for the weak dependency and

0.6

for the strong dependency. The results given below are based on 1000 replications and 100 bootstrap samples for the variance estimation.

Table 1 and Table 2 present the results based on

n = 20, 50, 100, 300, 400

given by the proposed estimation procedure on the estimation of the regression parameter

β

and the dependence parameter

τ

with the true

β_{0} = 0

or

0.5

and

g (z) = s i n (\frac{1}{2} π Z)

or

g (z) = z^{2}

, respectively. The table consists of the bias of estimates (Bias) determined by the difference value between the mean of the estimates and the real value, the sample standard error (SSE) of the estimates based on 1000 replications, the mean of the estimated standard errors (ESE) (for one replication, we obtained the estimated standard errors based on 100 bootstrap samples and computed the average of the 1000 estimated standard errors), as well as the

95 %

empirical coverage probabilities (CP). The tables show that the proposed estimator was seemingly unbiased, and the variance estimation procedure seemed to perform well. Moreover, all empirical coverage probabilities were close to the nominal level

95 %

when the sample sizes were increasing, indicating that the normal approximation to the distribution of the proposed estimator seemed reasonable. Moreover, as expected, the results got better in general with the increasing sample size.

The estimation results obtained under scenario two for the covariates are given in Table 3; the sample size was

n = 200

or 400 with the true

{(β_{1}, β_{2})}^{T} = {(0, 0)}^{T}

or

{(0.1, 0.1)}^{T}

and

g (z) = s i n (\frac{1}{2} π Z)

. Table 4 contains the estimation results obtained with the time-dependent covariates based on

β_{0} = 0, 1

or

0.5

as well as

n = 200

or 400, and the other values being the same as in Table 1. They again indicated that the proposed method seemed to perform well for the estimation of the regression and association parameters. Furthermore, the results generally became better when the sample size increased.

To assess the performance of the proposed method on the estimation of the nonlinear function g, we repeated the study given in Table 1 with four different g functions,

g_{1} (z) = sin (2 π z)

,

g_{2} (z) = cos (2 π z)

,

g_{3} (z) = - \frac{5}{3} + 5 z^{2}

, and

g_{4} (z) = - z + 5 z^{3}

, and

n = 200

. Figure 1 and Figure 2 show the average of the estimated g for each of the four cases with

β = 0

and

τ = 0.3

or

τ = 0.6

, respectively. The solid red lines represent the genuine functions, while the dashed blue lines represent the estimations. They show that the proposed method based on Bernstein polynomials seemed to perform reasonably well for the different Kendall’s

τ

considered, including a weak or strong dependency. We also took into account various configurations and got comparable outcomes.

5. An Illustration

In this section, we illustrate the proposed procedure by using the data from the Age-related Eye Disease Study (AREDS) [30], a clinical experiment tracking the development of a bilateral eye disease, age-related macular degeneration (AMD), provided the CopulaCenR package in version 4.2.0 of R software [31]. Each participant in the research was monitored every six months (during the first six years) or once a year (after year 6) for about 12 years. At each appointment, each participant’s eyes were given a severity score on a range of 1 to 12 (a higher number signifying a more serious condition).Moreover, the time-to-late AMD, which is the interval between the baseline visit and the first appointment at which the severity score reached nine or above, was computed for each eye of these subjects. Either interval censoring or right censoring was applied to the observations at both periods.

The data set consisted of

n = 629

subjects and in the analysis below, we focused on the effects of two covariates, SevScaleBL for the baseline AMD severity score (a value between one and eight with a higher value indicating more severe AMD) and rs2284665 for a genetic variant (zero, one and two for GG, GT and TT) of the AMD progression. In order to use the proposed method, we set rs2284665 to be X and SevScaleBL to be Z since the latter could be viewed as continuous. For the identifiability of the model, both X and Z were standardized and thus had the support

[0, 1]

. Moreover, various degrees of Bernstein polynomials were considered, such as

m_{1} = m_{2} = 3, 4, 5, 6

.

Table 5 gives the analysis results obtained by the suggested estimation approach. They include the estimated effect of the covariate rs2284665 and Kendall’s

τ

along with the estimated standard errors and the p-value for testing the effect to be zero. One can see that the results were consistent with respect to the degree of the Bernstein polynomials and suggested that the minor allele (TT) had a significantly “harmful” effect on AMD progression. Moreover, the estimated Kendall’s

τ

was

0.389

, suggesting a moderate dependence of the AMD progression between the two eyes. Figure 3 gives the estimated effect of the SevScaleBL, which indeed seemed to be nonlinear. To be more specific, the increased risk of AMD patients was associated with higher severity scores. The findings reached here were in line with those of other researchers who examined this subject [15]. It is worth pointing out that the conclusion of [15] was obtained under the proportional odds model and they could not visualize nonlinear covariate effects as in Figure 3.

6. Concluding Remarks

In the preceding sections, the regression analysis of bivariate interval-censored survival data was considered under a family of copula-based, semiparametric, partly linear additive hazards models. As discussed above, one significant advantage of the models was that they only needed to handle the two marginal distributions via a copula function with the copula parameter determining the dependence. For inference, a sieve maximum likelihood estimation procedure with the use of Bernstein polynomials were provided, and it was shown that the resulting estimators of the regression parameters were consistent and asymptotically efficient. Furthermore, the simulation studies suggested that the recommended approach worked effectively in practical situations.

It is worth emphasizing that the main reason for using Bernstein polynomials to approximate the infinite-dimensional cumulative hazard function

Λ_{j} (t)

and nonlinear covariate effects

g (\cdot)

was its simplicity. Other smoothing functions, such as spline functions [20], could also be used, and estimation procedures similar to the one described above could be developed. In order to implement the approach, we needed to choose the degrees

m_{1}

and

m_{2}

. As discussed above, a common method is to consider different values and compare the resulting estimators. Certainly, developing a data-driven approach for their selections would also be helpful.

In this paper, we focused on the additive hazards model. Other models, for example, the proportional hazards model, are sometimes more popular. It is well-known that the latter is more suitable for the situation where one is interested in the hazard ratio, whereas the former fits better if the excess risk or the risk differential is what is most important. However, there seems to be little literature on how to choose the better model or develop some model-checking methods, which is attractive research for the future.

Author Contributions

Conceptualization, T.H. and S.Z; methodology, X.Z. and T.H.; software, X.Z.; validation, X.Z. and T.H.; resources, T.H. and S.Z.; writing—original draft preparation, X.Z; writing—review and editing, T.H. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Natural Science Foundation Z210003, Technology Developing Plan of Jilin Province (No. 20200201258JC) and National Nature Science Foundation of China (grant nos. 12171328, 12071176 and 11971064).

Data Availability Statement

Research data are available in the R package CopulaCenR.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AIC	Akaike information criterion
AREDS	Age-related Eye Disease Study
AMD	Age-related macular degeneration
a.s.	almost surely

Appendix A. Proof of Asymptotic Properties

This appendix begins by describing the necessary regular conditions, which are similar to those generally used in the interval-censored data literature [32,33], and then sketch the proof of the asymptotic results given in Theorems 1–3.

Condition 1. There exists a positive number

δ_{j}

such that

P (V_{j} - U_{j} ⩾ δ_{j}) = 1, j = 1, 2

.

Condition 2. (i) There exist

0 < c_{1} < d_{1} < \infty

such that

P (c_{1} \leq U_{i j} < V_{i j} \leq d_{1}) = 1, j = 1, 2

. (ii) The covariate

X_{j}

’s are bounded; in other words, there exists

x_{0} > 0

that makes

P (| X | \leq x_{0}) = 1

, where

j = 1, 2

. The distribution of the

X_{j}

’s is not focused on any proper affine subspace of

R^{p}

. (iii) For some positive constant K, given the set of all observation times

V

, suppose the conditional density of

Z_{j}

is twice continuously differentiable and has a bound within

[K^{- 1}, K]

on

[c_{2}, d_{2}]

a.s.

Condition 3. After substituting

Λ_{j}

with

h_{j}

, the likelihood function can be rewritten as

L (β, φ, ω, h_{1},

h_{2}, g)

. Define

μ^{T} \dot{L} (β, φ, ω, h_{1}, h_{2}, g) = μ_{1}^{T} \frac{\partial L}{\partial β} + μ_{2} \frac{\partial L}{\partial φ} + μ_{3} \frac{\partial L}{\partial ω} + μ_{4} \frac{\partial L}{\partial h_{1}} + μ_{5} \frac{\partial L}{\partial h_{2}} + μ_{6} \frac{\partial L}{\partial g}

with

μ = {(μ_{1}^{T}, μ_{2}, μ_{3}, μ_{4}, μ_{5}, μ_{6})}^{T}

. There exist

l_{j}^{*}, r_{j}^{*} \in [c_{1}, d_{1}]

where there exist

p + 5

various sets of

(X_{1}, X_{2})

so that when

μ^{T} \dot{L} (β_{0}, φ_{0}, ω_{0}, Λ_{10}, Λ_{20}, g_{0}; O^{*}) = 0

in which

O^{*} = {u_{j}^{*}, v_{j}^{*}, X_{j}, Z_{j})

for every

p + 5

sets of values, one can conclude

v = 0_{(p + 5) \times 1}

.

Condition 4. There exist

0 < m_{1} < m_{2} < \infty

satisfying

m_{1} < Λ_{j 0} (c_{1}) < Λ_{j 0} (d_{1}) < m_{2}

, and

Λ_{j 0}

is strictly increasing and continuously differentiable until a q-order within

[c_{1}, d_{1}]

,

j = 1, 2

. In addition,

g_{0}

is continuously differentiable until an r-order within

[c_{2}, d_{2}]

. Additionally,

{(β^{(0) T}, φ_{0}, ω_{0})}^{T}

denotes an interior point in

A \subseteq R^{p} \times R^{(0, 1]} \times R^{+}

.

Condition 5. There exists

ε > 0

, for every

| η - η_{0} | < ε

,

P {ℓ_{n} (η; O) - ℓ_{n} (η_{0}; O)} ⪯ - d^{2} (η, η_{0})

, in which

ℓ_{n} (η; O)

is the log-likelihood function acquired before, and ⪯ denotes that “the left-hand side is smaller than the right, until a constant time”.

It is worth noting that Conditions 1–5, except Condition 3, are usually employed in interval-censored failure time research [26,34,35]. Condition 3 helps to ensure the parameters’ identifiability and ensure the effective Fisher information matrix is always positive [26,35].

Proof of Theorem 1.

Define

ℓ (η; O_{i})

as the log-likelihood for only one observation

O_{i}

, and

Θ_{n} = A \otimes M_{n} \otimes M_{n} \otimes H_{n}

. Let

L = \{ℓ (η; O_{i}) : η \in Θ\}

denote a class of functions, and let

P_{n}

and P denote the empirical and true probability measures, respectively. Then, following a similar calculation as in Lemma 1 in [19], the bracketing number of

L

is (until a constant time) in a bound of

{(1 / ϵ)}^{2 m_{1} + m_{2} + p + 2}

. Therefore, following Theorem 2.4.1 in [36], we can get

L

is a Glivenko–Cantelli class. Thus,

sup_{η \in Θ_{n}} |(P_{n} - P) ℓ (η; O)| \to 0 a . s .

Let

H (η; O) = - ℓ (η; O)

, for any

ϵ > 0

, define

K_{ϵ} = \{η : d (η, η_{0}) \geq ϵ, η \in Θ_{n}\}

,

κ_{1 n} = {sup}_{η \in Θ_{n}} |(P_{n} - P) H (η; O)|

, and

κ_{2 n} = P_{n} H (η_{0}; O) - P H (η_{0}; O)

. Hence, we can conclude that

inf_{K_{ϵ}} P H (η; O) = inf_{K_{ϵ}} \{P H (η; O) - P_{n} H (η; O) + P_{n} H (η; O)\} \leq κ_{1 n} + inf_{K_{ϵ}} P_{n} H (η; O) .

(A1)

If

{\hat{η}}_{n} \in K_{ϵ}

, we can get

inf_{K_{ϵ}} P_{n} H (η; O) = P_{n} H ({\hat{η}}_{n}; O) \leq P_{n} H (η_{0}; O) = κ_{2 n} + P H (η_{0}; O) .

(A2)

Then, we can prove

δ_{ϵ} > 0

based on the proof by contradiction under Condition 3. Combining (A1) and (A2), we have

inf_{K_{ϵ}} P H (η; O) \leq κ_{1 n} + κ_{2 n} + P H (η_{0}, O) = κ_{n} + P H (η_{0}, O),

in which

κ_{n} = κ_{1 n} + κ_{2 n}

, so that

κ_{n} \geq δ_{ϵ}

, where

δ_{ϵ} = {inf}_{K_{ϵ}} P H (η; O) - P H (η_{0}, O)

. This gives

{{\hat{η}}_{n} \in K_{ϵ}} \subseteq \{κ_{n} \geq δ_{ϵ}\}

. Based on Condition 1 and together with the strong law of large numbers, we have

κ_{1 n} + κ_{2 n} \to 0

a.s. Hence,

\cup_{k = 1}^{\infty} \cap_{n = k}^{\infty} {{\hat{η}}_{n} \in K_{ϵ}} \subseteq \cup_{k = 1}^{\infty} \cap_{n = k}^{\infty} \{κ_{n} \geq δ_{ϵ}\}

, which proves that

d ({\hat{η}}_{n}, η_{0}) \to 0

a.s. The proof of Theorem 1 is complete. □

Proof of Theorem 2.

We verify Conditions C1–C3 in [37] to derive the rate of convergence. Define

∥ u ∥

as the Euclidean norm of a vector u,

{∥ h ∥}_{\infty} = {sup}_{x} | h (x) |

is the supremum norm of a function h, and

{∥ h ∥}_{L_{2} (P)} = {({\int | h |}^{2} d P)}^{1 / 2}

. Moreover, let P denote a probability measure. After that, for the convenience of understanding the proof as follows, we define

Θ^{q r} = A \otimes M^{q} \otimes M^{q} \otimes G^{r}

, where

M^{q}

is the sets of

Λ_{j}

, and

G^{r}

is the sets of g. q and r are as defined in Condition 4. It is noteworthy that

Θ^{q r}

is completely the same as

Θ

except for the notation. Similarly,

Θ_{n}^{q r}

is the corresponding sieve space containing

M_{n}^{q}

and

G_{n}^{r}

. First, as a result of Condition 5, we can verify that Condition C1 directly stands. That is,

P \{ℓ (η_{0}; O) - ℓ (η; O)\} ⪯ - d^{2} (η, η_{0})

for any

η \in Θ_{n}^{q r}

. Second, we verify Condition C2 in [37]. Based on Conditions 1–4, one can easily find that for every

η \in Θ_{n}^{q r}

,

\begin{matrix} P {\{ℓ (η; O) - ℓ (η_{0}; O)\}}^{2} & ⪯ | b - b_{0} |^{2} + P [{Λ_{1} (l_{1}) - Λ_{10} (l_{1})}^{2} + {Λ_{1} (r_{1}) - Λ_{10} (r_{1})}^{2}] \\ + P [{Λ_{2} (l_{2}) - Λ_{20} (l_{2})}^{2} + {Λ_{2} (r_{2}) - Λ_{20} (r_{2})}^{2}] \\ + P {\{g (Z_{1}) - g_{0} (Z_{1})\}}^{2} + P {\{g (Z_{2}) - g_{0} (Z_{2})\}}^{2} = d^{2} (η, η_{0}), \end{matrix}

in which

b = {(β^{T}, φ, ω)}^{T}

, and this implies that for any

η \in Θ_{n}^{q r}

,

{sup}_{d (η, η_{0}) ⩽ ϵ} var {ℓ (η_{0}; O) - ℓ (η; O)} ⩽ {sup}_{d (η, η_{0}) ⩽ ϵ} P {ℓ (η_{0}; O) - ℓ (η; O)}^{2} ⪯ ϵ^{2}

. Thus, Condition C2 from [37] holds when the sign

β

in their paper is equal to one. In the end, one needs to verify Condition C3 of [37]. Define the class of functions

N_{n} = {ℓ (η; O) - ℓ (η_{n 0}; O) : η \in Θ_{n}^{q r}}

and let

N_{[]} (ϵ, N_{n}, L_{\infty})

denote the

ϵ

-bracketing number related to

L_{\infty}

norm of

N_{n}

. Then, we have

N_{[]} (ϵ, N_{n}, L_{\infty}) ⪯ {(1 / ϵ)}^{c_{1} m_{1} + c_{2} m_{2} + p + 2}

by following similar arguments as in Lemma A3 of [13], where

k_{1}, k_{2} > 0

, and

p + 2

is the dimensionality of b. By following the fact that the covering number is always smaller than the bracketing number, we have

log N_{[]} (ϵ, N_{n}, L_{\infty}) ⪯ (k_{1} m_{1} + k_{2} m_{2} + p + 2) log (1 / ϵ) ⪯ n^{ν_{1} + ν_{2}} log (1 / ϵ)

. Therefore, Condition C3 in [37] is satisfied under

2 r_{0} = ν

, and

r = 0^{+}

in their sign. Hence,

τ

of Theorem 1 in [37] on page 584 may be equal to

(1 - ν_{1}) / 2 -

{log (log n)} / (2 log n)

. Because the part behind the minus is close to zero with

n \to 0

, one can set a

{\tilde{ν}}_{1}

a little bigger than

ν_{1}

so as to get

(1 - {\tilde{ν}}_{1}) / 2 ⩽ τ

with a large n. Let

{\tilde{ν}}_{1}

replace

ν_{1}

but keep the same notation with

ν_{1}

, then the new constant

τ^{'} = (1 - ν_{1}) / 2

.

Notice that from Theorem 1.6.2 in [38], there are Bernstein polynomials

Λ_{j n 0} \in M_{n}^{q}

that make

∥ Λ_{j n 0} = - Λ_{j 0} ∥_{\infty} = O (n^{- q / 2})

,

j = 1, 2

. Similarly, there also exists a function

g_{n 0} \in G_{n}^{r}

satisfying

∥ g_{n 0} - g_{0} ∥_{\infty} = O (n^{- r / 2})

. Then, the sieve approximate error

ρ (π_{n} η_{0}, η_{0})

in [37] is

O (n^{- q ν_{1} / 2})

. Therefore, applying the Taylor expansion to

P {ℓ (η_{0}; O) - ℓ (η; O)}

surrounding

η_{0}

, then plugging in

η_{n 0} = {(β_{0}^{T}, φ_{0}, ω_{0}, Λ_{1 n 0}, Λ_{2 n 0}, g_{n 0})}^{T}

, the Kullback–Leilber pseudodistance of

η_{0}

and

η_{n 0}

follows

\begin{matrix} K (η_{n 0}, η_{0}) = - P \{ℓ (η_{n 0}; O) - ℓ (η^{(0)}; O)\} \\ = & - \frac{1}{2} P \{{\ddot{ℓ}}_{Λ_{1} Λ_{1}} (η_{0}; O) [Λ_{1 n 0} - Λ_{10}, Λ_{1 n 0} - Λ_{10}] + {\ddot{ℓ}}_{Λ_{2} Λ_{2}} (η_{0}; O) [Λ_{2 n 0} - Λ_{20}, Λ_{2 n 0} - Λ_{20}] \\ + 2 {\ddot{ℓ}}_{Λ_{1} Λ_{2}} (η_{0}; O) [Λ_{1 n 0} - Λ_{10}, Λ_{2 n 0} - Λ_{20}]\} - P \{{\ddot{ℓ}}_{g g} (η_{0}; O) [g_{n 0} - g_{0}, g_{n 0} - g_{0}]\} \\ + o (d^{2} (η_{n 0}, η_{0})) \\ ⪯ & {∥Λ_{1 n 0} - Λ_{10}∥}_{2}^{2} + {∥Λ_{2 n 0} - Λ_{20}∥}_{2}^{2} + o (∥ Λ_{1 n 0} - Λ_{10} ∥_{2}^{2} + {∥ Λ_{2 n 0} - Λ_{20} ∥}_{2}^{2}) \\ + {∥g_{n 0} - g_{0}∥}_{2}^{2} + o (∥ g_{n 0} - g_{0} ∥_{2}^{2}) \\ ⪯ & O (n^{- q ν_{1}}) + O (n^{- r ν_{2}}) = O (n^{- m i n (q ν_{1}, r ν_{2})}) . \end{matrix}

The first equality holds due to the first derivative of

P ℓ (η; O)

at

η_{0}

being equal to zero. As for the penultimate inequality, it holds because all the derivatives and second-order derivatives of the log-likelihood are bounded. Furthermore, since

∥ Λ_{j n 0} - Λ_{j 0} ∥_{2} ⩽∥ Λ_{j n 0} - Λ_{j 0} ∥_{\infty} = O (n^{- q ν_{1} / 2})

, and

∥ g_{n 0} - g_{0} ∥_{2} ⩽∥ g_{n 0} - g_{0} ∥_{\infty} = O (n^{- r ν_{2} / 2})

, we can get the last inequality, so that

K^{1 / 2} (η_{0}, η_{n 0}) = O (n^{- q ν_{1} / 2}) + O (n^{- r ν_{2} / 2}) = O (n^{- m i n (q ν_{1} / 2, r ν_{2} / 2)})

. Hence, by Theorem 1 in [37], the convergence rate of

{\hat{η}}_{n}

is

\begin{matrix} d ({\hat{η}}_{n}, η_{0}) & = O_{p} \{max (n^{- (1 - ν_{1}) / 2}, n^{- q ν_{1} / 2}, n^{- (1 - ν_{2}) / 2}, n^{- r ν_{2} / 2})\} \\ = O_{p} (n^{- min {q ν_{1} / 2, (1 - ν_{1}) / 2, r ν_{2} / 2, (1 - ν_{2}) / 2}}) . \end{matrix}

The proof of Theorem 2 is complete. □

Proof of Theorem 3.

Let us sketch the proof of Theorem 3 in five steps as follows.

Step 1. We first calculate the derivatives regarding

η = {(β^{T}, φ, ω, Λ_{1}, Λ_{2}, g)}^{T}

, such that

{\dot{ℓ}}_{Λ_{j}} (η; O) [h_{Λ_{j}}]

,

{\dot{ℓ}}_{g} (η; O) [h_{g}]

and so on; now, we omit

(η; O)

in the following formula for convenience in Step 1.

To obtain the score functions of

Λ_{j}, j = 1, 2

. Let

y_{Λ_{j}} (t_{j}) \in H_{Λ_{j}} = {y_{Λ_{j}} : y_{Λ_{j}} = \frac{\partial Λ_{j ξ}}{\partial ξ} |_{ξ = 0}, Λ_{j ξ} \in M^{q}}

denote an arbitrary parametric submodel of

Λ_{j}

, in which

y_{Λ_{j}} (t_{j})

satisfies the Fréchet derivative

{lim}_{ξ \to 0} {Λ_{j} (t_{j} + ξ) - Λ_{j} (t_{j}) - y_{Λ_{j}} (t_{j}) ξ} / ξ = 0

. Similarly, we can also define a submodel of g noted by

y_{g} (t_{j}) \in H_{g}

. Moreover, note

\begin{matrix} ℓ (η; t_{1}, t_{2}, Y) = log S (t_{1}, t_{2} ∣ Y) \\ = - ω log \{1 + {[{(exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1)}^{\frac{1}{φ}} + {(exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1)}^{\frac{1}{φ}}]}^{φ}\}, \end{matrix}

and

J = 1 + {[{(exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1)}^{1 / φ} + {(exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1)}^{1 / φ}]}^{φ},

where

Y = (X, Z)

and

h (t; Y) = β^{T} W (t) + g (Z) t

. The score function along

y_{Λ_{j}} (t_{j})

is

\begin{matrix} {\dot{ℓ}}_{Λ_{j}} (η; t_{1}, t_{2}, Y) [y_{Λ_{j}}] = \\ \frac{- 1}{J} {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{j^{'}} (t_{j^{'}}) + h (t_{j^{'}}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{j} (t_{j}) + h (t_{j}; Y))) - 1})}^{1 / φ}]}^{φ - 1} exp (\frac{1}{ω} (Λ_{j} (t_{j}) + h (t_{j}; Y))) \times y_{Λ_{j}} (t_{j}), \end{matrix}

with

j, j^{'} \in {1, 2}

,

h_{Λ_{j}} (t_{j}) \in M^{q - 1}

and

Λ_{j} \in M^{q}

. Analogously, we have the derivatives with respect to g as

\begin{matrix} {\dot{ℓ}}_{g} (η; t_{1}, t_{2}, Y) = \\ \frac{- 1}{J} \{{[1 + (\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})]}^{φ - 1} exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) t_{1} \\ + {[1 + (\frac{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1})]}^{φ - 1} exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) t_{2}\} \times y_{g}, \end{matrix}

with

g (z) \in G^{r}

, and

y_{g} (t_{j}) \in G^{r - 1}

.

The second-order derivatives of

ℓ (η; t_{1}, t_{2}, Y)

have the form

\begin{matrix} {\ddot{ℓ}}_{Λ_{1} Λ_{1}} (η; t_{1}, t_{2}, Y) [y_{Λ_{1}}, {\tilde{y}}_{Λ_{1}}] \\ = \{\frac{- 1}{J} {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ}]}^{φ - 1} exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) \\ + \frac{1}{J} {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ}]}^{φ - 2} exp (\frac{2}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) (1 - \frac{1}{φ}) \\ \times {(exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1)}^{1 / φ} {(exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1)}^{- 1 - 1 / φ} \\ + \frac{1}{J^{2}} {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ}]}^{2 φ - 2} exp (\frac{2}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y)))\} \\ \times \frac{1}{ω} y_{Λ_{1}} (t_{1}) {\tilde{y}}_{Λ_{1}} (t_{1}), \end{matrix}

\begin{matrix} {\ddot{ℓ}}_{Λ_{1} Λ_{2}} (η; t_{1}, t_{2}, Y) [y_{Λ_{1}}, h_{Λ_{2}}] = {\ddot{ℓ}}_{Λ_{2} Λ_{1}} (η; t_{1}, t_{2}, Y) [h_{Λ_{2}}, h_{Λ_{1}}] \\ = \{\frac{- 1}{J} {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ}]}^{φ - 2} \frac{1}{(exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1)} \\ \times (1 - \frac{1}{φ}) + \frac{1}{J^{2}} {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ}]}^{2 φ - 2}\} \\ \times \frac{1}{ω} exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y) + Λ_{2} (t_{2}) + h (t_{2}; Y))) y_{Λ_{1}} (t_{1}) y_{Λ_{2}} (t_{2}) \\ \times {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ - 1} . \end{matrix}

\begin{matrix} {\ddot{ℓ}}_{Λ_{1} g} (η; t_{1}, t_{2}, Y) [y_{Λ_{1}}, y_{g}] = \\ \{\frac{1}{J} {[{(exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y)) - 1)}^{1 / φ} + {(exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y)) - 1)}^{1 / φ}]}^{φ - 1} \\ \times [{(exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y)) - 1)}^{1 / φ - 1} exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y)) \times t_{1} \\ + {(exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y)) - 1)}^{1 / φ - 1} exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y)) \times t_{2}] \\ - {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ}]}^{- 1} {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ} \\ \times [(exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y)) - 1) exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y)) \times t_{2} \\ + (exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y)) - 1) exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y)) \times t_{1}] (1 - \frac{1}{φ})\} \\ \times {[1 + {(\frac{exp (\frac{1}{ω} (Λ_{2} (t_{2}) + h (t_{2}; Y))) - 1}{exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y))) - 1})}^{1 / φ}]}^{φ - 1} exp (\frac{1}{ω} (Λ_{1} (t_{1}) + h (t_{1}; Y)) \\ \times \frac{1}{J} \frac{1}{ω} y_{Λ_{1}} (t_{1}) y_{g}, \end{matrix}

Similarly, we can derive

{\ddot{ℓ}}_{b b}

,

{\ddot{ℓ}}_{Λ_{j} b} [y_{Λ_{j}}]

and

{\ddot{ℓ}}_{g b} [y_{g}]

as, respectively, the derivatives of

{\dot{ℓ}}_{b}

,

{\dot{ℓ}}_{Λ_{j}} [y_{Λ_{j}}]

and

{\dot{ℓ}}_{g} [y_{g}]

with respect to b.

{\ddot{ℓ}}_{b Λ_{j}} [y_{Λ_{j}}], {\ddot{ℓ}}_{Λ_{j^{'}} Λ_{j}} [y_{Λ_{j^{'}}}, y_{Λ_{j}}], {\ddot{ℓ}}_{Λ_{j} Λ_{j}} [y_{Λ_{j}}, {\tilde{y}}_{Λ_{j}}]

and

{\ddot{ℓ}}_{g Λ_{j}} [y_{g}, y_{Λ_{j}}]

are, respectively, the derivatives of

{\dot{ℓ}}_{b}, {\dot{ℓ}}_{Λ_{j^{'}}}, {\dot{ℓ}}_{Λ_{j}}

and

{\dot{ℓ}}_{g} [y_{g}]

with respect to

Λ_{j}

,

j, j^{'} \in {1, 2}

.

{\ddot{ℓ}}_{b g} [y_{g}]

,

{\ddot{ℓ}}_{Λ_{j} g} [y_{Λ_{j}}, y_{g}]

and

{\ddot{ℓ}}_{g g} [y_{g}, y_{g}]

are, respectively, the derivatives of

{\dot{ℓ}}_{b}, {\dot{ℓ}}_{Λ_{j}} [y_{Λ_{j}}]

and

{\dot{ℓ}}_{g} [y_{g}]

with respect to g.

Step 2. Consider the classes of functions

D_{1} = {{\dot{ℓ}}_{b} (η) : d (η, η_{0}) \leq δ},

D_{2} = {{\dot{ℓ}}_{Λ_{j}} (η) [y_{Λ_{j}}] : d (η, η_{0}) \leq δ}

and

D_{3} = {{\dot{ℓ}}_{g} (η) [y_{g}] : d (η, η_{0}) \leq δ}

. We need to show these three function classes are Donsker for any

δ > 0

. We determine the bracketing number of

D_{1}

in order to demonstrate that it is Donsker. In accordance with [37], we have

log N_{[]} (ϵ, D_{1}, L_{2}) ⪯ m a x (m_{1}, m_{2}) log (δ / ϵ)

for

0 < ϵ < δ

. This results in a finite-valued bracketing integral according to Theorem 2.8.4 of [36]. Hence, the class

D_{1}

is Donsker. Similar justifications support that

G_{2}

and

G_{3}

are also Donsker.

Step 3. Following similar arguments as in Lemma 2 of [19] and the properties of the score statistic, there exist

y_{Λ_{j}}^{*} \in M^{q - 1}

and

y_{g}^{*} \in G^{r - 1}

satisfying

P {\dot{ℓ}}_{b} (η_{0}) = 0, P {\dot{ℓ}}_{Λ_{j}} (η_{0}) [y_{Λ_{j}}^{*}] = 0, P {\dot{ℓ}}_{g} (η_{0}) [y_{g}^{*}] = 0 .

(A3)

Let

{\hat{η}}_{n}

denote the estimators of the sieve log-likelihood and

y_{Λ_{j}, n}^{*}

is the projection of

y_{Λ_{j}}^{*}

onto

M_{n}

,

j = 1, 2

. We get

\begin{matrix} P_{n} {\dot{ℓ}}_{Λ_{j}} ({\hat{η}}_{n}) [y_{Λ_{j}}^{*}] = & P_{n} {\dot{ℓ}}_{Λ_{j}} ({\hat{η}}_{n}) [y_{Λ_{j}, n}^{*}] + P {\dot{ℓ}}_{Λ_{j}} (η_{0}) [y_{Λ_{j}}^{*} - y_{Λ_{j}, n}^{*}] + (P_{n} - P) {\dot{ℓ}}_{Λ_{j}} ({\hat{η}}_{n}) [y_{Λ_{j}}^{*} - y_{Λ_{j}, n}^{*}] \\ + P \{{\dot{ℓ}}_{Λ_{j}} ({\hat{η}}_{n}) [y_{Λ_{j}}^{*} - y_{Λ_{j}, n}^{*}] - {\dot{ℓ}}_{Λ_{j}} (η_{0}) [y_{Λ_{j}}^{*} - y_{Λ_{j}, n}^{*}]\} \\ = & (I) + (I I) + (I I I) + (I V) . \end{matrix}

(A4)

Following the discussion about the proof for Theorem

5.3

of [34], we can derive that part (I) is equal to

o_{p} (n^{- 1 / 2})

. In addition, (II) is also equal to

o_{p} (n^{- 1 / 2})

based on (A3). We can acquire (III) as

o_{p} (n^{- 1 / 2})

due to

D_{2}

being Donsker. As for the fourth term (IV), on account of Theorem 2 and employing the first-order linear expansion of

{\dot{ℓ}}_{Λ_{j}} (\hat{η})

around

η_{0}

, one can get (IV) is

o_{p} (n^{- 1 / 2})

as well. Summating the four terms, we have

P_{n} {\dot{ℓ}}_{Λ_{j}} ({\hat{η}}_{n}) [y_{Λ_{j}}^{*}] = o_{p} (n^{- 1 / 2})

. Likewise, we have the property of

P_{n} {\dot{ℓ}}_{g} ({\hat{η}}_{n}) [y_{g}^{*}]

. Hence, we have

P_{n} {\dot{ℓ}}_{b} ({\hat{η}}_{n}) = o_{p} (n^{- 1 / 2}), P_{n} {\dot{ℓ}}_{Λ_{j}} ({\hat{η}}_{n}) [y_{Λ_{j}}^{*}] = o_{p} (n^{- 1 / 2}), P_{n} {\dot{ℓ}}_{g} ({\hat{η}}_{n}) [y_{g}^{*}] = o_{p} (n^{- 1 / 2}) .

(A5)

Step 4. Combining (A3) and (A5), we can easily show that

\begin{matrix} P_{n} \{{\dot{ℓ}}_{b} ({\hat{η}}_{n}) - {\dot{ℓ}}_{b} (η_{0})\} & = - (P_{n} - P) {\dot{ℓ}}_{b} (η_{0}) + o_{p} (n^{- 1 / 2}), \\ P_{n} \{{\dot{ℓ}}_{Λ_{j}} ({\hat{η}}_{n}) [y_{Λ_{j}}^{*}] - {\dot{ℓ}}_{Λ_{j}} (η_{0}) [y_{Λ_{j}}^{*}]\} & = - (P_{n} - P) {\dot{ℓ}}_{Λ_{j}} (η_{0}) [y_{Λ_{j}}^{*}] + o_{p} (n^{- 1 / 2}), \\ P_{n} \{{\dot{ℓ}}_{g} ({\hat{η}}_{n}) [y_{g}^{*}] - {\dot{ℓ}}_{g} (η_{0}) [y_{g}^{*}]\} & = - (P_{n} - P) {\dot{ℓ}}_{g} (η_{0}) [y_{g}^{*}] + o_{p} (n^{- 1 / 2}) . \end{matrix}

(A6)

Furthermore, based on some arguments in the proof of Theorem 3.2 in [13], there exists a neighborhood of

(b_{0}, Λ_{10}, Λ_{20}, g_{0})

as

{(b, Λ_{1}, Λ_{2}, g) : | b - b_{0} | + ∥ Λ_{1} - Λ_{10} ∥_{2} + ∥ Λ_{2} - Λ_{20} ∥_{2} + ∥ g - g_{0} ∥_{2} ⩽ C n^{- ξ}}

, where

ξ = min {(1 - ν_{1}) / 2, q ν_{1} / 2, (1 - ν_{2}) / 2, r ν_{2} / 2}

. Then, applying the Taylor expansion for

ℓ_{Λ_{j}} (η) [y_{j}^{*}]

yields

\begin{matrix} P (ℓ_{Λ_{j}} (η) [y_{j}^{*}] - ℓ_{Λ_{j}} (η_{0}) [y_{j}^{*}] - {\ddot{ℓ}}_{Λ_{j} b} (η_{0}) [h_{j}^{*}] (b - b_{0}) - {\ddot{ℓ}}_{Λ_{j} Λ_{j}} (η_{0}) [y_{j}^{*}, Λ_{j} - Λ_{j 0}] \\ - {\ddot{ℓ}}_{Λ_{j} Λ_{j^{'}}} (η_{0}) [y_{j}^{*}, Λ_{j^{'}} - Λ_{j^{'} 0}] - {\ddot{ℓ}}_{Λ_{j} g} (η_{0}) [y_{j}^{*}, g - g_{0}]) = o_{p} (n^{- 1 / 2}), \end{matrix}

(A7)

where

j, j^{'} \in {1, 2}

. Likewise, it is also easy to get the property of

ℓ_{b} (η)

and

ℓ_{g} (η) [y_{g}^{*}]

. Note that the derivatives of the score statistics are bounded. After applying Taylor series expansions about

η_{0}

to (A6), and combining the Equations (A7), we have

\begin{matrix} P {\ddot{ℓ}}_{b b} (η_{0}) ({\hat{b}}_{n} - b_{0}) + P {\ddot{ℓ}}_{b Λ_{1}} (η_{0}) [{\hat{Λ}}_{1, n} - Λ_{10}] + P {\ddot{ℓ}}_{b Λ_{2}} (η_{0}) [{\hat{Λ}}_{2, n} - Λ_{20}] + P {\ddot{ℓ}}_{b g} (η_{0}) [{\hat{g}}_{n} - g_{0}] \\ = - P_{n} {\dot{ℓ}}_{b} (η_{0}) + o_{p} (n^{- 1 / 2}), \\ P {\ddot{ℓ}}_{Λ_{j} b} (η_{0}) [y_{j}^{*}] ({\hat{b}}_{n} - b_{0}) + P {\ddot{ℓ}}_{Λ_{j} Λ_{j}} (η_{0}) [y_{j}^{*}, {\hat{Λ}}_{j, n} - Λ_{j 0}] + P {\ddot{ℓ}}_{Λ_{j} Λ_{j^{'}}} (η_{0}) [y_{j}^{*}, {\hat{Λ}}_{j^{'}, n} - Λ_{j^{'} 0}] \\ + P {\ddot{ℓ}}_{Λ_{j} g} (η_{0}) [y_{j}^{*}, {\hat{g}}_{n} - g_{0}] = - P_{n} {\dot{ℓ}}_{Λ_{j}} (η_{0}) [y_{j}^{*}] + o_{p} (n^{- 1 / 2}), \\ P {\ddot{ℓ}}_{g b} [y_{g}^{*}] ({\hat{b}}_{n} - b_{0}) + P {\ddot{ℓ}}_{g Λ_{1}} [y_{g}^{*}, {\hat{Λ}}_{1 n} - Λ_{10}] + P {\ddot{ℓ}}_{g Λ_{2}} [y_{g}^{*}, {\hat{Λ}}_{2 n} - Λ_{20}] + P {\ddot{ℓ}}_{g g} [y_{g}^{*}, {\hat{g}}_{n} - g_{0}] \\ = - P_{n} {\dot{ℓ}}_{g} (η_{0}) [y_{g}^{*}] + o_{p} (n^{- 1 / 2}) . \end{matrix}

(A8)

Taking the first equality in (A8) and subtracting the second and third equalities, we have

\begin{matrix} P ({\ddot{ℓ}}_{b b} (η_{0}) - {\ddot{ℓ}}_{Λ_{1} b} (η_{0}) [h_{Λ_{1}}^{*}] - {\ddot{ℓ}}_{Λ_{2} b} (η_{0}) [h_{Λ_{2}}^{*}] - {\ddot{ℓ}}_{g b} (η_{0}) [h_{g}^{*}]) ({\hat{b}}_{n} - b_{0}) = - P_{n} \{{\dot{ℓ}}_{b} (η_{0}) \\ - {\dot{ℓ}}_{Λ_{1}} (η_{0}) [y_{Λ_{1}}^{*}] - {\dot{ℓ}}_{Λ_{2}} (η_{0}) [y_{Λ_{2}}^{*}] - {\dot{ℓ}}_{g} (η_{0}) [y_{g}^{*}]\} + o_{p} (n^{- 1 / 2}) . \end{matrix}

(A9)

Step 5. Define

Q = - P ({\ddot{ℓ}}_{b b} (η_{0}; O) - {\ddot{ℓ}}_{Λ_{1} b} (η_{0}; O) [y_{Λ_{1}}^{*}] - {\ddot{ℓ}}_{Λ_{2} b} (η_{0}; O) [y_{Λ_{2}}^{*}] - {\ddot{ℓ}}_{g b} (η_{0};

O) [y_{g}^{*}])

and

B = P ℓ^{*} {(η_{0}; O)}^{\otimes 2} = P {ℓ^{*} (η_{0}; O) ℓ^{*} {(η_{0}; O)}^{T}}

; then, we have

\begin{matrix} Q & = P {({\dot{ℓ}}_{b} (η_{0}; O) - {\dot{ℓ}}_{Λ_{1}} (η_{0}; O) [y_{Λ_{1}}^{*}] - {\dot{ℓ}}_{Λ_{2}} (η_{0}; O) [y_{Λ_{2}}^{*}] - {\dot{ℓ}}_{g} (η_{0}) [y_{g}^{*}])}^{\otimes 2} \\ = P ℓ^{*} {(η_{0}; O)}^{\otimes 2} = B, \end{matrix}

where

ℓ^{*} (η_{0}; O) = {\dot{ℓ}}_{b} (η_{0}; O) - {\dot{ℓ}}_{Λ_{1}} (η_{0}; O) [y_{Λ_{1}}^{*}] - {\dot{ℓ}}_{Λ_{2}} (η_{0}; O) [y_{Λ_{2}}^{*}] - {\dot{ℓ}}_{g} (η_{0}; O) [y_{g}^{*}]

. Next, we need to verify Q is nonsingular. If Q is a nonsingular matrix, then we can conclude

v = {(v_{1}^{T}, v_{2}, v_{3})}^{T} = 0

from

v^{T} Q v =

v^{T} P ℓ^{*} {(η_{0}; O)}^{\otimes 2} v = 0

. Moreover, one is enough to show if

v^{T} ℓ^{*} (η_{0}; O) = 0

, then

v = 0

. Thus, we have

\begin{matrix} v^{T} ℓ^{*} (η_{0}; O) = v^{T} \{{\dot{ℓ}}_{b} (η_{0}; O) - {\dot{ℓ}}_{Λ_{1}} (η_{0}; O) [y_{Λ_{1}}^{*}] - {\dot{ℓ}}_{Λ_{2}} (η_{0}; O) [y_{Λ_{2}}^{*}] - {\dot{ℓ}}_{g} (η_{0}; O) [y_{g}^{*}]\} \\ = & \{v_{1}^{T} \frac{\partial L}{\partial β} (η_{0}; O) + v_{2} \frac{\partial L}{\partial φ} (η_{0}; O) + v_{3} \frac{\partial L}{\partial ω} (η_{0}; O) + \frac{\partial L}{\partial Λ_{1}} (η_{0}; O) [- v^{T} y_{Λ_{1}}^{*}] \\ + \frac{\partial L}{\partial Λ_{2}} (η_{0}; O) [- v^{T} y_{Λ_{2}}^{*}] + \frac{\partial L}{\partial g} (η_{0}; O) [- v^{T} y_{g}^{*}]\} \frac{1}{L (η_{0}; O)}, \end{matrix}

(A10)

where

η_{0} = (β^{(0) T}, φ_{0}, ω_{0}, Λ_{10}, Λ_{20}, g_{0})

and

L (η_{0}; O)

is the likelihood function. Under our Condition 3, (A10) is equal to zero only if

v = 0

. As a consequence, we have verified Q is nonsingular.

Substitute Q into (A9), we get

- Q ({\hat{b}}_{n} - b_{0}) = - P_{n} l^{*} (η_{0}; O) + o_{p} (n^{- 1 / 2}) .

This implies that

\sqrt{n} ({\hat{b}}_{n} - b_{0}) = Q^{- 1} n^{1 / 2} P_{n} l^{*} (η_{0}; O) + o_{p} (1) \to_{d} N \{0, Q^{- 1} B {(Q^{- 1})}^{T}\} .

(A11)

Since

Q = B = P ℓ^{*} {(η_{0})}^{\otimes 2}

, we obtain

Q^{- 1} B {(Q^{- 1})}^{T} = Q^{- 1} ≜ I^{- 1} (b_{0})

. Thus,

n^{1 / 2} ({\hat{b}}_{n} - b_{0}) \to_{d} N \{0, I^{- 1} (b_{0})\}

, with

I (b_{0}) = P ℓ^{*} {(η_{0})}^{\otimes 2}

and

ℓ^{*} (η_{0})

being the efficient score function of

b_{0}

. Now, we complete the proof of Theorem 3. □

References

Sun, J. The Statistical Analysis of Interval-Censored Failure Time Data; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Lin, D.Y.; Oakes, D.; Ying, Z. Additive hazards regression with current status data. Biometrika 1998, 85, 289–298. [Google Scholar] [CrossRef]
Martinussen, T.; Scheike, T.H. Efficient estimation in additive hazards regression with current status data. Biometrika 2002, 89, 649–658. [Google Scholar] [CrossRef]
Feng, Y.; Sun, J.; Sun, L. Estimation of the additive hazards model with linear inequality restrictions based on current status data. Commun. Stat.-Theory Methods 2022, 51, 68–81. [Google Scholar] [CrossRef]
Wang, P.; Zhou, Y.; Sun, J. A new method for regression analysis of interval-censored data with the additive hazards model. J. Korean Stat. Soc. 2020, 49, 1131–1147. [Google Scholar] [CrossRef]
Li, H.; Zhang, H.; Zhu, L.; Li, N.; Sun, J. Estimation of the additive hazards model with interval-censored data and missing covariates. Can. J. Stat. 2020, 48, 499–517. [Google Scholar] [CrossRef]
Wang, T.B.; Yopadhyay, D.; Sinha, S. Efficient estimation of the additive risks model for interval-censored data. arXiv 2022, arXiv:2203.09726. [Google Scholar]
Tong, X.; Chen, M.H.; Sun, J. Regression analysis of multivariate interval-censored failure time data with application to tumorigenicity experiments. Biom. J. 2008, 50, 364–374. [Google Scholar] [CrossRef] [PubMed]
Yin, G.; Cai, J. Additive hazards model with multivariate failure time data. Biometrika 2004, 91, 801–818. [Google Scholar] [CrossRef]
Liu, P.; Song, S.; Zhou, Y. Semiparametric additive frailty hazard model for clustered failure time data. Can. J. Stat. 2022, 50, 549–571. [Google Scholar] [CrossRef]
Zeng, D.; Cai, J. Additive transformation models for clustered failure time data. Lifetime Data Anal. 2010, 16, 333–352. [Google Scholar] [CrossRef]
Yu, M.; Du, M. Regression analysis of multivariate interval-censored failure time data under transformation model with informative censoring. Mathematics 2022, 10, 3257. [Google Scholar] [CrossRef]
Sun, T.; Ding, Y. Copula-based semiparametric regression method for bivariate data under general interval censoring. Biostatistics 2021, 22, 315–330. [Google Scholar] [CrossRef]
Marra, G.; Radice, R. Copula link-based additive models for right-censored event time data. J. Am. Stat. Assoc. 2020, 115, 886–895. [Google Scholar] [CrossRef]
Petti, D.; Eletti, A.; Marra, G.; Radice, R. Copula link-based additive models for bivariate time-to-event outcomes with general censoring scheme. Comput. Stat. Data Anal. 2022, 175, 107550. [Google Scholar] [CrossRef]
Cheng, G.; Zhou, L.; Huang, J.Z. Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data. Bernoulli 2014, 20, 141–163. [Google Scholar] [CrossRef]
Lu, X.; Song, P.X.K. Efficient estimation of the partly linear additive hazards model with current status data. Scand. J. Stat. 2015, 42, 306–328. [Google Scholar] [CrossRef]
Wang, X.; Song, Y.; Zhang, S. An efficient estimation for the parameter in additive partially linear models with missing covariates. J. Korean Stat. Soc. 2020, 49, 779–801. [Google Scholar] [CrossRef]
Lee, C.Y.; Wong, K.Y.; Lam, K.F.; Xu, J. Analysis of clustered interval-censored data using a class of semiparametric partly linear frailty transformation models. Biometrics 2022, 78, 165–178. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Ren, F. Partially linear additive hazards regression for clustered and right censored data. Bulletin of Informatics and Cybernetics 2022, 54, 1–14. [Google Scholar] [CrossRef]
Sklar, M. Fonctions de repartition an dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 1959, 8, 229–231. [Google Scholar]
Nelson, R.B. An Introduction to Copulas; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Joe, H. Multivariate Models and Dependence Concepts; CRC Press: Boca Raton, FL, USA, 1997. [Google Scholar]
Clayton, D.G. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 1978, 65, 141–151. [Google Scholar] [CrossRef]
Gumble, E.J. Bivariate exponential distributions. J. Am. Statitical Assoc. 1960, 55, 698–707. [Google Scholar] [CrossRef]
Zhou, Q.; Hu, T.; Sun, J. A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data. J. Am. Stat. Assoc. 2017, 112, 664–672. [Google Scholar] [CrossRef]
Carnicer, J.M.; Pen˜a, J.M. Shape preserving representations and optimality of the bernstein basis. Adv. Comput. Math. 1993, 1, 173–196. [Google Scholar] [CrossRef]
Burnham, K.P.; Anderson, D.R.; Burnham, K.P.; Anderson, D.R. Practical Use of the Information-Theoretic Approach; Springer: Berlin/Heidelberg, Germany, 1993; pp. 75–117. [Google Scholar]
Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
Age-Related Eye Disease Study Research Group. The age-related eye disease study (AREDS): Design implications AREDS report no. 1. Control. Clin. Trials 1999, 20, 573. [Google Scholar] [CrossRef]
Sun, T.; Ding, Y. CopulaCenR: Copula based regression models for bivariate censored data in R. R J. 2020, 12, 266. [Google Scholar] [CrossRef]
Huang, J. Efficient estimation for the proportional hazards model with interval censoring. Ann. Stat. 1996, 24, 540–568. [Google Scholar] [CrossRef]
Zhang, Y.; Hua, L.; Huang, J. A spline-based semiparametric maximum likelihood estimation method for the cox model with interval-censored data. Scand. J. Stat. 2010, 37, 338–354. [Google Scholar] [CrossRef]
Huang, J.; Rossini, A. Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J. Am. Stat. Assoc. 1997, 92, 960–967. [Google Scholar] [CrossRef]
Wen, C.C.; Chen, Y.H. A frailty model approach for regression analysis of bivariate interval-censored survival data. Stat. Sin. 2013, 23, 383–408. [Google Scholar] [CrossRef]
Van der Vaart, A.; Wellner, J. Weak Convergence and Empirical Processes; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
Shen, X.; Wong, W.H. Convergence rate of sieve estimates. Ann. Stat. 1994, 22, 580–615. [Google Scholar] [CrossRef]
Lorentz, G.G. Bernstein Polynomials; American Mathematical Society: Providence, RI, USA, 2013. [Google Scholar]

Figure 1. Estimated g with

β = 0

and

τ = 0.3

; the solid red lines represent the real functions and the dashed blue lines show the estimated functions. (a)

g (z) = sin (2 π z)

; (b)

g (z) = cos (2 π z)

; (c)

g (z) = \frac{5}{3} - 5 z^{2}

; (d)

g (z) = - z + 5 z^{3}

.

Figure 1. Estimated g with

β = 0

and

τ = 0.3

; the solid red lines represent the real functions and the dashed blue lines show the estimated functions. (a)

g (z) = sin (2 π z)

; (b)

g (z) = cos (2 π z)

; (c)

g (z) = \frac{5}{3} - 5 z^{2}

; (d)

g (z) = - z + 5 z^{3}

.

Figure 2. Estimated g with

β = 0

and

τ = 0.6

; the solid red lines represent the real functions and the dashed blue lines show the estimated functions. (a)

g (z) = sin (2 π z)

; (b)

g (z) = cos (2 π z)

; (c)

g (z) = \frac{5}{3} - 5 z^{2}

; (d)

g (z) = - z + 5 z^{3}

.

Figure 2. Estimated g with

β = 0

and

τ = 0.6

; the solid red lines represent the real functions and the dashed blue lines show the estimated functions. (a)

g (z) = sin (2 π z)

; (b)

g (z) = cos (2 π z)

; (c)

g (z) = \frac{5}{3} - 5 z^{2}

; (d)

g (z) = - z + 5 z^{3}

.

Figure 3. Estimated nonlinear effect of SevScaleBL for the illustration.

Table 1. Estimation results on

β

and

τ

with

g (z) = s i n (\frac{1}{2} π z)

.

Table 1. Estimation results on

β

and

τ

with

g (z) = s i n (\frac{1}{2} π z)

.

			$\hat{β}$				$\hat{τ}$
$τ$	n	$β$	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP
0.3	20	0	−0.047	1.611	2.702	1.000	−0.145	0.477	2.262	0.991
		0.5	0.029	1.747	3.164	0.998	−0.119	0.541	1.842	0.987
	50	0	0.034	0.692	0.831	0.978	−0.061	0.276	0.304	0.982
		0.5	0.021	0.642	0.806	0.988	−0.048	0.286	0.303	0.977
	100	0	−0.008	0.468	0.477	0.952	−0.022	0.145	0.172	0.978
		0.5	0.009	0.469	0.498	0.965	−0.015	0.157	0.179	0.975
	300	0	−0.004	0.254	0.254	0.949	−0.008	0.072	0.079	0.959
		0.5	−0.007	0.271	0.263	0.940	0.003	0.073	0.081	0.961
	400	0	0.003	0.219	0.220	0.951	−0.011	0.063	0.065	0.956
		0.5	−0.005	0.237	0.228	0.940	−0.000	0.061	0.067	0.958
0.6	20	0	−0.080	1.718	3.595	1.000	−0.051	0.413	2.797	0.979
		0.5	0.035	1.751	3.564	1.000	−0.029	0.403	2.639	0.979
	50	0	−0.023	0.578	0.749	0.983	0.015	0.177	0.233	0.980
		0.5	0.025	0.576	0.799	0.990	−0.012	0.202	0.236	0.965
	100	0	−0.012	0.367	0.393	0.962	0.013	0.103	0.132	0.954
		0.5	0.020	0.365	0.418	0.973	0.007	0.114	0.137	0.958
	300	0	−0.007	0.201	0.206	0.956	−0.009	0.064	0.066	0.933
		0.5	0.015	0.206	0.215	0.962	0.005	0.064	0.069	0.931
	400	0	0.006	0.178	0.177	0.949	−0.008	0.055	0.056	0.932
		0.5	0.011	0.188	0.184	0.946	0.001	0.055	0.057	0.945

Table 2. Estimation results on

β

and

τ

with

g (z) = z^{2}

.

Table 2. Estimation results on

β

and

τ

with

g (z) = z^{2}

.

			$\hat{β}$				$\hat{τ}$
$τ$	n	$β$	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP
0.3	20	0	−0.045	1.737	2.162	1.000	−0.138	0.485	2.536	0.983
		0.5	0.074	1.922	2.941	0.997	−0.108	0.435	1.260	0.987
	50	0	0.045	0.732	0.919	0.983	−0.052	0.215	0.249	0.993
		0.5	0.076	0.767	0.961	0.989	−0.046	0.209	0.248	0.985
	100	0	−0.010	0.509	0.531	0.956	−0.024	0.122	0.149	0.971
		0.5	0.027	0.511	0.552	0.967	−0.019	0.120	0.145	0.979
	300	0	−0.009	0.280	0.285	0.950	−0.013	0.060	0.066	0.956
		0.5	−0.013	0.296	0.296	0.947	−0.005	0.057	0.064	0.968
	400	0	−0.004	0.247	0.245	0.948	−0.010	0.052	0.055	0.960
		0.5	−0.016	0.252	0.253	0.951	−0.003	0.056	0.057	0.956
0.6	20	0	0.157	1.767	2.243	1.000	−0.046	0.361	1.798	0.974
		0.5	0.195	2.134	2.373	0.997	−0.034	0.329	0.636	0.972
	50	0	0.034	0.608	0.815	0.989	−0.029	0.158	0.208	0.984
		0.5	0.025	0.632	0.843	0.979	−0.018	0.170	0.237	0.984
	100	0	−0.008	0.403	0.435	0.965	−0.020	0.096	0.116	0.969
		0.5	0.028	0.417	0.449	0.963	0.014	0.095	0.118	0.970
	300	0	−0.003	0.222	0.225	0.956	−0.016	0.052	0.057	0.960
		0.5	0.020	0.231	0.234	0.953	−0.010	0.057	0.058	0.956
	400	0	−0.001	0.199	0.195	0.957	−0.018	0.046	0.048	0.936
		0.5	0.011	0.195	0.201	0.946	−0.006	0.050	0.052	0.952

Table 3. Estimation results on

β

and

τ

with

g (z) = s i n (\frac{1}{2} π z)

and two covariates.

Table 3. Estimation results on

β

and

τ

with

g (z) = s i n (\frac{1}{2} π z)

and two covariates.

		$n = 200$				$n = 400$
Param	True Value	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP
$β_{1}$	0	0.010	0.457	0.433	0.950	0.007	0.318	0.299	0.948
$β_{2}$	0	0.010	0.571	0.550	0.947	0.007	0.392	0.380	0.947
$τ$	0.3	−0.009	0.089	0.103	0.963	−0.003	0.064	0.065	0.936
$β_{1}$	0.1	−0.007	0.316	0.321	0.959	0.006	0.218	0.221	0.953
$β_{2}$	0.1	−0.009	0.532	0.557	0.954	0.006	0.386	0.385	0.951
$τ$	0.3	−0.005	0.092	0.105	0.967	−0.003	0.063	0.066	0.943
$β_{1}$	0	−0.015	0.244	0.258	0.960	−0.009	0.176	0.177	0.948
$β_{2}$	0	−0.015	0.461	0.448	0.938	−0.004	0.300	0.308	0.947
$τ$	0.6	−0.001	0.081	0.083	0.931	−0.006	0.056	0.058	0.949
$β_{1}$	0.1	−0.003	0.257	0.261	0.955	0.003	0.183	0.179	0.945
$β_{2}$	0.1	−0.003	0.434	0.452	0.964	0.003	0.316	0.309	0.941
$τ$	0.6	0.002	0.080	0.084	0.931	0.001	0.055	0.056	0.938

Table 4. Estimation results on

β

and

τ

with the time-dependent covariates.

Table 4. Estimation results on

β

and

τ

with the time-dependent covariates.

			$\hat{β}$				$\hat{τ}$
n	$τ$	$β$	Bias	SSE	ESE	CP	Bias	SSE	ESE	CP
200	0.3	0	−0.016	0.265	0.270	0.946	−0.025	0.098	0.113	0.974
		0.5	0.027	0.279	0.286	0.948	−0.012	0.098	0.117	0.970
		1	0.023	0.301	0.310	0.954	−0.007	0.102	0.123	0.963
	0.6	0	0.025	0.231	0.234	0.942	−0.016	0.090	0.095	0.940
		0.5	0.046	0.233	0.243	0.957	−0.008	0.092	0.094	0.948
		1	0.042	0.259	0.265	0.945	−0.010	0.093	0.095	0.936
400	0.3	0	0.006	0.222	0.204	0.945	−0.004	0.060	0.070	0.957
		0.5	0.026	0.203	0.198	0.944	−0.003	0.068	0.071	0.959
		1	0.008	0.207	0.214	0.954	0.003	0.070	0.074	0.953
	0.6	0	0.020	0.159	0.165	0.953	−0.010	0.062	0.062	0.941
		0.5	0.038	0.155	0.163	0.952	−0.003	0.060	0.062	0.943
		1	0.028	0.182	0.183	0.946	0.005	0.061	0.063	0.936

Table 5. Analysis results for the AREDS data.

	$β$			$τ$
Bernstein Polynomials	EST	ESE	p-Value	EST	ESE
$m_{1} = 3, m_{2} = 3$	0.041	0.009	<0.000	0.473	0.043
$m_{1} = 4, m_{2} = 4$	0.046	0.008	<0.000	0.416	0.056
$m_{1} = 5, m_{2} = 5$	0.034	0.008	<0.000	0.389	0.048
$m_{1} = 6, m_{2} = 6$	0.024	0.009	0.011	0.388	0.056

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Zhao, S.; Hu, T.; Sun, J. Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data. Axioms 2023, 12, 198. https://doi.org/10.3390/axioms12020198

AMA Style

Zhang X, Zhao S, Hu T, Sun J. Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data. Axioms. 2023; 12(2):198. https://doi.org/10.3390/axioms12020198

Chicago/Turabian Style

Zhang, Ximeng, Shishun Zhao, Tao Hu, and Jianguo Sun. 2023. "Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data" Axioms 12, no. 2: 198. https://doi.org/10.3390/axioms12020198

APA Style

Zhang, X., Zhao, S., Hu, T., & Sun, J. (2023). Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data. Axioms, 12(2), 198. https://doi.org/10.3390/axioms12020198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Partially Linear Additive Hazards Regression for Bivariate Interval-Censored Data

Abstract

1. Introduction

2. Assumptions and Likelihood Function

3. Sieve Maximum Likelihood Estimation

4. A Simulation Study

5. An Illustration

6. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proof of Asymptotic Properties

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI