Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models

Wang, Wan-Lun

doi:10.3390/e17085353

Open AccessArticle

Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models

by

Wan-Lun Wang

Department of Statistics, Graduate Institute of Statistics and Actuarial Science, Feng Chia University, Taichung 40724, Taiwan

Entropy 2015, 17(8), 5353-5381; https://doi.org/10.3390/e17085353

Submission received: 21 April 2015 / Revised: 17 July 2015 / Accepted: 21 July 2015 / Published: 29 July 2015

(This article belongs to the Special Issue Inductive Statistical Methods)

Download

Browse Figures

Versions Notes

Abstract

:

Multivariate nonlinear mixed-effects models (MNLMM) have received increasing use due to their flexibility for analyzing multi-outcome longitudinal data following possibly nonlinear profiles. This paper presents and compares five different iterative algorithms for maximum likelihood estimation of the MNLMM. These algorithmic schemes include the penalized nonlinear least squares coupled to the multivariate linear mixed-effects (PNLS-MLME) procedure, Laplacian approximation, the pseudo-data expectation conditional maximization (ECM) algorithm, the Monte Carlo EM algorithm and the importance sampling EM algorithm. When fitting the MNLMM, it is rather difficult to exactly evaluate the observed log-likelihood function in a closed-form expression, because it involves complicated multiple integrals. To address this issue, the corresponding approximations of the observed log-likelihood function under the five algorithms are presented. An expected information matrix of parameters is also provided to calculate the standard errors of model parameters. A comparison of computational performances is investigated through simulation and a real data example from an AIDS clinical study.

Keywords:

importance sampling; Laplacian approximation; Monte Carlo EM; penalized nonlinear least squares; pseudo expectation conditional maximization

MSC Classification:

62H12; 62J02

1. Introduction

Analysis of multi-outcome longitudinal data with various features has attracted considerable interest in clinical trials, biological psychology, environmental science and medical research, to name a few. The methodology of multivariate linear mixed-effects models (MLMM) [1] and multivariate nonlinear mixed-effects models (MNLMM) [2] has been developed for related work. A comprehensive study of the MLMM along with its applications can be found in [3,4,5,6,7], among others. Nonlinear models for repeated-measures data rest on more complicated mathematical derivations and heavier computational requirements than linear models, but they can offer flexibility in capturing a broader range of data patterns. Several approaches to carrying out maximum likelihood (ML) estimation of nonlinear mixed-effects models (NLMM) for single-outcome longitudinal data have been studied; see, for example, [8,9,10,11,12]. Bayesian inference in NLMM via Markov chain Monte Carlo (MCMC) procedures can be found, for instance, in [13,14,15]. Although the use of the NLMM, as well as its extensions in other families of distributions have been pretty well established in the literature, to the best of our knowledge, exploration of the inference on MNLMM is relatively rare so far. Analyzing each response variable of the data by fitting the NLMM separately might be inappropriate and fail to take account of the between-variable association, as well as its evolution.

For the general NLMM, the linearization method [8,16] that exploits a first-order Taylor expansion to approximate the nonlinear function in terms of a linear pseudo-data model is by far the most widely-used approach due to its numerical simplicity. Despite its popularity, [17] argued that the linearization method may produce substantial bias in parameter estimation, as the number of observations per subject is too small, and the variability of random effects tends to be large at the same time. Although computationally much simpler, the Laplace approximation method [10] can also lead to considerably-biased parameter estimates, depending on the quality of the mode. As an alternative to the pseudo-data and Laplace approximation approaches, the integral approximation methods that use Monte Carlo integration [18] or importance sampling [19] to approximate the observed likelihood may provide more accurate estimates than the linearization method. However, the numerical integration methods are generally inefficient to implement and become computationally prohibitive when the dimension of random effects increases [20]. Over the past few decades, several estimation algorithms for NLMM have been developed and implemented in different software. For example, the linearization methods using the first-order Taylor expansion [21] or the first-order conditional estimation (FOCE) [8,16] are embedded in R function nlme, while the Laplace approximation method is implemented in NONMEM [22] and the SAS macro NLIMIX [23]. A new SAS macro NLMIXED incorporating adaptive Gaussian quadrature has shown considerable improvement [24]. The other improved procedure based on the stochastic approximation expectation maximization [25] was implemented in MONOLIX [26], NONMEM [27] and R package saemix [28]. Multivariate nonlinear mixed-effects models can be fitted using ad hoc manipulation by expanding the design matrix with extra columns of dummy covariates flagging each element of the original multivariate responses.

Consider the multiple repeated measures

{(Y_{i}, X_{i}), i = 1, ..., N}

, where

Y_{i}

is a

s_{i} \times r

response matrix composed of r response vectors

y_{i j} = {(y_{i j, 1}, ..., y_{i j, s_{i}})}^{T}

,

j = 1, ..., r

, and

X_{i}

is the covariate matrix for the i-th subject. Let

E_{i} = [e_{i 1} : e_{i 2} : \dots : e_{i r}]

be the

s_{i} \times r

matrix of within-subject errors associated with

Y_{i}

, where

e_{i j} = (e_{i j, 1}, ..., e_{i j, s_{i}})

. Let

y_{i} = vec (Y_{i})

and

e_{i} = vec (E_{i})

denote the stacked

s_{i} r \times 1

vectors of all responses and within-subject errors, respectively.

In general, the MNLMM takes the form of:

\begin{matrix} y_{i} = μ_{i} (η_{i}, X_{i}) + e_{i}, i = 1, ..., N \end{matrix}

(1)

where

μ_{i} = μ_{i} (η_{i}, X_{i})

is a nonlinearly-differentiable function of a subject-specific parameter

η_{i}

governing the within-profile behaviors and

e_{i}

is a vector containing normally-distributed error components. Moreover, the fixed effects β and the random effects b_i can be incorporated into the model by letting:

η_{i} = A_{i} β + B_{i} b_{i}

(2)

where

A_{i}

and

B_{i}

are design matrices of size

s \times p

and

s \times q

, respectively. We assume that b_i follows a multivariate normal distribution with mean vector 0 and

q \times q

variance-covariance matrix D, denoted by

b_{i} \sim N_{q} (0, D)

, and independent of

e_{i} \sim N_{s_{i} r} (0, R_{i})

. The joint distributions of

{(b_{i}^{T}, e_{i}^{T})}^{T}

for distinct subjects are independent. To reduce the number of parameters in

R_{i}

, we assume that the k-th row of

E_{i}

, say

e_{i \cdot k}

, follows

N_{r} (0, \sum)

, and the j-th column of

E_{i}

, say

e_{i j \cdot}

, follows

N_{s_{i}} (0, C_{i})

, such that

R_{i} = \sum \otimes C_{i}

. This specification implies that within-subject errors for all responses measured at the same occasion have variance-covariance ∑. To capture the extra autocorrelation of a given response among irregularly-observed occasions, some parsimonious dependence structures can be made on

C_{i}

, such as the compound symmetry, the p-order autoregressive model [29,30] and the damped exponential correlation [31]. For simplicity, we write

C_{i} = C_{i} (ϕ)

, which depends on subject i according to its dimension

s_{i}

with each entry being a function of a small set of parameters ϕ describing within-subject autocorrelation.

Let

θ = (β, D, \sum, ϕ)

be the entire model parameters. According to Model Equation (1) with Assumption Equation (2), the marginal density of

y_{i}

is:

f (y_{i} | θ) = \int ϕ_{s_{i} r} (y_{i} | μ_{i}, R_{i}) ϕ_{q} (b_{i} | 0, D) d b_{i},

(3)

where

ϕ_{d} (\cdot | μ, Ω)

denotes the probability density function (pdf) of a d-variate normal distribution with mean vector μ and variance-covariance matrix Ω. Typically, this integral cannot yield a closed-form expression when the vector-valued function

μ_{i} = μ_{i} (η_{i}, X_{i})

is nonlinear in random effects

b_{i}

. Thus, the log-likelihood function of θ for

y = {y_{1}, ..., y_{N}}

is given by:

\begin{matrix} ℓ (θ | y) & = & \sum_{i = 1}^{N} log {\int {(2 π)}^{- (s_{i} r + q) / 2} {| \sum |}^{- s_{i} / 2} {| C_{i} |}^{- r / 2} {| D |}^{- 1 / 2} \\ \times exp \{- \frac{1}{2} [{(y_{i} - μ_{i})}^{T} R_{i}^{- 1} (y_{i} - μ_{i}) + b_{i}^{T} D^{- 1} b_{i}]\} d b_{i}} . \end{matrix}

(4)

The purpose of this article is to consider five different methods for carrying out ML estimation of the MNLMM described in Equation (1) along with Equation (2) and for approximating the observed log-likelihood Function Equation (4). The methods include the penalized nonlinear least squares coupled to multivariate linear mixed effects (PNLS-MLME) approximation [8], Laplacian approximation [32], a pseudo-data version of the expectation conditional maximization (ECM) algorithm [33], the Monte Carlo EM (MCEM) algorithm [34] and the importance sampling EM (ISEM) algorithm [35]. The approximation to the observed log-likelihood is based on the standard Taylor expansion and is easy to calculate within the algorithms. A simple way of computing standard errors of parameters via the information-based method is provided.

The article is organized as follows. In Section 2, we describe the five computational procedures for ML estimation of the MNLMM together with the calculation of standard errors of parameters. In Section 3, the proposed methodology is illustrated with the analysis of HIV-AIDS data. Section 4 presents a comparison of the five approximation methods through simulation studies. We summarize and discuss implications in Section 5. The technical derivations are collected in the Appendix.

2. Five Approximate ML Procedures

From Model Equation (1), the j-th column (outcome) of

Y_{i}

, say

y_{i j} = {(y_{i j, 1}, ..., y_{i j, s_{i}})}^{T}

, can be formulated as:

y_{i j} = μ_{i j} (η_{i}, x_{i j}) + e_{i j},

where

μ_{i j} (η_{i}, x_{i j}) = {(μ_{j} (η_{i}, x_{i j, 1}), ..., μ_{j} (η_{i}, x_{i j, s_{i}}))}^{T}

and

e_{i j} = {(e_{i j, 1}, ..., e_{i j, s_{i}})}^{T}

. Analogously, the model for the k-th row (occasion) can be expressed as:

y_{i, k} = μ_{i}^{k} (η_{i}, x_{i k}) + e_{i, k},

where

y_{i, k} = {(y_{i 1, k}, ..., y_{i r, k})}^{T}

,

μ_{i}^{k} (η_{i}, x_{i k}) = {(μ_{1} (η_{i}, x_{i 1, k}), ..., μ_{r} (η_{i}, x_{i r, k}))}^{T}

and

e_{i, k} = {(e_{i 1, k}, ..., e_{i r, k})}^{T}

. We present five algorithms for employing ML estimation of Model Equation (1). The approximation to the observed log-likelihood Function Equation (4) and the calculation of standard errors of parameters are discussed, as well.

2.1. PNLS-MLME Procedure

Following the linear mixed-effects (LME) approximation method suggested by [8], the first procedure consists of two steps: a penalized nonlinear least squares (PNLS) step and a multivariate LME (MLME) step. The basic idea behind this procedure is that we estimate the unobservable random effects

b_{i}

via the PNLS step and then update the ML estimates of parameters θ based on the formulation of MLMM for the pseudo-data. Specifically, the proposed PNLS-MLME procedure is sketched below.

In the PNLS step, first define:

g (y_{i}, b_{i}, θ) = {(y_{i} - μ_{i} (β, b_{i}))}^{T} {(\sum ⨂ C_{i})}^{- 1} (y_{i} - μ_{i} (β, b_{i})) + b_{i}^{T} D^{−1} b_{i}

(5)

where

μ_{i} (β, b_{i}) = μ_{i} (η_{i}, X_{i})

, for

i = 1, 2, \dots, N

, is a function of fixed effects β and random effects

b_{i}

. Fixing the current estimates of parameters

{\hat{θ}}^{(h)} = ({\hat{β}}^{(h)}, {\hat{D}}^{(h)}, {\sum^{^}}^{(h)}, {\hat{ϕ}}^{(h)})

, the conditional modes of random effects

b_{i}

are obtained through minimizing a penalized nonlinear least-squaresobjective function:

{{\hat{b}}_{i}^{(h)}}_{i = 1}^{N} = arg min \sum_{i = 1}^{N} g (y_{i}, b_{i}, {\hat{θ}}^{(h)})

(6)

The joint distributions

{(b_{i}^{T}, e_{i}^{T})}^{T}

for distinct subjects are independent, and thus, all

y_{i}

are independent of each other. In practice, solving over

{\hat{b}}_{i}^{(h)}

for each subject can be implemented by minimizing

g (y_{i}, b_{i}, {\hat{θ}}^{(h)})

with respect to q-dimensional random effects of one subject at a time, rather than finding the solutions with respect to those of all subjects simultaneously.

In the MLME step, which allows updating the parameter estimates, we utilize the first-order Taylor expansion of Model Equation (1) around the current estimates

{\hat{η}}_{i}^{(h)} = A_{i} {\hat{β}}^{(h)} + B_{i} {\hat{b}}_{i}^{(h)}

, that is,

y_{i j, k} - μ_{j} ({\hat{η}}_{i}^{(h)}, x_{i j, k}) + {\dot{μ}}_{j} {({\hat{η}}_{i}^{(h)}, x_{i j, k})}^{T} {\hat{η}}_{i}^{(h)} = {\dot{μ}}_{j} {({\hat{η}}_{i}^{(h)}, x_{i j, k})}^{T} η_{i} + e_{i j, k},

where

{\dot{μ}}_{j}

,

j = 1, ..., r

, are the first partial derivatives of

μ_{j}

with respect to

η_{i}

and β and

b_{i}

are replaced by

{\hat{β}}^{(h)}

and

{{\hat{b}}_{i}^{(h)}}_{i = 1}^{N}

, respectively. Denote the pseudo-data by:

{\tilde{y}}_{i j, k} = y_{i j, k} - μ_{j} ({\hat{η}}_{i}^{(h)}, x_{i j, k}) + {\tilde{x}}_{i j k} {\hat{β}}^{(h)} + {\tilde{z}}_{i j, k} {\hat{b}}_{i}^{(h)},

(7)

where

{\tilde{x}}_{i j k} = {\dot{μ}}_{j} {({\hat{η}}_{i}^{(h)}, x_{i j, k})}^{T} A_{i}

and

{\tilde{z}}_{i j k} = {\dot{μ}}_{j} {({\hat{η}}_{i}^{(h)}, x_{i j, k})}^{T} B_{i}

. Consequently, Model Equation (1) can be rewritten as:

{\tilde{y}}_{i j, k} = {\tilde{x}}_{i j k} β + {\tilde{z}}_{i j k} b_{i} + e_{i j, k} .

The model for the super vector of the pseudo-data for the i-th subject is:

{\tilde{y}}_{i} = {\tilde{X}}_{i} β + {\tilde{Z}}_{i} b_{i} + e_{i},

(8)

where

{\tilde{y}}_{i}

is a

s_{i} r \times 1

vector composed of r pseudo-response vectors

{\tilde{y}}_{i j} = {({\tilde{y}}_{i j, 1}, \dots, {\tilde{y}}_{i j, s_{i}})}^{T}

,

{\tilde{X}}_{i}

is a

s_{i} r \times p

matrix with rows made up of

p \times 1

vector

{\tilde{x}}_{i j k}

and

{\tilde{Z}}_{i}

is a

s_{i} r \times q

matrix with rows made up of

q \times 1

vector

{\tilde{z}}_{i j k}

. Obviously, Model Equation (8) for the pseudo-data is shown in an LME representation, so the estimation procedure becomes much simpler.

Therefore, the log-likelihood function of θ according to Model Equation (8) can be approximated by:

\begin{matrix} ℓ_{PD} (θ | y) & ≅ & - \frac{1}{2} \sum_{i = 1}^{N} {s_{i} r log (2 π) + log | {\tilde{Z}}_{i} D {\tilde{Z}}_{i}^{T} + \sum \otimes C_{i} | \\ + {({\tilde{y}}_{i} - {\tilde{X}}_{i} β)}^{T} {({\tilde{Z}}_{i} D {\tilde{Z}}_{i}^{T} + \sum \otimes C_{i})}^{- 1} ({\tilde{y}}_{i} - {\tilde{X}}_{i} β)} . \end{matrix}

(9)

In the MLME step, we update

{\hat{β}}^{(h)}

by a generalized least-squares approach, which yields:

\begin{matrix} {\hat{β}}^{(h + 1)} & = & {(\sum_{i = 1}^{N} {\tilde{X}}_{i}^{T} {({\tilde{Z}}_{i} {\hat{D}}^{(h)} {\tilde{Z}}_{i}^{T} + {\sum^{^}}^{(h)} \otimes {\hat{C}}_{i}^{(h)})}^{- 1} {\tilde{X}}_{i})}^{- 1} \\ \times \sum_{i = 1}^{N} {\tilde{X}}_{i}^{T} {({\tilde{Z}}_{i} {\hat{D}}^{(h)} {\tilde{Z}}_{i}^{T} + {\sum^{^}}^{(h)} \otimes {\hat{C}}_{i}^{(h)})}^{- 1} {\tilde{y}}_{i} . \end{matrix}

(10)

Denote the half-vectorization operator by

vech (\cdot)

, which represents a column vector obtained by vectorizing only the lower triangular entries of a symmetric matrix. Given the current estimate

{\hat{β}}^{(h + 1)}

, we update

{\hat{α}}^{(h)} = (vech ({\hat{D}}^{(h)}), vech ({\sum^{^}}^{(h)}), {\hat{ϕ}}^{(h)})

by the Newton–Raphson method:

{\hat{α}}^{(h + 1)} = {\hat{α}}^{(h)} - {\hat{H}}_{α α}^{{(h + 1 / 2)}^{−1}} {\hat{S}}_{α}^{(h + 1 / 2)},

(11)

where

{\hat{S}}_{α}^{(h + 1 / 2)}

and

{\hat{H}}_{α α}^{(h + 1 / 2)}

are the score vector

S_{α}

and Hessian matrix

H_{α α}

evaluated at

β = {\hat{β}}^{(h + 1)}

and

α = {\hat{α}}^{(h)}

. Explicit expressions for elements in

S_{α}

and

H_{α α}

are given in Appendix.

Iterations of Equations (6), (10) and (11) continue until either the maximum number of iterations or the user-specified convergence tolerance has been achieved.

2.2. Laplacian Procedure

From Function Equation (3) and Definition Equation (5), we have the joint density of

(y_{i}, b_{i})

, denoted by

f (y_{i}, b_{i} | θ) = ϕ_{s_{i} r} (y_{i} | μ_{i}, R_{i}) ϕ_{q} (b_{i} | 0, D)

, and the marginal density of

y_{i}

, given by:

f (y_{i} | θ) = \int {(2 π)}^{- (s_{i} r + q) / 2} {| R_{i} |}^{- 1 / 2} D^{- 1 / 2} \exp {- \frac{1}{2} g (y_{i}, b_{i}, θ)} d b_{i} .

(12)

Laplacian approximation [32,36] is an alternative technique to estimate the marginal densities or posterior predictive densities, which involve integrating out all non-target variables. We next discuss how to adopt the Laplacian approximation to evaluate Equation (12) and develop the corresponding estimation algorithm.

Set an initial guess of random effects

b_{i}

to be:

{\hat{b}}_{i} = {\hat{b}}_{i} (y_{i}, θ) = arg max_{b_{i}} f (y_{i}, b_{i} | θ) = arg min_{b_{i}} g (y_{i}, b_{i}, θ) .

Consider the second-order Taylor expansion of

g (y_{i}, b_{i}, θ)

around

{\hat{b}}_{i}

. It yields:

\begin{matrix} g (y_{i}, b_{i}, θ) & \approx & g (y_{i}, {\hat{b}}_{i}, θ) - \dot{g} (y_{i}, {\hat{b}}_{i}, θ) (b_{i} - {\hat{b}}_{i}) + \frac{1}{2} {(b_{i} - {\hat{b}}_{i})}^{T} \ddot{g} (y_{i}, {\hat{b}}_{i}, θ) (b_{i} - {\hat{b}}_{i}) \\ \approx & g (y_{i}, {\hat{b}}_{i}, θ) + \frac{1}{2} {(b_{i} - {\hat{b}}_{i})}^{T} \ddot{g} (y_{i}, {\hat{b}}_{i}, θ) (b_{i} - {\hat{b}}_{i}), \end{matrix}

because

\dot{g} (y_{i}, {\hat{b}}_{i}, θ) = 0

, where the first two partial derivatives of

g (y_{i}, b_{i}, θ)

with respect to

b_{i}

are:

\begin{matrix} \dot{g} (y_{i}, b_{i}, θ) = - 2 (\frac{\partial μ_{i} (β, b_{i})}{\partial b_{i}^{T}} |_{b_{i} = {\hat{b}}_{i}} R_{i}^{- 1} (y_{i} - μ_{i} (β, {\hat{b}}_{i})) - D^{- 1} b_{i}), \end{matrix}

and:

\ddot{g} (y_{i}, b_{i}, θ) = - 2 (\frac{\partial^{2} μ_{i}}{\partial b_{i} \partial b_{i}^{T}} |_{b_{i} = {\hat{b}}_{i}} R_{i}^{- 1} (y_{i} - μ_{i}) - \frac{\partial μ_{i}}{\partial b_{i}^{T}} |_{b_{i} = {\hat{b}}_{i}} R_{i}^{- 1} \frac{\partial μ_{i}}{\partial b_{i}} |_{b_{i} = {\hat{b}}_{i}} - D^{- 1}),

respectively. Notice that the contribution of the term involving the second derivative of

μ_{i}

in

\ddot{g} (y_{i}, b_{i}, θ)

is usually negligible compared to that involving the product of the first derivative of

μ_{i}

at

b_{i} = {\hat{b}}_{i}

[37]. We hereby define:

\ddot{g} (y_{i}, {\hat{b}}_{i}, θ) ≅ G (y_{i}, θ) = 2 (\frac{\partial μ_{i} (β, b_{i})}{\partial b_{i}^{T}} |_{b_{i} = {\hat{b}}_{i}}^{T} R_{i}^{- 1} \frac{\partial μ_{i} (β, b_{i})}{\partial b_{i}} |_{b_{i} = {\hat{b}}_{i}} + D^{- 1}) .

(13)

Consequently, the Laplacian approximation to log-likelihood Equation (4) is:

\begin{matrix} ℓ_{L A} (θ | y) & ≅ & log {\prod_{i = 1}^{N} {(2 π)}^{- \frac{s_{i} r + q}{2}} | R_{i} |^{- \frac{1}{2}} {| D |}^{- \frac{1}{2}} exp \{- \frac{1}{2} g (y_{i}, {\hat{b}}_{i}, θ)\} \\ \times \int exp \{- \frac{1}{4} {(b_{i} - {\hat{b}}_{i})}^{T} \ddot{g} (y_{i}, {\hat{b}}_{i}, θ) (b_{i} - {\hat{b}}_{i})\} d b_{i}} \end{matrix}

(14)

\begin{matrix} = & - \frac{1}{2} \sum_{i = 1}^{N} {s_{i} r log (2 π) + log | R_{i} | + log | D | + log | \frac{1}{2} G (y_{i}, θ) | \\ + {(y_{i} - μ_{i} (β, {\hat{b}}_{i}))}^{T} R_{i}^{- 1} (y_{i} - μ_{i} (β, {\hat{b}}_{i})) + {\hat{b}}_{i}^{T} D^{- 1} {\hat{b}}_{i}} . \end{matrix}

(15)

with regard to ML estimation of θ, we can treat it as an optimization problem based on

ℓ_{L A} (θ | y)

. Subsequently, we estimate D by taking the first partial derivative of Equation (15) with respect to

D^{- 1}

and setting it to zero, yielding:

\hat{D} = N^{- 1} \sum_{i = 1}^{N} {\hat{b}}_{i} {\hat{b}}_{i}^{T} .

By maximizing Equation (15), the estimates of β, ∑ and ϕ react with one another, and thus, we perform an iterative algorithm that proceeds as follows. Given

\hat{D}

and the current estimates

{\hat{β}}^{(h)}

and

{\hat{ϕ}}^{(h)}

, we update the diagonal elements in

{\sum^{^}}^{(h)}

by:

{\hat{σ}}_{j j}^{(h + 1)} = {(\sum_{i = 1}^{N} s_{i})}^{- 1} \sum_{i = 1}^{N} tr (C_{i} {({\hat{ϕ}}^{(h)})}^{- 1} (y_{i j} - {\hat{μ}}_{i j}^{(h + 1)}) {(y_{i j} - {\hat{μ}}_{i j}^{(h + 1)})}^{T}),

and the off-diagonal elements by:

\begin{matrix} {\hat{σ}}_{j l}^{(h + 1)} & = & {(2 \sum_{i = 1}^{N} s_{i})}^{- 1} \sum_{i = 1}^{N} tr (C_{i} {({\hat{ϕ}}^{(h)})}^{- 1} [(y_{i j} - {\hat{μ}}_{i j}^{(h + 1)}) {(y_{i l} - {\hat{μ}}_{i l}^{(h + 1)})}^{T} \\ + (y_{i l} - {\hat{μ}}_{i l}^{(h + 1)}) {(y_{i j} - {\hat{μ}}_{i j}^{(h + 1)})}^{T}]), \end{matrix}

for

j, l = 1, ..., r

, where

{\hat{μ}}_{i j}^{(h + 1)}

is an

s_{i} \times 1

subvector consisting of the

((j - 1) s_{i} + 1)

-th to the

(j s_{i})

-th entries of

{\hat{μ}}_{i}^{(h + 1)} = μ_{i} ({\hat{β}}^{(h + 1)}, {\hat{b}}_{i})

. Unfortunately, equating the first partial derivatives of Equation (15) with respect to β and ϕ, respectively, to zero cannot deduce the updated estimators in closed form. Therefore, we use the nlminb routine [38] to perform a numerical search for updating

{\hat{β}}^{(h)}

and

{\hat{ϕ}}^{(h)}

sequentially. Specifically,

{\hat{β}}^{(h + 1)} = arg min_{β} \{\sum_{i = 1}^{N} {(y_{i} - μ_{i} (β, {\hat{b}}_{i}))}^{T} {({\sum^{^}}^{(h + 1)} \otimes C_{i} ({\hat{ϕ}}^{(h)}))}^{- 1} (y_{i} - μ_{i} (β, {\hat{B}}_{i}))\},

and:

\begin{array}{l} {\hat{ϕ}}^{(h + 1)} & = & arg min_{ϕ} {\sum_{i = 1}^{N} [log | \frac{1}{2} G (y_{i}, {\hat{θ}}_{(- ϕ)}^{(h + 1)}) | + r log | C_{i} (ϕ) | \\ + {(y_{i} - {\hat{μ}}_{i}^{(h + 1)})}^{T} {({\sum^{^}}^{(h + 1)} \otimes C_{i} (ϕ))}^{- 1} (y_{i} - {\hat{μ}}_{i}^{(h + 1)})]} . \end{array}

2.3. Pseudo-ECM Algorithm

According to the pseudo-data model specified in Equation (8), treating the random effects

{b_{i}}_{i = 1}^{N}

as latent data, we establish a complete-data framework of the model:

{\tilde{y}}_{i} | b_{i} \sim N_{s_{i} r} ({\tilde{X}}_{i} β + {\tilde{Z}}_{i} b_{i}, R_{i}), b_{i} \sim N_{q} (0, D), i = 1, ..., N .

Given the pseudo-complete data

\tilde{y} = {{\tilde{y}}_{i}}_{i = 1}^{N}

and

= {b_{i}}_{i = 1}^{N}

, the log-likelihood function of is:

ℓ_{C}^{P} (θ | \tilde{y}, b) = \prod_{i = 1}^{N} log (ϕ_{s_{i} r} ({\tilde{y}}_{i} | {\tilde{X}}_{i} β + {\tilde{Z}}_{i} b_{i}, R_{i}) ϕ_{q} (b_{i} | 0, D)) .

(16)

To carry out ML estimation for the MNLMM, we develop an ECM algorithm [33], which is a variant of EM [39], replacing its M steps by several computationally-simpler conditional maximization (CM) steps. It has several appealing features, such as stability of monotone convergence and simplicity of implementation. Hereafter, the procedure is referred to as the pseudo-ECM algorithm, because it is developed under the pseudo-data defined in Equation (7). The proposed implementation approach is outlined below.

E step:: Evaluate the expected complete-data log-likelihood Function Equation (16) conditioning on the current estimates ${\hat{θ}}^{(h)}$ and the pseudo-responses $\tilde{y} = \tilde{y} ({\hat{β}}^{(h)}, {\hat{b}}_{i}^{(h - 1)})$ , which linearize the regression function around the previous estimates of mixed effects $({\hat{β}}^{(h)}, {\hat{b}}_{i}^{(h - 1)})$ and should be updated at each iteration. This gives rise to the so-called Q-function:

$\begin{matrix} Q (θ | {\hat{θ}}^{(h)}) & = & - \frac{1}{2} \sum_{i = 1}^{N} {log | ∑ \otimes C_{i} | + log | D | + tr ({(∑ \otimes C_{i})}^{- 1} {\hat{Ω}}_{i}^{(h)}) \\ + tr (D^{- 1} {\hat{Ψ}}_{i}^{(h)})}, \end{matrix}$

(17)

where:

$\begin{matrix} {\hat{Ψ}}_{i}^{(h)} & = & E [b_{i} b_{i}^{T} | {\tilde{y}}_{i}, {\hat{θ}}^{(h)}] = {\tilde{b}}_{i}^{(h)} {\tilde{b}}_{i}^{{(h)}^{T}} + {({\hat{D}}^{{(h)}^{- 1}} + {\tilde{Z}}_{i}^{T} {\hat{R}}_{i}^{{(h)}^{- 1}} {\tilde{Z}}_{i})}^{- 1}, \\ {\hat{Ω}}_{i}^{(h)} & = & E [{\tilde{e}}_{i} {\tilde{e}}_{i}^{T} | {\tilde{y}}_{i}, {\hat{θ}}^{(h)}] = {\tilde{e}}_{i}^{(h)} {\tilde{e}}_{i}^{{(h)}^{T}} + {\tilde{Z}}_{i} {({\hat{D}}^{{(h)}^{- 1}} + {\tilde{Z}}_{i}^{T} {\hat{R}}_{i}^{{(h)}^{- 1}} {\tilde{Z}}_{i})}^{- 1} {\tilde{Z}}_{i}^{T} \end{matrix}$

with ${\hat{R}}_{i}^{(h)} = {\sum^{^}}^{(h)} \otimes C_{i} ({\hat{ϕ}}^{(h)})$ , ${\tilde{b}}_{i}^{(h)} = E [b_{i} | {\tilde{y}}_{i}, {\hat{θ}}^{(h)}] = {\hat{D}}^{(h)} {\tilde{Z}}_{i}^{T} {({\tilde{Z}}_{i} {\hat{D}}^{(h)} {\tilde{Z}}_{i}^{T} + {\hat{R}}_{i}^{(h)})}^{- 1} ({\tilde{y}}_{i} - {\tilde{X}}_{i} {\hat{β}}^{(h)})$ , and ${\tilde{e}}_{i}^{(h)} = E [{\tilde{e}}_{i} | {\tilde{y}}_{i}, {\hat{θ}}^{(h)}] = {\tilde{y}}_{i} - {\tilde{X}}_{i} β - {\tilde{Z}}_{i} {\tilde{b}}_{i}^{(h)}$ , where ${\tilde{y}}_{i} = {\tilde{y}}_{i} ({\hat{β}}^{(h)}, {\hat{b}}_{i}^{(h - 1)})$ represents the updated pseudo-responses.
CM step:: Update the current estimates ${\hat{β}}^{(h)}$ , ${\hat{D}}^{(h)}$ , ${\sum^{^}}^{(h)}$ and ${\hat{ϕ}}^{(h)}$ by maximizing the Q-function
Equation (17). We obtain:

$\begin{matrix} {\hat{β}}^{(h + 1)} & = & {(\sum_{i = 1}^{N} {\tilde{X}}_{i}^{T} {\hat{R}}_{i}^{{(h)}^{- 1}} {\tilde{X}}_{i})}^{- 1} (\sum_{i = 1}^{N} {\tilde{X}}_{i}^{T} {\hat{R}}_{i}^{{(h)}^{- 1}} ({\tilde{y}}_{i} - {\tilde{Z}}_{i} {\tilde{b}}_{i}^{(h)})), \\ {\hat{D}}^{(h + 1)} & = & N^{- 1} \sum_{i = 1}^{N} {\hat{Ψ}}_{i}^{(h)}, \\ {\hat{σ}}_{j l}^{(h + 1)} & = & \{\begin{matrix} {(\sum_{i = 1}^{N} s_{i})}^{- 1} \sum_{i = 1}^{N} tr ({\hat{C}}_{i} {({\hat{ϕ}}^{(h)})}^{- 1} {\hat{ω}}_{i j l}^{(h + 1 / 2)}), & for j = l, \\ {(2 \sum_{j = 1}^{N} s_{i})}^{- 1} \sum_{i = 1}^{N} tr ({\hat{C}}_{i} {({\hat{ϕ}}^{(h)})}^{- 1} [{\hat{ω}}_{i j l}^{(h + 1 / 2)} + {\hat{ω}}_{i l j}^{(h + 1 / 2)}]), & for j \neq l, \end{matrix} \\ {\hat{ϕ}}^{(h + 1)} & = & arg min_{ϕ} \{r \sum_{i = 1}^{N} log | C_{i} | + \sum_{i = 1}^{N} tr ({({\sum^{^}}^{(h + 1)} \otimes C_{i})}^{- 1} {\hat{Ω}}_{i}^{(h + 1 / 2)})\}, \end{matrix}$

where ${\hat{ω}}_{i j l}^{(h + 1 / 2)}$ is an $s_{i} \times s_{i}$ matrix consisting of the $((j - 1) s_{i} + 1)$ -th to the $(j s_{i})$ -th columns and rows of ${\hat{Ω}}_{i}^{(h)}$ in which β and D have been replaced by their updated estimates at the $h + 1$ iteration. Besides, ${\hat{Ω}}_{i}^{(h + 1 / 2)}$ in the above optimization function for ${\hat{ϕ}}^{(h + 1)}$ is ${\hat{Ω}}_{i}^{(h)}$ evaluated at $θ = {\hat{θ}}^{(h + 1)}$ , except for ϕ.

Given

{{\hat{b}}_{i}^{(0)}}_{i = 1}^{N}

and

{\hat{θ}}^{(0)}

, we implement the pseudo-ECM algorithm until the user's specified convergence criterion satisfies. Analogous to the PNLS-MLME method, this algorithm is established under the pseudo-data scenario. Hence, the resulting approximate log-likelihood value can be obtained by using Equation (9).

2.4. Monte Carlo EM Algorithm

We offer a Monte Carlo (MC) version of the EM algorithm [40] for ML estimation of Model Equation (1) and evaluate the observed log-likelihood Equation (4) via the MC integration. The MCEM is a modification of the EM algorithm in which the E step is computed numerically through a large number of simulated samples.

Given the complete data

(y, b)

, the log-likelihood function of θ for the MNLMM can beexpressed as:

\begin{matrix} ℓ_{c} (θ | y, b) = \sum_{i = 1}^{N} log (ϕ_{s_{i} r} (y_{i} | μ_{i} (β, b_{i}), R_{i}) ϕ_{q} (b_{i} | 0, D)) . \end{matrix}

(18)

In the E step, we compute the expectation of complete data log-likelihood Function Equation (18) to yield the Q-function:

\begin{matrix} Q (θ | {\hat{θ}}^{(h)}) & = & \sum_{i = 1}^{N} \int log ϕ_{s_{i} r} (y_{i} | μ_{i} (β, b_{i}), R_{i}) P (b_{i} | y_{i}, {\hat{θ}}^{(h)}) d b_{i} \\ + \sum_{i = 1}^{N} \int log ϕ_{q} (b_{i} | 0, D) P (b_{i} | y_{i}, {\hat{θ}}^{(h)}) d b_{i} . \end{matrix}

(19)

Obviously, Equation (19) cannot be written in closed form, since the conditional distribution of

b_{i}

given

y_{i}

:

P (b_{i} | y_{i}, θ) \propto \exp {- \frac{1}{2} [{(y_{i} - μ_{i} (β, b_{i}))}^{T} R_{i}^{- 1} (y_{i} - μ_{i} (β, b_{i})) + b_{i}^{T} D^{- 1} b_{i}]}

(20)

has no standard form. To simulate random samples from Equation (20), we perform the Metropolis–Hastings (M-H) algorithm [41] with the proposal distribution:

b_{i}^{(m + 1)} ~ N_{q} (b_{i}^{(m)}, G^{- 1} (y_{i}, {\hat{θ}}^{(h)}),

(21)

where

G^{- 1} (y_{i}, {\hat{θ}}^{(h)})

is the inverse matrix of

G (y_{i}, θ)

given in Equation (13) and evaluated at

θ = {\hat{θ}}^{(h)}

. Note that the idea of considering such a proposal distribution comes from the integration ofEquation (14) over

b_{i}

, which is, up to a multiplicative constant, approximately equal to a

N ({\hat{b}}_{i}, G^{- 1} (y_{i} θ,))

. We have the probability

min \{1, P (b_{i}^{(m + 1)} | y_{i}, {\hat{θ}}^{(h)}) / P (b_{i}^{(m)} | y_{i}, {\hat{θ}}^{(h)})\}

to accept the new generation

b_{i}^{(m + 1)}

, but otherwise to set

b_{i}^{(m + 1)} =_{i}^{(m)}

. After having a set of converged MC samples

{b_{i}^{(m)}}_{m = 1}^{M}

, the random effects

b_{i}

, as well as their function

f (b_{i})

in Equation (19) can be estimated by

{\hat{b}}_{i}^{(h)} = \sum_{m = 1}^{M} b_{i}^{(m)} / M

and

E [f (b_{i}) | y_{i}, θ] = \sum_{m = 1}^{M} f (b_{i}^{(m)}) / M

, respectively, at each iteration.

In the M step, we find the limited value of the obtained Q-function Equation (19) by equating the following functions:

\frac{\partial Q (θ | {\hat{θ}}^{(h)})}{\partial D} = \sum_{i = 1}^{N} \frac{\partial}{\partial D} E [\log ϕ_{q} (b_{i} | 0, D) | y_{i}, {\hat{θ}}^{(h)}]

(22)

and:

\frac{\partial Q (θ | {\hat{θ}}^{(h)})}{\partial α} = \sum_{i = 1}^{N} \frac{\partial}{\partial α} E [\log ϕ_{s_{i} r} (y_{i} | μ_{i} (β, b_{i}), R_{i}) | y_{i}, {\hat{θ}}^{(h)}]

(23)

to zeros, where

α = {β, \sum, ϕ}

. By allowing differentiation under the integral sign for Equation (22), we update the estimate of D by:

{\hat{D}}^{(h + 1)} = \frac{1}{N} \sum_{i = 1}^{N} E [b_{i} b_{i}^{T} | y_{i}, {\hat{θ}}^{(h)}] ≅ \frac{1}{N} \sum_{i = 1}^{N} \{\frac{1}{M} \sum_{i = 1}^{M} b_{i}^{(m)} b_{i}^{{(m)}^{T}}\} .

Since solving Equation (23) is analytically intractable, we perform a profile approximate Q-function approach, which updates

{\hat{β}}^{h}

,

{\sum^{^}}^{(h)}

and

{\hat{ϕ}}^{(h)}

by a sequential optimization procedure as the Laplacian method described in Section 2.2. It gives:

\begin{matrix} {\hat{β}}^{(h + 1)} & = & arg max_{β} \sum_{i = 1}^{N} E [log ϕ_{s_{i} r} (y_{i} | μ_{i} (β, b_{i}), {\hat{R}}_{i}^{(h)}) y_{i}, {\hat{θ}}^{(h)}], \end{matrix}

(24)

\begin{matrix} {\sum^{^}}^{(h + 1)} & = & arg max_{Σ} \sum_{i = 1}^{N} E [log ϕ_{s_{i} r} (y_{i} | μ_{i} ({\hat{β}}^{(h + 1)}, b_{i}), \sum \otimes {\hat{C}}_{i}^{(h)}) | y_{i}, {\hat{θ}}^{(h)}], \end{matrix}

(25)

and:

\begin{matrix} {\hat{ϕ}}^{(h + 1)} & = & arg max_{ϕ} \sum_{i = 1}^{N} E [log ϕ_{s_{i} r} (y_{i} | μ_{i} ({\hat{β}}^{(h + 1)}, b_{i}), {\sum^{^}}^{(h + 1)} \otimes C_{i} (ϕ)) | y_{i}, {\hat{θ}}^{(h)}] . \end{matrix}

(26)

Consequently, the marginal log-likelihood can be approximated as:

\begin{matrix} ℓ_{MC} (θ | y) & = & - \frac{1}{2} \sum_{i = 1}^{N} \{(s_{i} r + q) log (2 π) + s_{i} log | \sum | + r log | C_{i} | + log | D |\} \\ - \frac{1}{2 M} \sum_{i = 1}^{N} \sum_{m = 1}^{M} g (y_{i}, b_{i}^{(m)}, θ) . \end{matrix}

According to an alternative hierarchy of the MNLMM,

y_{i} | η_{i} \sim N_{s_{i} r} (μ_{i} (η_{i}, x_{i}), R_{i}), η_{i} \sim N_{s} (A_{i} β, B_{i} D B_{i}^{T}), for i = 1, ..., N,

the MCEM algorithm that deals with Monte Carlo integration directly on the individual parameters

η_{i}

rather than subject-specific random effects

b_{i}

can yield an explicit estimator for the fixed effects β. However, such an implementation may not be feasible in the framework of MNLMMs due to the possible singularity of

B_{i} D B_{i}^{T}

.

2.5. Importance Sampling EM Algorithm

Importance sampling (IS) is an alternative way of performing MC integration. We provide an ISEM algorithm, which modifies MC approximation of Equation (19) in the E step of the MCEM algorithm by using the IS method. To implement the ISEM algorithm, we first choose an appropriate envelope distribution from which the samples are simulated and the importance weights calculated. Like that used in the M-H algorithm, Equation (21) is a natural consideration for the envelop distribution. As suggested by [35], an envelop distribution could be a mixture of two multivariate normal distributions with pdf:

\begin{matrix} λ (b_{i}) = P_{0} ϕ_{q} (b_{i} | 0, {\hat{D}}^{(h)}) + (1 - P_{0}) ϕ_{q} (b_{i} | {\hat{b}}_{i}^{(h)}, G^{- 1} (y_{i}, {\hat{θ}}^{(h)})), \end{matrix}

(27)

where the mixing proportion

0 \leq P_{0} \leq 1

is a pre-specified value.

Notably, ISEM can be performed to evaluate the expected values of any functions of unobservable

{b_{i}}_{i = 1}^{N}

, e.g.,

f (b_{i}) = b_{i}

and

f (b_{i}) = b_{i} b_{i}^{T}

. It follows that:

\begin{matrix} E [f (b_{i}) | y_{i}, θ] = \int f (b_{i}) f (b_{i} | y_{i}, θ) d b_{i} = \frac{\int f (b_{i}) f (y_{i} | b_{i}, θ) f (b_{i} | D) d b_{i}}{\int f (y_{i} | b_{i}, θ) f (b_{i} | D) d b_{i}} . \end{matrix}

(28)

Having obtained a sufficient number of random effects, denoted by

{b_{i}^{(m)}}_{i = 1}^{N}

,

m = 1, ..., M

, we adopt the ratio of two MC approximations using IS from Equation (27) to estimate Equation (28), given by:

\begin{matrix} E [f (b_{i}) | y_{i}, θ] ≅ \frac{\sum_{m = 1}^{M} f (b_{i}^{(m)}) f (y_{i} | b_{i}^{(m)}, θ) f (b_{i}^{(m)} | D) λ (b_{i}^{(m)})}{\sum_{m = 1}^{M} f (y_{i} | b_{i}^{(m)}, θ) f (b_{i}^{(m)} | D) λ (b_{i}^{(m)})} . \end{matrix}

(29)

In the E step, given the current estimates of parameters

{\hat{θ}}^{(h)}

, we compute Equation (19) in which the required conditional moments of latent data b can be approximated based on Equation (29). In the M step, we update each entry of

{\hat{θ}}^{(h)}

by maximizing the Q-function. Indeed, the ISEM procedure works conceptually similarly to that of MCEM: only

{\hat{D}}^{(h + 1)}

shows an explicit solution, while

{\hat{β}}^{(h + 1)}

,

{\sum^{^}}^{(h + 1)}

and

{\hat{ϕ}}^{(h + 1)}

are obtained through sequential optimization solutions via Equations (24)–(26). The IS approximation to the marginal log-likelihood is:

\begin{matrix} ℓ_{I S} (θ | y) & ≅ & - \frac{1}{2} \sum_{i = 1}^{N} \{s_{i} log | \sum | + r log | C_{i} | + log | D |\} \\ + \sum_{i = 1}^{N} log \{\frac{1}{M} \sum_{m = 1}^{M} [exp \{- \frac{1}{2} g (y_{i}, b_{i}^{(m)}, θ)\} f (b_{i}^{(m)} | D) / λ (b_{i}^{(m)})]\} . \end{matrix}

2.6. Expected Information Matrix

For Model Equation (8), denoting by

θ = (β, α)

with

α = (vech (D), vech (\sum) ϕ,)

, the expected information matrix of θ obtained by taking the expectation of the negative Hessian matrix can be expressed as:

J_{θ θ} = [\begin{matrix} J_{β β} & J_{β α} \\ J_{β α}^{T} & J_{α α} \end{matrix}],

(30)

where

J_{β β} = \sum_{i = 1}^{N} {\tilde{X}}_{i}^{T} {\tilde{Λ}}_{i}^{- 1} {\tilde{X}}_{i}

,

J_{β α} = 0

, and

J_{α α}

is a

g \times g

information matrix whose

(l, s)

-th entry is

{[J_{α α}]}_{l s} = 2^{- 1} \sum_{i = 1}^{N} tr ({\tilde{Λ}}_{i}^{- 1} {\dot{\tilde{Λ}}}_{i l} {\tilde{Λ}}_{i}^{- 1} {\dot{\tilde{Λ}}}_{i s})

, for

l, s = 1, ..., g

,

g = q (q + 1) / 2 + r (r + 1) / 2 + d i m (ϕ)

, with

{\dot{\tilde{Λ}}}_{i l}

being

{\dot{\tilde{Λ}}}_{i l}^{(h)}

given in (A.1) with

{\hat{θ}}^{(h)}

replaced by θ. Consequently, the asymptotic variance-covariance matrix of θ can be approximated by the inverse of information Matrix Equation (30), denoted by

J_{θ θ}^{- 1}

. The resulting standard errors of parameters are the square roots of diagonal entries of

J_{θ θ}^{- 1}

evaluated at

θ = \hat{θ}

.

2.7. Initialization

When implementing iterative procedures, a common difficulty encountered in practice is that the algorithm is painfully slow or even non-convergent. Such a computational problem may occur in handling ML estimation of the MNLMM, especially when the data are too sparse or the dimension of random effects is over-specified. To overcome this potential problem, a default procedure of automatically creating a set of good initial values is summarized below.

(i): A direct way of obtaining the initial value for β is to fit the NLMMs to each outcome variable separately by using the nlme R package [12].
(ii): Using the fitting results of NLMMs for each outcome, we take the initial value ${\hat{D}}^{(0)}$ as a (block) diagonal form with the diagonal entry being the variances (covariances) of random effects under the fitted NLMMs.
(iii): For the initial value for ∑, we use the sample variance-covariance matrix of the data. That is, take ${\sum^{^}}^{(0)} = \sum_{i = 1}^{N} \sum_{t = 1}^{s_{i}} (y_{i \cdot t} - \bar{y}) {(y_{i \cdot t} - \bar{y})}^{T} / (\sum_{i = 1}^{N} s_{i} - 1)$ , where $y_{i \cdot t} = {(y_{i 1 t}, \dots, y_{i r t})}^{T}$ and $\bar{y} = {(\sum_{i = 1}^{N} s_{i})}^{- 1} {(\sum_{i = 1}^{N} \sum_{t = 1}^{s_{i}} y_{i 1 t}, \dots, \sum_{i = 1}^{N} \sum_{t = 1}^{s_{i}} y_{i r t})}^{T}$ .
(iv): The initial values for ϕ, depending on the structure, are simply chosen to give a condition of nearly uncorrelated errors.

3. Application: ACTG 315 Data

We present a comparison of the five algorithms via a real data example from the AIDS Clinical Trial Group protocol 315 (ACTG 315) study developed by the Immunology Research Agenda Committee of the U.S. National Institute of Allergy and Infectious Disease, the ACTG sponsor. The study design and recruitment of participants (patients) were conducted by University Hospitals of Cleveland, Rush-Presbyterian-St. Luke's Medical Center and University of Colorado Health Science Center. In the study, 53 human immunodeficiency virus type 1 (HIV-1)-infected patients were recruited, and their plasma HIV-1 RNA (viral load) copies and CD4⁺ T cell counts were repeatedly measured at Days 0, 2, 7, 10, 14, 28, 56, 84, 168 and 196 after the start of treatment. A more detailed description of the study can be found in [42,43].

HIV-1 infection is associated with progressive and profound loss of immune function that places infected persons at enhanced risk for opportunistic infections, and even death. A reaction in HIV-1-related immune deficiency can be characterized by decreases in the numbers of circulating CD4⁺ T helper lymphocytes. CD4⁺ T cells in blood decline to a lower level after HIV-1 infection and may recover to a high level after antiviral therapies suppress viral load. Generally, there is a negative correlation between the virologic marker (measured by HIV-1 RNA) and the immunologic marker (measured by CD4⁺ T cells) during antiviral treatments. As a consequence, a joint analysis of HIV-1 RNA and CD4⁺ counts is helpful to take the evolution of the correlation among responses over time into account. The data have been analyzed by [44,45,46,47] using different modeling approaches.

As a part of the clinical trial on 53 patients, a total of 48 patients were recruited in our analysis after excluding four early drop-out patients and one due to a plasma HIV-1 RNA pattern that suggested intermittent adherence to study therapy. To stabilize the variances and to reduce the strong skewness among the two makers, a base-10 logarithmic transformation is made for HIV-1 RNA and a square-root transformation for CD4⁺ T cells. Both transformations are widely used in HIV-AIDS clinical trials. Let

y_{i 1, k}

and

y_{i 2, k}

be

{log}_{10} RNA

and

{CD 4}^{0.5}

markers, respectively, at the k-th time point for patient i. We consider the following bivariate nonlinear mixed-effects model for

y_{i 1, k}

and

y_{i 2, k}

:

\begin{matrix} y_{i 1, k} & = & {log}_{10} (exp {(β_{1} + b_{i 1}) + β_{2} t_{i k}} + exp {β_{3} {rna}_{i}}) + e_{i 1, k}, \\ y_{i 2, k} & = & (β_{4} + b_{i 2}) / (1 + exp {(β_{5} - t_{i k}) / β_{6}}) + e_{i 2, k}, \end{matrix}

(31)

where

t_{i k} = {day}_{i k} / 7

is the k-th visited time point (week) for patient i;

{rna}_{i}

is the

{log}_{10}

RNA levels for patient i at the start of the study;

(b_{i 1}, b_{i 2})

are the bivariate normally-distributed random effects; and

(e_{i 1}^{T}, e_{i 2}^{T}) = (e_{i 1, 1}, ..., e_{i 1, s_{i}}, e_{i 2, 1}, ..., e_{i 2, s_{i}})

are the within-subject errors following a multivariate normal distribution with zero mean and variance-covariance matrix

\sum \otimes C_{i}

. Because the baseline RNA is a significant covariate in the ACTG 315 study [47], it should be incorporated into the analysis. To account for the extra autocorrelation caused by within-patient dependence among unequally-spaced occasions, we employ a continuous order-one autoregressive structure, i.e.,

C_{i} = [ϕ^{| t_{i k} - t_{i k^{'}} |}]

, for the across-occasion covariance matrix of within-subject errors.

According to the standard formulation in Equation (2), we specify:

A = [\begin{matrix} I_{2} & 0 & 0 \\ 0 & {rna}_{i} & 0 \\ 0 & 0 & I_{3} \end{matrix}], B = {[\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \end{matrix}]}^{T}, β = {(β_{1}, β_{2}, β_{3}, β_{4}, β_{5}, β_{6})}^{T},

and

b_{i} = {(b_{i 1}, b_{i 2})}^{T}

, where

I_{d}

is a diagonal matrix of order d. Define:

ξ₁ = (exp{η₁ + η₂^t} + exp{η₃})⁻¹/log(10), and ξ₂ = (1 + exp{(η₅ − t)/η₆})⁻¹,

(32)

where

η_{1} = β_{1} + b_{i 1}

,

η_{2} = β_{2}

,

η_{3} = β_{3} {rna}_{i}

,

η_{4} = β_{4} + b_{i 2}

,

η_{5} = β_{5}

and

η_{6} = β_{6}

. The first derivatives of

μ_{1}

and

μ_{2}

specified in Equation (31) with respect to η are:

{\dot{μ}}_{1} = \frac{\partial μ_{1}}{\partial η} = [\begin{matrix} ξ_{1} exp {η_{1} + η_{2} t} \\ ξ_{1} t exp {η_{1} + η_{2} t} \\ ξ_{1} exp {η_{3}} \\ 0 \\ 0 \\ 0 \end{matrix}], and {\dot{μ}}_{2} = \frac{\partial μ_{2}}{\partial η} = [\begin{matrix} 0 \\ 0 \\ 0 \\ ξ_{2} \\ - μ_{2} exp {(η_{5} - t) / η_{6}} ξ_{2} / η_{6} \\ μ_{2} exp {(η_{5} - t) / η_{6}} (η_{5} - t) ξ_{2} / η_{6}^{2} \end{matrix}] .

The first derivative of mean function

μ_{i} (β, b_{i}) = (μ_{1}, μ_{2})

with respect to

b_{i}

is:

\frac{μ_{i} (β, b_{i})}{\partial b_{i}} = [\begin{matrix} ξ_{1} exp {(β_{1} + b_{i 1}) + (β_{2} + b_{i 2}) t_{i}} & 0_{s_{i}} \\ 0_{s_{i}} & ξ_{2} \end{matrix}],

where

ξ_{1}

and

ξ_{2}

are

s_{i} \times 1

vectors composed of

ξ_{1}

and

ξ_{2}

given by Equation (32) with t replaced by a

s_{i} \times 1

occasion vector

t_{i}

of the i-th patient.

Table 1 presents the parameter estimates and their standard deviations (in parentheses) from the five computational methods, namely PNLS-MLMM, Laplacian, pseudo-ECM, MCEM with 500 Monte Carlo samples and ISEM with mixing proportion

P_{0} = 0.5

. When employing the ISEM algorithm, several choices of the mixing proportion

P_{0}

, ranging from 0–1 with and increment of 0.1, are considered. To save space, we reported only the result for

P_{0} = 0.5

, as it yields the maximized log-likelihood value. The results indicate that the five methods can give very similar estimates and the significance of model parameters. According to the estimates of

\sum = [σ_{j l}]

, the estimated correlation of

{log}_{10}

RNA and CD4^0.5 ranges from −0.13–−0.18 (around), confirming a negative relationship between the virologic and immunologic markers. The between-patient correlations of the two responses have no statistical significance based on the estimates of D. The estimate of autoregressive parameter ϕ is significantly different from zero, revealing an existence of autocorrelation among the within-patient variability. Figure 1 displays the observations and estimated mean curves in which the covariate is set to be the average of baseline RNA values of all patients for the five computational methods. Judging from the figure, the considered logarithmic and logistic curves in Equation (31) are reasonable functions to describe the evolutions of RNA in the

{log}_{10}

scale and CD4 in the square-root scale over time. The trend of

{log}_{10}

RNA decreases at the beginning due to the rapid growth of CD4^0.5 cells in the early days of antiviral therapies. After nearly four weeks, the decline pattern on

{log}_{10}

RNA and the growth pattern on CD4^0.5 become slow and smooth. As an illustration, the fitted values obtained by the five methods together with the observations for seven randomly-selected patients are displayed in Figure 2. As anticipated, the fitted trajectories for each patient show the slight difference among the five estimating procedures. Generally, they adapt the trend along observed repeated measures, but some of configurations are not ideally captured. It is known that the viral load (RNA copies) and CD4 counts are highly variable immune system markers, making them difficult to fit.

Figure 1. The

{log}_{10} (RNA)

and CD4^0.5 observations (∘) with the estimated mean curves against time (in days) from ML estimation using the five proposed procedures.

Figure 1. The

{log}_{10} (RNA)

and CD4^0.5 observations (∘) with the estimated mean curves against time (in days) from ML estimation using the five proposed procedures.

Figure 2. The fitted values obtained by the five proposed procedures together with the observations (•) of

{log}_{10} (RNA)

and CD4^0.5 for seven randomly-selected patients.

Figure 2. The fitted values obtained by the five proposed procedures together with the observations (•) of

{log}_{10} (RNA)

and CD4^0.5 for seven randomly-selected patients.

Table 1. Estimation results for AIDS Clinical Trial Group protocol 315 (ACTG 315) data. PNLS, penalized nonlinear least squares; MLME, multivariate linear mixed-effects; ECM, expectation conditional maximization; MCEM, Monte Carlo EM; ISEM, importance sampling EM.

**Table 1.** Estimation results for AIDS Clinical Trial Group protocol 315 (ACTG 315) data. PNLS, penalized nonlinear least squares; MLME, multivariate linear mixed-effects; ECM, expectation conditional maximization; MCEM, Monte Carlo EM; ISEM, importance sampling EM.
Parameter	PNLS-MLME	Laplacian	Pseudo-ECM	MCEM	ISEM
β₁	12.0477	12.9800	12.0485	12.0784	12.114
	(0.2513)	(0.2858)	(0.2530)	(0.2626)	(0.2652)
β₂	−2.6558	−2.6476	−2.6543	−2.6198	−2.6069
	(0.1781)	(0.1970)	(0.1777)	(0.1950)	(0.1992)
β₃	1.3039	1.3001	1.3039	1.3012	1.3000
	(0.0274)	(0.0248)	(0.0273)	(0.0253)	(0.0249)
β₄	16.8604	16.8577	16.8605	16.8875	16.9058
	(0.3911)	(0.3340)	(0.3914)	(0.3863)	(0.3829)
β₅	−1.7324	−1.7791	−1.7312	−1.7721	−1.7643
	(0.4936)	(0.4590)	(0.4930)	(0.4632)	(0.4585)
β₆	1.3081	1.3514	1.3078	1.3604	1.3463
	(0.3262)	(0.2899)	(0.3259)	(0.2972)	(0.2896)
$d_{11}$	0.0000	0.7457	0.0583	0.1183	0.1398
	(0.4665)	(0.5763)	(0.4753)	(0.4673)	(0.4612)
$d_{21}$	−0.0020	−0.1400	0.0144	−0.2386	0.0838
	(0.5414)	(0.5203)	(0.5479)	(0.5401)	(0.5295)
$d_{22}$	4.7425	3.8251	4.7585	5.4602	5.4894
	(1.3803)	(0.9953)	(1.3826)	(1.3561)	(1.3361)
σ₁₁	0.4655	0.4267	0.4622	0.4379	0.4329
	(0.0458)	(0.0411)	(0.0455)	(0.0420)	(0.0414)
σ₂₁	−0.2232	−0.1738	−0.2164	−0.2185	−0.2225
	(0.0965)	(0.0747)	(0.0962)	(0.0786)	(0.0754)
σ₂₂	5.7063	3.5558	5.6929	3.8956	3.6033
	(0.5991)	(0.3520)	(0.5980)	(0.3874)	(0.3541)
ϕ	0.6824	0.5447	0.6818	0.5674	0.5343
	(0.0311)	(0.0422)	(0.0312)	(0.0400)	(0.0425)

Furthermore, the approximate values of log-likelihood function for Model Equation (31) evaluated at the ML estimates

\hat{θ}

obtained respectively by the five estimation procedures are reported in Table 2. To assess the accuracy of the approximations of the log-likelihood function, we also perform the double integral in log-likelihood Function Equation (4) by plugging the corresponding

\hat{θ}

into Equation (4) and using the integrate routine in the R package to get the exact log-likelihoods. The exact log-likelihood values together with the absolute differences (AD) between the approximate and exact values are also listed in Table 2. Roughly, the log-likelihood values under the five approximation methods are similar and close to their corresponding exact values. In this example, the pseudo-ECM yields the most precise evaluation, followed by Laplacian, MCEM, ISEM and PNLS-MLME.

Table 2. Approximate and exact log-likelihood functions for the fitted Model Equation (31) under the five estimation methods. AD, absolute difference.

**Table 2.** Approximate and exact log-likelihood functions for the fitted Model Equation (31) under the five estimation methods. AD, absolute difference.
	PNLS-MLME	Laplacian	Pseudo-ECM	MCEM	ISEM
Approximate	−974.360	−986.794	−974.592	−966.763	−1010.370
Exact	−1063.338	−991.754	−978.269	−981.384	−978.758
AD	88.978	4.96	3.677	14.621	31.612

Although the proposed five algorithms can provide quite similar estimates of model parameters, as well as the fitted mean profiles shown in Figure 1 and Figure 2, we should give the following remarks. The PNLS-MLMM and Laplacian methods involve solving the fixed effects β and the modes of random effects

{b_{i}}_{i = 1}^{N}

by implementing optimal iterative procedures. Thus, the two methods are very sensitive to initial values and may suffer from slow or even non-convergence due to singularity of variance-covariance matrices, especially when unnecessary random effects are included in the model. The MCEM and ISEM methods spend more time in generating an adequate number of samples of random effects to evaluate the required conditional expectations. Overall, the pseudo-ECM algorithm is the best method in terms of computational efficiency in this study. However, all of the proposed methods may get trapped in one of many local maxima of the log-likelihood function. To assess the stability of the resulting estimates, a variety of initial values should be employed when implementing the algorithms. The global optimal solution is obtained by choosing the one with the largest log-likelihood value.

4. Simulation Study

In this section, two simulation studies with data generated from two models with linear and nonlinear profiles, respectively, are undertaken to compare the performance of the five algorithmic procedures for fitting the MNLMM. The performance comparison includes the convergence efficiency in terms of the number of iterations and consumed CPU time, the accuracy of parameter estimates and the precision of log-likelihood approximation. All computations were carried out by R package 2.13.1 in a Win32 environment of a desktop PC machine with a 3.40-GHz/Intel Core(TM) i7-2600 CPU Processor and 4.0 GB RAM.

4.1. Bivariate Linear Case

To perform an evaluation of the exact log-likelihood values that is tractable, in this simulation, we restrict ourselves to generating datasets from the following bivariate LMM:

\begin{matrix} y_{i 1 k} & = & β_{1} + b_{i 1} + β_{2} t_{k} + e_{i 1 k}, \\ y_{i 2 k} & = & β_{3} + (β_{4} + b_{i 2}) t_{k} + e_{i 2 k}, \end{matrix}

(33)

for

i = 1, ..., N

and

k, t_{k} = 1, ..., 7

. Following the standard notation for Model Equation (1) along with Assumption Equation (2), we set

A_{i} = I_{4}

,

β = {(β_{1}, β_{2}, β_{3}, β_{4})}^{T}

,

B_{i} = {[\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{matrix}]}^{T}

and

b_{i} = {(b_{i 1}, b_{i 2})}^{T} \sim N_{2} (0, D)

. The specific model parameters are:

β = {(1, 2, - 2, 4)}^{T}, D = [\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}], \sum = [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}], and C_{i} = I_{7},

where the values of ρ are chosen as 0, 0.5 and 0.9 to reflect zero, middle and high correlations between outcome variables, respectively. The sample sizes N are set to 25 and 50, and a total of 100 replications are run for each combination of between-outcome correlation ρ and sample size N. Each simulated dataset is fitted by the MNLMM using the five computational procedures, say the PNLS-MLME, Laplacian, pseudo-ECM, MCEM and ISEM algorithms, described in Section 2. Initial values for the parameters are chosen to be the true values of parameters plus a random draw from the standard normal distribution. Note that the E step of the MCEM algorithm is undertaken with generating

M = 1000

MC samples. When implementing the ISEM algorithm, the envelop distribution was multivariate normal mixtures with three different mixing proportions

P_{0} = 0.1, 0.5

and 0.9. Because all converged estimates are almost the same, we report only the result under

P_{0} = 0.5

for the sake of conciseness. The computational procedures achieve convergence when:

max_{l = 1, ..., m} (| ({\hat{θ}}_{l}^{(h + 1)} - {\hat{θ}}_{l}^{(h)}) / {\hat{θ}}_{l}^{(h)} |) < 0.01,

where m is the number of unknown parameters.

Table 3 summarizes the averages of CPU time (Time), numbers of iterations (Iter), converged log-likelihood values (

ℓ_{max}

), relative bias (RB) of log-likelihood functions and empirical sums of relative mean squared errors (RMSE) of parameter estimates obtained by five approximation methods over 100 replicates under all considered scenarios. The relative bias of log-likelihood values calculated as

(ℓ_{max} - ℓ_{true}) / | ℓ_{true} |

is used to evaluate the accuracy of the estimation of the log-likelihood function, where

ℓ_{true}

is the true value of the log-likelihood function and

ℓ_{max}

is the converged maximized log-likelihood value. The empirical sums of RMSE for each case are calculated as

\sum_{l = 1}^{m} {({\hat{θ}}_{l} - θ_{l})}^{2} / θ_{l}^{2}

, where

θ_{l}

and

{\hat{θ}}_{l}

are each the entry of the true value of the parameter and its estimate, respectively.

Based on the results shown in Table 3, we first compare the convergence speed of the five estimation procedures. Apparently, the pseudo-ECM method takes the least consumed CPU time, and it is followed by the PNLS-MLME, Laplacian, ISEM and then the MCEM methods. The fewest number of iterations is required by running the PNLS-MLME method followed by the pseudo-ECM, Laplacian, ISEM and MCEM methods, while the last four methods show negligible differences, especially for a large sample size and a high between-outcome correlation. Not surprisingly, the MCEM and ISEM methods require heavier computational cost, because they need to generate a great number of random samples of random effects to perform the MC integration in each iteration. We also find that the consumed CPU time and the required number of iterations decrease when the between-outcome correlation ρ increases. We remark that the PNLS-MLME method converges quickly, but it fails to converge unless the initial values are good enough. When the chosen starting point is far from optimum, it may cause divergence of the procedure, and thereby, another set of initial values should be reset.

Table 3. Simulation results for the computational performance of five approximation methods under each combination of correlations ρ and sample sizes N. Iter, iteration; RB, relative bias.

**Table 3.** Simulation results for the computational performance of five approximation methods under each combination of correlations ρ and sample sizes N. Iter, iteration; RB, relative bias.
N	ρ		PNLS-MLME	Laplacian	Pseudo-ECM	MCEM	ISEM
25	0	Time	4.077	25.954	1.970	8789.093	5862.499
		Iter	2.150	12.140	9.800	138.440	58.390
		$ℓ_{max}$	−576.769	−610.274	−577.121	−556.914	−642.139
		RB	0.008	−0.033	0.008	0.045	−0.100
		RMSE	2.229	2.441	2.169	2.176	2.177
	0.5	Time	4.370	30.803	2.045	2403.145	1680.319
		Iter	2.120	11.430	9.930	35.650	15.750
		$ℓ_{max}$	−559.366	−582.622	−559.907	−536.608	−625.736
		RB	0.009	−0.022	0.008	0.052	−0.103
		RMSE	0.580	0.672	0.561	0.601	0.602
	0.9	Time	3.646	25.006	1.749	1252.625	1158.028
		Iter	2.000	8.940	8.570	18.330	10.760
		$ℓ_{max}$	−468.270	−474.786	−468.909	−423.555	−535.591
		RB	0.011	−0.003	0.009	0.118	−0.120
		RMSE	0.470	0.484	0.450	0.486	0.477
50	0	Time	8.365	41.545	8.927	6825.341	3967.824
		Iter	2.240	10.050	9.260	56.240	20.170
		$ℓ_{max}$	−1159.337	−1177.863	−1159.675	−1120.721	−1292.848
		RB	0.004	−0.010	0.004	0.039	−0.094
		RMSE	1.688	1.747	1.685	1.692	1.689
	0.5	Time	9.776	56.560	10.210	2112.857	1706.392
		Iter	2.140	9.760	9.530	11.800	9.690
		$ℓ_{max}$	−1124.354	−1140.195	−1124.911	−1079.401	−1258.644
		RB	0.004	−0.009	0.004	0.046	−0.098
		RMSE	0.277	0.324	0.270	0.313	0.315
	0.9	Time	8.185	34.382	6.666	1512.85	1091.661
		Iter	2.000	6.070	6.210	7.320	6.850
		$ℓ_{max}$	−933.662	−943.973	−934.566	−843.025	−1069.55
		RB	0.005	−0.006	0.004	0.113	−0.116
		RMSE	0.226	0.229	0.226	0.237	0.234

When assessing the approximated log-likelihood functions, we find that all approximation methods produce relative biases in log-likelihoods within

\pm 0.12

(the range is not quite large). Because the simulated datasets are generated from a linear scenario, i.e., bivariate LMM specified in Equation (33), the pseudo-data model given in Equation (8) certainly satisfies the MLMM [1] framework. Therefore, the ML estimates of model parameters, as well as the maximized log-likelihood value obtained by the pseudo-ECM algorithm are exactly the same as those obtained by fitting the MLMM using the EM-based algorithm. Besides, the PNLS-MLME method uses the same approximation of the log-likelihood function, say

ℓ_{PD} (\hat{θ} | y)

, with that of pseudo-ECM. Thus, the values of relative biases in log-likelihoods obtained by the PNLS-MLME and pseudo-ECM algorithms are quite similar, and they are very close to zero. Additionally, the Laplacian approximation gives near-zero, but slightly under-estimated relative biases in log-likelihoods, and the relative biases are negligible when the sample size and between-outcome correlation are large. The log-likelihood values could be slightly over-estimated by using the MCEM method and slightly under-estimated by using the ISEM method. As anticipated, the approximations of log-likelihood functions will get close to the exact log-likelihood value when the sample size increases.

Figure 3. Scatter plots of fixed-effects estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the multivariate nonlinear mixed-effects model (MNLMM) under the case of

N = 25

, ρ

= 0.9

.

Figure 3. Scatter plots of fixed-effects estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the multivariate nonlinear mixed-effects model (MNLMM) under the case of

N = 25

, ρ

= 0.9

.

Figure 4. Scatter plots of variance-covariance components estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the MNLMM under the case of

N = 25

, ρ

= 0.9

.

Figure 4. Scatter plots of variance-covariance components estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the MNLMM under the case of

N = 25

, ρ

= 0.9

.

We now turn our attention to observing the estimation performance for model parameters under the five computational methods. From the RMSE rows of Table 3, typically, the five methods give comparable results for estimation accuracy due to negligible differences in RMSE scores. The RMSE decreases as the sample size increases, confirming the good asymptotic properties of ML estimators, at least for the setting of parameters used in this simulation. As mentioned above, the pseudo-ECM method implemented for linear models produces the same results as the EM-type algorithm for MLMM. Judging from Table 3, the pseudo-ECM method has the smallest RMSE among the five computational methods. Furthermore, we compare the estimates of each parameter obtained by PNLS-MLME, Laplacian, MCEM and ISEM against those obtained by pseudo-ECM one-by-one in detail. Figure 3 and Figure 4 display the scatter plots of the estimates of fixed effects (β) and variance-covariance components (D and ∑) separately for the pseudo-ECM method (in the X-axes) versus the other four procedures (in the Y-axes). The dashed lines indicate the true values of parameters. To save space, we present only the case of

N = 25

and ρ

= 0.9

, because the other five cases exhibit almost a similar pattern. It can be seen from the two figures that the estimates are all located in the neighborhood of the true values, indicating that all five computational procedures yield very precise estimates of model parameters. In general, there is a strong agreement in the estimates obtained through the five methods, because the point estimates fall close to the 45-degree line. However, for the estimate of

β_{4}

, PNLS-MLME appears to have a slightly large variability. For the estimates of

σ_{11}

,

σ_{12}

and

σ_{22}

, the other four methods tend to give estimates smaller than does the pseudo-ECM algorithm.

4.2. Bivariate Nonlinear Case

In the simulation, the data were generated from the MNLMM with nonlinear mean curves Equation (31). The presumed model parameters are:

β = {(12, - 2.7, 1.3, 16.9, - 1.7, 1.3)}^{T}, D = [\begin{matrix} 1 & 0.5 \\ 0.5 & 4 \end{matrix}], \sum = [\begin{matrix} 0.5 & - 0.2 \\ - 0.2 & 5 \end{matrix}], C_{i} = I_{10} .

Each simulated dataset is fitted by the MNLMM using the five approximation methods described in Section 2. To investigate the effect of the size of MC samples for MCEM and mixing proportions of the envelope distribution for ISEM, we consider MC sample sizes

M = 500, 1000, 2000

and the mixing proportions

P_{0} = 0.1, 0.5, 0.9

. A total of 100 replications are run for each of sample sizes

N = 25

and 50 across nine computational procedures. The convergence rule is the same as the previous simulation. Note that numerical double-integration is performed to calculate the exact log-likelihood, such that the evaluation of the accuracy of the approximate log-likelihood is tractable.

In this simulation study, there are 18 (10) and 12 (7) non-convergence cases out of 100 trials for the PNLS-MLME and Laplacian methods, respectively, under sample size

N = 25 (50)

. To ensure that we are comparing estimates of different methods based on the same simulated data and initial values, an additional dataset will be regenerated in the procedure if one of the methods did not converge for a particular dataset. This can be done by using the R try() function to handle the error-recovery. Table 4 reports the computing results, including the averages of CPU time (Time), numbers of iterations (Iter), converged log-likelihood values (

ℓ_{max}

), RB of log-likelihood functions and empirical sums of the RMSE of parameter estimates for each sample size and each algorithm. The results indicate that the pseudo-ECM spent the least CPU time, followed by the PNLS-MLME, Laplacian, ISEM with

P_{0} = 0.1, 0.5

, MCEM with

M = 500, 1000, 2000

and, then, ISEM with

P_{0} = 0.9

. The PNLS-MLME demands the fewest numbers of iterations, followed by Laplacian, pseudo-ECM, ISEM with

P_{0} = 0.1, 0.5

, MCEM with

M = 2000, 1000, 500

and, then, ISEM with

P_{0} = 0.9

. The performance of the five methods under the bivariate nonlinear model is conceptually similar to that under the bivariate linear model shown in Section 4.1. It makes sense that the consumed CPU time increases with the size of MC samples M for MCEM, but the required iteration number decreases with MC sample size M. Besides, for the ISEM method, when the proportion of importance samples of random effects drawing from the posterior of

b_{i}

increases (say

P_{0}

decreases), both the CPU time and iteration number decrease.

It can be seen from the RB column of Table 4 that all methods except for the three ISEM procedures provide comparable accuracy for approximate observed log-likelihood values, while the ISEM method tends to get a relatively large bias. Observing the empirical sums of RMSE, the PNLS-MLME and pseudo-ECM methods can yield more accurate estimates of model parameters as

N = 25

and

N = 50

, respectively, while the others show minor difference in RMSE scores. The MCEM method generally offers better precision of the parameter estimates when the size of generated MC samples increases. Although the MCEM spent much CPU time and had larger iteration numbers to achieve convergence, it can produce relatively small bias for the approximation of observed log-likelihood and smaller RMSE for estimates of model parameters, especially for large sizes of sample

N = 50

and MC samples

M = 2000

. Additionally, among the three settings of

P_{0}

for ISEM, the case of equal weights (say

P_{0} = 0.5

) gives smaller RB and RMSE scores. If we want to obtain more accurate results of approximate log-likelihood using the ISEM algorithm, probably a larger number of samples of random effects might be necessary, but it seems inefficient. As expected, when the sample size N increases, the required CPU time and iteration number increase, and the RB and RMSE decrease, confirming the large sample properties of ML estimation. In addition, the RMSE (

\times 10^{2}

) for the estimates of each parameter under the nine considered estimating procedures are listed in Table 5. It seems that the estimators for

β_{5}

,

β_{6}

,

d_{11}

,

d_{21}

,

d_{22}

and

σ_{21}

show somewhat less precise point estimates as opposed to the other parameters in the setting of this simulation. Observing the table, there are remarkable differences in the magnitude of RMSE values as the precision of parameter estimates depends heavily on the specification of nonlinear mean functions. Moreover, there are no consistent rankings of precision among the nine considered procedures for each parameter. Although this is a limited study, it demonstrates that all five approximation methods can give reasonable results for parameter estimation.

Table 4. Simulation results for nine estimating procedures under the bivariate nonlinear case.

**Table 4.** Simulation results for nine estimating procedures under the bivariate nonlinear case.
Sample Size N	Methods	Comparison Criteria
Sample Size N	Methods	Time	Iter	$ℓ_{\max}$	RB	RMSE
25	PNLS-MLME	5.071	3.533	−847.968	0.009	1.671
	Laplacian	21.199	7.133	−860.383	−0.012	2.000
	Pseudo-ECM	2.709	12.000	−847.994	0.009	1.967
	MCEM $(M = 500)$	9062.743	380.000	−847.217	0.010	2.099
	MCEM $(M = 1000)$	9569.619	213.733	−847.346	0.010	2.072
	MCEM $(M = 2000)$	11,375.297	131.400	−847.896	0.009	2.029
	ISEM $(P_{0} = 0.9)$	17,008.449	333.733	−887.996	−0.028	1.999
	ISEM $(P_{0} = 0.5)$	4635.601	93.400	−881.169	−0.018	1.882
	ISEM $(P_{0} = 0.1)$	1086.651	22.200	−862.842	−0.020	2.077
50	PNLS-MLME	14.149	3.940	−1710.123	0.007	1.119
	Laplacian	53.066	7.690	−1763.046	−0.010	1.134
	Pseudo-ECM	11.331	13.070	−1710.216	0.007	1.110
	MCEM $(M = 500)$	15,860.866	392.595	−1713.939	0.005	1.184
	MCEM $(M = 1000)$	24,077.335	238.470	−1714.151	0.005	1.157
	MCEM $(M = 2000)$	26,328.930	134.750	−1714.447	0.004	1.151
	ISEM $(P_{0} = 0.9)$	31,224.663	386.120	−1789.168	−0.021	1.255
	ISEM $(P_{0} = 0.5)$	7065.363	106.350	−1780.396	−0.015	1.138
	ISEM $(P_{0} = 0.1)$	2805.677	26.870	−1779.298	−0.018	1.153

Table 5. Relative mean squared errors (

\times 10^{2}

) for the estimates of model parameters under nine iterative procedures.

**Table 5.** Relative mean squared errors ( $\times 10^{2}$ ) for the estimates of model parameters under nine iterative procedures.
Sample Size N	Methods	Parameter
Sample Size N	Methods	$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$	$β_{5}$	$β_{6}$	$d_{11}$	$d_{21}$	$d_{22}$	$σ_{11}$	$σ_{21}$	$σ_{22}$
25	PNLS-MLME	0.046	0.960	0.046	0.041	18.505	11.559	21.698	87.774	4.565	0.998	20.003	0.909
	Laplacian	0.045	0.965	0.046	0.033	18.600	11.766	20.875	120.340	4.609	1.412	20.013	1.325
	Pseudo-ECM	0.043	0.964	0.046	0.026	18.501	11.570	20.066	118.786	4.759	0.988	20.010	0.909
	MCEM $(M = 500)$	0.046	0.956	0.045	0.039	18.736	11.781	20.926	130.477	4.549	1.389	19.682	1.299
	MCEM $(M = 1000)$	0.047	0.969	0.045	0.038	18.589	11.668	21.160	127.668	4.797	1.383	19.559	1.314
	MCEM $(M = 2000)$	0.047	0.970	0.046	0.036	18.602	11.669	20.402	123.817	4.606	1.404	19.935	1.315
	ISEM $(P_{0} = 0.9)$	0.046	0.960	0.046	0.028	18.590	11.666	20.510	120.740	4.609	1.400	20.013	1.315
	ISEM $(P_{0} = 0.5)$	0.047	0.969	0.046	0.040	18.420	11.476	19.930	110.919	4.000	1.463	19.619	1.271
	ISEM $(P_{0} = 0.1)$	0.043	0.993	0.045	0.021	18.785	11.899	26.451	122.587	4.407	1.631	19.474	1.377
50	PNLS-MLME	0.053	0.433	0.019	0.083	8.038	6.437	9.609	62.998	3.126	0.355	20.445	0.290
	Laplacian	0.054	0.433	0.019	0.040	8.056	6.501	10.172	63.165	3.121	0.787	20.025	1.051
	Pseudo-ECM	0.053	0.432	0.019	0.043	8.055	6.452	8.858	62.921	3.013	0.355	20.493	0.289
	MCEM $(M = 500)$	0.052	0.420	0.019	0.087	8.117	6.505	10.334	67.836	3.055	0.875	20.054	1.033
	MCEM $(M = 1000)$	0.054	0.420	0.019	0.085	8.149	6.508	10.099	65.263	3.113	0.881	20.003	1.063
	MCEM $(M = 2000)$	0.054	0.418	0.019	0.075	8.120	6.494	10.185	64.350	3.120	0.892	20.274	1.070
	ISEM $(P_{0} = 0.9)$	0.059	0.415	0.019	0.080	8.034	6.429	18.313	67.054	3.142	0.924	20.045	1.011
	ISEM $(P_{0} = 0.5)$	0.055	0.429	0.019	0.053	8.131	6.508	9.040	64.614	3.143	0.861	19.819	1.080
	ISEM $(P_{0} = 0.1)$	0.052	0.431	0.019	0.035	8.194	6.554	10.182	64.165	3.011	0.987	20.542	1.147

5. Discussion and Conclusions

In this article, we describe and compare five approximation methods to carry out ML estimation of the MNLMM, as well as the evaluation of the observed log-likelihood function. The methods, namely PNLS-MLME, Laplacian approximation, pseudo-ECM, MCEM and ISEM algorithms, depend on the result of the first two order Taylor expansions. The PNLS-MLME and pseudo-ECM methods use a linearization of nonlinear mean functions, while the other three methods rely on an approximation of the observed likelihood. Numerical results indicate that the five methods can give comparable accuracy of the estimation of model parameters, as well as approximation of the observed log-likelihood function of the MNLMM.

In summary, the five algorithmic schemes preserve flexibility and simplicity in carrying out ML estimation of the MNLMM. The pseudo-ECM method can offer relatively better efficiency compared to the other four methods. For the PNLS-MLME and Laplacian methods, a poor initial guess of θ can result in poor estimates of

{b_{i}}_{i = 1}^{N}

, and thereby, the accuracy of parameter estimates and the performance of convergence become worse. To overcome this weakness, the consideration of different starting values for

{\hat{D}}^{(0)}

is recommended by specifying

c {\hat{D}}^{(0)}

, where c is a random draw from the standard normal distribution and the original

{\hat{D}}^{(0)}

is given in Section 2.7. The MCEM and ISEM methods appear to be less efficient, because both of them spend much time to generate MC samples for evaluating the required conditional expectations in each iteration. For the implementation of the ISEM algorithm, the specification of the mixing proportion

P_{0}

depends on the data at hand. We suggest trying a variety of settings and choose the optimal

P_{0}

corresponding to the maximized approximate observed log-likelihood. An R package for fitting MNLMM based on the proposed techniques will be released in the near future.

However, the multivariate normality assumption in the MNLMM might not provide robust inference if the data, even after being transformed, and exhibit fat tails and/or skewness [48,49,50]. To alleviate such limitations, it is natural to replace the multivariate normally-distributed random effects and within-subject errors of the MNLMM by a broader family, such as the multivariate skew-normal distribution [51], the multivariate skew-t distribution [52], the multivariate skew-elliptical distribution [53], or the multivariate skew-normal independent distribution [54,55]. The proposed methods are readily extendable to carry out ML estimation of the multivariate version of skew-family nonlinear mixed models. This leads to valuable further research on the issue of developing multivariate skew-family nonlinear mixed models together with their ML inference.

Acknowledgments

The author would like to express her deepest gratitude to the Chief Editor, the Associate Editor and two anonymous reviewers for their insightful comments and suggestions that greatly improved this article. This work was partially supported by the Ministry of Science and Technology under Grant No. MOST 103-2118-M-035-001-MY2 of Taiwan.

Conflicts of Interest

The author declares no conflict of interest.

Appendix

A. Score Vector and Hessian Matrix

The score vector

S_{α}

calculated as the first derivatives of

ℓ_{PD} (θ | y)

in Equation (9) with respect to each entry of α can be expressed by:

\begin{matrix} {[S_{α}]}_{l} & = & \frac{1}{2} \sum_{i = 1}^{N} \{{({\tilde{y}}_{i}^{(h)} - {\tilde{X}}_{i}^{(h)} β)}^{T} {\tilde{Λ}}_{i}^{{(h)}^{- 1}} {\dot{\tilde{Λ}}}_{i l}^{(h)} {\tilde{Λ}}_{i}^{{(h)}^{- 1}} ({\tilde{y}}_{i}^{(h)} - {\tilde{X}}_{i}^{(h)} β) - tr ({\tilde{Λ}}_{i}^{- 1^{(h)}} {\dot{\tilde{Λ}}}_{i l}^{(h)})\}, \end{matrix}

for

l = 1, ..., g

,

g = q (q + 1) / 2 + r (r + 1) / 2 + \dim (ϕ)

, where

{\tilde{Λ}}_{i}^{(h)} = {\tilde{Z}}_{i}^{(h)} D {\tilde{Z}}_{i}^{{(h)}^{T}} + \sum \otimes C_{i} (ϕ)

,

\begin{matrix} {\dot{\tilde{Λ}}}_{i l}^{(h)} = \frac{\partial {\tilde{Λ}}_{i}^{(h)}}{\partial w_{l}} = \{\begin{matrix} {\tilde{Z}}_{i}^{(h)} \frac{\partial D}{\partial w_{l}} {\tilde{Z}}_{i}^{{(h)}^{T}} & if w_{l} = vech (D), \\ \frac{\partial Σ}{\partial w_{l}} \otimes C_{i} (ϕ) & if w_{l} = vech (\sum), \\ \sum \otimes \frac{\partial C_{i} (ϕ)}{\partial w_{l}} & if w_{l} = ϕ . \end{matrix} \end{matrix}

(A.1)

Here,

\partial D / \partial ω_{l}

is one in the

(j, l)

-th and the

(l, j)

-th elements of D as

ω_{l} = d_{j l}

, say the distinct element of D, and zero otherwise; similarly for

\partial \sum / \partial ω_{l}

when

ω_{l} = σ_{j l}

. Besides, the Hessian matrix calculated as the second derivatives of

ℓ_{PD} (θ | y)

with respect to each entry of α is:

\begin{matrix} {[H_{α α}]}_{l u} & = & \frac{1}{2} \sum_{i = 1}^{N} {tr [{\tilde{Λ}}_{i}^{- 1^{(h)}} ({\dot{\tilde{Λ}}}_{i u}^{(h)} {\tilde{Λ}}_{i}^{- 1^{(h)}} {\dot{\tilde{Λ}}}_{i l}^{(h)} - {\ddot{\tilde{Λ}}}_{i l u}^{(h)})] + tr [({\tilde{y}}_{i}^{(h)} - {\tilde{X}}_{i}^{(h)} β) \\ \times {({\tilde{y}}_{i}^{(h)} - {\tilde{X}}_{i}^{(h)} β)}^{T} ({\tilde{Λ}}_{i}^{- 1^{(h)}} ({\ddot{\tilde{Λ}}}_{i l u}^{(h)} - 2 {\dot{\tilde{Λ}}}_{i u}^{(h)} {\tilde{Λ}}_{i}^{- 1^{(h)}} {\dot{\tilde{Λ}}}_{i l}^{(h)}) {\tilde{Λ}}_{i}^{- 1^{(h)}})]}, \end{matrix}

where:

\begin{matrix} {\ddot{\tilde{Λ}}}_{i l u}^{(h)} = \frac{\partial {\dot{\tilde{Λ}}}_{i}^{(h)}}{\partial w_{l}} = \{\begin{matrix} \frac{\partial Σ}{\partial w_{l}} \otimes \frac{\partial C_{i} (ϕ)}{\partial w_{u}} & if w_{l} = vech (\sum), w_{u} = ϕ, \\ 0 & otherwise . \end{matrix} \end{matrix}

References

Shah, A.; Laird, N.; Schoenfeld, D. A Random-Effects Model for Multiple Characteristics with Possibly Missing Data. J. Am. Stat. Assoc. 1997, 92, 775–779. [Google Scholar] [CrossRef]
Marshall, G.; de la Cruz-Mesía, R.; Barón, A.E.; Rutledge, J.H.; Zerbe, G.O. Non-linear Random Effects Model for Multivariate Responses with Missing Data. Statist. Med. 2006, 25, 2817–2830. [Google Scholar] [CrossRef] [PubMed]
Sammel, M.; Lin, X.; Ryan, L. Multivariate Linear Mixed Models for Multiple Outcomes. Statist. Med. 1999, 18, 2479–2492. [Google Scholar] [CrossRef]
Song, X.; Davidian, M.; Tsiatis, A.A. An Estimator for the Proportional Hazards Model with Multiple Longitudinal Covariates Measured with Error. Biostatistics 2002, 3, 511–528. [Google Scholar] [CrossRef] [PubMed]
Roy, J.; Lin, X. Analysis of Multivariate Longitudinal Outcomes with Nonignorable Dropouts and Missing Covariates: Changes in Methadone Treatment Practices. J. Am. Stat. Assoc. 2002, 97, 40–52. [Google Scholar] [CrossRef]
Roy, A. Estimating Correlation Coefficient between Two Variables with Repeated Observations Using Mixed Effects Model. Biom. J. 2006, 48, 286–301. [Google Scholar] [CrossRef] [PubMed]
Wang, W.L.; Fan, T.H. ECM-Based Maximum Likelihood Inference for Multivariate Linear Mixed Models with Autoregressive Errors. Comput. Stat. Data Anal. 2010, 54, 1328–1341. [Google Scholar] [CrossRef]
Lindstrom, M.J.; Bates, D.M. Nonlinear Mixed Effects Models for Repeated Measures Data. Biometrics 1990, 46, 673–687. [Google Scholar] [CrossRef] [PubMed]
Davidian, M.; Giltinan, D.M. Nonlinear Models for Repeated Measurements Data; Chapman & Hall: London, UK, 1995. [Google Scholar]
Pinheiro, J.C.; Bates, D.M. Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model. J. Comput. Graph. Stat. 1995, 4, 12–35. [Google Scholar]
Pinheiro, J.C.; Bates, D.M. Mixed-Effects Models in S and S-PLUS; Springer: Berlin, Germany, 2000. [Google Scholar]
Pinheiro, J.; Bates, D.; DebRoy, S.; Sarkar, D.; R Core Team. nlme: Linear and Nonlinear Mixed Effects Models, R package version 3.1-104; Available online: http://CRAN.R-project.org/package=nlme (accessed on 24 July 2015).
Dey, D.K.; Chen, M.H.; Chang, H. Bayesian Approach for Nonlinear Random Effects Models. Biometrics 1997, 53, 1239–1252. [Google Scholar] [CrossRef]
Huang, Y.; Liu, D.; Wu, H. Hierarchical Bayesian Methods for Estimation of Parameters in a Longitudinal HIV Dynamic System. Biometrics 2006, 62, 413–423. [Google Scholar] [CrossRef] [PubMed]
Lachosa, V.H.; Castrob, L.M.; Dey, D.K. Bayesian Inference in Nonlinear Mixed-Effects Models Using Normal Independent Distributions. Comput. Stat. Data Anal. 2013, 64, 237–252. [Google Scholar] [CrossRef]
Wolfinger, R.D.; Lin, X. Two Taylor-Series Approximation Methods for Nonlinear Mixed Models. Comput. Stat. Data Anal. 1997, 25, 465–490. [Google Scholar] [CrossRef]
Ge, Z.; Bickel, J.P.; Rice, A.J. An Approximate Likelihood Approach to Nonlinear Mixed Effects Models via Spline Approximation. Comput. Stat. Data Anal. 2004, 46, 747–776. [Google Scholar] [CrossRef]
Walker, S.G. An EM Algorithm for Nonlinear Random Effects Models. Biometrics 1996, 52, 934–944. [Google Scholar] [CrossRef]
Wang, J. EM Algorithms for Nonlinear Mixed Effects Models. Comput. Stat. Data Anal. 2007, 51, 3244–3256. [Google Scholar] [CrossRef]
Vonesh, E.F.; Wang, H.; Nie, L.; Majumdar, D. Conditional Second-order Generalized Estimating Equations for Generalized Linear and Nonlinear Mixed-Effects Models. J. Am. Stat. Assoc. 2002, 97, 271–283. [Google Scholar] [CrossRef]
Vonesh, E.F. Non-linear Models for the Analysis of Longitudinal Data. Stat. Med. 1992, 11, 1929–1954. [Google Scholar] [CrossRef] [PubMed]
Beal, S.; Sheiner, L. The NONMEM System. Am. Stat. 1980, 34, 118–199. [Google Scholar] [CrossRef]
Wolfinger, R.D. Comment: Experiences with the SAS Macro NLINMIX. Stat. Med. 1997, 16, 1258–1259. [Google Scholar]
Wolfinger, R.D. Fitting Nonlinear Mixed Models with the New NLMIXED Procedure. In Proceedings of the 99 Joint Statistical Meetings, Miami Beach, FL, USA, 11–14 April 1999.
Kuhn, E.; Lavielle, M. Maximum Likelihood Estimation in Nonlinear Mixed Effects Models. Comput. Stat. Data Anal. 2005, 49, 1020–1038. [Google Scholar] [CrossRef]
Lavielle, M. MONOLIX (MOdelès NOn LInéaires à effets miXtes); MONOLIX Group: Orsay, France, 2008. [Google Scholar]
Beal, S.; Sheiner, L.; Boeckmann, A.; Bauer, R. NONMEM User's Guides (1989–2009); Icon Development Solutions: Ellicott City, MD, USA, 2009. [Google Scholar]
Comets, E.; Lavenu, A.; Lavielle, M. Saemix: Stochastic Approximation Expectation Maximization (SAEM) Algorithm. R package version 1. 2011. [Google Scholar]
Wang, W.L.; Fan, T.H. Estimation in Multivariate t Linear Mixed Models for Multiple Longitudinal Data. Statist. Sinica 2011, 21, 1857–1880. [Google Scholar] [CrossRef]
Wang, W.L.; Fan, T.H. Bayesian Analysis of Multivariate t Linear Mixed Models Using a Combination of IBF and Gibbs Samplers. J. Multivar. Anal. 2012, 105, 300–310. [Google Scholar] [CrossRef]
Wang, W.L. Multivariate t Linear Mixed Models for Irregularly Observed Multiple Repeated Measures with Missing Outcomes. Biom. J. 2013, 55, 554–571. [Google Scholar] [CrossRef] [PubMed]
Tierney, L.; Kadane, J.B. Accurate Approximations for Posterior Moments and Densities. J. Am. Stat. Assoc. 1986, 81, 82–86. [Google Scholar] [CrossRef]
Meng, X.L.; Rubin, D.B. Maximum Likelihood Estimation via the ECM Algorithm: A General Framework. Biometrika 1993, 80, 267–278. [Google Scholar] [CrossRef]
Booth, G.J.; Hobert, P.J. Maximizing Generalized Linear Mixed Model Likelihoods with an Automated Monte Carlo EM Algorithm. J. R. Stat. Soc. Ser. B 1999, 61, 265–285. [Google Scholar] [CrossRef]
Lai, T.L.; Shih, M.C. A Hybrid Estimator in Nonlinear and Generalized Linear Mixed Effects Models. Biometrika 2006, 90, 791–795. [Google Scholar]
Leonard, T.; Hsu, J.S.J.; Tsui, K.W. Bayesian Marginal Inference. J. Am. Stat. Assoc. 1989, 84, 1051–1058. [Google Scholar] [CrossRef]
Bates, D.M.; Watts, D.G. Relative Curvature Measures of Nonlinearity. J. R. Stat. Soc. Ser. B 1980, 42, 1–25. [Google Scholar]
R Development Core Team. R. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2012. [Google Scholar]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm (with Discussion). J. R. Stat. Soc. Ser. B 1977, 39, 1–38. [Google Scholar]
Wei, G.C.G.; Tanner, M.A. A Monte Carlo Implementation of the EM Algorithm and the Poor's Man's Data Augmentation Algorithms. J. Am. Stat. Assoc. 1990, 85, 699–704. [Google Scholar] [CrossRef]
Hastings, W.K. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
Lederman, M.M.; Connick, E.; Landay, A.; Kuritzkes, D.R.; Spritzler, J.; Clair, M.S.; Kotzin, B.L.; Fox, L.; Chiozzi, M.H.; Leonard, J.M.; et al. Immunologic Responses Associated with 12 Weeks of Combination Antiretroviral Therapy Consisting of Zidovudine, Lamivudine, and Ritonavir: Results of AIDS Clinical Trials Group Protocol 315. J. Infect. Dis. 1998, 178, 70–79. [Google Scholar] [CrossRef] [PubMed]
Connick, E.; Lederman, M.M.; Kotzin, B.L.; Spritzler, J.; Kuritzkes, D.R.; Clair, M.S.; Sevin, A.D.; Fox, L.; Chiozzi, M.H.; Leonard, J.M.; et al. Immune Reconstitution in the First Year of Potent Antiretroviral Therapy and Its Relationship to Virologic Response. J. Infect. Dis. 2000, 181, 358–363. [Google Scholar] [CrossRef] [PubMed]
Wu, H.; Ding, A. Population HIV-1 Dynamics in Vivo: Applicable Models and Inferential Tools for Virological Data from AIDS Clinical Trials. Biometrics 1999, 55, 410–418. [Google Scholar] [CrossRef] [PubMed]
Liang, H.; Wu, H.; Carroll, R.J. The Relationship between Virologic Responses in AIDS Clinical Research Using Mixed-Effects Varying-Coefficient Models with Measurement Error. Biostatistics 2003, 4, 297–312. [Google Scholar] [CrossRef] [PubMed]
Wu, H.; Liang, H. Backfitting Random Varying-Coefficient Models with Timedependent Smoothing Covariates. Scand. J. Stat. 2004, 31, 3–19. [Google Scholar] [CrossRef]
Lin, T.I.; Wang, W.L. Multivariate Skew-Normal Linear Mixed Models for Multi-outcome Longitudinal Data. Stat. Model. 2013, 13, 199–221. [Google Scholar] [CrossRef]
Lin, T.I.; Lee, J.C. A Robust Approach to t Linear Mixed Models Applied to Multiple Sclerosis Data. Statist. Med. 2006, 25, 1397–1412. [Google Scholar] [CrossRef] [PubMed]
Lin, T.I.; Lee, J.C. Bayesian Analysis of Hierarchical Linear Mixed Modeling Using Multivariate t Distributions. J. Statist. Plan. Inf. 2007, 137, 484–495. [Google Scholar] [CrossRef]
Lin, T.I.; Lee, J.C. Estimation and Prediction in Linear Mixed Models with Skew Normal Random Effects for Longitudinal Data. Statist. Med. 2008, 27, 1490–1507. [Google Scholar] [CrossRef] [PubMed]
Arellano-Valle, R.B.; Genton, M. On Fundamental Skew Distributions. J. Multivar. Anal. 2005, 96, 93–116. [Google Scholar] [CrossRef]
Azzalini, A.; Capitaino, A. Distributions Generated by Perturbation of Symmetry with Emphasis on a Multivariate Skew t-Distribution. J. R. Stat. Soc. Ser. B 2003, 65, 367–389. [Google Scholar] [CrossRef]
Branco, M.; Dey, D. A General Class of Multivariate Skew-Elliptical Distribution. J. Multivar. Anal. 2001, 79, 93–113. [Google Scholar] [CrossRef]
Bandyopadhyay, D.; Lachos, V.H.; Abanto-Vallec, C.A.; Ghosh, P. Linear Mixed Models for Skew-Normal/Independent Bivariate Responses with an Application to Periodontal Disease. Statist. Med. 2010, 29, 2643–2655. [Google Scholar] [CrossRef] [PubMed]
Bandyopadhyay, D.; Castro, L.M.; Lachos, V.H.; Pinheiro, H.P. Robust Joint Non-linear Mixed-Effects Models and Diagnostics for Censored HIV Viral Loads with CD4 Measurement Error. J. Agr. Biol. Environ. Stat. 2015, 20, 121–139. [Google Scholar] [CrossRef]

© 2015 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, W.-L. Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models. Entropy 2015, 17, 5353-5381. https://doi.org/10.3390/e17085353

AMA Style

Wang W-L. Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models. Entropy. 2015; 17(8):5353-5381. https://doi.org/10.3390/e17085353

Chicago/Turabian Style

Wang, Wan-Lun. 2015. "Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models" Entropy 17, no. 8: 5353-5381. https://doi.org/10.3390/e17085353

Article Menu

Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models

Abstract

1. Introduction

2. Five Approximate ML Procedures

2.1. PNLS-MLME Procedure

2.2. Laplacian Procedure

2.3. Pseudo-ECM Algorithm

2.4. Monte Carlo EM Algorithm

2.5. Importance Sampling EM Algorithm

2.6. Expected Information Matrix

2.7. Initialization

3. Application: ACTG 315 Data

4. Simulation Study

4.1. Bivariate Linear Case

4.2. Bivariate Nonlinear Case

5. Discussion and Conclusions

Acknowledgments

Conflicts of Interest

Appendix

A. Score Vector and Hessian Matrix

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI