Next Article in Journal
Multifractal Dimensional Dependence Assessment Based on Tsallis Mutual Information
Next Article in Special Issue
A Bayesian Predictive Discriminant Analysis with Screened Data
Previous Article in Journal / Special Issue
Statistical Evidence Measured on a Properly Calibrated Scale across Nested and Non-nested Hypothesis Comparisons
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models

Department of Statistics, Graduate Institute of Statistics and Actuarial Science, Feng Chia University, Taichung 40724, Taiwan
Entropy 2015, 17(8), 5353-5381; https://doi.org/10.3390/e17085353
Submission received: 21 April 2015 / Revised: 17 July 2015 / Accepted: 21 July 2015 / Published: 29 July 2015
(This article belongs to the Special Issue Inductive Statistical Methods)

Abstract

:
Multivariate nonlinear mixed-effects models (MNLMM) have received increasing use due to their flexibility for analyzing multi-outcome longitudinal data following possibly nonlinear profiles. This paper presents and compares five different iterative algorithms for maximum likelihood estimation of the MNLMM. These algorithmic schemes include the penalized nonlinear least squares coupled to the multivariate linear mixed-effects (PNLS-MLME) procedure, Laplacian approximation, the pseudo-data expectation conditional maximization (ECM) algorithm, the Monte Carlo EM algorithm and the importance sampling EM algorithm. When fitting the MNLMM, it is rather difficult to exactly evaluate the observed log-likelihood function in a closed-form expression, because it involves complicated multiple integrals. To address this issue, the corresponding approximations of the observed log-likelihood function under the five algorithms are presented. An expected information matrix of parameters is also provided to calculate the standard errors of model parameters. A comparison of computational performances is investigated through simulation and a real data example from an AIDS clinical study.

1. Introduction

Analysis of multi-outcome longitudinal data with various features has attracted considerable interest in clinical trials, biological psychology, environmental science and medical research, to name a few. The methodology of multivariate linear mixed-effects models (MLMM) [1] and multivariate nonlinear mixed-effects models (MNLMM) [2] has been developed for related work. A comprehensive study of the MLMM along with its applications can be found in [3,4,5,6,7], among others. Nonlinear models for repeated-measures data rest on more complicated mathematical derivations and heavier computational requirements than linear models, but they can offer flexibility in capturing a broader range of data patterns. Several approaches to carrying out maximum likelihood (ML) estimation of nonlinear mixed-effects models (NLMM) for single-outcome longitudinal data have been studied; see, for example, [8,9,10,11,12]. Bayesian inference in NLMM via Markov chain Monte Carlo (MCMC) procedures can be found, for instance, in [13,14,15]. Although the use of the NLMM, as well as its extensions in other families of distributions have been pretty well established in the literature, to the best of our knowledge, exploration of the inference on MNLMM is relatively rare so far. Analyzing each response variable of the data by fitting the NLMM separately might be inappropriate and fail to take account of the between-variable association, as well as its evolution.
For the general NLMM, the linearization method [8,16] that exploits a first-order Taylor expansion to approximate the nonlinear function in terms of a linear pseudo-data model is by far the most widely-used approach due to its numerical simplicity. Despite its popularity, [17] argued that the linearization method may produce substantial bias in parameter estimation, as the number of observations per subject is too small, and the variability of random effects tends to be large at the same time. Although computationally much simpler, the Laplace approximation method [10] can also lead to considerably-biased parameter estimates, depending on the quality of the mode. As an alternative to the pseudo-data and Laplace approximation approaches, the integral approximation methods that use Monte Carlo integration [18] or importance sampling [19] to approximate the observed likelihood may provide more accurate estimates than the linearization method. However, the numerical integration methods are generally inefficient to implement and become computationally prohibitive when the dimension of random effects increases [20]. Over the past few decades, several estimation algorithms for NLMM have been developed and implemented in different software. For example, the linearization methods using the first-order Taylor expansion [21] or the first-order conditional estimation (FOCE) [8,16] are embedded in R function nlme, while the Laplace approximation method is implemented in NONMEM [22] and the SAS macro NLIMIX [23]. A new SAS macro NLMIXED incorporating adaptive Gaussian quadrature has shown considerable improvement [24]. The other improved procedure based on the stochastic approximation expectation maximization [25] was implemented in MONOLIX [26], NONMEM [27] and R package saemix [28]. Multivariate nonlinear mixed-effects models can be fitted using ad hoc manipulation by expanding the design matrix with extra columns of dummy covariates flagging each element of the original multivariate responses.
Consider the multiple repeated measures { ( Y i , X i ) , i = 1 , ... , N } , where Y i is a s i × r response matrix composed of r response vectors y i j = ( y i j , 1 , ... , y i j , s i ) T , j = 1 , ... , r , and X i is the covariate matrix for the i-th subject. Let E i = [ e i 1 : e i 2 : : e i r ] be the s i × r matrix of within-subject errors associated with Y i , where e i j = ( e i j , 1 , ... , e i j , s i ) . Let y i = vec ( Y i ) and e i = vec ( E i ) denote the stacked s i r × 1 vectors of all responses and within-subject errors, respectively.
In general, the MNLMM takes the form of:
y i = μ i ( η i , X i ) + e i , i = 1 , ... , N
where μ i = μ i ( η i , X i ) is a nonlinearly-differentiable function of a subject-specific parameter η i governing the within-profile behaviors and e i is a vector containing normally-distributed error components. Moreover, the fixed effects β and the random effects bi can be incorporated into the model by letting:
η i = A i β + B i b i
where A i and B i are design matrices of size s × p and s × q , respectively. We assume that bi follows a multivariate normal distribution with mean vector 0 and q × q variance-covariance matrix D, denoted by b i N q ( 0 , D ) , and independent of e i N s i r ( 0 , R i ) . The joint distributions of ( b i T , e i T ) T for distinct subjects are independent. To reduce the number of parameters in R i , we assume that the k-th row of E i , say e i · k , follows N r ( 0 , ) , and the j-th column of E i , say e i j · , follows N s i ( 0 , C i ) , such that R i = C i . This specification implies that within-subject errors for all responses measured at the same occasion have variance-covariance . To capture the extra autocorrelation of a given response among irregularly-observed occasions, some parsimonious dependence structures can be made on C i , such as the compound symmetry, the p-order autoregressive model [29,30] and the damped exponential correlation [31]. For simplicity, we write C i = C i ( ϕ ) , which depends on subject i according to its dimension s i with each entry being a function of a small set of parameters ϕ describing within-subject autocorrelation.
Let θ = ( β , D , , ϕ ) be the entire model parameters. According to Model Equation (1) with Assumption Equation (2), the marginal density of y i is:
f ( y i | θ ) = ϕ s i r ( y i | μ i , R i ) ϕ q ( b i | 0 , D ) d b i ,
where ϕ d ( · | μ , Ω ) denotes the probability density function (pdf) of a d-variate normal distribution with mean vector μ and variance-covariance matrix Ω. Typically, this integral cannot yield a closed-form expression when the vector-valued function μ i = μ i ( η i , X i ) is nonlinear in random effects b i . Thus, the log-likelihood function of θ for y = { y 1 , ... , y N } is given by:
( θ | y ) = i = 1 N log { ( 2 π ) - ( s i r + q ) / 2 | | - s i / 2 | C i | - r / 2 | D | - 1 / 2 × exp - 1 2 y i - μ i T R i - 1 y i - μ i + b i T D - 1 b i d b i } .
The purpose of this article is to consider five different methods for carrying out ML estimation of the MNLMM described in Equation (1) along with Equation (2) and for approximating the observed log-likelihood Function Equation (4). The methods include the penalized nonlinear least squares coupled to multivariate linear mixed effects (PNLS-MLME) approximation [8], Laplacian approximation [32], a pseudo-data version of the expectation conditional maximization (ECM) algorithm [33], the Monte Carlo EM (MCEM) algorithm [34] and the importance sampling EM (ISEM) algorithm [35]. The approximation to the observed log-likelihood is based on the standard Taylor expansion and is easy to calculate within the algorithms. A simple way of computing standard errors of parameters via the information-based method is provided.
The article is organized as follows. In Section 2, we describe the five computational procedures for ML estimation of the MNLMM together with the calculation of standard errors of parameters. In Section 3, the proposed methodology is illustrated with the analysis of HIV-AIDS data. Section 4 presents a comparison of the five approximation methods through simulation studies. We summarize and discuss implications in Section 5. The technical derivations are collected in the Appendix.

2. Five Approximate ML Procedures

From Model Equation (1), the j-th column (outcome) of Y i , say y i j = ( y i j , 1 , ... , y i j , s i ) T , can be formulated as:
y i j = μ i j ( η i , x i j ) + e i j ,
where μ i j ( η i , x i j ) = ( μ j ( η i , x i j , 1 ) , ... , μ j ( η i , x i j , s i ) ) T and e i j = ( e i j , 1 , ... , e i j , s i ) T . Analogously, the model for the k-th row (occasion) can be expressed as:
y i , k = μ i k ( η i , x i k ) + e i , k ,
where y i , k = ( y i 1 , k , ... , y i r , k ) T , μ i k ( η i , x i k ) = ( μ 1 ( η i , x i 1 , k ) , ... , μ r ( η i , x i r , k ) ) T and e i , k = ( e i 1 , k , ... , e i r , k ) T . We present five algorithms for employing ML estimation of Model Equation (1). The approximation to the observed log-likelihood Function Equation (4) and the calculation of standard errors of parameters are discussed, as well.

2.1. PNLS-MLME Procedure

Following the linear mixed-effects (LME) approximation method suggested by [8], the first procedure consists of two steps: a penalized nonlinear least squares (PNLS) step and a multivariate LME (MLME) step. The basic idea behind this procedure is that we estimate the unobservable random effects b i via the PNLS step and then update the ML estimates of parameters θ based on the formulation of MLMM for the pseudo-data. Specifically, the proposed PNLS-MLME procedure is sketched below.
In the PNLS step, first define:
g ( y i , b i , θ ) = ( y i μ i ( β , b i ) ) T ( C i ) 1 ( y i μ i ( β , b i ) ) + b i T D −1 b i
where μ i ( β , b i ) = μ i ( η i , X i ) , for i = 1 , 2 , , N , is a function of fixed effects β and random effects b i . Fixing the current estimates of parameters θ ^ ( h ) = ( β ^ ( h ) , D ^ ( h ) , ^ ( h ) , ϕ ^ ( h ) ) , the conditional modes of random effects b i are obtained through minimizing a penalized nonlinear least-squaresobjective function:
{ b ^ i ( h ) } i = 1 N = arg min   i = 1 N g ( y i , b i , θ ^ ( h ) )
The joint distributions ( b i T , e i T ) T for distinct subjects are independent, and thus, all y i are independent of each other. In practice, solving over b ^ i ( h ) for each subject can be implemented by minimizing g ( y i , b i , θ ^ ( h ) ) with respect to q-dimensional random effects of one subject at a time, rather than finding the solutions with respect to those of all subjects simultaneously.
In the MLME step, which allows updating the parameter estimates, we utilize the first-order Taylor expansion of Model Equation (1) around the current estimates η ^ i ( h ) = A i β ^ ( h ) + B i b ^ i ( h ) , that is,
y i j , k - μ j ( η ^ i ( h ) , x i j , k ) + μ ˙ j ( η ^ i ( h ) , x i j , k ) T η ^ i ( h ) = μ ˙ j ( η ^ i ( h ) , x i j , k ) T η i + e i j , k ,
where μ ˙ j , j = 1 , ... , r , are the first partial derivatives of μ j with respect to η i and β and b i are replaced by β ^ ( h ) and { b ^ i ( h ) } i = 1 N , respectively. Denote the pseudo-data by:
y ˜ i j , k = y i j , k - μ j ( η ^ i ( h ) , x i j , k ) + x ˜ i j k β ^ ( h ) + z ˜ i j , k b ^ i ( h ) ,
where x ˜ i j k = μ ˙ j ( η ^ i ( h ) , x i j , k ) T A i and z ˜ i j k = μ ˙ j ( η ^ i ( h ) , x i j , k ) T B i . Consequently, Model Equation (1) can be rewritten as:
y ˜ i j , k = x ˜ i j k β + z ˜ i j k b i + e i j , k .
The model for the super vector of the pseudo-data for the i-th subject is:
y ˜ i = X ˜ i β + Z ˜ i b i + e i ,
where y ˜ i is a s i r × 1 vector composed of r pseudo-response vectors y ˜ i j = ( y ˜ i j , 1 , , y ˜ i j , s i ) T , X ˜ i is a s i r × p matrix with rows made up of p × 1 vector x ˜ i j k and Z ˜ i is a s i r × q matrix with rows made up of q × 1 vector z ˜ i j k . Obviously, Model Equation (8) for the pseudo-data is shown in an LME representation, so the estimation procedure becomes much simpler.
Therefore, the log-likelihood function of θ according to Model Equation (8) can be approximated by:
PD ( θ | y ) - 1 2 i = 1 N { s i r log ( 2 π ) + log | Z ˜ i D Z ˜ i T + C i | + y ˜ i - X ˜ i β T Z ˜ i D Z ˜ i T + C i - 1 y ˜ i - X ˜ i β } .
In the MLME step, we update β ^ ( h ) by a generalized least-squares approach, which yields:
β ^ ( h + 1 ) = i = 1 N X ˜ i T Z ˜ i D ^ ( h ) Z ˜ i T + ^ ( h ) C ^ i ( h ) - 1 X ˜ i - 1 × i = 1 N X ˜ i T Z ˜ i D ^ ( h ) Z ˜ i T + ^ ( h ) C ^ i ( h ) - 1 y ˜ i .
Denote the half-vectorization operator by vech ( · ) , which represents a column vector obtained by vectorizing only the lower triangular entries of a symmetric matrix. Given the current estimate β ^ ( h + 1 ) , we update α ^ ( h ) = ( vech ( D ^ ( h ) ) , vech ( ^ ( h ) ) , ϕ ^ ( h ) ) by the Newton–Raphson method:
α ^ ( h + 1 ) = α ^ ( h ) H ^ α α ( h + 1 / 2 ) −1 S ^ α ( h + 1 / 2 ) ,
where S ^ α ( h + 1 / 2 ) and H ^ α α ( h + 1 / 2 ) are the score vector S α and Hessian matrix H α α evaluated at β = β ^ ( h + 1 ) and α = α ^ ( h ) . Explicit expressions for elements in S α and H α α are given in Appendix.
Iterations of Equations (6), (10) and (11) continue until either the maximum number of iterations or the user-specified convergence tolerance has been achieved.

2.2. Laplacian Procedure

From Function Equation (3) and Definition Equation (5), we have the joint density of ( y i , b i ) , denoted by f ( y i , b i | θ ) = ϕ s i r y i | μ i , R i ϕ q ( b i | 0 , D ) , and the marginal density of y i , given by:
f ( y i | θ ) = ( 2 π ) ( s i r + q ) / 2 | R i | 1 / 2 D 1 / 2 exp { 1 2 g ( y i , b i , θ ) } d b i .
Laplacian approximation [32,36] is an alternative technique to estimate the marginal densities or posterior predictive densities, which involve integrating out all non-target variables. We next discuss how to adopt the Laplacian approximation to evaluate Equation (12) and develop the corresponding estimation algorithm.
Set an initial guess of random effects b i to be:
b ^ i = b ^ i ( y i , θ ) = arg max b i f ( y i , b i | θ ) = arg min b i g ( y i , b i , θ ) .
Consider the second-order Taylor expansion of g ( y i , b i , θ ) around b ^ i . It yields:
g ( y i , b i , θ ) g ( y i , b ^ i , θ ) - g ˙ ( y i , b ^ i , θ ) ( b i - b ^ i ) + 1 2 ( b i - b ^ i ) T g ¨ ( y i , b ^ i , θ ) ( b i - b ^ i ) g ( y i , b ^ i , θ ) + 1 2 ( b i - b ^ i ) T g ¨ ( y i , b ^ i , θ ) ( b i - b ^ i ) ,
because g ˙ ( y i , b ^ i , θ ) = 0 , where the first two partial derivatives of g ( y i , b i , θ ) with respect to b i are:
g ˙ ( y i , b i , θ ) = - 2 μ i ( β , b i ) b i T | b i = b ^ i R i - 1 y i - μ i ( β , b ^ i ) - D - 1 b i ,
and:
g ¨ ( y i , b i , θ ) = - 2 2 μ i b i b i T | b i = b ^ i R i - 1 y i - μ i - μ i b i T | b i = b ^ i R i - 1 μ i b i | b i = b ^ i - D - 1 ,
respectively. Notice that the contribution of the term involving the second derivative of μ i in g ¨ ( y i , b i , θ ) is usually negligible compared to that involving the product of the first derivative of μ i at b i = b ^ i [37]. We hereby define:
g ¨ ( y i , b ^ i , θ ) G ( y i , θ ) = 2 μ i ( β , b i ) b i T | b i = b ^ i T R i - 1 μ i ( β , b i ) b i | b i = b ^ i + D - 1 .
Consequently, the Laplacian approximation to log-likelihood Equation (4) is:
L A ( θ | y ) log { i = 1 N ( 2 π ) - s i r + q 2 | R i | - 1 2 | D | - 1 2 exp - 1 2 g ( y i , b ^ i , θ ) × exp - 1 4 ( b i - b ^ i ) T g ¨ ( y i , b ^ i , θ ) ( b i - b ^ i ) d b i }
= - 1 2 i = 1 N { s i r log ( 2 π ) + log | R i | + log | D | + log | 1 2 G ( y i , θ ) | + y i - μ i ( β , b ^ i ) T R i - 1 y i - μ i ( β , b ^ i ) + b ^ i T D - 1 b ^ i } .
with regard to ML estimation of θ, we can treat it as an optimization problem based on L A ( θ | y ) . Subsequently, we estimate D by taking the first partial derivative of Equation (15) with respect to D - 1 and setting it to zero, yielding:
D ^ = N - 1 i = 1 N b ^ i b ^ i T .
By maximizing Equation (15), the estimates of β, and ϕ react with one another, and thus, we perform an iterative algorithm that proceeds as follows. Given D ^ and the current estimates β ^ ( h ) and ϕ ^ ( h ) , we update the diagonal elements in ^ ( h ) by:
σ ^ j j ( h + 1 ) = i = 1 N s i - 1 i = 1 N tr C i ( ϕ ^ ( h ) ) - 1 y i j - μ ^ i j ( h + 1 ) y i j - μ ^ i j ( h + 1 ) T ,
and the off-diagonal elements by:
σ ^ j l ( h + 1 ) = 2 i = 1 N s i - 1 i = 1 N tr ( C i ( ϕ ^ ( h ) ) - 1 [ y i j - μ ^ i j ( h + 1 ) y i l - μ ^ i l ( h + 1 ) T + y i l - μ ^ i l ( h + 1 ) y i j - μ ^ i j ( h + 1 ) T ] ) ,
for j , l = 1 , ... , r , where μ ^ i j ( h + 1 ) is an s i × 1 subvector consisting of the ( ( j - 1 ) s i + 1 ) -th to the ( j s i ) -th entries of μ ^ i ( h + 1 ) = μ i ( β ^ ( h + 1 ) , b ^ i ) . Unfortunately, equating the first partial derivatives of Equation (15) with respect to β and ϕ, respectively, to zero cannot deduce the updated estimators in closed form. Therefore, we use the nlminb routine [38] to perform a numerical search for updating β ^ ( h ) and ϕ ^ ( h ) sequentially. Specifically,
β ^ ( h + 1 ) = arg min β i = 1 N y i - μ i ( β , b ^ i ) T ( ^ ( h + 1 ) C i ( ϕ ^ ( h ) ) ) - 1 y i - μ i ( β , B ^ i ) ,
and:
ϕ ^ ( h + 1 ) = arg min ϕ { i = 1 N [ log | 1 2 G ( y i , θ ^ ( - ϕ ) ( h + 1 ) ) | + r log | C i ( ϕ ) | + ( y i - μ ^ i ( h + 1 ) ) T ( ^ ( h + 1 ) C i ( ϕ ) ) - 1 ( y i - μ ^ i ( h + 1 ) ) ] } .

2.3. Pseudo-ECM Algorithm

According to the pseudo-data model specified in Equation (8), treating the random effects { b i } i = 1 N as latent data, we establish a complete-data framework of the model:
y ˜ i | b i N s i r ( X ˜ i β + Z ˜ i b i , R i ) , b i N q ( 0 , D ) , i = 1 , ... , N .
Given the pseudo-complete data y ˜ = { y ˜ i } i = 1 N and = { b i } i = 1 N , the log-likelihood function of is:
C P ( θ | y ˜ , b ) = i = 1 N log ( ϕ s i r ( y ˜ i | X ˜ i β + Z ˜ i b i , R i ) ϕ q ( b i | 0 , D ) ) .
To carry out ML estimation for the MNLMM, we develop an ECM algorithm [33], which is a variant of EM [39], replacing its M steps by several computationally-simpler conditional maximization (CM) steps. It has several appealing features, such as stability of monotone convergence and simplicity of implementation. Hereafter, the procedure is referred to as the pseudo-ECM algorithm, because it is developed under the pseudo-data defined in Equation (7). The proposed implementation approach is outlined below.
E step: 
Evaluate the expected complete-data log-likelihood Function Equation (16) conditioning on the current estimates θ ^ ( h ) and the pseudo-responses y ˜ = y ˜ ( β ^ ( h ) , b ^ i ( h - 1 ) ) , which linearize the regression function around the previous estimates of mixed effects ( β ^ ( h ) , b ^ i ( h - 1 ) ) and should be updated at each iteration. This gives rise to the so-called Q-function:
Q ( θ | θ ^ ( h ) ) = - 1 2 i = 1 N { log | C i | + log | D | + tr ( C i ) - 1 Ω ^ i ( h ) + tr ( D - 1 Ψ ^ i ( h ) ) } ,
where:
Ψ ^ i ( h ) = E [ b i b i T | y ˜ i , θ ^ ( h ) ] = b ˜ i ( h ) b ˜ i ( h ) T + ( D ^ ( h ) - 1 + Z ˜ i T R ^ i ( h ) - 1 Z ˜ i ) - 1 , Ω ^ i ( h ) = E [ e ˜ i e ˜ i T | y ˜ i , θ ^ ( h ) ] = e ˜ i ( h ) e ˜ i ( h ) T + Z ˜ i ( D ^ ( h ) - 1 + Z ˜ i T R ^ i ( h ) - 1 Z ˜ i ) - 1 Z ˜ i T
with R ^ i ( h ) = ^ ( h ) C i ( ϕ ^ ( h ) ) , b ˜ i ( h ) = E [ b i | y ˜ i , θ ^ ( h ) ] = D ^ ( h ) Z ˜ i T ( Z ˜ i D ^ ( h ) Z ˜ i T + R ^ i ( h ) ) - 1 ( y ˜ i - X ˜ i β ^ ( h ) ) , and e ˜ i ( h ) = E [ e ˜ i | y ˜ i , θ ^ ( h ) ] = y ˜ i - X ˜ i β - Z ˜ i b ˜ i ( h ) , where y ˜ i = y ˜ i ( β ^ ( h ) , b ^ i ( h - 1 ) ) represents the updated pseudo-responses.
CM step: 
Update the current estimates β ^ ( h ) , D ^ ( h ) , ^ ( h ) and ϕ ^ ( h ) by maximizing the Q-function
Equation (17). We obtain:
β ^ ( h + 1 ) = i = 1 N X ˜ i T R ^ i ( h ) - 1 X ˜ i - 1 i = 1 N X ˜ i T R ^ i ( h ) - 1 ( y ˜ i - Z ˜ i b ˜ i ( h ) ) , D ^ ( h + 1 ) = N - 1 i = 1 N Ψ ^ i ( h ) , σ ^ j l ( h + 1 ) = i = 1 N s i - 1 i = 1 N tr C ^ i ( ϕ ^ ( h ) ) - 1 ω ^ i j l ( h + 1 / 2 ) , for   j = l , 2 j = 1 N s i - 1 i = 1 N tr C ^ i ( ϕ ^ ( h ) ) - 1 ω ^ i j l ( h + 1 / 2 ) + ω ^ i l j ( h + 1 / 2 ) , for   j l , ϕ ^ ( h + 1 ) = arg min ϕ r i = 1 N log | C i | + i = 1 N tr ( ^ ( h + 1 ) C i ) - 1 Ω ^ i ( h + 1 / 2 ) ,
where ω ^ i j l ( h + 1 / 2 ) is an s i × s i matrix consisting of the ( ( j - 1 ) s i + 1 ) -th to the ( j s i ) -th columns and rows of Ω ^ i ( h ) in which β and D have been replaced by their updated estimates at the h + 1 iteration. Besides, Ω ^ i ( h + 1 / 2 ) in the above optimization function for ϕ ^ ( h + 1 ) is Ω ^ i ( h ) evaluated at θ = θ ^ ( h + 1 ) , except for ϕ.
Given { b ^ i ( 0 ) } i = 1 N and θ ^ ( 0 ) , we implement the pseudo-ECM algorithm until the user's specified convergence criterion satisfies. Analogous to the PNLS-MLME method, this algorithm is established under the pseudo-data scenario. Hence, the resulting approximate log-likelihood value can be obtained by using Equation (9).

2.4. Monte Carlo EM Algorithm

We offer a Monte Carlo (MC) version of the EM algorithm [40] for ML estimation of Model Equation (1) and evaluate the observed log-likelihood Equation (4) via the MC integration. The MCEM is a modification of the EM algorithm in which the E step is computed numerically through a large number of simulated samples.
Given the complete data ( y , b ) , the log-likelihood function of θ for the MNLMM can beexpressed as:
c ( θ | y , b ) = i = 1 N log ϕ s i r y i | μ i ( β , b i ) , R i ϕ q b i | 0 , D .
In the E step, we compute the expectation of complete data log-likelihood Function Equation (18) to yield the Q-function:
Q ( θ | θ ^ ( h ) ) = i = 1 N log ϕ s i r y i | μ i ( β , b i ) , R i P ( b i | y i , θ ^ ( h ) ) d b i + i = 1 N log ϕ q ( b i | 0 , D ) P ( b i | y i , θ ^ ( h ) ) d b i .
Obviously, Equation (19) cannot be written in closed form, since the conditional distribution of b i given y i :
P ( b i | y i , θ ) exp { 1 2 [ ( y i μ i ( β , b i ) ) T R i 1 ( y i μ i ( β , b i ) ) + b i T D 1 b i ] }
has no standard form. To simulate random samples from Equation (20), we perform the Metropolis–Hastings (M-H) algorithm [41] with the proposal distribution:
b i ( m + 1 ) ~ N q ( b i ( m ) , G - 1 ( y i , θ ^ ( h ) ) ,
where G - 1 ( y i , θ ^ ( h ) ) is the inverse matrix of G ( y i , θ ) given in Equation (13) and evaluated at θ = θ ^ ( h ) . Note that the idea of considering such a proposal distribution comes from the integration ofEquation (14) over b i , which is, up to a multiplicative constant, approximately equal to a N ( b ^ i , G - 1 y i θ , ) . We have the probability min 1 , P ( b i ( m + 1 ) | y i , θ ^ ( h ) ) / P ( b i ( m ) | y i , θ ^ ( h ) ) to accept the new generation b i ( m + 1 ) , but otherwise to set b i ( m + 1 ) = i ( m ) . After having a set of converged MC samples { b i ( m ) } m = 1 M , the random effects b i , as well as their function f ( b i ) in Equation (19) can be estimated by b ^ i ( h ) = m = 1 M b i ( m ) / M and E [ f ( b i ) | y i , θ ] = m = 1 M f ( b i ( m ) ) / M , respectively, at each iteration.
In the M step, we find the limited value of the obtained Q-function Equation (19) by equating the following functions:
Q ( θ | θ ^ ( h ) ) D = i = 1 N D E [ log ϕ q ( b i | 0 , D ) | y i , θ ^ ( h ) ]
and:
Q ( θ | θ ^ ( h ) ) α = i = 1 N α E [ log ϕ s i r ( y i | μ i ( β , b i ) , R i ) | y i , θ ^ ( h ) ]
to zeros, where α = { β , , ϕ } . By allowing differentiation under the integral sign for Equation (22), we update the estimate of D by:
D ^ ( h + 1 ) = 1 N i = 1 N E b i b i T | y i , θ ^ ( h ) 1 N i = 1 N 1 M i = 1 M b i ( m ) b i ( m ) T .
Since solving Equation (23) is analytically intractable, we perform a profile approximate Q-function approach, which updates β ^ h , ^ ( h ) and ϕ ^ ( h ) by a sequential optimization procedure as the Laplacian method described in Section 2.2. It gives:
β ^ ( h + 1 ) = arg max β i = 1 N E log ϕ s i r y i | μ i ( β , b i ) , R ^ i ( h ) y i , θ ^ ( h ) ,
^ ( h + 1 ) = arg max Σ i = 1 N E log ϕ s i r y i | μ i ( β ^ ( h + 1 ) , b i ) , C ^ i ( h ) | y i , θ ^ ( h ) ,
and:
ϕ ^ ( h + 1 ) = arg max ϕ i = 1 N E log ϕ s i r y i | μ i ( β ^ ( h + 1 ) , b i ) , ^ ( h + 1 ) C i ( ϕ ) | y i , θ ^ ( h ) .
Consequently, the marginal log-likelihood can be approximated as:
MC ( θ | y ) = - 1 2 i = 1 N ( s i r + q ) log ( 2 π ) + s i log | | + r log | C i | + log | D | - 1 2 M i = 1 N m = 1 M g ( y i , b i ( m ) , θ ) .
According to an alternative hierarchy of the MNLMM,
y i | η i N s i r ( μ i ( η i , x i ) , R i ) , η i N s ( A i β , B i D B i T ) , for   i = 1 , ... , N ,
the MCEM algorithm that deals with Monte Carlo integration directly on the individual parameters η i rather than subject-specific random effects b i can yield an explicit estimator for the fixed effects β. However, such an implementation may not be feasible in the framework of MNLMMs due to the possible singularity of B i D B i T .

2.5. Importance Sampling EM Algorithm

Importance sampling (IS) is an alternative way of performing MC integration. We provide an ISEM algorithm, which modifies MC approximation of Equation (19) in the E step of the MCEM algorithm by using the IS method. To implement the ISEM algorithm, we first choose an appropriate envelope distribution from which the samples are simulated and the importance weights calculated. Like that used in the M-H algorithm, Equation (21) is a natural consideration for the envelop distribution. As suggested by [35], an envelop distribution could be a mixture of two multivariate normal distributions with pdf:
λ ( b i ) = P 0 ϕ q ( b i | 0 , D ^ ( h ) ) + ( 1 - P 0 ) ϕ q b i | b ^ i ( h ) , G - 1 ( y i , θ ^ ( h ) ) ,
where the mixing proportion 0 P 0 1 is a pre-specified value.
Notably, ISEM can be performed to evaluate the expected values of any functions of unobservable { b i } i = 1 N , e.g., f ( b i ) = b i and f ( b i ) = b i b i T . It follows that:
E f ( b i ) | y i , θ = f ( b i ) f ( b i | y i , θ ) d b i = f ( b i ) f ( y i | b i , θ ) f ( b i | D ) d b i f ( y i | b i , θ ) f ( b i | D ) d b i .
Having obtained a sufficient number of random effects, denoted by { b i ( m ) } i = 1 N , m = 1 , ... , M , we adopt the ratio of two MC approximations using IS from Equation (27) to estimate Equation (28), given by:
E f ( b i ) | y i , θ m = 1 M f ( b i ( m ) ) f ( y i | b i ( m ) , θ ) f ( b i ( m ) | D ) λ ( b i ( m ) ) m = 1 M f ( y i | b i ( m ) , θ ) f ( b i ( m ) | D ) λ ( b i ( m ) ) .
In the E step, given the current estimates of parameters θ ^ ( h ) , we compute Equation (19) in which the required conditional moments of latent data b can be approximated based on Equation (29). In the M step, we update each entry of θ ^ ( h ) by maximizing the Q-function. Indeed, the ISEM procedure works conceptually similarly to that of MCEM: only D ^ ( h + 1 ) shows an explicit solution, while β ^ ( h + 1 ) , ^ ( h + 1 ) and ϕ ^ ( h + 1 ) are obtained through sequential optimization solutions via Equations (24)–(26). The IS approximation to the marginal log-likelihood is:
I S ( θ | y ) - 1 2 i = 1 N s i log | | + r log | C i | + log | D | + i = 1 N log 1 M m = 1 M exp - 1 2 g ( y i , b i ( m ) , θ ) f ( b i ( m ) | D ) / λ ( b i ( m ) ) .

2.6. Expected Information Matrix

For Model Equation (8), denoting by θ = ( β , α ) with α = ( vech ( D ) , vech ( ) ϕ , ) , the expected information matrix of θ obtained by taking the expectation of the negative Hessian matrix can be expressed as:
J θ θ = [ J β β J β α J β α T J α α ] ,
where J β β = i = 1 N X ˜ i T Λ ˜ i - 1 X ˜ i , J β α = 0 , and J α α is a g × g information matrix whose ( l , s ) -th entry is [ J α α ] l s = 2 - 1 i = 1 N tr ( Λ ˜ i - 1 Λ ˜ ˙ i l Λ ˜ i - 1 Λ ˜ ˙ i s ) , for l , s = 1 , ... , g , g = q ( q + 1 ) / 2 + r ( r + 1 ) / 2 + d i m ( ϕ ) , with Λ ˜ ˙ i l being Λ ˜ ˙ i l ( h ) given in (A.1) with θ ^ ( h ) replaced by θ. Consequently, the asymptotic variance-covariance matrix of θ can be approximated by the inverse of information Matrix Equation (30), denoted by J θ θ - 1 . The resulting standard errors of parameters are the square roots of diagonal entries of J θ θ - 1 evaluated at θ = θ ^ .

2.7. Initialization

When implementing iterative procedures, a common difficulty encountered in practice is that the algorithm is painfully slow or even non-convergent. Such a computational problem may occur in handling ML estimation of the MNLMM, especially when the data are too sparse or the dimension of random effects is over-specified. To overcome this potential problem, a default procedure of automatically creating a set of good initial values is summarized below.
(i)
A direct way of obtaining the initial value for β is to fit the NLMMs to each outcome variable separately by using the nlme R package [12].
(ii)
Using the fitting results of NLMMs for each outcome, we take the initial value D ^ ( 0 ) as a (block) diagonal form with the diagonal entry being the variances (covariances) of random effects under the fitted NLMMs.
(iii)
For the initial value for , we use the sample variance-covariance matrix of the data. That is, take ^ ( 0 ) = i = 1 N t = 1 s i ( y i · t - y ¯ ) ( y i · t - y ¯ ) T / ( i = 1 N s i - 1 ) , where y i · t = ( y i 1 t , , y i r t ) T and y ¯ = ( i = 1 N s i ) - 1 ( i = 1 N t = 1 s i y i 1 t , , i = 1 N t = 1 s i y i r t ) T .
(iv)
The initial values for ϕ, depending on the structure, are simply chosen to give a condition of nearly uncorrelated errors.

3. Application: ACTG 315 Data

We present a comparison of the five algorithms via a real data example from the AIDS Clinical Trial Group protocol 315 (ACTG 315) study developed by the Immunology Research Agenda Committee of the U.S. National Institute of Allergy and Infectious Disease, the ACTG sponsor. The study design and recruitment of participants (patients) were conducted by University Hospitals of Cleveland, Rush-Presbyterian-St. Luke's Medical Center and University of Colorado Health Science Center. In the study, 53 human immunodeficiency virus type 1 (HIV-1)-infected patients were recruited, and their plasma HIV-1 RNA (viral load) copies and CD4+ T cell counts were repeatedly measured at Days 0, 2, 7, 10, 14, 28, 56, 84, 168 and 196 after the start of treatment. A more detailed description of the study can be found in [42,43].
HIV-1 infection is associated with progressive and profound loss of immune function that places infected persons at enhanced risk for opportunistic infections, and even death. A reaction in HIV-1-related immune deficiency can be characterized by decreases in the numbers of circulating CD4+ T helper lymphocytes. CD4+ T cells in blood decline to a lower level after HIV-1 infection and may recover to a high level after antiviral therapies suppress viral load. Generally, there is a negative correlation between the virologic marker (measured by HIV-1 RNA) and the immunologic marker (measured by CD4+ T cells) during antiviral treatments. As a consequence, a joint analysis of HIV-1 RNA and CD4+ counts is helpful to take the evolution of the correlation among responses over time into account. The data have been analyzed by [44,45,46,47] using different modeling approaches.
As a part of the clinical trial on 53 patients, a total of 48 patients were recruited in our analysis after excluding four early drop-out patients and one due to a plasma HIV-1 RNA pattern that suggested intermittent adherence to study therapy. To stabilize the variances and to reduce the strong skewness among the two makers, a base-10 logarithmic transformation is made for HIV-1 RNA and a square-root transformation for CD4+ T cells. Both transformations are widely used in HIV-AIDS clinical trials. Let y i 1 , k and y i 2 , k be log 10 RNA and CD 4 0 . 5 markers, respectively, at the k-th time point for patient i. We consider the following bivariate nonlinear mixed-effects model for y i 1 , k and y i 2 , k :
y i 1 , k = log 10 exp { ( β 1 + b i 1 ) + β 2 t i k } + exp { β 3 rna i } + e i 1 , k , y i 2 , k = ( β 4 + b i 2 ) / ( 1 + exp { ( β 5 - t i k ) / β 6 } ) + e i 2 , k ,
where t i k = day i k / 7 is the k-th visited time point (week) for patient i; rna i is the log 10 RNA levels for patient i at the start of the study; ( b i 1 , b i 2 ) are the bivariate normally-distributed random effects; and ( e i 1 T , e i 2 T ) = ( e i 1 , 1 , ... , e i 1 , s i , e i 2 , 1 , ... , e i 2 , s i ) are the within-subject errors following a multivariate normal distribution with zero mean and variance-covariance matrix C i . Because the baseline RNA is a significant covariate in the ACTG 315 study [47], it should be incorporated into the analysis. To account for the extra autocorrelation caused by within-patient dependence among unequally-spaced occasions, we employ a continuous order-one autoregressive structure, i.e., C i = [ ϕ | t i k - t i k ' | ] , for the across-occasion covariance matrix of within-subject errors.
According to the standard formulation in Equation (2), we specify:
A = I 2 0 0 0 rna i 0 0 0 I 3 , B = 1 0 0 0 0 0 0 0 0 1 0 0 T , β = ( β 1 , β 2 , β 3 , β 4 , β 5 , β 6 ) T ,
and b i = ( b i 1 , b i 2 ) T , where I d is a diagonal matrix of order d. Define:
ξ1 = (exp{η1 + η2t} + exp{η3})−1/log(10), and ξ2 = (1 + exp{(η5t)/η6})−1,
where η 1 = β 1 + b i 1 , η 2 = β 2 , η 3 = β 3 rna i , η 4 = β 4 + b i 2 , η 5 = β 5 and η 6 = β 6 . The first derivatives of μ 1 and μ 2 specified in Equation (31) with respect to η are:
μ ˙ 1 = μ 1 η = ξ 1 exp { η 1 + η 2 t } ξ 1 t exp { η 1 + η 2 t } ξ 1 exp { η 3 } 0 0 0 , and   μ ˙ 2 = μ 2 η = 0 0 0 ξ 2 - μ 2 exp { ( η 5 - t ) / η 6 } ξ 2 / η 6 μ 2 exp { ( η 5 - t ) / η 6 } ( η 5 - t ) ξ 2 / η 6 2 .
The first derivative of mean function μ i ( β , b i ) = ( μ 1 , μ 2 ) with respect to b i is:
μ i ( β , b i ) b i = ξ 1 exp { ( β 1 + b i 1 ) + ( β 2 + b i 2 ) t i } 0 s i 0 s i ξ 2 ,
where ξ 1 and ξ 2 are s i × 1 vectors composed of ξ 1 and ξ 2 given by Equation (32) with t replaced by a s i × 1 occasion vector t i of the i-th patient.
Table 1 presents the parameter estimates and their standard deviations (in parentheses) from the five computational methods, namely PNLS-MLMM, Laplacian, pseudo-ECM, MCEM with 500 Monte Carlo samples and ISEM with mixing proportion P 0 = 0 . 5 . When employing the ISEM algorithm, several choices of the mixing proportion P 0 , ranging from 0–1 with and increment of 0.1, are considered. To save space, we reported only the result for P 0 = 0 . 5 , as it yields the maximized log-likelihood value. The results indicate that the five methods can give very similar estimates and the significance of model parameters. According to the estimates of = [ σ j l ] , the estimated correlation of log 10 RNA and CD40.5 ranges from −0.13–−0.18 (around), confirming a negative relationship between the virologic and immunologic markers. The between-patient correlations of the two responses have no statistical significance based on the estimates of D. The estimate of autoregressive parameter ϕ is significantly different from zero, revealing an existence of autocorrelation among the within-patient variability. Figure 1 displays the observations and estimated mean curves in which the covariate is set to be the average of baseline RNA values of all patients for the five computational methods. Judging from the figure, the considered logarithmic and logistic curves in Equation (31) are reasonable functions to describe the evolutions of RNA in the log 10 scale and CD4 in the square-root scale over time. The trend of log 10 RNA decreases at the beginning due to the rapid growth of CD40.5 cells in the early days of antiviral therapies. After nearly four weeks, the decline pattern on log 10 RNA and the growth pattern on CD40.5 become slow and smooth. As an illustration, the fitted values obtained by the five methods together with the observations for seven randomly-selected patients are displayed in Figure 2. As anticipated, the fitted trajectories for each patient show the slight difference among the five estimating procedures. Generally, they adapt the trend along observed repeated measures, but some of configurations are not ideally captured. It is known that the viral load (RNA copies) and CD4 counts are highly variable immune system markers, making them difficult to fit.
Figure 1. The log 10 ( RNA ) and CD40.5 observations (∘) with the estimated mean curves against time (in days) from ML estimation using the five proposed procedures.
Figure 1. The log 10 ( RNA ) and CD40.5 observations (∘) with the estimated mean curves against time (in days) from ML estimation using the five proposed procedures.
Entropy 17 05353 g001
Figure 2. The fitted values obtained by the five proposed procedures together with the observations (•) of log 10 ( RNA ) and CD40.5 for seven randomly-selected patients.
Figure 2. The fitted values obtained by the five proposed procedures together with the observations (•) of log 10 ( RNA ) and CD40.5 for seven randomly-selected patients.
Entropy 17 05353 g002
Table 1. Estimation results for AIDS Clinical Trial Group protocol 315 (ACTG 315) data. PNLS, penalized nonlinear least squares; MLME, multivariate linear mixed-effects; ECM, expectation conditional maximization; MCEM, Monte Carlo EM; ISEM, importance sampling EM.
Table 1. Estimation results for AIDS Clinical Trial Group protocol 315 (ACTG 315) data. PNLS, penalized nonlinear least squares; MLME, multivariate linear mixed-effects; ECM, expectation conditional maximization; MCEM, Monte Carlo EM; ISEM, importance sampling EM.
ParameterPNLS-MLMELaplacianPseudo-ECMMCEMISEM
β112.047712.980012.048512.078412.114
(0.2513)(0.2858)(0.2530)(0.2626)(0.2652)
β2−2.6558−2.6476−2.6543−2.6198−2.6069
(0.1781)(0.1970)(0.1777)(0.1950)(0.1992)
β31.30391.30011.30391.30121.3000
(0.0274)(0.0248)(0.0273)(0.0253)(0.0249)
β416.860416.857716.860516.887516.9058
(0.3911)(0.3340)(0.3914)(0.3863)(0.3829)
β5−1.7324−1.7791−1.7312−1.7721−1.7643
(0.4936)(0.4590)(0.4930)(0.4632)(0.4585)
β61.30811.35141.30781.36041.3463
(0.3262)(0.2899)(0.3259)(0.2972)(0.2896)
d 11 0.00000.74570.05830.11830.1398
(0.4665)(0.5763)(0.4753)(0.4673)(0.4612)
d 21 −0.0020−0.14000.0144−0.23860.0838
(0.5414)(0.5203)(0.5479)(0.5401)(0.5295)
d 22 4.74253.82514.75855.46025.4894
(1.3803)(0.9953)(1.3826)(1.3561)(1.3361)
σ110.46550.42670.46220.43790.4329
(0.0458)(0.0411)(0.0455)(0.0420)(0.0414)
σ21 −0.2232−0.1738−0.2164−0.2185−0.2225
(0.0965)(0.0747)(0.0962)(0.0786)(0.0754)
σ225.70633.55585.69293.89563.6033
(0.5991)(0.3520)(0.5980)(0.3874)(0.3541)
ϕ0.68240.54470.68180.56740.5343
(0.0311)(0.0422)(0.0312)(0.0400)(0.0425)
Furthermore, the approximate values of log-likelihood function for Model Equation (31) evaluated at the ML estimates θ ^ obtained respectively by the five estimation procedures are reported in Table 2. To assess the accuracy of the approximations of the log-likelihood function, we also perform the double integral in log-likelihood Function Equation (4) by plugging the corresponding θ ^ into Equation (4) and using the integrate routine in the R package to get the exact log-likelihoods. The exact log-likelihood values together with the absolute differences (AD) between the approximate and exact values are also listed in Table 2. Roughly, the log-likelihood values under the five approximation methods are similar and close to their corresponding exact values. In this example, the pseudo-ECM yields the most precise evaluation, followed by Laplacian, MCEM, ISEM and PNLS-MLME.
Table 2. Approximate and exact log-likelihood functions for the fitted Model Equation (31) under the five estimation methods. AD, absolute difference.
Table 2. Approximate and exact log-likelihood functions for the fitted Model Equation (31) under the five estimation methods. AD, absolute difference.
PNLS-MLMELaplacianPseudo-ECMMCEMISEM
Approximate−974.360−986.794−974.592−966.763−1010.370
Exact−1063.338−991.754−978.269−981.384−978.758
AD88.9784.963.67714.62131.612
Although the proposed five algorithms can provide quite similar estimates of model parameters, as well as the fitted mean profiles shown in Figure 1 and Figure 2, we should give the following remarks. The PNLS-MLMM and Laplacian methods involve solving the fixed effects β and the modes of random effects { b i } i = 1 N by implementing optimal iterative procedures. Thus, the two methods are very sensitive to initial values and may suffer from slow or even non-convergence due to singularity of variance-covariance matrices, especially when unnecessary random effects are included in the model. The MCEM and ISEM methods spend more time in generating an adequate number of samples of random effects to evaluate the required conditional expectations. Overall, the pseudo-ECM algorithm is the best method in terms of computational efficiency in this study. However, all of the proposed methods may get trapped in one of many local maxima of the log-likelihood function. To assess the stability of the resulting estimates, a variety of initial values should be employed when implementing the algorithms. The global optimal solution is obtained by choosing the one with the largest log-likelihood value.

4. Simulation Study

In this section, two simulation studies with data generated from two models with linear and nonlinear profiles, respectively, are undertaken to compare the performance of the five algorithmic procedures for fitting the MNLMM. The performance comparison includes the convergence efficiency in terms of the number of iterations and consumed CPU time, the accuracy of parameter estimates and the precision of log-likelihood approximation. All computations were carried out by R package 2.13.1 in a Win32 environment of a desktop PC machine with a 3.40-GHz/Intel Core(TM) i7-2600 CPU Processor and 4.0 GB RAM.

4.1. Bivariate Linear Case

To perform an evaluation of the exact log-likelihood values that is tractable, in this simulation, we restrict ourselves to generating datasets from the following bivariate LMM:
y i 1 k = β 1 + b i 1 + β 2 t k + e i 1 k , y i 2 k = β 3 + ( β 4 + b i 2 ) t k + e i 2 k ,
for i = 1 , ... , N and k , t k = 1 , ... , 7 . Following the standard notation for Model Equation (1) along with Assumption Equation (2), we set A i = I 4 , β = ( β 1 , β 2 , β 3 , β 4 ) T , B i = 1 0 0 0 0 0 0 1 T and b i = ( b i 1 , b i 2 ) T N 2 ( 0 , D ) . The specific model parameters are:
β = ( 1 , 2 , - 2 , 4 ) T , D = 1 0 . 5 0 . 5 1 , = 1 ρ ρ 1 , and   C i = I 7 ,
where the values of ρ are chosen as 0, 0.5 and 0.9 to reflect zero, middle and high correlations between outcome variables, respectively. The sample sizes N are set to 25 and 50, and a total of 100 replications are run for each combination of between-outcome correlation ρ and sample size N. Each simulated dataset is fitted by the MNLMM using the five computational procedures, say the PNLS-MLME, Laplacian, pseudo-ECM, MCEM and ISEM algorithms, described in Section 2. Initial values for the parameters are chosen to be the true values of parameters plus a random draw from the standard normal distribution. Note that the E step of the MCEM algorithm is undertaken with generating M = 1000 MC samples. When implementing the ISEM algorithm, the envelop distribution was multivariate normal mixtures with three different mixing proportions P 0 = 0 . 1 , 0 . 5 and 0.9. Because all converged estimates are almost the same, we report only the result under P 0 = 0 . 5 for the sake of conciseness. The computational procedures achieve convergence when:
max l = 1 , ... , m | ( θ ^ l ( h + 1 ) - θ ^ l ( h ) ) / θ ^ l ( h ) | < 0 . 01 ,
where m is the number of unknown parameters.
Table 3 summarizes the averages of CPU time (Time), numbers of iterations (Iter), converged log-likelihood values ( max ), relative bias (RB) of log-likelihood functions and empirical sums of relative mean squared errors (RMSE) of parameter estimates obtained by five approximation methods over 100 replicates under all considered scenarios. The relative bias of log-likelihood values calculated as ( max - true ) / | true | is used to evaluate the accuracy of the estimation of the log-likelihood function, where true is the true value of the log-likelihood function and max is the converged maximized log-likelihood value. The empirical sums of RMSE for each case are calculated as l = 1 m ( θ ^ l - θ l ) 2 / θ l 2 , where θ l and θ ^ l are each the entry of the true value of the parameter and its estimate, respectively.
Based on the results shown in Table 3, we first compare the convergence speed of the five estimation procedures. Apparently, the pseudo-ECM method takes the least consumed CPU time, and it is followed by the PNLS-MLME, Laplacian, ISEM and then the MCEM methods. The fewest number of iterations is required by running the PNLS-MLME method followed by the pseudo-ECM, Laplacian, ISEM and MCEM methods, while the last four methods show negligible differences, especially for a large sample size and a high between-outcome correlation. Not surprisingly, the MCEM and ISEM methods require heavier computational cost, because they need to generate a great number of random samples of random effects to perform the MC integration in each iteration. We also find that the consumed CPU time and the required number of iterations decrease when the between-outcome correlation ρ increases. We remark that the PNLS-MLME method converges quickly, but it fails to converge unless the initial values are good enough. When the chosen starting point is far from optimum, it may cause divergence of the procedure, and thereby, another set of initial values should be reset.
Table 3. Simulation results for the computational performance of five approximation methods under each combination of correlations ρ and sample sizes N. Iter, iteration; RB, relative bias.
Table 3. Simulation results for the computational performance of five approximation methods under each combination of correlations ρ and sample sizes N. Iter, iteration; RB, relative bias.
Nρ PNLS-MLMELaplacianPseudo-ECMMCEMISEM
250Time4.07725.9541.9708789.0935862.499
Iter2.15012.1409.800138.44058.390
max −576.769−610.274−577.121−556.914−642.139
RB0.008−0.0330.0080.045−0.100
RMSE2.2292.4412.1692.1762.177
0.5Time4.37030.8032.0452403.1451680.319
Iter2.12011.4309.93035.65015.750
max −559.366−582.622−559.907−536.608−625.736
RB0.009−0.0220.0080.052−0.103
RMSE0.5800.6720.5610.6010.602
0.9Time3.64625.0061.7491252.6251158.028
Iter2.0008.9408.57018.33010.760
max −468.270−474.786−468.909−423.555−535.591
RB0.011−0.0030.0090.118−0.120
RMSE0.4700.4840.4500.4860.477
500Time8.36541.5458.9276825.3413967.824
Iter2.24010.0509.26056.24020.170
max −1159.337−1177.863−1159.675−1120.721−1292.848
RB0.004−0.0100.0040.039−0.094
RMSE1.6881.7471.6851.6921.689
0.5Time9.77656.56010.2102112.8571706.392
Iter2.1409.7609.53011.8009.690
max −1124.354−1140.195−1124.911−1079.401−1258.644
RB0.004−0.0090.0040.046−0.098
RMSE0.2770.3240.2700.3130.315
0.9Time8.18534.3826.6661512.851091.661
Iter2.0006.0706.2107.3206.850
max −933.662−943.973−934.566−843.025−1069.55
RB0.005−0.0060.0040.113−0.116
RMSE0.2260.2290.2260.2370.234
When assessing the approximated log-likelihood functions, we find that all approximation methods produce relative biases in log-likelihoods within ± 0 . 12 (the range is not quite large). Because the simulated datasets are generated from a linear scenario, i.e., bivariate LMM specified in Equation (33), the pseudo-data model given in Equation (8) certainly satisfies the MLMM [1] framework. Therefore, the ML estimates of model parameters, as well as the maximized log-likelihood value obtained by the pseudo-ECM algorithm are exactly the same as those obtained by fitting the MLMM using the EM-based algorithm. Besides, the PNLS-MLME method uses the same approximation of the log-likelihood function, say PD ( θ ^ | y ) , with that of pseudo-ECM. Thus, the values of relative biases in log-likelihoods obtained by the PNLS-MLME and pseudo-ECM algorithms are quite similar, and they are very close to zero. Additionally, the Laplacian approximation gives near-zero, but slightly under-estimated relative biases in log-likelihoods, and the relative biases are negligible when the sample size and between-outcome correlation are large. The log-likelihood values could be slightly over-estimated by using the MCEM method and slightly under-estimated by using the ISEM method. As anticipated, the approximations of log-likelihood functions will get close to the exact log-likelihood value when the sample size increases.
Figure 3. Scatter plots of fixed-effects estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the multivariate nonlinear mixed-effects model (MNLMM) under the case of N = 25 , ρ = 0 . 9 .
Figure 3. Scatter plots of fixed-effects estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the multivariate nonlinear mixed-effects model (MNLMM) under the case of N = 25 , ρ = 0 . 9 .
Entropy 17 05353 g003
Figure 4. Scatter plots of variance-covariance components estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the MNLMM under the case of N = 25 , ρ = 0 . 9 .
Figure 4. Scatter plots of variance-covariance components estimates for PNLS-MLME, Laplacian, MCEM and ISEM against pseudo-ECM methods for the MNLMM under the case of N = 25 , ρ = 0 . 9 .
Entropy 17 05353 g004
We now turn our attention to observing the estimation performance for model parameters under the five computational methods. From the RMSE rows of Table 3, typically, the five methods give comparable results for estimation accuracy due to negligible differences in RMSE scores. The RMSE decreases as the sample size increases, confirming the good asymptotic properties of ML estimators, at least for the setting of parameters used in this simulation. As mentioned above, the pseudo-ECM method implemented for linear models produces the same results as the EM-type algorithm for MLMM. Judging from Table 3, the pseudo-ECM method has the smallest RMSE among the five computational methods. Furthermore, we compare the estimates of each parameter obtained by PNLS-MLME, Laplacian, MCEM and ISEM against those obtained by pseudo-ECM one-by-one in detail. Figure 3 and Figure 4 display the scatter plots of the estimates of fixed effects (β) and variance-covariance components (D and ) separately for the pseudo-ECM method (in the X-axes) versus the other four procedures (in the Y-axes). The dashed lines indicate the true values of parameters. To save space, we present only the case of N = 25 and ρ = 0 . 9 , because the other five cases exhibit almost a similar pattern. It can be seen from the two figures that the estimates are all located in the neighborhood of the true values, indicating that all five computational procedures yield very precise estimates of model parameters. In general, there is a strong agreement in the estimates obtained through the five methods, because the point estimates fall close to the 45-degree line. However, for the estimate of β 4 , PNLS-MLME appears to have a slightly large variability. For the estimates of σ 11 , σ 12 and σ 22 , the other four methods tend to give estimates smaller than does the pseudo-ECM algorithm.

4.2. Bivariate Nonlinear Case

In the simulation, the data were generated from the MNLMM with nonlinear mean curves Equation (31). The presumed model parameters are:
β = ( 12 , - 2 . 7 , 1 . 3 , 16 . 9 , - 1 . 7 , 1 . 3 ) T , D = 1 0 . 5 0 . 5 4 , = 0 . 5 - 0 . 2 - 0 . 2 5 , C i = I 10 .
Each simulated dataset is fitted by the MNLMM using the five approximation methods described in Section 2. To investigate the effect of the size of MC samples for MCEM and mixing proportions of the envelope distribution for ISEM, we consider MC sample sizes M = 500 , 1000 , 2000 and the mixing proportions P 0 = 0 . 1 , 0 . 5 , 0 . 9 . A total of 100 replications are run for each of sample sizes N = 25 and 50 across nine computational procedures. The convergence rule is the same as the previous simulation. Note that numerical double-integration is performed to calculate the exact log-likelihood, such that the evaluation of the accuracy of the approximate log-likelihood is tractable.
In this simulation study, there are 18 (10) and 12 (7) non-convergence cases out of 100 trials for the PNLS-MLME and Laplacian methods, respectively, under sample size N = 25 ( 50 ) . To ensure that we are comparing estimates of different methods based on the same simulated data and initial values, an additional dataset will be regenerated in the procedure if one of the methods did not converge for a particular dataset. This can be done by using the R try() function to handle the error-recovery. Table 4 reports the computing results, including the averages of CPU time (Time), numbers of iterations (Iter), converged log-likelihood values ( max ), RB of log-likelihood functions and empirical sums of the RMSE of parameter estimates for each sample size and each algorithm. The results indicate that the pseudo-ECM spent the least CPU time, followed by the PNLS-MLME, Laplacian, ISEM with P 0 = 0 . 1 , 0 . 5 , MCEM with M = 500 , 1000 , 2000 and, then, ISEM with P 0 = 0 . 9 . The PNLS-MLME demands the fewest numbers of iterations, followed by Laplacian, pseudo-ECM, ISEM with P 0 = 0 . 1 , 0 . 5 , MCEM with M = 2000 , 1000 , 500 and, then, ISEM with P 0 = 0 . 9 . The performance of the five methods under the bivariate nonlinear model is conceptually similar to that under the bivariate linear model shown in Section 4.1. It makes sense that the consumed CPU time increases with the size of MC samples M for MCEM, but the required iteration number decreases with MC sample size M. Besides, for the ISEM method, when the proportion of importance samples of random effects drawing from the posterior of b i increases (say P 0 decreases), both the CPU time and iteration number decrease.
It can be seen from the RB column of Table 4 that all methods except for the three ISEM procedures provide comparable accuracy for approximate observed log-likelihood values, while the ISEM method tends to get a relatively large bias. Observing the empirical sums of RMSE, the PNLS-MLME and pseudo-ECM methods can yield more accurate estimates of model parameters as N = 25 and N = 50 , respectively, while the others show minor difference in RMSE scores. The MCEM method generally offers better precision of the parameter estimates when the size of generated MC samples increases. Although the MCEM spent much CPU time and had larger iteration numbers to achieve convergence, it can produce relatively small bias for the approximation of observed log-likelihood and smaller RMSE for estimates of model parameters, especially for large sizes of sample N = 50 and MC samples M = 2000 . Additionally, among the three settings of P 0 for ISEM, the case of equal weights (say P 0 = 0 . 5 ) gives smaller RB and RMSE scores. If we want to obtain more accurate results of approximate log-likelihood using the ISEM algorithm, probably a larger number of samples of random effects might be necessary, but it seems inefficient. As expected, when the sample size N increases, the required CPU time and iteration number increase, and the RB and RMSE decrease, confirming the large sample properties of ML estimation. In addition, the RMSE ( × 10 2 ) for the estimates of each parameter under the nine considered estimating procedures are listed in Table 5. It seems that the estimators for β 5 , β 6 , d 11 , d 21 , d 22 and σ 21 show somewhat less precise point estimates as opposed to the other parameters in the setting of this simulation. Observing the table, there are remarkable differences in the magnitude of RMSE values as the precision of parameter estimates depends heavily on the specification of nonlinear mean functions. Moreover, there are no consistent rankings of precision among the nine considered procedures for each parameter. Although this is a limited study, it demonstrates that all five approximation methods can give reasonable results for parameter estimation.
Table 4. Simulation results for nine estimating procedures under the bivariate nonlinear case.
Table 4. Simulation results for nine estimating procedures under the bivariate nonlinear case.
Sample Size NMethodsComparison Criteria
TimeIter max RBRMSE
25PNLS-MLME5.0713.533−847.9680.0091.671
Laplacian21.1997.133−860.383−0.0122.000
Pseudo-ECM2.70912.000−847.9940.0091.967
MCEM ( M = 500 ) 9062.743380.000−847.2170.0102.099
MCEM ( M = 1000 ) 9569.619213.733−847.3460.0102.072
MCEM ( M = 2000 ) 11,375.297131.400−847.8960.0092.029
ISEM ( P 0 = 0 . 9 ) 17,008.449333.733−887.996−0.0281.999
ISEM ( P 0 = 0 . 5 ) 4635.60193.400−881.169−0.0181.882
ISEM ( P 0 = 0 . 1 ) 1086.65122.200−862.842−0.0202.077
50PNLS-MLME14.1493.940−1710.1230.0071.119
Laplacian53.0667.690−1763.046−0.0101.134
Pseudo-ECM11.33113.070−1710.2160.0071.110
MCEM ( M = 500 ) 15,860.866392.595−1713.9390.0051.184
MCEM ( M = 1000 ) 24,077.335238.470−1714.1510.0051.157
MCEM ( M = 2000 ) 26,328.930134.750−1714.4470.0041.151
ISEM ( P 0 = 0 . 9 ) 31,224.663386.120−1789.168−0.0211.255
ISEM ( P 0 = 0 . 5 ) 7065.363106.350−1780.396−0.0151.138
ISEM ( P 0 = 0 . 1 ) 2805.67726.870−1779.298−0.0181.153
Table 5. Relative mean squared errors ( × 10 2 ) for the estimates of model parameters under nine iterative procedures.
Table 5. Relative mean squared errors ( × 10 2 ) for the estimates of model parameters under nine iterative procedures.
Sample Size N Methods Parameter
β 1 β 2 β 3 β 4 β 5 β 6 d 11 d 21 d 22 σ 11 σ 21 σ 22
25PNLS-MLME0.0460.9600.0460.04118.50511.55921.69887.7744.5650.99820.0030.909
Laplacian0.0450.9650.0460.03318.60011.76620.875120.3404.6091.41220.0131.325
Pseudo-ECM0.0430.9640.0460.02618.50111.57020.066118.7864.7590.98820.0100.909
MCEM ( M = 500 ) 0.0460.9560.0450.03918.73611.78120.926130.4774.5491.38919.6821.299
MCEM ( M = 1000 ) 0.0470.9690.0450.03818.58911.66821.160127.6684.7971.38319.5591.314
MCEM ( M = 2000 ) 0.0470.9700.0460.03618.60211.66920.402123.8174.6061.40419.9351.315
ISEM ( P 0 = 0 . 9 ) 0.0460.9600.0460.02818.59011.66620.510120.7404.6091.40020.0131.315
ISEM ( P 0 = 0 . 5 ) 0.0470.9690.0460.04018.42011.47619.930110.9194.0001.46319.6191.271
ISEM ( P 0 = 0 . 1 ) 0.0430.9930.0450.02118.78511.89926.451122.5874.4071.63119.4741.377
50PNLS-MLME0.0530.4330.0190.0838.0386.4379.60962.9983.1260.35520.4450.290
Laplacian0.0540.4330.0190.0408.0566.50110.17263.1653.1210.78720.0251.051
Pseudo-ECM0.0530.4320.0190.0438.0556.4528.85862.9213.0130.35520.4930.289
MCEM ( M = 500 ) 0.0520.4200.0190.0878.1176.50510.33467.8363.0550.87520.0541.033
MCEM ( M = 1000 ) 0.0540.4200.0190.0858.1496.50810.09965.2633.1130.88120.0031.063
MCEM ( M = 2000 ) 0.0540.4180.0190.0758.1206.49410.18564.3503.1200.89220.2741.070
ISEM ( P 0 = 0 . 9 ) 0.0590.4150.0190.0808.0346.42918.31367.0543.1420.92420.0451.011
ISEM ( P 0 = 0 . 5 ) 0.0550.4290.0190.0538.1316.5089.04064.6143.1430.86119.8191.080
ISEM ( P 0 = 0 . 1 ) 0.0520.4310.0190.0358.1946.55410.18264.1653.0110.98720.5421.147

5. Discussion and Conclusions

In this article, we describe and compare five approximation methods to carry out ML estimation of the MNLMM, as well as the evaluation of the observed log-likelihood function. The methods, namely PNLS-MLME, Laplacian approximation, pseudo-ECM, MCEM and ISEM algorithms, depend on the result of the first two order Taylor expansions. The PNLS-MLME and pseudo-ECM methods use a linearization of nonlinear mean functions, while the other three methods rely on an approximation of the observed likelihood. Numerical results indicate that the five methods can give comparable accuracy of the estimation of model parameters, as well as approximation of the observed log-likelihood function of the MNLMM.
In summary, the five algorithmic schemes preserve flexibility and simplicity in carrying out ML estimation of the MNLMM. The pseudo-ECM method can offer relatively better efficiency compared to the other four methods. For the PNLS-MLME and Laplacian methods, a poor initial guess of θ can result in poor estimates of { b i } i = 1 N , and thereby, the accuracy of parameter estimates and the performance of convergence become worse. To overcome this weakness, the consideration of different starting values for D ^ ( 0 ) is recommended by specifying c D ^ ( 0 ) , where c is a random draw from the standard normal distribution and the original D ^ ( 0 ) is given in Section 2.7. The MCEM and ISEM methods appear to be less efficient, because both of them spend much time to generate MC samples for evaluating the required conditional expectations in each iteration. For the implementation of the ISEM algorithm, the specification of the mixing proportion P 0 depends on the data at hand. We suggest trying a variety of settings and choose the optimal P 0 corresponding to the maximized approximate observed log-likelihood. An R package for fitting MNLMM based on the proposed techniques will be released in the near future.
However, the multivariate normality assumption in the MNLMM might not provide robust inference if the data, even after being transformed, and exhibit fat tails and/or skewness [48,49,50]. To alleviate such limitations, it is natural to replace the multivariate normally-distributed random effects and within-subject errors of the MNLMM by a broader family, such as the multivariate skew-normal distribution [51], the multivariate skew-t distribution [52], the multivariate skew-elliptical distribution [53], or the multivariate skew-normal independent distribution [54,55]. The proposed methods are readily extendable to carry out ML estimation of the multivariate version of skew-family nonlinear mixed models. This leads to valuable further research on the issue of developing multivariate skew-family nonlinear mixed models together with their ML inference.

Acknowledgments

The author would like to express her deepest gratitude to the Chief Editor, the Associate Editor and two anonymous reviewers for their insightful comments and suggestions that greatly improved this article. This work was partially supported by the Ministry of Science and Technology under Grant No. MOST 103-2118-M-035-001-MY2 of Taiwan.

Conflicts of Interest

The author declares no conflict of interest.

Appendix

A. Score Vector and Hessian Matrix

The score vector S α calculated as the first derivatives of PD ( θ | y ) in Equation (9) with respect to each entry of α can be expressed by:
[ S α ] l = 1 2 i = 1 N y ˜ i ( h ) - X ˜ i ( h ) β T Λ ˜ i ( h ) - 1 Λ ˜ ˙ i l ( h ) Λ ˜ i ( h ) - 1 y ˜ i ( h ) - X ˜ i ( h ) β - tr Λ ˜ i - 1 ( h ) Λ ˜ ˙ i l ( h ) ,
for l = 1 , ... , g , g = q ( q + 1 ) / 2 + r ( r + 1 ) / 2 + dim ( ϕ ) , where Λ ˜ i ( h ) = Z ˜ i ( h ) D Z ˜ i ( h ) T + C i ( ϕ ) ,
Λ ˜ ˙ i l ( h ) = Λ ˜ i ( h ) w l = Z ˜ i ( h ) D w l Z ˜ i ( h ) T if   w l = vech ( D ) , Σ w l C i ( ϕ ) if   w l = vech ( ) , C i ( ϕ ) w l if   w l = ϕ .
Here, D / ω l is one in the ( j , l ) -th and the ( l , j ) -th elements of D as ω l = d j l , say the distinct element of D, and zero otherwise; similarly for / ω l when ω l = σ j l . Besides, the Hessian matrix calculated as the second derivatives of PD ( θ | y ) with respect to each entry of α is:
H α α l u = 1 2 i = 1 N { tr Λ ˜ i - 1 ( h ) Λ ˜ ˙ i u ( h ) Λ ˜ i - 1 ( h ) Λ ˜ ˙ i l ( h ) - Λ ˜ ¨ i l u ( h ) + tr [ y ˜ i ( h ) - X ˜ i ( h ) β × y ˜ i ( h ) - X ˜ i ( h ) β T Λ ˜ i - 1 ( h ) ( Λ ˜ ¨ i l u ( h ) - 2 Λ ˜ ˙ i u ( h ) Λ ˜ i - 1 ( h ) Λ ˜ ˙ i l ( h ) ) Λ ˜ i - 1 ( h ) ] } ,
where:
Λ ˜ ¨ i l u ( h ) = Λ ˜ ˙ i ( h ) w l = Σ w l C i ( ϕ ) w u if   w l = vech ( ) , w u = ϕ , 0 otherwise .

References

  1. Shah, A.; Laird, N.; Schoenfeld, D. A Random-Effects Model for Multiple Characteristics with Possibly Missing Data. J. Am. Stat. Assoc. 1997, 92, 775–779. [Google Scholar] [CrossRef]
  2. Marshall, G.; de la Cruz-Mesía, R.; Barón, A.E.; Rutledge, J.H.; Zerbe, G.O. Non-linear Random Effects Model for Multivariate Responses with Missing Data. Statist. Med. 2006, 25, 2817–2830. [Google Scholar] [CrossRef] [PubMed]
  3. Sammel, M.; Lin, X.; Ryan, L. Multivariate Linear Mixed Models for Multiple Outcomes. Statist. Med. 1999, 18, 2479–2492. [Google Scholar] [CrossRef]
  4. Song, X.; Davidian, M.; Tsiatis, A.A. An Estimator for the Proportional Hazards Model with Multiple Longitudinal Covariates Measured with Error. Biostatistics 2002, 3, 511–528. [Google Scholar] [CrossRef] [PubMed]
  5. Roy, J.; Lin, X. Analysis of Multivariate Longitudinal Outcomes with Nonignorable Dropouts and Missing Covariates: Changes in Methadone Treatment Practices. J. Am. Stat. Assoc. 2002, 97, 40–52. [Google Scholar] [CrossRef]
  6. Roy, A. Estimating Correlation Coefficient between Two Variables with Repeated Observations Using Mixed Effects Model. Biom. J. 2006, 48, 286–301. [Google Scholar] [CrossRef] [PubMed]
  7. Wang, W.L.; Fan, T.H. ECM-Based Maximum Likelihood Inference for Multivariate Linear Mixed Models with Autoregressive Errors. Comput. Stat. Data Anal. 2010, 54, 1328–1341. [Google Scholar] [CrossRef]
  8. Lindstrom, M.J.; Bates, D.M. Nonlinear Mixed Effects Models for Repeated Measures Data. Biometrics 1990, 46, 673–687. [Google Scholar] [CrossRef] [PubMed]
  9. Davidian, M.; Giltinan, D.M. Nonlinear Models for Repeated Measurements Data; Chapman & Hall: London, UK, 1995. [Google Scholar]
  10. Pinheiro, J.C.; Bates, D.M. Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model. J. Comput. Graph. Stat. 1995, 4, 12–35. [Google Scholar]
  11. Pinheiro, J.C.; Bates, D.M. Mixed-Effects Models in S and S-PLUS; Springer: Berlin, Germany, 2000. [Google Scholar]
  12. Pinheiro, J.; Bates, D.; DebRoy, S.; Sarkar, D.; R Core Team. nlme: Linear and Nonlinear Mixed Effects Models, R package version 3.1-104; Available online: http://CRAN.R-project.org/package=nlme (accessed on 24 July 2015).
  13. Dey, D.K.; Chen, M.H.; Chang, H. Bayesian Approach for Nonlinear Random Effects Models. Biometrics 1997, 53, 1239–1252. [Google Scholar] [CrossRef]
  14. Huang, Y.; Liu, D.; Wu, H. Hierarchical Bayesian Methods for Estimation of Parameters in a Longitudinal HIV Dynamic System. Biometrics 2006, 62, 413–423. [Google Scholar] [CrossRef] [PubMed]
  15. Lachosa, V.H.; Castrob, L.M.; Dey, D.K. Bayesian Inference in Nonlinear Mixed-Effects Models Using Normal Independent Distributions. Comput. Stat. Data Anal. 2013, 64, 237–252. [Google Scholar] [CrossRef]
  16. Wolfinger, R.D.; Lin, X. Two Taylor-Series Approximation Methods for Nonlinear Mixed Models. Comput. Stat. Data Anal. 1997, 25, 465–490. [Google Scholar] [CrossRef]
  17. Ge, Z.; Bickel, J.P.; Rice, A.J. An Approximate Likelihood Approach to Nonlinear Mixed Effects Models via Spline Approximation. Comput. Stat. Data Anal. 2004, 46, 747–776. [Google Scholar] [CrossRef]
  18. Walker, S.G. An EM Algorithm for Nonlinear Random Effects Models. Biometrics 1996, 52, 934–944. [Google Scholar] [CrossRef]
  19. Wang, J. EM Algorithms for Nonlinear Mixed Effects Models. Comput. Stat. Data Anal. 2007, 51, 3244–3256. [Google Scholar] [CrossRef]
  20. Vonesh, E.F.; Wang, H.; Nie, L.; Majumdar, D. Conditional Second-order Generalized Estimating Equations for Generalized Linear and Nonlinear Mixed-Effects Models. J. Am. Stat. Assoc. 2002, 97, 271–283. [Google Scholar] [CrossRef]
  21. Vonesh, E.F. Non-linear Models for the Analysis of Longitudinal Data. Stat. Med. 1992, 11, 1929–1954. [Google Scholar] [CrossRef] [PubMed]
  22. Beal, S.; Sheiner, L. The NONMEM System. Am. Stat. 1980, 34, 118–199. [Google Scholar] [CrossRef]
  23. Wolfinger, R.D. Comment: Experiences with the SAS Macro NLINMIX. Stat. Med. 1997, 16, 1258–1259. [Google Scholar]
  24. Wolfinger, R.D. Fitting Nonlinear Mixed Models with the New NLMIXED Procedure. In Proceedings of the 99 Joint Statistical Meetings, Miami Beach, FL, USA, 11–14 April 1999.
  25. Kuhn, E.; Lavielle, M. Maximum Likelihood Estimation in Nonlinear Mixed Effects Models. Comput. Stat. Data Anal. 2005, 49, 1020–1038. [Google Scholar] [CrossRef]
  26. Lavielle, M. MONOLIX (MOdelès NOn LInéaires à effets miXtes); MONOLIX Group: Orsay, France, 2008. [Google Scholar]
  27. Beal, S.; Sheiner, L.; Boeckmann, A.; Bauer, R. NONMEM User's Guides (1989–2009); Icon Development Solutions: Ellicott City, MD, USA, 2009. [Google Scholar]
  28. Comets, E.; Lavenu, A.; Lavielle, M. Saemix: Stochastic Approximation Expectation Maximization (SAEM) Algorithm. R package version 1. 2011. [Google Scholar]
  29. Wang, W.L.; Fan, T.H. Estimation in Multivariate t Linear Mixed Models for Multiple Longitudinal Data. Statist. Sinica 2011, 21, 1857–1880. [Google Scholar] [CrossRef]
  30. Wang, W.L.; Fan, T.H. Bayesian Analysis of Multivariate t Linear Mixed Models Using a Combination of IBF and Gibbs Samplers. J. Multivar. Anal. 2012, 105, 300–310. [Google Scholar] [CrossRef]
  31. Wang, W.L. Multivariate t Linear Mixed Models for Irregularly Observed Multiple Repeated Measures with Missing Outcomes. Biom. J. 2013, 55, 554–571. [Google Scholar] [CrossRef] [PubMed]
  32. Tierney, L.; Kadane, J.B. Accurate Approximations for Posterior Moments and Densities. J. Am. Stat. Assoc. 1986, 81, 82–86. [Google Scholar] [CrossRef]
  33. Meng, X.L.; Rubin, D.B. Maximum Likelihood Estimation via the ECM Algorithm: A General Framework. Biometrika 1993, 80, 267–278. [Google Scholar] [CrossRef]
  34. Booth, G.J.; Hobert, P.J. Maximizing Generalized Linear Mixed Model Likelihoods with an Automated Monte Carlo EM Algorithm. J. R. Stat. Soc. Ser. B 1999, 61, 265–285. [Google Scholar] [CrossRef]
  35. Lai, T.L.; Shih, M.C. A Hybrid Estimator in Nonlinear and Generalized Linear Mixed Effects Models. Biometrika 2006, 90, 791–795. [Google Scholar]
  36. Leonard, T.; Hsu, J.S.J.; Tsui, K.W. Bayesian Marginal Inference. J. Am. Stat. Assoc. 1989, 84, 1051–1058. [Google Scholar] [CrossRef]
  37. Bates, D.M.; Watts, D.G. Relative Curvature Measures of Nonlinearity. J. R. Stat. Soc. Ser. B 1980, 42, 1–25. [Google Scholar]
  38. R Development Core Team. R. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2012. [Google Scholar]
  39. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm (with Discussion). J. R. Stat. Soc. Ser. B 1977, 39, 1–38. [Google Scholar]
  40. Wei, G.C.G.; Tanner, M.A. A Monte Carlo Implementation of the EM Algorithm and the Poor's Man's Data Augmentation Algorithms. J. Am. Stat. Assoc. 1990, 85, 699–704. [Google Scholar] [CrossRef]
  41. Hastings, W.K. Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
  42. Lederman, M.M.; Connick, E.; Landay, A.; Kuritzkes, D.R.; Spritzler, J.; Clair, M.S.; Kotzin, B.L.; Fox, L.; Chiozzi, M.H.; Leonard, J.M.; et al. Immunologic Responses Associated with 12 Weeks of Combination Antiretroviral Therapy Consisting of Zidovudine, Lamivudine, and Ritonavir: Results of AIDS Clinical Trials Group Protocol 315. J. Infect. Dis. 1998, 178, 70–79. [Google Scholar] [CrossRef] [PubMed]
  43. Connick, E.; Lederman, M.M.; Kotzin, B.L.; Spritzler, J.; Kuritzkes, D.R.; Clair, M.S.; Sevin, A.D.; Fox, L.; Chiozzi, M.H.; Leonard, J.M.; et al. Immune Reconstitution in the First Year of Potent Antiretroviral Therapy and Its Relationship to Virologic Response. J. Infect. Dis. 2000, 181, 358–363. [Google Scholar] [CrossRef] [PubMed]
  44. Wu, H.; Ding, A. Population HIV-1 Dynamics in Vivo: Applicable Models and Inferential Tools for Virological Data from AIDS Clinical Trials. Biometrics 1999, 55, 410–418. [Google Scholar] [CrossRef] [PubMed]
  45. Liang, H.; Wu, H.; Carroll, R.J. The Relationship between Virologic Responses in AIDS Clinical Research Using Mixed-Effects Varying-Coefficient Models with Measurement Error. Biostatistics 2003, 4, 297–312. [Google Scholar] [CrossRef] [PubMed]
  46. Wu, H.; Liang, H. Backfitting Random Varying-Coefficient Models with Timedependent Smoothing Covariates. Scand. J. Stat. 2004, 31, 3–19. [Google Scholar] [CrossRef]
  47. Lin, T.I.; Wang, W.L. Multivariate Skew-Normal Linear Mixed Models for Multi-outcome Longitudinal Data. Stat. Model. 2013, 13, 199–221. [Google Scholar] [CrossRef]
  48. Lin, T.I.; Lee, J.C. A Robust Approach to t Linear Mixed Models Applied to Multiple Sclerosis Data. Statist. Med. 2006, 25, 1397–1412. [Google Scholar] [CrossRef] [PubMed]
  49. Lin, T.I.; Lee, J.C. Bayesian Analysis of Hierarchical Linear Mixed Modeling Using Multivariate t Distributions. J. Statist. Plan. Inf. 2007, 137, 484–495. [Google Scholar] [CrossRef]
  50. Lin, T.I.; Lee, J.C. Estimation and Prediction in Linear Mixed Models with Skew Normal Random Effects for Longitudinal Data. Statist. Med. 2008, 27, 1490–1507. [Google Scholar] [CrossRef] [PubMed]
  51. Arellano-Valle, R.B.; Genton, M. On Fundamental Skew Distributions. J. Multivar. Anal. 2005, 96, 93–116. [Google Scholar] [CrossRef]
  52. Azzalini, A.; Capitaino, A. Distributions Generated by Perturbation of Symmetry with Emphasis on a Multivariate Skew t-Distribution. J. R. Stat. Soc. Ser. B 2003, 65, 367–389. [Google Scholar] [CrossRef]
  53. Branco, M.; Dey, D. A General Class of Multivariate Skew-Elliptical Distribution. J. Multivar. Anal. 2001, 79, 93–113. [Google Scholar] [CrossRef]
  54. Bandyopadhyay, D.; Lachos, V.H.; Abanto-Vallec, C.A.; Ghosh, P. Linear Mixed Models for Skew-Normal/Independent Bivariate Responses with an Application to Periodontal Disease. Statist. Med. 2010, 29, 2643–2655. [Google Scholar] [CrossRef] [PubMed]
  55. Bandyopadhyay, D.; Castro, L.M.; Lachos, V.H.; Pinheiro, H.P. Robust Joint Non-linear Mixed-Effects Models and Diagnostics for Censored HIV Viral Loads with CD4 Measurement Error. J. Agr. Biol. Environ. Stat. 2015, 20, 121–139. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Wang, W.-L. Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models. Entropy 2015, 17, 5353-5381. https://doi.org/10.3390/e17085353

AMA Style

Wang W-L. Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models. Entropy. 2015; 17(8):5353-5381. https://doi.org/10.3390/e17085353

Chicago/Turabian Style

Wang, Wan-Lun. 2015. "Approximate Methods for Maximum Likelihood Estimation of Multivariate Nonlinear Mixed-Effects Models" Entropy 17, no. 8: 5353-5381. https://doi.org/10.3390/e17085353

Article Metrics

Back to TopTop