A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models

Wang, Jianling; Luan, Yihui; Jiang, Jiming

doi:10.3390/math10234547

Open AccessArticle

A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models

by

Jianling Wang

^1,2,

Yihui Luan

² and

Jiming Jiang

^3,*

¹

School of Mathematics, Shandong University, Jinan 250100, China

²

Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Qingdao 266237, China

³

Department of Statistics, University of California, Davis, CA 95616, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(23), 4547; https://doi.org/10.3390/math10234547

Submission received: 11 October 2022 / Revised: 23 November 2022 / Accepted: 28 November 2022 / Published: 1 December 2022

Download

Browse Figures

Versions Notes

Abstract

:

The main goal of this paper is to propose a two-step method for the estimation of parameters in non-linear mixed-effects models. A first-step estimate

\tilde{θ}

of the vector

θ

of parameters is obtained by solving estimation equations, with a working covariance matrix as the identity matrix. It is shown that

\tilde{θ}

is consistent. If, furthermore, we have an estimated covariance matrix,

\hat{V}

, by

\tilde{θ}

, a second-step estimator

\hat{θ}

can be obtained by solving the optimal estimation equations. It is shown that

\hat{θ}

maintains asymptotic optimality. We establish the consistency and asymptotic normality of the proposed estimators. Simulation results show the improvement of

\hat{θ}

over

\tilde{θ}

. Furthermore, we provide a method to estimate the variance

σ^{2}

using the method of moments; we also assess the empirical performance. Finally, three real-data examples are considered.

Keywords:

two-step estimate; consistency; estimation equation; non-linear mixed model

MSC:

62F12

1. Introduction

Non-linear mixed-effects models (NLMEMs) have been described in the literature, and have been used particularly in pharmacokinetics to identify sources of variability in drug concentration in the patient population [1,2]. For example, in [3] (Section 20.3), the authors discussed a toxicokinetic model, involving 15 parameters for each of the six persons in a pharmacokinetics experiment. Some methods for the estimation of fixed effects and variance components in NLMEMs have been described. The marginal density of the response variable does not have a closed-form expression, so some approximation methods have also been proposed, for example, taking a first-order Taylor expansion of the non-linear function for the conditional modes of the random effects model [4], Laplacian approximation [5], importance sampling [5,6], and Gaussian quadrature approximation [7].

Iterative estimation equations (IEEs) [8,9] have been investigated in the context of a semi-parametric regression model for longitudinal data with an unspecified covariance matrix; consistency and asymptotic efficiency have also been demonstrated [10]. However, it is time-consuming or difficult to obtain the convergence using the iterative method when the sample size is too large. Here, we improve and extend this method to the non-linear mixed-effects model.

This paper is structured as follows: In Section 2, we discuss a two-step method for estimating the parameters of non-linear mixed effects models. In Section 3, we study the asymptotic properties of the estimators. Section 4 contains details of the simulation results. In Section 5, we propose a method to estimate the variance

σ^{2}

. The analysis of real data is considered in Section 6. All of the technical results can be found in the Appendix A, Appendix B, Appendix C and Appendix D.

2. Estimation in a Non-linear Mixed-Effects Model

2.1. Non-linear Mixed-Effects Model

A non-linear mixed-effects (NLME) model can be expressed as follows:

y_{i j} = f (x_{j}, β, α_{i}) + ϵ_{i j},

where

i = 1, \dots, N; j = 1, \dots, N_{i},

y_{i j}

is the response or observation vector for the

j t h

observation of the

i t h

individual, f is a known non-linear function,

x_{j}

is the vector of covariates,

β

is a population parameter vector,

α_{i}

is a vector of the unobserved latent variables that are random across subjects, and

ϵ_{i j}

represents the error, independent of

α_{i}

[11]. We assume that

α_{i} \sim N (0, τ^{2})

and

ϵ_{i j} \sim N (0, σ^{2})

, and that they are independent of each other [2,4,11].

Next, we describe a method for estimating the parameter

θ = (β^{^{'}}, τ)^{'}

.

2.2. Parameter Estimation

Let the responses be

y_{1}, y_{2}, \dots, y_{N}

, the sample size be N, and

Θ

be the parameter space;

θ \in Θ

and

θ_{0}

are the vectors of the true parameter.

As in the iterative estimation equation, we now use the estimation equation to estimate the parameter, as follows:

\begin{matrix} F_{N} (θ) = C_{N}^{- 1} U_{N}^{T} (θ) B_{N} (y - μ_{N} (θ)), \end{matrix}

(1)

where

y = {(y_{i})}_{1 \leq i \leq N}, U_{N} (θ) = \partial μ_{N} (θ) / \partial θ, μ_{N} (θ) = E (y) = (μ_{N, 1}, \dots, μ_{N, N}), μ_{N, i} = E (y_{i}), 1 \leq i \leq N, B_{N} = d i a g (B_{N, 1}, B_{N, 2}, \dots, B_{N, N})

,

C_{N} = d i a g (c_{N, 1}, \dots, c_{N, r})

and

c_{N, k}

is a sequence of positive constants,

1 \leq k \leq r

, where r represents the dimensions of parameter

θ

. The notation

U_{N} (θ), B_{N}, μ_{N} (θ)

represents dependence on the sample size, N.

For longitudinal data, Equation (1) can be expressed as

\begin{matrix} F_{N} (θ) = \sum_{i = 1}^{N} C_{N, i}^{- 1} U_{N, i}^{T} (θ) B_{N, i} (y_{i} - μ_{N, i} (θ)), \end{matrix}

(2)

where

y_{i} = {(y_{i j})}_{1 \leq j \leq N_{i}}; μ_{N, i} = E (y_{i}); B_{N, i} = {(V a r (y_{i}))}^{- 1}; U_{N, i} (θ) = \partial μ_{N, i} (θ) / \partial θ .

Note that the generalized estimation equation (GEE) [8] corresponds to

B_{N, i} = V_{i}^{- 1}

,

V_{i} = V a r (y_{i})

, which is the true covariance matrix, but which is usually unknown. Hence, we propose a two-step estimation method.

Let

B_{N, i} = I_{N_{i}}

; then, the first-step estimator

\tilde{θ} = {\tilde{θ}}_{N}

is the solution to the equation

\begin{matrix} \sum_{i = 1}^{N} C_{N, i}^{- 1} U_{N, i}^{T} (θ) (y_{i} - μ_{N, i} (θ)) = 0 . \end{matrix}

(3)

For the second-step estimator, we derive an estimate

{\hat{V}}_{i}

for

V_{i}

using the first-step estimator

{\tilde{θ}}_{N}

. Then, letting

B_{i} = {\hat{V}}_{i}^{- 1}

, we use the equation

F_{N} (θ) = 0

to obtain the second-step estimator

{\hat{θ}}_{N}

.

If we iterate until convergence, the iterative equation estimator (IEE) is obtained. However, it will sometimes be difficult to obtain convergence.

It is shown in the next section that, under suitable conditions,

{\tilde{θ}}_{N}

and

{\hat{θ}}_{N}

are consistent. Furthermore, our simulations show that

{\hat{θ}}_{N}

outperforms

{\tilde{θ}}_{N}

in terms of efficiency. Furthermore, our simulation results show that the estimated efficiency of the second-step estimate is not much different from that of the GEE method.

A challenging task during computation is solving the estimation equation, which, under such a model, typically does not have an analytic expression. We tried to solve equations with the most popular methods, such as the Newton–Raphson iterative algorithm [12], but failed. Finally, we solved the estimation equation using the non-linear Gauss–Seidel algorithm, whose convergence has been established [13].

Remark 1.

The method can only estimate the parameters involved in

E (y_{i j})

. In the case of the linear mixed-effects model, the method can only estimate the fixed effects. For example, for the linear mixed-effects model

y = X β + Z α + ϵ

, we assume the random effect

α \sim N (0, R)

and independent error ϵ,

ϵ \sim N (0, D)

. As can be seen, in order to use our method, we need to know

E (y) = X β

; then, we can estimate parameter β but cannot estimate the variance of random effects, R. However, under a non-linear mixed-effects model, the method can be used to estimate both the fixed effects and the variance of the random effects, as is the case for our simulation.

Remark 2.

The matrix V can occasionally be singular. In this case, we suggest using the Moore–Penrose generalized inverse of V in place of

V^{- 1}

.

3. Asymptotic Properties of the Estimator

In this section, we study the consistency and asymptotic normality of first- and second-step estimators.

We assume that the first-step(

B_{N} = I_{N}

) estimator

\tilde{θ} = {\tilde{θ}}_{N}

is the solution to Equation (3). Let

{\tilde{θ}}_{N} = \{\begin{matrix} the solution of (3), & if the solution (3) exist \\ any θ in the parameter space, & if the solution (3) does not exist \end{matrix}

Consider

F_{N} (.)

as a map from

Θ

to a subset of

R^{r}

. Let

F_{N} (Θ)

be the image of

Θ

under

F_{N} (.)

.

For

x \in R^{r}

and

A \subset R^{r}

, define

d (x, A) = {inf}_{y \in A} | x - y |

.

A^{c}

represents the complement of A.

Let

ξ_{n}

be a sequence of non-negative random variables. We say that

lim inf ξ_{n} > 0

with a probability tending to one if for any

ϵ > 0

there is

δ > 0

such that

P (ξ_{n} > δ) \geq 1 - ϵ

for large n. Note that this is equivalent to

ξ_{n}^{- 1} = O_{p} (1)

[14].

Theorem 1.

(i) Suppose that,

\begin{matrix} F_{N} (θ_{0}) \to 0 \end{matrix}

(4)

in probability, as

N \to \infty

.

(ii)

\begin{matrix} \underset{N}{lim inf} d {F_{N} (θ_{0}), F_{N}^{c} (Θ)} > 0 w i t h p r o b a b i l i t y t e n d i n g t o o n e . \end{matrix}

(5)

Then, with a probability tending to one, the solution to (3) exists and is in Θ.

Theorem 2.

(i) Suppose that

F_{N} (θ_{0}) \to 0 (4)

in probability, as

N \to \infty

.

(ii) Suppose for any

ϵ > 0

, there is

Θ_{0} \subset Θ,

and

δ_{1} > 0, N_{δ_{1}} > 0

, such that, for large N,

\begin{matrix} P (inf_{θ \notin Θ_{0}} | F_{N} (θ) - F_{N} (θ_{0}) | > δ_{1}) > 1 - ϵ . \end{matrix}

(6)

Furthermore, suppose there are

δ_{2} > 0, N_{δ_{2}} > 0

, such that, for large N,

\begin{matrix} P (inf_{θ \in Θ_{0}, θ \neq θ_{0}} \frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ_{2}) > 1 - ϵ . \end{matrix}

(7)

Then, any solution to (3) is consistent.

Let

V_{N}

be the covariance matrix of y.

Write

{(H_{N, j, 1})}_{k l} = {(c_{j}^{- 1} \frac{\partial^{3} μ_{N}}{\partial θ_{j} \partial θ_{k} \partial θ_{l}})}^{T} B_{N} (y - μ_{N} (θ^{j})), 1 \leq j \leq r

,

{(H_{N, j, 1})}_{k l}, 1 \leq j, k, l \leq r

is the

(k, l)

element of

H_{N, j, 1}

,

θ^{j}

lies between

θ_{0}

and

{\tilde{θ}}_{N} (1 \leq j \leq r)

.

{(H_{N, j, 2})}_{k l} = {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{k}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{l}}) + {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{l}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{k}}) + {(c_{j}^{- 1} \frac{\partial μ_{N}}{\partial θ_{j}})}^{T} B_{N} (\frac{\partial^{2} μ_{N}}{\partial θ_{k} \partial θ_{l}}),

.

1 \leq j, k, l \leq r H_{N, j, 2, ε} = {sup}_{| θ - θ_{0} | \leq ϵ} ∥ H_{N, j, 2} ∥, 1 \leq j \leq r

Theorem 3.

Suppose that

(i) The components of

μ (θ)

are three-time continuously differentiable;

(ii)

{\tilde{θ}}_{N}

satisfies (3) with a probability tending to one and is consistent;

(iii) There exists

ϵ > 0

such that

\begin{matrix} \frac{| {\tilde{θ}}_{N} - θ_{0} |}{{(λ_{N, 1} λ_{N, 2})}^{1 / 2}} (max_{j} (H_{N, j, 2, ε})) \to 0 \end{matrix}

in probability, where

λ_{N, 1} = λ_{min} (C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1})

λ_{N, 2} = λ_{min} (U_{N 0}^{T} B_{N}^{T} U_{N 0} {(U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0})}^{- 1} U_{N 0}^{T} B_{N} U_{N 0})

U_{N 0} = \partial μ_{N} {(θ) / \partial θ |}_{θ = θ_{0}}

(iv)

{[C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1}]}^{- \frac{1}{2}} F_{N} (θ_{0}) \to N (0, I_{r})

in distribution.

(v)

{[C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1}]}^{- \frac{1}{2}} A_{N, 1}

is bounded in probability,

{[C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1}]}^{- \frac{1}{2}} (H_{N, j, 1} (θ^{j})), 1 \leq j \leq r

is bounded in probability.

where

{(A_{N, 1})}_{i j} = {(c_{i}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{i} \partial θ_{j}})}^{T} B_{N} (y - μ_{N} (θ_{0}))

,

{(A_{N, 1})}_{i j}, 1 \leq i, j \leq r

is the

(i, j)

element of

A_{N, 1}

.

Then,

{\tilde{θ}}_{N}

is asymptotically normal with mean

θ_{0}

and asymptotic covariance matrix

\begin{matrix} {(U_{N 0}^{T} B_{N}^{T} U_{N 0})}^{- 1} (U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0}) {(U_{N 0}^{T} B_{N} U_{N 0})}^{- 1} \end{matrix}

(8)

The proofs and further details are given in the Appendix A, Appendix B, Appendix C andAppendix D.

4. Simulation

Example 1.

Consider a simple case of a non-linear mixed-effects model that can be expressed as

y_{i j} = e^{β x_{j} + α_{i}} + ϵ_{i j},

i = 1, 2, \dots, m, j = 1, 2, \dots, n_{i}

, where

α_{i}

and

ϵ_{i j}

are independent, with

α_{i} \sim N (0, τ^{2})

and

ϵ_{i j} \sim N (0, σ^{2})

. We consider

σ^{2}

as a nuisance parameter whose estimation is not considered. Let

σ^{2} = 1

; we consider the estimation of the unknown parameters β and τ.

In this model, it is easy to see that

y_{i}

has the same (joint) distribution, hence,

V_{i} = V a r (y_{i j}) = V_{0}

for an unspecified

n \times n

covariance matrix,

1 \leq i \leq m

. For the second-step estimate,

V_{i}

is estimated by the method of moments (MoM) as follows:

{\hat{V}}_{0} = \frac{1}{m} \sum_{i = 1}^{m} (y_{i} - μ_{i}) {(y_{i} - μ_{i})}^{^{'}} .

Consider a set of unbalanced data. Table 1 shows the results of a simulation in which m = 500,

n_{i} = 2, 1 \leq i \leq 250;

for

n_{i} = 6, 251 \leq i \leq m

, the true parameters are

β = - 1, τ = 1

, and

X_{j}, j = 1, \dots, n

generated from N(0, 1). The results are based on 500 simulations. We find a 13.21% improvement in the second-step estimator over the first-step estimator in terms of the total mean squared error, and the second-step estimator is very close to the GEE estimator in efficiency.

Example 2.

Consider the following non-linear mixed-effects model, an exponential model, which may be used to model changes in drug concentration. Let

y_{i j} = β_{1} exp (- (β_{2} + α_{i}) t_{j}) + ϵ_{i j}, i = 1, \dots, m; j = 1, \dots, n,

where

α_{i}

is the random effect with a distribution

N (0, τ^{2})

, and

ϵ_{i j}

is the error, which is independent of

α_{i}

, with distribution

N (0, σ^{2})

.

β_{1}

and

β_{2}

are fixed parameters.

t_{j}

represents the time of observation. Assuming that

σ^{2} > 0

is known, we estimate parameter

θ = (β_{1}, β_{2}, τ)

.

Let

m = 500

and

n = 11

, with the true parameters

β_{1} = 2, β_{2} = 1, τ = 1, σ^{2} = 0.5, t_{j} = (1 : n) / n, j = 1, \dots, n

. The results, based on 500 simulation runs, are presented in Table 2. We see an improvement of approximately 12.5% in the second-step estimator over the first-step estimator in terms of the total mean squared error. Furthermore, we see that the second-step estimate is comparable to the GEE method.

5. Estimate of Variance $σ^{2}$

We have so far not discussed how to estimate the variance

σ^{2}

. Now, we propose a method to estimate this parameter and study its empirical performance.

For a non-linear mixed-effects model

\begin{matrix} y_{i j} = f (x_{j}, β, α_{i}) + ϵ_{i j}, i = 1, \dots, m; j = 1, \dots, n_{i}, \end{matrix}

(9)

we assume the same conditions as in Section 2.1. Let

α_{i} = τ ξ_{i}

,

ξ_{i} \sim N (0, 1)

, we have

\begin{matrix} E {y_{i j} - f (x_{j}, β, τ ξ_{i})}^{2} = E (ϵ_{i j}^{2}) = σ^{2} . \end{matrix}

(10)

The summation of both sides of

(10)

over

i = 1, \dots, m; j = 1, \dots, n_{i}

leads to

\begin{matrix} E [\sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} {y_{i j} - f (x_{j}, β, τ ξ_{i})}^{2}] = N σ^{2}, \end{matrix}

(11)

where

N = \sum_{i = 1}^{m} n_{i}

. If

ξ_{i}, i = 1, \dots, m

were observable, by removing the expectation sign on the left side of

(11)

, and replacing

β

and

τ

by their available estimators,

\hat{β}

and

\hat{τ}

, respectively, an empirical method of moments (EMM) estimator of

σ^{2}

would be obtained, that is,

\begin{matrix} σ^{2} = \frac{1}{N} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} {y_{i j} - f (x_{j}, \hat{β}, \hat{τ} ξ_{i})}^{2} . \end{matrix}

(12)

The difficulty is, of course, that

ξ_{i}, i = 1, \dots, m

are unobserved. To handle this situation, we replace

ξ_{i}, i = 1, \dots, m

on the right side of

(12)

with their conditional expectations given y,

{\hat{ξ}}_{i}

and

i = 1, \dots, m

, respectively, that is,

\begin{matrix} σ^{2} = \frac{1}{N} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} {y_{i j} - f (x_{j}, \hat{β}, \hat{τ} {\hat{ξ}}_{i})}^{2} . \end{matrix}

(13)

To compute the conditional expectations, we need to know the parameters,

θ = (β, τ)

and

σ^{2}

. The

θ

is replaced by the current estimator,

\hat{θ}

. As for

σ^{2}

, we use an idea similar to the EM algorithm. Let

σ_{c}^{2}

be the current estimator of

σ^{2}

. Then, the conditional expectations are computed under

θ

and

σ_{c}^{2}

, denoted by

{\hat{ξ}}_{i, θ, σ_{c}^{2}}, 1 \leq i \leq m

. We then use

\begin{matrix} σ_{u}^{2} = \frac{1}{N} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} {y_{i j} - f (x_{j}, \hat{β}, \hat{τ} {\hat{ξ}}_{i, θ, σ_{c}^{2}})}^{2} \end{matrix}

(14)

to update

σ^{2}

, from

σ_{c}^{2}

to

σ_{u}^{2}

. We continue until convergence, that is, when

| σ_{u}^{2} - σ_{c}^{2} | \leq δ

(e.g.,

δ = 0.001

). The final estimator,

σ_{u}^{2}

, is denoted by

{\hat{σ}}^{2}

. The initial estimator,

σ_{0}^{2}

, is obtained by the right side of

(13)

with

ξ_{i} = 0, 1 \leq i \leq m

. We now consider an example.

Example 3.

(Example 1 Continued). In this model, we obtain

\begin{matrix} σ^{2} = \frac{1}{N} \sum_{i = 1}^{m} \sum_{j = 1}^{n_{i}} {y_{i j} - e^{β x_{j} + τ ξ_{i}}}^{2} . \end{matrix}

(15)

and

\begin{matrix} {\hat{ξ}}_{i} = E (ξ_{i} | y_{i}) = \frac{E ξ_{i} \prod_{j = 1}^{n_{i}} f (y_{i j} | ξ_{i})}{E \prod_{j = 1}^{n_{i}} f (y_{i j} | ξ_{i})}, \end{matrix}

(16)

where

f (y_{i j} | ξ_{i})

is the condition probability density function, and it is obvious that

y_{i j} | ξ_{i} \sim N (e^{β x_{j} + τ ξ_{i}}, σ^{2})

.

The parameter

θ = (β, τ)

is replaced by the estimator

\hat{θ}

, either the first-step estimate or second-step estimate. Then, using the same simulation design as Example 1, we iterate

(15)

and

(16)

to obtain the estimator

{\hat{σ}}^{2}

. Based on 500 simulation runs, we found that some solutions did not converge, so we used the converge solution to obtain the results, which are shown in Table 3. The results show that there is little difference using a different method, but the estimation result is good. We can use either the first-step estimate or second-step estimate for the next step.

Remark 3.

The results in Table 3 are based on the converged solutions only. An implication is that, if the solution converges one may expect a good estimate from this procedure. In our real-data analysis results, the solution converged in all cases. A topic of future work would be to improve the estimator with better convergence.

6. Real Data

6.1. Height of Girls

The data are from the Longitudinal Studies of Child Health Development project, initiated in 1929 at the Harvard School of Public Health (the full description of the project is given by Stuart and Reed, 1929 [15]), which consists of the heights of 67 girls and 67 boys aged from 7 to 18, as described in chapter 8, Demidenko (2013) [16]. Here, we only consider the data for the girls, see Figure 1; the data from the boys are similar.

We use a non-linear mixed-effects model to describe the growth trend. For example, assuming that one parameter is subject-specific, the NLME model is

\begin{matrix} y_{i j} = \frac{β_{1}}{1 + e x p (β_{2} + α_{i} - β_{3} t_{i j})} + ϵ_{i j}, \end{matrix}

(17)

where

i = 1, \dots, 67; j = 1, \dots, n_{i}

and

α_{i}

represents the random effect with distribution

N (0, τ^{2})

. Furthermore, we assume the random errors

ϵ_{i j}

are independent from

α_{i}

, and are distributed as

N (0, σ^{2})

.

t_{i j}

is the age for

i t h

girl at

j t h

time. We first estimate the parameter

θ = (β_{1}, β_{2}, β_{3}, τ)

.

For simplicity of notation, let

h (x) = \frac{1}{1 + e^{x}}

. Under this model, we can obtain

μ_{i j} = E (y_{i j}) = E (β_{1} h (β_{2} + α_{i} - β_{3} t_{i j})) .

It is convenient to use the expression

α_{i} = τ ξ_{i}

, where

ξ_{i} \sim N (0, 1)

. Then, we have

\begin{matrix} μ_{i j} = E (y_{i j}) = E (β_{1} h (β_{2} + τ ξ - β_{3} t_{i j})) . \end{matrix}

(18)

In order to estimate parameter

θ = (β_{1}, β_{2}, β_{3}, τ)

, we use the first-step estimation equation

\begin{matrix} F (θ) = \sum_{i = 1}^{m} (\frac{\partial μ_{i}}{\partial θ})^{'} (y_{i} - μ_{i}) = 0 . \end{matrix}

(19)

The parameter estimate is

{\hat{θ}}_{1} = (169.835, 1.048, 0.310, 1.191)

, which is solved using the Gauss–Seidel iteration method.

The second-step estimation equation is

\begin{matrix} F (θ) = \sum_{i = 1}^{m} (\frac{\partial μ_{i}}{\partial θ})^{'} Σ_{i}^{- 1} (y_{i} - μ_{i}) = 0, \end{matrix}

(20)

where

Σ_{i} = V a r (y_{i})

, but this value is unknown, so we need an estimate to approximate it. Because the dataset consists of unbalanced data, the estimators of the covariances are obtained component-wise [10]. The estimates are

{\hat{θ}}_{2} = (164.828, 1.428, 0.427, 1.933)

.

Next, we use our method to estimate the parameter

σ^{2}

. In this model,

f (x_{j}, β, α_{i}) = \frac{β_{1}}{1 + e x p (β_{2} + α_{i} - β_{3} t_{i j})}

; then, we can obtain the first-step estimator

σ_{1}^{2} = 96.352 (σ = 9.816)

and second-step estimator

σ_{2}^{2} = 173.161 (σ = 13.159)

.

6.2. Indomethacin Concentration

Pinheiro and Bates (2000) [17] presented a dataset on the drug indomethacin for six patients. Every patient was injected with intravenous indomethacin before the commencement of the study. The plasma concentration of the indomethacin level (mcg/mL) of the patients was measured 11 times at the following points (hr): t = (0.25, 0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8). Let

y_{i j}, i = 1, \dots, 6, j = 1, \dots, 11

represent the plasma concentration for

i t h

patients at

j t h

point. We can plot the concentration change, as shown in Figure 2.

From the plot, we can see that the initial decrease in the plasma concentration of the drug level is dramatic due to the movement of the drug from the body circulation system into the tissue, until an equilibrium is reached. We establish a non-linear mixed-model to describe the change.

y_{i j} = β_{1} exp (- (β_{2} + α_{i}) t_{j}) + ϵ_{i j}

, where we assume

α_{i}, i = 1, \dots, 6

to be random effects, which are independent and distributed as

N (0, τ^{2})

.

ϵ_{i j}

are i.i.d random errors distributed as

N (0, σ^{2})

, and

α_{i}

and

ϵ_{i j}

s are independent of each other. We use our method to estimate the parameter

θ = (β_{1}, β_{2}, τ)

. The first-step estimate is

{\hat{θ}}_{1} = (2.910, 1.539, 0.533)

and the second-step estimate is

{\hat{θ}}_{2} = (2.804, 1.492, 0.522)

. The results show that the first-step estimate is comparable to the second-step estimate.

Then, we estimate the parameter

σ^{2}

. In this model,

f (x_{j}, β, α_{i}) = β_{1} exp (- (β_{2} + α_{i}) t_{j})

. Then, we can obtain the first-step estimator

σ_{1}^{2} = 0.021

and second-step estimator

σ_{2}^{2} = 0.021

. The two-step estimator returns the same estimate, so the first-step estimates are the ones to be used.

6.3. Orange Trees

We consider the data on the growth of orange trees over time given in Draper and Smith ([18] Exercise 24.N, p.559), described in [4]. The data are presented in Figure 3 and consist of seven measurements of the trunk circumferences (in millimeters) of five trees on seven occasions.

Each of the five trees was measured at 118, 484, 664, 1004, 1231, 1372, and 1582 days after December 31, 1968, when the study started. Let

y_{i j}

be the trunk circumferences (in millimeters) for the

i t h

tree at

j t h

time. We consider a non-linear model as follows:

\begin{matrix} y_{i j} = \frac{β_{1}}{1 + exp {- [t_{j} / 365.25 - (β_{2} + α_{i})]}} + ϵ_{i j}, i = 1, \dots, 5; j = 1, \dots, 7, \end{matrix}

where

t_{j}

represents the day corresponding to the

j t h

measurement;

α_{i}, i = 1, \dots, 5

are independent random effects, identically distributed as

N (0, τ^{2})

; and

ϵ_{i j}, i = 1, \dots, 5; j = 1, \dots, 7

are the random errors, assumed to be independent and distributed as

N (0, σ^{2})

.

α_{i}

and

ϵ_{i j}

are independent of each other. We use our method to estimate the parameter

θ = (β_{1}, β_{2}, τ)

. The first-step estimate is

{\hat{θ}}_{1} = (204.960, 2.158, 0.577)

, and the second-step estimate is

{\hat{θ}}_{2} = (204.960, 2.159, 0.591)

. If the iteration converges, we can obtain the estimate

\hat{θ} = (204.961, 2.169, 0.673)

. The results show that the two-step estimate is similar to the converged estimate. We use the second-step estimate for subsequent work. For

σ^{2}

, we can obtain the first-step estimator

σ_{1}^{2} = 204.686 (σ = 14.306)

and second-step estimator

σ_{2}^{2} = 204.579 (σ = 14.303)

. It is shown that the two-step estimate is very close.

7. Concluding Remarks

In this paper, we propose a two-step method to estimate the parameters, and we study the asymptotic properties of the estimators. This method is very convenient to use because it depends only on

E (y_{i j})

and we do not have to know the special distribution of

y_{i j}

. On the other hand, this method can only estimate the parameters related to

E (y_{i j})

. Thus, we need some other method to estimate other parameters. Here, we provide a method to estimate the variance

σ^{2}

using the method of moments.

We found that the second-step estimate is sometimes not more efficient than the first-step estimate, or that it demonstrates little improvement. In such a case, we choose to use the first-step estimate, which is simpler and computationally more attractive.

In this paper, the numerical solution is also an important topic. Hence, we will attempt to improve our numerical solution method in future work; see [19,20,21].

Author Contributions

Conceptualization, J.W. and J.J.; methodology, J.W., Y.L. and J.J.; software, J.W.; validation, J.W., Y.L. and J.J.; formal analysis, J.J. and Y.L.; investigation, J.W., Y.L. and J.J.; resources, J.W., Y.L. and J.J.; data curation, J.W. and J.J.; writing—original draft preparation, J.W.; writing—review and editing, Y.L. and J.J.; visualization, J.W.; supervision, Y.L.and J.J.; project administration, J.W., Y.L. and J.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China, Grant No. 2018YFA0703900, and the National Science Foundation of China, Grant No. 11971264.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data included in this study are available upon request from the corresponding author.

Acknowledgments

We are thankful to the reviewers for their constructive comments, which helped us to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Proof of Theorem 1.

Solution (3) exists and is in

Θ

if and only if

0 \in F_{N} (Θ)

.

For any

ϵ > 0

,

Inequality (5) implies that there is

δ > 0, N_{1} > 0

such that

P (d {F_{N} (θ_{0}), F_{N}^{c} (Θ)} > δ) > 1 - ϵ

for large N.

Equation (4) implies that any

0 < ϵ_{1} < δ, N_{ϵ} > 0

such that

P (| F_{N} (θ_{0}) | > ϵ_{1}) < ϵ

as

N \geq N_{ϵ}

. Thus, when

N \geq N_{1} \lor N_{ϵ}

,

\begin{matrix} P (0 \notin F_{N} (Θ)) & \leq & P (0 \notin F_{N} (Θ), d {F_{N} (θ_{0}), F_{N}^{c} (Θ)} > δ) + P (d {F_{N} (θ_{0}), F_{N}^{c} (Θ)} \leq δ) < 2 ϵ \end{matrix}

because

ϵ

is arbitrary, then

{lim}_{N \to \infty} P (0 \notin F_{N} (Θ)) = 0

. Therefore, the solution exists to (3) with a probability tending to one. □

Appendix B

Proof of Theorem 2.

For any

ϵ > 0

,

by (6), there is

Θ_{0} \subset Θ, δ_{1} > 0

and

N_{δ_{1}} > 0

such that

P ({inf}_{θ \notin Θ_{0}} | F_{N} (θ) - F_{N} (θ_{0}) | > δ_{1}) > 1 - ϵ, N \geq N_{δ_{1}}

by (4), there is

Θ_{0} \subset Θ

, any

0 < ϵ_{1} < δ_{1}, N_{ϵ_{1}} > 0

such that

P (| F_{N} (θ_{0}) | > ϵ_{1}) < ϵ

as

N \geq N_{ϵ_{1}}

.

by Theorem 1, there is

N_{ϵ} > 0

such that for

N \geq N_{ϵ}

,

P (solution (3) exists) > 1 - ϵ

Then, for

δ_{1}, N_{1} = max {N_{ϵ_{1}}, N_{δ_{1}}, N_{ϵ}}

,

N \geq N_{1}

, such that

\begin{matrix} P ({\tilde{θ}}_{N} \notin Θ_{0}) \\ \leq & P ({\tilde{θ}}_{N} \notin Θ_{0}, solution (3) exists) + P (solution (3) does not exist) \\ \leq & P ({\tilde{θ}}_{N} \notin Θ_{0}, inf_{θ \notin Θ_{0}} | F_{N} (θ) - F_{N} (θ_{0}) | > δ_{1}, solution (3) exists) \\ + & P (inf_{θ \notin Θ_{0}} | F_{N} (θ) - F_{N} (θ_{0}) | \leq δ_{1}, solution (3) exists) + P (solution (3) does not exist) \\ \leq & P (| F_{N} ({\tilde{θ}}_{N}) - F_{N} (θ_{0}) | > δ_{1}, solution (3) exists) + P (inf_{θ \notin Θ_{0}} | F_{N} (θ) - F_{N} (θ_{0}) | \leq δ_{1}) \\ + & P (solution (3) does not exist) \\ \leq & 3 ϵ \end{matrix}

On the other hand, by (7), there is

δ_{2} > 0

and

N_{δ_{2}} > 0

, such that

P (inf_{θ \in Θ_{0}, θ \neq θ_{0}} \frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ_{2}) > 1 - ϵ, N \geq N_{δ_{2}}

Then, for any

ϵ_{2} > \frac{ϵ_{1}}{δ_{2}}

, there is

N_{2} = max {N_{1}, N_{δ_{2}}, N_{ϵ}}

,

N \geq N_{2}

such that

\begin{matrix} P (| {\tilde{θ}}_{N} - θ_{0} | \geq ϵ_{2}) & \leq & P (| {\tilde{θ}}_{N} - θ_{0} | \geq ϵ_{2}, {\tilde{θ}}_{N} \notin Θ_{0}) + P (| {\tilde{θ}}_{N} - θ_{0} | \geq ϵ_{2}, {\tilde{θ}}_{N} \in Θ_{0}) \\ \leq & P ({\tilde{θ}}_{N} \notin Θ_{0}) + P (| {\tilde{θ}}_{N} - θ_{0} | \geq ϵ_{2}, {\tilde{θ}}_{N} \in Θ_{0}, \\ inf_{θ \in Θ_{0}, θ \neq θ_{0}} \frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ_{2}) \\ + P (inf_{θ \in Θ_{0}, θ \neq θ_{0}} \frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} \leq δ_{2}) \\ \leq & P ({\tilde{θ}}_{N} \notin Θ_{0}) + P (| F_{N} ({\tilde{θ}}_{N}) - F_{N} (θ_{0}) | \geq δ_{2} ϵ_{2}) + \\ P (inf_{θ \in Θ_{0}, θ \neq θ_{0}} \frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} \leq δ_{2}) \\ \leq & 6 ϵ \end{matrix}

The result follows because

F_{N} ({\tilde{θ}}_{N}) = 0

with a probability tending to one and by the above argument.

The following lemmas provide sufficient conditions for (4)–(7). Let

V_{N}

be the covariance matrix of y. Write

U_{N 0} = U_{N} (θ_{0}), H_{N, j, 2, ε} = {sup}_{| θ - θ_{0} | \leq ϵ} ∥ H_{N, j, 2} ∥, 1 \leq j \leq r

□

Appendix C

Lemma A1.

We find that (4) holds provided that, as

N \to \infty

,

t r (C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1}) \to 0 (L 1)

where

V_{N} = V a r (y) = diag (V_{N, 1}, \dots, V_{N, N}), V_{N, i} = V a r (y_{i})

.

Proof.

By Chebyshev’s Inequality, we know that

\begin{matrix} P (| F_{N} (θ_{0}) | > ϵ) & \leq & \frac{E | F_{N} (θ_{0}) |^{2}}{ϵ^{2}} = \frac{E | C_{N}^{- 1} U_{N 0}^{T} B_{N} (y - μ_{N} (θ_{0})) |^{2}}{ϵ^{2}} \\ = & \frac{t r (C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1})}{ϵ^{2}} \end{matrix}

Then, by (L1), we can obtain

P (| F_{N} (θ_{0}) | > ϵ) \to 0 (N \to \infty)

, followed by (4). □

Lemma A2.

Suppose

(1)

{lim inf}_{N} d {0, F_{N}^{c} (Θ)} > 0

with probability tending to one.

(2)

F_{N} (θ_{0}) \to 0

in probability as

N \to \infty

;

Then, (5) holds.

Proof.

For any

ϵ > 0

:

By condition (1), there is

δ_{1} > 0, N_{δ_{1}} > 0

such that

P (d {0, F_{N}^{c} (Θ)} > δ_{1}) > 1 - \frac{ϵ}{2}, N \geq N_{δ_{1}}

,

By condition (2), for any

ϵ_{1} > 0 (ϵ_{1} \leq δ_{1} / 2)

, there is

N_{ϵ_{1}} > 0

such that

P (| F_{N} (θ_{0}) | < ϵ_{1}) > 1 - ϵ / 2, N \geq N_{ϵ_{1}};

Then, let

δ (ϵ_{1} \leq δ \leq δ_{1} / 2), N_{δ} = max {N_{ϵ_{1}}, N_{δ_{1}}}

, as

N \geq N_{δ}

, by triangle inequality

d {F_{N} (θ_{0}), F_{N}^{c} (Θ)} \geq d {0, F_{N}^{c} (Θ)} - d {0, F_{N} (θ_{0})},

we have

\begin{matrix} P (d {F_{N} (θ_{0}), F_{N}^{c} (Θ)} > δ) & \geq & P (d {0, F_{N}^{c} (Θ)} - d {0, F_{N} (θ_{0})} > δ) \\ \geq & P ({d {0, F_{N}^{c} (Θ)} > 2 δ} \cap {d {0, F_{N} (θ_{0})} < δ}) \\ \geq & P ({d {0, F_{N}^{c} (Θ)} > 2 δ}) + P ({d {0, F_{N} (θ_{0})} < δ}) - 1 \\ \geq & 1 - \frac{ϵ}{2} + 1 - \frac{ϵ}{2} - 1 = 1 - ϵ, N \geq N_{δ} \end{matrix}

Then, (5) holds. □

Lemma A3.

Suppose that

there are continuous functions

f_{j} (.), g_{j} (.) (1 \leq j \leq r)

, such that

(1)

{lim inf}_{N} min [| f_{j} (F_{N} (θ_{0})) |, | g_{j} (F_{N} (θ_{0})) |] > 0 (1 \leq j \leq r)

with a probability tending to one.

(2) For any

ϵ_{1} > 0

, such that

lim_{θ_{j} \to - \infty} sup_{θ \in Θ} P (| f_{j} (F_{N} (θ)) | > ϵ_{1}) = 0, lim_{θ_{j} \to \infty} sup_{θ \in Θ} P (| g_{j} (F_{N} (θ) | > ϵ_{1})) = 0, (1 \leq j \leq r)

(3) If, as

N \to \infty

,

F_{N} (θ_{0})

is bounded in probability.

Then, there is a compact subset

Θ_{0} \subset Θ

such that (6) holds with

Θ_{N} = Θ_{0}

.

Proof.

For any

ϵ > 0

:

By condition (1), there is

δ_{1} > 0, N_{δ_{1}} > 0

such that

P (| f_{j} (F_{N} (θ_{0})) | > δ_{1}) > 1 - ϵ / 4

,

P (| g_{j} (F_{N} (θ_{0})) | > δ_{1}) > 1 - ϵ / 4

.

By condition (2), for any

ϵ_{1} > 0

(

ϵ_{1} < δ_{1} / 2

),there is

γ_{1} > 0

such that

P (| f_{j} (F_{N} (θ)) | > ϵ_{1}) < ϵ / 4

if

θ \in Θ

and

θ_{j} < - γ_{1}

uniformly in N; there is

γ_{2} > 0

such that

P (| g_{j} (F_{N} (θ)) | > ϵ_{1}) < ϵ / 4

if

θ \in Θ

and

θ_{j} > γ_{2}

uniformly in N.

Then, for any

ϵ_{2} > 0 (ϵ_{1} \leq ϵ_{2} \leq δ_{1} / 2)

, there is

N_{ϵ_{2}} > N_{δ_{1}}

, such that

\begin{matrix} P (| f_{j} (F_{N} (θ_{0})) - f_{j} (F_{N} (θ)) | > ϵ_{2}) & \geq & P (f_{j} (F_{N} (θ_{0})) > 2 ϵ_{2} \cap f_{j} (F_{N} (θ)) < ϵ_{2}) \\ \geq & P (f_{j} (F_{N} (θ_{0})) > 2 ϵ_{2}) + P ((f_{j} (F_{N} (θ)) < ϵ_{2}) - 1 \\ \geq & 1 - \frac{ϵ}{4} + 1 - \frac{ϵ}{4} - 1 \\ = & 1 - \frac{ϵ}{2}, N \geq N_{ϵ_{2}} \end{matrix}

if

θ \in Θ

and

θ_{j} \leq - γ_{1}

. Similarly, we can obtain

P (| g_{j} (F_{N} (θ_{0})) - g_{j} (F_{N} (θ)) | > ϵ_{2}) > 1 - ϵ / 2, N \geq N_{ϵ_{2}}

if

θ \in Θ

and

θ_{j} \geq γ_{2}

.

By condition (3), there is

M_{1} > 0

such that

P (| F_{N} (θ_{0}) | \leq M_{1}) > 1 - ϵ / 2, N \geq 1

, let

M_{2} = M_{1} + 1

,then

P (| F_{N} (θ_{0}) | \leq M_{2}) \leq P (| F_{N} (θ_{0}) | \leq M_{1}) > 1 - ϵ / 2

,

So for any

ϵ_{3} (ϵ_{3} < ϵ_{2}),

there is

δ > 0 (δ \leq 1),

if

| F_{N} (θ_{0}) - F_{N} (θ) | < δ, | F_{N} (θ_{0}) | \leq M_{1}

. Then,

| F_{N} (θ) | - | F_{N} (θ_{0}) | < | F_{N} (θ_{0}) - F_{N} (θ) | < δ

,

| F_{N} (θ) | < | F_{N} (θ_{0}) | + δ \leq M_{2}

,so

| F_{N} (θ_{0} | \leq M_{2}, F_{N} (θ) \leq M_{2}

. Furthermore,

f_{j} (.), g_{j} (.) (1 \leq j \leq r)

are continuous functions. If they are uniformly continuous in a compact set, then

| f_{j} (F_{N} (θ_{0})) - f_{j} (F_{N} (θ)) | < ϵ_{3} .

Then, there is

N_{δ} > 0 (N_{δ} > N_{ϵ_{2}})

, if

N \geq N_{δ}, θ_{j} < - γ_{1}

such that

\begin{matrix} P (| F_{N} (θ_{0}) - F_{N} (θ) | < δ) & \leq & P (| F_{N} (θ_{0}) - F_{N} (θ) | < δ, | F_{N} (θ_{0}) | \leq M_{1}) + P r (F_{N} (θ_{0}) | > M_{1}) \\ \leq & P (| f_{j} (F_{N} (θ_{0})) - f_{j} (F_{N} (θ)) | < ϵ_{3}) + P (F_{N} (θ_{0}) | > M_{1}) \\ \leq & ϵ \end{matrix}

Similarly, if

θ_{j} > γ_{2}

, then

P (| F_{N} (θ_{0}) - F_{N} (θ) | < δ) < ϵ

.

So, for any

ϵ > 0

, let compact subset

Θ_{0} = Θ \cap {[- γ_{1}, γ_{2}]}^{r}

. Then, if

θ \notin Θ_{0}

, there is

δ > 0, N_{δ} > 0

, such that

P (| F_{N} (θ_{0}) - F_{N} (θ) | > δ) > 1 - ϵ, N \geq N_{δ}

. Then, (6) holds. □

Lemma A4.

Suppose

F_{N} (θ)

is continuously differentiable,

(1)

{lim inf}_{N} λ_{m i n} (U_{N 0}^{T} B_{N}^{T} U_{N 0} C^{- 2} U_{N 0}^{T} B_{N} U_{N 0}) > 0

λ_{m i n}

means the smallest eigenvalue.

(2) For any

ϵ > 0

,

{lim sup}_{N} \frac{{max}_{j} {(H_{N, j, 2, ϵ})}^{2}}{λ_{m i n} (U_{N 0}^{T} B_{N}^{T} U_{N 0} C^{- 2} U_{N 0}^{T} B_{N} U_{N 0})} < \infty (1 \leq j \leq r)

(3)

\frac{‖ R_{N, 1} + A_{N, 1} ‖^{2}}{λ_{m i n} (U_{N 0}^{T} B_{N}^{T} U_{N 0} C^{- 2} U_{N 0}^{T} B_{N} U_{N 0})} = o_{p} (1) (N \to \infty)

where

H_{N, j, 2, ϵ} = {sup}_{∥ θ - θ_{0} ∥ < ϵ} ∥ H_{N, j, 2} (θ) ∥

,

H_{N, j, 2} (θ)

is

r \times r

,

\begin{matrix} {(H_{N, j, 2})}_{k l} & = & {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{k}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{l}}) + {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{l}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{k}}) + {(c_{j}^{- 1} \frac{\partial μ_{N}}{\partial θ_{j}})}^{T} B_{N} (\frac{\partial^{2} μ_{N}}{\partial θ_{k} \partial θ_{l}}), 1 \leq j \leq r \\ = & \sum_{i = 1}^{N} ({(c_{j}^{- 1} \frac{\partial^{2} μ_{N, i}}{\partial θ_{j} \partial θ_{k}})}^{T} B_{N, i} (\frac{\partial μ_{N, i}}{\partial θ_{l}}) + {(c_{j}^{- 1} \frac{\partial^{2} μ_{N, i}}{\partial θ_{j} \partial θ_{l}})}^{T} B_{N, i} (\frac{\partial μ_{N, i}}{\partial θ_{k}}) + {(c_{j}^{- 1} \frac{\partial μ_{N, i}}{\partial θ_{j}})}^{T} B_{N, i} (\frac{\partial^{2} μ_{N, i}}{\partial θ_{k} \partial θ_{l}})), \\ 1 \leq j \leq r . \\ R_{N, 1} & = & (\begin{matrix} \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, 1, 1} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, r, 1} (θ^{r}) \end{matrix}), {(H_{N, j, 1} (θ^{j}))}_{k l} = {(c_{j}^{- 1} \frac{\partial^{3} μ_{N}}{\partial θ_{j} \partial θ_{k} \partial θ_{l}})}^{T} B_{N} (y - μ_{N} (θ^{j})), 1 \leq j \leq r \end{matrix}

θ^{j}

lies between θ and

θ_{0}

.

A_{N, 1}

is

r \times r

, the (k,l) element is

{(c_{l}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{k} \partial θ_{l}})}^{T} B_{N} (y - μ_{N} (θ_{0}))

(4) Suppose there is a compact set

Θ_{1} \subset Θ

such that

d {θ_{0}, Θ_{1}} > 0

, and

δ_{2} > 0

,

N_{δ_{2}} > 0

,

P ({inf}_{θ \in Θ_{1}} | F_{N} (θ) - F_{N} (θ_{0}) | > δ_{2}) > 1 - ϵ

as

N \geq N_{δ_{2}}

Then, there is a compact

Θ_{0}

such that

δ > 0, N_{δ} > 0

,

P (inf_{θ \in Θ_{0}, θ \neq θ_{0}} \frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ) > 1 - ϵ

as

N \geq N_{δ}

, where

Θ_{0}

is any compact subset of Θ that includes

θ_{0}

as an interior point.

Proof.

\begin{matrix} F_{N} (θ) & = & C_{N}^{- 1} U_{N}^{T} (θ) B_{N} (y - μ_{N} (θ)) \\ = & (\begin{matrix} \sum_{i = 1}^{N} c_{1}^{- 1} {(\frac{\partial μ_{N, i}}{\partial θ_{1}})}^{T} B_{N, i} (y_{i} - μ_{N, i}) \\ \dots \\ \sum_{i = 1}^{N} c_{r}^{- 1} {(\frac{\partial μ_{N, i}}{\partial θ_{r}})}^{T} B_{N, i} (y_{i} - μ_{N, i}) \end{matrix}) \end{matrix}

by

F_{N} (θ)

Taylor expansion,

F_{N} (θ) = F_{N} (θ_{0}) + (\frac{\partial F_{N}}{\partial θ}) |_{θ = θ_{0}} (θ - θ_{0}) + (\begin{matrix} \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, 1} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, r} (θ^{r}) \end{matrix}) (θ - θ_{0})

where the

(k, l)

element of

H_{N, j} (θ^{j})

is

\begin{matrix} {(c_{j}^{- 1} \frac{\partial^{3} μ_{N}}{\partial θ_{j} \partial θ_{k} \partial θ_{l}})}^{T} B_{N} (y - μ_{N} (θ)) - {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{k}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{l}}) - {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{l}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{k}}) \\ - {(c_{j}^{- 1} \frac{\partial μ_{N}}{\partial θ_{j}})}^{T} B_{N} (\frac{\partial^{2} μ_{N}}{\partial θ_{k} \partial θ_{l}}) \end{matrix}

where

θ^{j}

lies between

θ_{0}

and

θ_{N} (1 \leq j \leq r)

.

Then,

F_{N} (θ) - F_{N} (θ_{0}) = (A_{N, 1} - A_{N, 2}) (θ - θ_{0}) + (R_{N, 1} - R_{N, 2}) (θ - θ_{0}) = A_{N, 2} (θ - θ_{0}) + (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0})

, where

\begin{matrix} {(A_{N, 1})}_{k l} & = & {(c_{k}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{k} \partial θ_{l}})}^{T} B_{N} (y - μ_{N} (θ_{0})), A_{N, 1 (r \times r)} \\ A_{N, 2} & = & C_{N}^{- 1} U_{N 0}^{T} B_{N} U_{N 0} \\ R_{N, 1} & = & (\begin{matrix} \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, 1, 1} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, r, 1} (θ^{r}) \end{matrix}), {(H_{N, j, 1})}_{k l} = {(c_{j}^{- 1} \frac{\partial^{3} μ_{N}}{\partial θ_{j} \partial θ_{k} \partial θ_{l}})}^{T} B_{N} (y - μ_{N} (θ^{j})), \\ 1 \leq j \leq r \\ R_{N, 2} & = & (\begin{matrix} \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, 1, 2} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, r, 2} (θ^{r}) \end{matrix}), \\ {(H_{N, j, 2})}_{k l} & = & {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{k}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{l}}) + {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{l}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{k}}) + {(c_{j}^{- 1} \frac{\partial μ_{N}}{\partial θ_{j}})}^{T} B_{N} (\frac{\partial^{2} μ_{N}}{\partial θ_{k} \partial θ_{l}}), \\ 1 \leq j \leq r \end{matrix}

So

\begin{matrix} | F_{N} (θ) - F_{N} (θ_{0}) |^{2} & = & | (A_{N, 2} + (R_{N, 2} - R_{N, 1} - A_{N, 1})) (θ - θ_{0}) |^{2} \\ = & {(θ - θ_{0})}^{T} A_{N, 2}^{T} A_{N, 2} (θ - θ_{0}) \\ + 2 {(θ - θ_{0})}^{T} A_{N, 2}^{T} (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) \\ + {(θ - θ_{0})}^{T} {(R_{N, 2} - R_{N, 1} - A_{N, 1})}^{T} (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) \\ \geq & {(θ - θ_{0})}^{T} A_{N, 2}^{T} A_{N, 2} (θ - θ_{0}) \\ - 2 | {(θ - θ_{0})}^{T} A_{N, 2}^{T} (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) | \\ + {(θ - θ_{0})}^{T} {(R_{N, 2} - R_{N, 1} - A_{N, 1})}^{T} (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) \\ = & | A_{N, 2} (θ - θ_{0}) |^{2} - 2 | {(θ - θ_{0})}^{T} A_{N, 2}^{T} (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) | \\ + | (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) |^{2} \\ \geq & | A_{N, 2} (θ - θ_{0}) |^{2} - 2 | {(θ - θ_{0})}^{T} A_{N, 2}^{T} | | (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) | \\ + | (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) |^{2} \\ = & (| A_{N, 2} (θ - θ_{0}) | - | (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) {|)}^{2} \end{matrix}

so,

| F_{N} (θ) - F_{N} (θ_{0}) | \geq | | A_{N, 2} (θ - θ_{0}) | - | (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) | |

Let

L_{N} = A_{N, 2}^{T} A_{N, 2} = U_{N 0}^{T} B_{N}^{T} U_{N 0} C^{- 2} U_{N 0}^{T} B_{N} U_{N 0}

we have

| A_{N, 2} (θ - θ_{0}) |^{2} \geq λ_{min} (A_{N, 2}^{T} A_{N, 2}) | θ - θ_{0} |^{2} = λ_{min} (L_{N}) {| θ - θ_{0} |}^{2}

\begin{matrix} | (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) |^{2} \leq ∥ R_{N, 2} - R_{N, 1} - A_{N, 1} ∥^{2} {| θ - θ_{0} |}^{2} \end{matrix}

by

∥ A - B ∥ \leq ∥ A ∥ + ∥ B ∥

, we can obtain

\begin{matrix} | A_{N, 2} (θ - θ_{0}) | - | (R_{N, 2} - R_{N, 1} - A_{N, 1}) (θ - θ_{0}) | \\ \geq & λ_{min}^{1 / 2} (L_{N}) | θ - θ_{0} | - ∥ R_{N, 2} - R_{N, 1} - A_{N, 1} ∥ | θ - θ_{0} | \\ = & (λ_{min}^{1 / 2} (L_{N}) - ∥ R_{N, 2} - R_{N, 1} - A_{N, 1} ∥) | θ - θ_{0} | \\ = & (1 - \frac{∥ R_{N, 2} - R_{N, 1} - A_{N, 1} ∥}{λ_{min}^{1 / 2} (L_{N})}) λ_{min}^{1 / 2} (L_{N}) | θ - θ_{0} | \\ \geq & (1 - \frac{(∥ R_{N, 2} ∥ + ∥ (R_{N, 1} + A_{N, 1}) ∥)}{λ_{min}^{1 / 2} (L_{N})}) λ_{min}^{1 / 2} (L_{N}) | θ - θ_{0} | \\ = & (1 - (\sqrt{\frac{∥ R_{N, 2} ∥^{2}}{λ_{min} (L_{N})}} + \sqrt{\frac{∥ R_{N, 1} + A_{N, 1} ∥^{2}}{λ_{min} (L_{N})}})) λ_{min}^{1 / 2} (L_{N}) | θ - θ_{0} | \end{matrix}

\begin{matrix} ∥ R_{N, 2} ∥^{2} & \leq & \sum_{j = 1}^{r} {| \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, j, 2} (θ^{j}) |}^{2} \\ \leq & \frac{1}{4} \sum_{j = 1}^{r} ((| θ - θ_{0}) | ∥ H_{N, j, 2} (θ^{j}) {∥)}^{2} \\ \leq & \frac{1}{4} {| θ - θ_{0} |}^{2} \sum_{j = 1}^{r} {(H_{N, j, 2, ε})}^{2} \\ \leq & \frac{r}{4} {| θ - θ_{0} |}^{2} max_{j} {(H_{N, j, 2, ε})}^{2} \end{matrix}

By (2), there is

M_{1} > 0

,

δ > 0 (δ = \sqrt{\frac{1}{r M_{1}}})

, as

| θ - θ_{0} | < δ

, such that

\frac{∥ R_{N, 2} ∥^{2}}{λ_{min} (L_{N})} \leq \frac{r}{4} {| θ - θ_{0} |}^{2} \frac{{max}_{j} {(H_{N, j, 2, ε})}^{2}}{λ_{min} (L_{N})} \leq \frac{r}{4} δ^{2} M_{1}

, let

\frac{r}{4} δ^{2} M_{1} = ϵ_{1}

; therefore,

\begin{matrix} | F_{N} (θ) - F_{N} (θ_{0}) | & \geq & (1 - (\sqrt{\frac{∥ R_{N, 2} ∥^{2}}{λ_{min} (L_{N})}} + \sqrt{\frac{∥ R_{N, 1} + A_{N, 1} ∥^{2}}{λ_{min} (L_{N})}})) λ_{min}^{1 / 2} (L_{N}) | θ - θ_{0} | \\ \geq & (1 - (\sqrt{ϵ_{1}} + \sqrt{\frac{∥ R_{N, 1} + A_{N, 1} ∥^{2}}{λ_{min} (L_{N})}})) λ_{min}^{1 / 2} (L_{N}) | θ - θ_{0} | \end{matrix}

For any

ϵ > 0

, by condition (3), for any

ϵ_{2} > 0 (ϵ_{2} < \frac{1}{4})

there is

N_{1} > 0

,

P (\frac{∥ R_{N, 1} + A_{N, 1} ∥^{2}}{λ_{min} (L_{N})} < ϵ_{2}) > 1 - ϵ

as

N > N_{1}

.

So, there is

0 < δ_{1} < (1 - (\sqrt{ϵ_{1}} + \sqrt{ϵ_{2}})) λ_{min} (L_{N}), N_{δ_{1}} \geq N_{1}

, as

N \geq N_{δ_{1}}

, such that

\begin{matrix} P (\frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ_{1}) & \geq & P ((1 - (\sqrt{ϵ_{1}} + \sqrt{\frac{∥ R_{N, 1} + A_{N, 1} ∥^{2}}{λ_{min} (L_{N})}})) λ_{min}^{1 / 2} (L_{N}) > δ_{1}) \\ \geq & P (\sqrt{\frac{∥ R_{N, 1} + A_{N, 1} ∥^{2}}{λ_{min} (L_{N})}} λ_{min}^{1 / 2} (L_{N}) < 1 - \sqrt{ϵ_{1}} - \frac{δ_{1}}{λ_{min}^{1 / 2} (L_{N})}) \\ \geq & P (\sqrt{\frac{∥ R_{N, 1} + A_{N, 1} ∥^{2}}{λ_{min} (L_{N})}} < \sqrt{ϵ_{2}}) \\ \geq & (1 - ϵ) \end{matrix}

So, for any

ϵ > 0

, there is

δ > 0, δ_{1} > 0, N_{δ_{1}} > 0

, as

N \geq N_{δ_{1}}

, such that

P (\frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ_{1}) > (1 - ϵ), | θ - θ_{0} | < δ

Suppose

Θ_{0}

is a compact subset that includes

θ_{0}

as an interior point, as say, there is

D > 0

,

Θ_{0} = {θ : | θ - θ_{0} | \leq D}

Let

Θ_{1} = {θ : δ \leq | θ - θ_{0} | \leq D}

, when

d {θ_{0}, Θ_{1}} > 0

then there is

δ_{2} > 0, N_{δ_{2}} > 0

, as

N \geq N_{δ_{2}}

, such that

P ({inf}_{θ \in Θ_{1}} | F_{N} (θ) - F_{N} (θ_{0}) | > δ_{2}) > 1 - ϵ

Let

δ_{3} < δ_{2} D

, there is

N_{δ_{3}} \geq N_{δ_{2}}

, as

N \geq N_{δ_{3}}

such that

\begin{matrix} P (\frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ_{3}) \\ \geq & P (\frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ_{3}, θ \in Θ_{1}) \\ \geq & P (D | F_{N} (θ) - F_{N} (θ_{0}) | > δ_{3}) \\ \geq & P (| F_{N} (θ) - F_{N} (θ_{0}) | > δ_{2}) \\ \geq & P (inf_{θ \in Θ_{1}} | F_{N} (θ) - F_{N} (θ_{0}) | > δ_{2}) > 1 - ϵ \end{matrix}

Then, there is a compact

Θ_{0}

such that

δ > 0, N_{δ} > 0

,

P (inf_{θ \in Θ_{0}, θ \neq θ_{0}} \frac{| F_{N} (θ) - F_{N} (θ_{0}) |}{| θ - θ_{0} |} > δ) > 1 - ϵ

as

N \geq N_{δ}

, where

Θ_{0}

is any compact subset of

Θ

that includes

θ_{0}

as an interior point. □

Appendix D

Proof of Theorem 3.

By the Taylor expansion, it is easy to show that

F_{N} (θ) = F_{N} (θ_{0}) + (\frac{\partial F_{N}}{\partial θ}) |_{θ = θ_{0}} (θ - θ_{0}) + (\begin{matrix} \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, 1} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {(θ - θ_{0})}^{T} H_{N, r} (θ^{r}) \end{matrix}) (θ - θ_{0})

Take

{\tilde{θ}}_{N}

into the equation,

0 = F_{N} ({\tilde{θ}}_{N}) = F_{N} (θ_{0}) + (\frac{\partial F_{N}}{\partial θ}) |_{θ = θ_{0}} ({\tilde{θ}}_{N} - θ_{0}) + (\begin{matrix} \frac{1}{2} {({\tilde{θ}}_{N} - θ_{0})}^{T} H_{N, 1} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {({\tilde{θ}}_{N} - θ_{0})}^{T} H_{N, r} (θ^{r}) \end{matrix}) ({\tilde{θ}}_{N} - θ_{0})

Now, we have

0 = F_{N} (θ_{0}) + (A_{N, 1} - A_{N, 2}) (\tilde{θ} - θ_{0}) + (R_{N, 1} - R_{N, 2}) (\tilde{θ} - θ_{0})

where

\begin{matrix} {(A_{N, 1})}_{i j} & = & {(c_{i}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{i} \partial θ_{j}})}^{T} B_{N} (y - μ_{N} (θ_{0})), A_{N, 1 (r \times r)} \\ A_{N, 2} & = & C_{N 0}^{- 1} U_{N 0}^{T} B_{N} U_{N} \\ R_{N, 1} & = & (\begin{matrix} \frac{1}{2} {(\tilde{θ} - θ_{0})}^{T} H_{N, 1, 1} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {(\tilde{θ} - θ_{0})}^{T} H_{N, r, 1} (θ^{r}) \end{matrix}), {(H_{N, j, 1})}_{k l} = {(c_{j}^{- 1} \frac{\partial^{3} μ_{N}}{\partial θ_{j} \partial θ_{k} \partial θ_{l}})}^{T} B_{N} (y - μ_{N} (θ)), \\ 1 \leq j \leq r \\ R_{N, 2} & = & (\begin{matrix} \frac{1}{2} {(\tilde{θ} - θ_{0})}^{T} H_{N, 1, 2} (θ^{1}) \\ \dots \dots \\ \frac{1}{2} {(\tilde{θ} - θ_{0})}^{T} H_{N, r, 2} (θ^{r}) \end{matrix}), \\ {(H_{N, j, 2})}_{k l} & = & {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{k}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{l}}) + {(c_{j}^{- 1} \frac{\partial^{2} μ_{N}}{\partial θ_{j} \partial θ_{l}})}^{T} B_{N} (\frac{\partial μ_{N}}{\partial θ_{k}}) + {(c_{j}^{- 1} \frac{\partial μ_{N}}{\partial θ_{j}})}^{T} B_{N} (\frac{\partial^{2} μ_{N}}{\partial θ_{k} \partial θ_{l}}), \\ (1 \leq j \leq r) \end{matrix}

F_{N} (θ_{0}) + A_{N, 1} ({\tilde{θ}}_{N} - θ_{0}) + R_{N, 1} ({\tilde{θ}}_{N} - θ_{0}) = A_{N, 2} ({\tilde{θ}}_{N} - θ_{0}) + R_{N, 2} ({\tilde{θ}}_{N} - θ_{0})

Write

W_{N} = C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1}

. Then,

W_{N}^{- \frac{1}{2}} (F_{N} (θ_{0}) + A_{N, 1} ({\tilde{θ}}_{N} - θ_{0}) + R_{N, 1} ({\tilde{θ}}_{N} - θ_{0})) = W_{N}^{- \frac{1}{2}} (A_{N, 2} ({\tilde{θ}}_{N} - θ_{0}) + R_{N, 2} ({\tilde{θ}}_{N} - θ_{0}))

by (iv)(v)(vi),

W_{N}^{- \frac{1}{2}} (F_{N} (θ_{0})) \to N (0, I_{r}), W_{N}^{- \frac{1}{2}} A_{N, 1} ({\tilde{θ}}_{N} - θ_{0}) = o_{p} (1), W_{N}^{- \frac{1}{2}} R_{N, 1} ({\tilde{θ}}_{N} - θ_{0}) = o_{p} (1)

so

\begin{matrix} W_{N}^{- \frac{1}{2}} (C_{N}^{- 1} U_{N}^{T} B_{N} U_{N} ({\tilde{θ}}_{N} - θ_{0}) + R_{N, 2} ({\tilde{θ}}_{N} - θ_{0})) \to N (0, I_{r}) \\ W_{N}^{- \frac{1}{2}} (C_{N}^{- 1} U_{N}^{T} B_{N} U_{N} + R_{N, 2}) ({\tilde{θ}}_{N} - θ_{0})) \to N (0, I_{r}) \end{matrix}

in distribution. Furthermore, we have

\begin{matrix} W_{N}^{- \frac{1}{2}} (C_{N}^{- 1} U_{N}^{T} B_{N} U_{N} + R_{N, 2}) = (I_{r} + K_{N}) (W_{N}^{- \frac{1}{2}} C_{N}^{- 1} U_{N}^{T} B_{N} U_{N}) \end{matrix}

where

K_{N} = W_{N}^{- \frac{1}{2}} R_{N, 2} {(W_{N}^{- \frac{1}{2}} C_{N}^{- 1} U_{N 0}^{T} B_{N} U_{N 0})}^{- 1}

\begin{matrix} ∥ W_{N}^{- \frac{1}{2}} ∥ = {(λ_{m i n} (W_{N}))}^{- \frac{1}{2}} = {(λ_{m i n} (C_{N}^{- 1} U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0} C_{N}^{- 1}))}^{- \frac{1}{2}} \\ ∥ {(W_{N}^{- \frac{1}{2}} C_{N}^{- 1} U_{N 0}^{T} B_{N} U_{N 0})}^{- 1} ∥ = {(λ_{m i n} (U_{N 0}^{T} B_{N}^{T} U_{N} {(U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0})}^{- 1} U_{N}^{T} B_{N} U_{N 0}))}^{- \frac{1}{2}} \end{matrix}

so

\begin{matrix} ∥ K_{N} ∥ & \leq & ∥ W_{N}^{- \frac{1}{2}} ∥ ∥ R_{N, 2} ∥ ∥ {(W_{N}^{- \frac{1}{2}} C_{N}^{- 1} U_{N 0}^{T} B_{N} U_{N 0})}^{- 1} ∥ = {(λ_{N, 1} λ_{N, 2})}^{- \frac{1}{2}} ∥ R_{N, 2} ∥ \\ ∥ R_{N, 2} ∥^{2} & = & \sum_{j = 1}^{r} {(\frac{1}{2} {({\tilde{θ}}_{N} - θ_{0})}^{T} H_{N, j, 2} (θ^{j}))}^{2} \\ \leq & \frac{1}{4} \sum_{j = 1}^{r} ((∥ {\tilde{θ}}_{N} - θ_{0} ∥ ∥ H_{N, j, 2} (θ^{j}) {∥)}^{2} \\ \leq & \frac{1}{4} | {\tilde{θ}}_{N} - θ_{0} |^{2} \sum_{j = 1}^{r} (∥ H_{N, j, 2} (θ^{j}) {∥)}^{2} \\ \leq & \frac{1}{4} {| {\tilde{θ}}_{N} - θ_{0} |}^{2} \sum_{j = 1}^{r} {(H_{N, j, 2, ε} (θ^{j}))}^{2} \\ \leq & \frac{r}{4} {| {\tilde{θ}}_{N} - θ_{0} |}^{2} max_{j} {(H_{N, j, 2, ε} (θ^{j}))}^{2} \end{matrix}

then

\begin{matrix} ∥ K_{N} ∥ & \leq & ∥ W_{N}^{- \frac{1}{2}} ∥ ∥ R_{N, 2} ∥ ∥ {(W_{N}^{- \frac{1}{2}} C_{N}^{- 1} U_{N 0}^{T} B_{N} U_{N 0})}^{- 1} ∥ \\ \leq & \frac{\sqrt{r}}{2} \frac{| {\tilde{θ}}_{N} - θ_{0}) |}{{(λ_{N, 1} λ_{N, 2})}^{\frac{1}{2}}} max_{j} (H_{N, j, 2, ε} (θ^{j})) \\ \to & 0 \end{matrix}

{(I_{r} + K_{N})}^{- 1} (W_{N}^{- \frac{1}{2}} (C_{N}^{- 1} U_{N}^{T} B_{N} U_{N} + R_{2})) ({\tilde{θ}}_{N} - θ_{0}) = (W_{N}^{- \frac{1}{2}} C_{N}^{- 1} U_{N}^{T} B_{N} U_{N}) ({\tilde{θ}}_{N} - θ_{0})

{(I_{r} + K_{N})}^{- 1} \to 1

in probability,

(W_{N}^{- \frac{1}{2}} (C_{N}^{- 1} U_{N}^{T} B_{N} U_{N} + R_{N, 2})) ({\tilde{θ}}_{N} - θ_{0}) \to N (0, I_{r})

in distribution, so

(W_{N}^{- \frac{1}{2}} C_{N}^{- 1} U_{N}^{T} B_{N} U_{N}) ({\tilde{θ}}_{N} - θ_{0}) \to N (0, I_{r})

Then,

{\tilde{θ}}_{N}

is an asymptotically normal with mean

θ_{0}

and asymptotic covariance matrix

{(U_{N 0}^{T} B_{N}^{T} U_{N 0})}^{- 1} (U_{N 0}^{T} B_{N} V_{N} B_{N}^{T} U_{N 0}) {(U_{N 0}^{T} B_{N} U_{N 0})}^{- 1}

□

References

FDA US. Guidance for Industry: Population Pharmacokinetics; FDA: Rockville, MD, USA, 1999. [Google Scholar]
Jiang, J.; Ge, Z. Mixed models: An overview. In Frontiers of Statistics in Honor of Professor Peter J. Bickel’s 65th Birthday; Fan, J., Koul, H., Eds.; Imperial College Press: London, UK, 2006; pp. 445–466. [Google Scholar]
Gelman, A.; Carlin, J.; Stern, H.; Rubin, D. Bayesian Data Analysis, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
Lindstrom, M.; Bates, D. Nonlinear mixed effects models for repeated measures data. Biometrics 1990, 46, 673–687. [Google Scholar] [CrossRef] [PubMed]
Pinheiro, J.; Bates, D. Approximations to the Log-likelihood function in Nonlinear Mixed Effects Models. J. Comput. Graph. Stat. 1995, 4, 12–35. [Google Scholar]
Geweke, J. Bayesian Inference in Econmetric Models Using MonteCarlo Integration. Econometrica 1989, 57, 1317–1339. [Google Scholar] [CrossRef]
Davidian, M.; Gallant, A.R. Smooth Nonparametric Maximum Likelihood Estimation for Population Pharmacokinetics, with Application to Quinidine. J. Pharmacokinet. Biopharm. 1992, 20, 529–556. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jiang, J. Linear and Generalized Linear Mixed Models and Their Applications; Springer: New York, NY, USA, 2007. [Google Scholar]
Jiang, J.; Nguyen, T. Linear and Generalized Linear Mixed Models and Their Applications, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
Jiang, J.; Luan, Y.; Wang, Y.G. Iterative Estimating Equations: Linear Convergence and Asymptotic Properties. Ann. Stat. 2007, 35, 2233–2260. [Google Scholar] [CrossRef]
Jiang, J. Asymptotic Analysis of Mixed Effects Models. Theory, Applications, and Open Problems; Chapman & Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
McCullagh, P. ; Nelder, J A. Generalised Linear Modelling; Chapman and Hall: New York, NY, USA, 1989. [Google Scholar]
Jiang, J. A nonlinear Gauss–Seidel algorithm for inference about GLMM. Comput. Stat. 2000, 15, 229–241. [Google Scholar] [CrossRef]
Jiang, J.; Zhang, W. Robust estimation in generalised linear mixed models. Biometrika 2001, 88, 753–765. [Google Scholar] [CrossRef]
Stuart, H.C.; Reed, R.B. Longitudinal studies of child health and development, Harvard School of Public Health, Series II, No. 1, Description of project. Pediatrics 1929, 24, 875–885. [Google Scholar] [CrossRef]
Demidenko, E. Mixed Models: Theory and Applications with R; John Wiley & Sons: New York, NY, USA, 2013. [Google Scholar]
Pinheiro, J.; Bates, D. Mixed-Effects Models in S and S-PLUS; Statistics and Computing Series; Springer: New York, NY, USA, 2000. [Google Scholar]
Draper, N.R.; Smith, H. Applied Regression Analysis, 3rd ed.; Wiley: New York, NY, USA, 1998. [Google Scholar]
Qalandarov, A.A.; Khaldjigitov, A.A. Mathematical and numerical modeling of the coupled dynamic thermoelastic problems for isotropic bodies. TWMS J. Pure Appl. Math. 2020, 11, 119–126. [Google Scholar]
Shokri, A.; Saadat, H. Trigonometrically fitted high-order predictor–corrector method with phase-lag of order infinity for the numerical solution of radial Schrödinger equation. J. Math. Chem. 2014, 52, 1870–1894. [Google Scholar] [CrossRef]
Shokri, A.; Saadat, H. P-stability, TF and VSDPL technique in Obrechkoff methods for the numerical solution of the Schrodinger equation. Bull. Iran. Math. Soc. 2016, 42, 687–706. [Google Scholar]

Figure 1. The height of girls aged from 7 to 18.

Figure 2. Indomethicin concentration (mcg/mL) of six individuals measured 11 times after injection.

Figure 3. Trunk circumference (in millimeters) of five orange trees.

Table 1. Simulation result: non-linear model.

	$β = - 1$			$τ = 1$			Overall
Estimator	Mean	Bias	SD	Mean	Bias	SD	MSE
1st-step	−1.0006	−0.0006	0.0321	0.9891	−0.0109	0.0644	0.0053
2nd-step	−0.9993	0.0007	0.0184	0.9849	−0.0151	0.0631	0.0046
GEE	−0.9992	0.0008	0.0182	0.9961	−0.0039	0.0622	0.0042

SD, standard deviation. Overall MSE represents the mean squared error of the estimator of β + the mean squared error of the estimator of τ.

Table 2. Simulation estimation result.

	$β_{1} = 2$			$β_{2} = 1$			$τ = 1$			Overall
Estimator	Mean	Bias	SD	Mean	Bias	SD	Mean	Bias	SD	MSE
1st-step	1.9804	−0.0196	0.0404	0.9463	−0.0537	0.1015	0.9421	−0.0579	0.0974	0.0280
2nd-step	1.9985	−0.0015	0.0429	0.9914	−0.0086	0.1080	0.9835	−0.0165	0.1032	0.0245
GEE	1.9989	−0.0011	0.0428	0.9933	−0.0067	0.1082	0.9854	−0.0146	0.1033	0.0245

SD, standard deviation. Overall MSE represents the mean squared error of the estimator of β₁ + the mean squared error of the estimator of β₂ + the mean squared error of the estimator of τ.

Table 3. Estimate

σ^{2}

.

Table 3. Estimate

σ^{2}

.

$σ^{2} = 1$	mean	var	mse	% of Convergence
1st-step	1.1019	0.0277	0.0380	46.247
2nd-step	1.1106	0.0398	0.0520	42.857
GEE	1.1028	0.0300	0.0405	44.794

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Luan, Y.; Jiang, J. A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models. Mathematics 2022, 10, 4547. https://doi.org/10.3390/math10234547

AMA Style

Wang J, Luan Y, Jiang J. A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models. Mathematics. 2022; 10(23):4547. https://doi.org/10.3390/math10234547

Chicago/Turabian Style

Wang, Jianling, Yihui Luan, and Jiming Jiang. 2022. "A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models" Mathematics 10, no. 23: 4547. https://doi.org/10.3390/math10234547

APA Style

Wang, J., Luan, Y., & Jiang, J. (2022). A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models. Mathematics, 10(23), 4547. https://doi.org/10.3390/math10234547

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models

Abstract

1. Introduction

2. Estimation in a Non-linear Mixed-Effects Model

2.1. Non-linear Mixed-Effects Model

2.2. Parameter Estimation

3. Asymptotic Properties of the Estimator

4. Simulation

5. Estimate of Variance $σ^{2}$

6. Real Data

6.1. Height of Girls

6.2. Indomethacin Concentration

6.3. Orange Trees

7. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models

Abstract

1. Introduction

2. Estimation in a Non-linear Mixed-Effects Model

2.1. Non-linear Mixed-Effects Model

2.2. Parameter Estimation

3. Asymptotic Properties of the Estimator

4. Simulation

5. Estimate of Variance σ 2

6. Real Data

6.1. Height of Girls

6.2. Indomethacin Concentration

6.3. Orange Trees

7. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5. Estimate of Variance $σ^{2}$