Next Article in Journal
Estimation and Hypothesis Test for Mean Curve with Functional Data by Reproducing Kernel Hilbert Space Methods, with Applications in Biostatistics
Previous Article in Journal
The Effect of Diameter and Position of Transverse Cylindrical Vortex Generators on Heat Transfer Improvement in a Wavy Channel
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models

1
School of Mathematics, Shandong University, Jinan 250100, China
2
Research Center for Mathematics and Interdisciplinary Sciences, Frontiers Science Center for Nonlinear Expectations (Ministry of Education), Shandong University, Qingdao 266237, China
3
Department of Statistics, University of California, Davis, CA 95616, USA
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(23), 4547; https://doi.org/10.3390/math10234547
Submission received: 11 October 2022 / Revised: 23 November 2022 / Accepted: 28 November 2022 / Published: 1 December 2022

Abstract

:
The main goal of this paper is to propose a two-step method for the estimation of parameters in non-linear mixed-effects models. A first-step estimate θ ˜ of the vector θ of parameters is obtained by solving estimation equations, with a working covariance matrix as the identity matrix. It is shown that θ ˜ is consistent. If, furthermore, we have an estimated covariance matrix, V ^ , by θ ˜ , a second-step estimator θ ^ can be obtained by solving the optimal estimation equations. It is shown that θ ^ maintains asymptotic optimality. We establish the consistency and asymptotic normality of the proposed estimators. Simulation results show the improvement of θ ^ over θ ˜ . Furthermore, we provide a method to estimate the variance σ 2 using the method of moments; we also assess the empirical performance. Finally, three real-data examples are considered.

1. Introduction

Non-linear mixed-effects models (NLMEMs) have been described in the literature, and have been used particularly in pharmacokinetics to identify sources of variability in drug concentration in the patient population [1,2]. For example, in [3] (Section 20.3), the authors discussed a toxicokinetic model, involving 15 parameters for each of the six persons in a pharmacokinetics experiment. Some methods for the estimation of fixed effects and variance components in NLMEMs have been described. The marginal density of the response variable does not have a closed-form expression, so some approximation methods have also been proposed, for example, taking a first-order Taylor expansion of the non-linear function for the conditional modes of the random effects model [4], Laplacian approximation [5], importance sampling [5,6], and Gaussian quadrature approximation [7].
Iterative estimation equations (IEEs) [8,9] have been investigated in the context of a semi-parametric regression model for longitudinal data with an unspecified covariance matrix; consistency and asymptotic efficiency have also been demonstrated [10]. However, it is time-consuming or difficult to obtain the convergence using the iterative method when the sample size is too large. Here, we improve and extend this method to the non-linear mixed-effects model.
This paper is structured as follows: In Section 2, we discuss a two-step method for estimating the parameters of non-linear mixed effects models. In Section 3, we study the asymptotic properties of the estimators. Section 4 contains details of the simulation results. In Section 5, we propose a method to estimate the variance σ 2 . The analysis of real data is considered in Section 6. All of the technical results can be found in the Appendix A, Appendix B, Appendix C and Appendix D.

2. Estimation in a Non-linear Mixed-Effects Model

2.1. Non-linear Mixed-Effects Model

A non-linear mixed-effects (NLME) model can be expressed as follows:
y i j = f ( x j , β , α i ) + ϵ i j ,
where i = 1 , , N ; j = 1 , , N i , y i j is the response or observation vector for the j t h observation of the i t h individual, f is a known non-linear function, x j is the vector of covariates, β is a population parameter vector, α i is a vector of the unobserved latent variables that are random across subjects, and ϵ i j represents the error, independent of α i [11]. We assume that α i N ( 0 , τ 2 ) and ϵ i j N ( 0 , σ 2 ) , and that they are independent of each other [2,4,11].
Next, we describe a method for estimating the parameter θ = ( β , τ ) .

2.2. Parameter Estimation

Let the responses be y 1 , y 2 , , y N , the sample size be N, and Θ be the parameter space; θ Θ and θ 0 are the vectors of the true parameter.
As in the iterative estimation equation, we now use the estimation equation to estimate the parameter, as follows:
F N ( θ ) = C N 1 U N T ( θ ) B N ( y μ N ( θ ) ) ,
where y = ( y i ) 1 i N , U N ( θ ) = μ N ( θ ) / θ , μ N ( θ ) = E ( y ) = ( μ N , 1 , , μ N , N ) , μ N , i = E ( y i ) , 1 i N , B N = d i a g ( B N , 1 , B N , 2 , , B N , N ) , C N = d i a g ( c N , 1 , , c N , r ) and c N , k is a sequence of positive constants, 1 k r , where r represents the dimensions of parameter θ . The notation U N ( θ ) , B N , μ N ( θ ) represents dependence on the sample size, N.
For longitudinal data, Equation (1) can be expressed as
F N ( θ ) = i = 1 N C N , i 1 U N , i T ( θ ) B N , i ( y i μ N , i ( θ ) ) ,
where y i = ( y i j ) 1 j N i ; μ N , i = E ( y i ) ; B N , i = ( V a r ( y i ) ) 1 ; U N , i ( θ ) = μ N , i ( θ ) / θ .
Note that the generalized estimation equation (GEE) [8] corresponds to B N , i = V i 1 , V i = V a r ( y i ) , which is the true covariance matrix, but which is usually unknown. Hence, we propose a two-step estimation method.
Let B N , i = I N i ; then, the first-step estimator θ ˜ = θ ˜ N is the solution to the equation
i = 1 N C N , i 1 U N , i T ( θ ) ( y i μ N , i ( θ ) ) = 0 .
For the second-step estimator, we derive an estimate V ^ i for V i using the first-step estimator θ ˜ N . Then, letting B i = V ^ i 1 , we use the equation F N ( θ ) = 0 to obtain the second-step estimator θ ^ N .
If we iterate until convergence, the iterative equation estimator (IEE) is obtained. However, it will sometimes be difficult to obtain convergence.
It is shown in the next section that, under suitable conditions, θ ˜ N and θ ^ N are consistent. Furthermore, our simulations show that θ ^ N outperforms θ ˜ N in terms of efficiency. Furthermore, our simulation results show that the estimated efficiency of the second-step estimate is not much different from that of the GEE method.
A challenging task during computation is solving the estimation equation, which, under such a model, typically does not have an analytic expression. We tried to solve equations with the most popular methods, such as the Newton–Raphson iterative algorithm [12], but failed. Finally, we solved the estimation equation using the non-linear Gauss–Seidel algorithm, whose convergence has been established [13].
Remark 1.
The method can only estimate the parameters involved in E ( y i j ) . In the case of the linear mixed-effects model, the method can only estimate the fixed effects. For example, for the linear mixed-effects model y = X β + Z α + ϵ , we assume the random effect α N ( 0 , R ) and independent error ϵ, ϵ N ( 0 , D ) . As can be seen, in order to use our method, we need to know E ( y ) = X β ; then, we can estimate parameter β but cannot estimate the variance of random effects, R. However, under a non-linear mixed-effects model, the method can be used to estimate both the fixed effects and the variance of the random effects, as is the case for our simulation.
Remark 2.
The matrix V can occasionally be singular. In this case, we suggest using the Moore–Penrose generalized inverse of V in place of V 1 .

3. Asymptotic Properties of the Estimator

In this section, we study the consistency and asymptotic normality of first- and second-step estimators.
We assume that the first-step( B N = I N ) estimator θ ˜ = θ ˜ N is the solution to Equation (3). Let
θ ˜ N = the solution of ( 3 ) , if the solution ( 3 ) exist any θ in the parameter space , if the solution ( 3 ) does not exist
Consider F N ( . ) as a map from Θ to a subset of R r . Let F N ( Θ ) be the image of Θ under F N ( . ) .
For x R r and A R r , define d ( x , A ) = inf y A | x y | . A c represents the complement of A.
Let ξ n be a sequence of non-negative random variables. We say that lim inf ξ n > 0 with a probability tending to one if for any ϵ > 0 there is δ > 0 such that P ( ξ n > δ ) 1 ϵ for large n. Note that this is equivalent to ξ n 1 = O p ( 1 ) [14].
Theorem 1.
(i) Suppose that,
F N ( θ 0 ) 0
in probability, as N .
(ii)
lim inf N d { F N ( θ 0 ) , F N c ( Θ ) } > 0 w i t h p r o b a b i l i t y t e n d i n g t o o n e .
Then, with a probability tending to one, the solution to (3) exists and is in Θ.
Theorem 2.
(i) Suppose that
F N ( θ 0 ) 0 ( 4 )
in probability, as N .
(ii) Suppose for any ϵ > 0 , there is Θ 0 Θ , and δ 1 > 0 , N δ 1 > 0 , such that, for large N,
P ( inf θ Θ 0 | F N ( θ ) F N ( θ 0 ) | > δ 1 ) > 1 ϵ .
Furthermore, suppose there are δ 2 > 0 , N δ 2 > 0 , such that, for large N,
P ( inf θ Θ 0 , θ θ 0 | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ 2 ) > 1 ϵ .
Then, any solution to (3) is consistent.
Let V N be the covariance matrix of y.
Write ( H N , j , 1 ) k l = ( c j 1 3 μ N θ j θ k θ l ) T B N ( y μ N ( θ j ) ) , 1 j r ,
( H N , j , 1 ) k l , 1 j , k , l r is the ( k , l ) element of H N , j , 1 , θ j lies between θ 0 and θ ˜ N ( 1 j r ) .
( H N , j , 2 ) k l = ( c j 1 2 μ N θ j θ k ) T B N ( μ N θ l ) + ( c j 1 2 μ N θ j θ l ) T B N ( μ N θ k ) + ( c j 1 μ N θ j ) T B N ( 2 μ N θ k θ l ) , .
1 j , k , l r H N , j , 2 , ε = sup | θ θ 0 | ϵ H N , j , 2 , 1 j r
Theorem 3.
Suppose that
(i) The components of μ ( θ ) are three-time continuously differentiable;
(ii) θ ˜ N satisfies (3) with a probability tending to one and is consistent;
(iii) There exists ϵ > 0 such that
| θ ˜ N θ 0 | ( λ N , 1 λ N , 2 ) 1 / 2 ( max j ( H N , j , 2 , ε ) ) 0
in probability, where
λ N , 1 = λ min ( C N 1 U N 0 T B N V N B N T U N 0 C N 1 )
λ N , 2 = λ min ( U N 0 T B N T U N 0 ( U N 0 T B N V N B N T U N 0 ) 1 U N 0 T B N U N 0 )
U N 0 = μ N ( θ ) / θ | θ = θ 0
(iv)
[ C N 1 U N 0 T B N V N B N T U N 0 C N 1 ] 1 2 F N ( θ 0 ) N ( 0 , I r )
in distribution.
(v) [ C N 1 U N 0 T B N V N B N T U N 0 C N 1 ] 1 2 A N , 1 is bounded in probability,
[ C N 1 U N 0 T B N V N B N T U N 0 C N 1 ] 1 2 ( H N , j , 1 ( θ j ) ) , 1 j r is bounded in probability.
where ( A N , 1 ) i j = ( c i 1 2 μ N θ i θ j ) T B N ( y μ N ( θ 0 ) ) , ( A N , 1 ) i j , 1 i , j r is the ( i , j ) element of A N , 1 .
Then, θ ˜ N is asymptotically normal with mean θ 0 and asymptotic covariance matrix
( U N 0 T B N T U N 0 ) 1 ( U N 0 T B N V N B N T U N 0 ) ( U N 0 T B N U N 0 ) 1
The proofs and further details are given in the Appendix A, Appendix B, Appendix C andAppendix D.

4. Simulation

Example 1.
Consider a simple case of a non-linear mixed-effects model that can be expressed as
y i j = e β x j + α i + ϵ i j ,
i = 1 , 2 , , m , j = 1 , 2 , , n i , where α i and ϵ i j are independent, with α i N ( 0 , τ 2 ) and ϵ i j N ( 0 , σ 2 ) . We consider σ 2 as a nuisance parameter whose estimation is not considered. Let σ 2 = 1 ; we consider the estimation of the unknown parameters β and τ.
In this model, it is easy to see that y i has the same (joint) distribution, hence, V i = V a r ( y i j ) = V 0 for an unspecified n × n covariance matrix, 1 i m . For the second-step estimate, V i is estimated by the method of moments (MoM) as follows:
V ^ 0 = 1 m i = 1 m ( y i μ i ) ( y i μ i ) .
Consider a set of unbalanced data. Table 1 shows the results of a simulation in which m = 500, n i = 2 , 1 i 250 ; for n i = 6 , 251 i m , the true parameters are β = 1 , τ = 1 , and X j , j = 1 , , n generated from N(0, 1). The results are based on 500 simulations. We find a 13.21% improvement in the second-step estimator over the first-step estimator in terms of the total mean squared error, and the second-step estimator is very close to the GEE estimator in efficiency.
Example 2.
Consider the following non-linear mixed-effects model, an exponential model, which may be used to model changes in drug concentration. Let
y i j = β 1 exp ( ( β 2 + α i ) t j ) + ϵ i j , i = 1 , , m ;   j = 1 , , n ,
where α i is the random effect with a distribution N ( 0 , τ 2 ) , and ϵ i j is the error, which is independent of α i , with distribution N ( 0 , σ 2 ) . β 1 and β 2 are fixed parameters. t j represents the time of observation. Assuming that σ 2 > 0 is known, we estimate parameter θ = ( β 1 , β 2 , τ ) .
Let m = 500 and n = 11 , with the true parameters β 1 = 2 , β 2 = 1 , τ = 1 , σ 2 = 0.5 , t j = ( 1 : n ) / n , j = 1 , , n . The results, based on 500 simulation runs, are presented in Table 2. We see an improvement of approximately 12.5% in the second-step estimator over the first-step estimator in terms of the total mean squared error. Furthermore, we see that the second-step estimate is comparable to the GEE method.

5. Estimate of Variance σ 2

We have so far not discussed how to estimate the variance σ 2 . Now, we propose a method to estimate this parameter and study its empirical performance.
For a non-linear mixed-effects model
y i j = f ( x j , β , α i ) + ϵ i j , i = 1 , , m ;   j = 1 , , n i ,
we assume the same conditions as in Section 2.1. Let α i = τ ξ i , ξ i N ( 0 , 1 ) , we have
E { y i j f ( x j , β , τ ξ i ) } 2 = E ( ϵ i j 2 ) = σ 2 .
The summation of both sides of ( 10 ) over i = 1 , , m ; j = 1 , , n i leads to
E i = 1 m j = 1 n i { y i j f ( x j , β , τ ξ i ) } 2 = N σ 2 ,
where N = i = 1 m n i . If ξ i , i = 1 , , m were observable, by removing the expectation sign on the left side of ( 11 ) , and replacing β and τ by their available estimators, β ^ and τ ^ , respectively, an empirical method of moments (EMM) estimator of σ 2 would be obtained, that is,
σ 2 = 1 N i = 1 m j = 1 n i { y i j f ( x j , β ^ , τ ^ ξ i ) } 2 .
The difficulty is, of course, that ξ i , i = 1 , , m are unobserved. To handle this situation, we replace ξ i , i = 1 , , m on the right side of ( 12 ) with their conditional expectations given y, ξ ^ i and i = 1 , , m , respectively, that is,
σ 2 = 1 N i = 1 m j = 1 n i { y i j f ( x j , β ^ , τ ^ ξ ^ i ) } 2 .
To compute the conditional expectations, we need to know the parameters, θ = ( β , τ ) and σ 2 . The θ is replaced by the current estimator, θ ^ . As for σ 2 , we use an idea similar to the EM algorithm. Let σ c 2 be the current estimator of σ 2 . Then, the conditional expectations are computed under θ and σ c 2 , denoted by ξ ^ i , θ , σ c 2 , 1 i m . We then use
σ u 2 = 1 N i = 1 m j = 1 n i { y i j f ( x j , β ^ , τ ^ ξ ^ i , θ , σ c 2 ) } 2
to update σ 2 , from σ c 2 to σ u 2 . We continue until convergence, that is, when | σ u 2 σ c 2 | δ (e.g., δ = 0.001 ). The final estimator, σ u 2 , is denoted by σ ^ 2 . The initial estimator, σ 0 2 , is obtained by the right side of ( 13 ) with ξ i = 0 , 1 i m . We now consider an example.
Example 3.
(Example 1 Continued). In this model, we obtain
σ 2 = 1 N i = 1 m j = 1 n i { y i j e β x j + τ ξ i } 2 .
and
ξ ^ i = E ( ξ i | y i ) = E ξ i j = 1 n i f ( y i j | ξ i ) E j = 1 n i f ( y i j | ξ i ) ,
where f ( y i j | ξ i ) is the condition probability density function, and it is obvious that y i j | ξ i N ( e β x j + τ ξ i , σ 2 ) .
The parameter θ = ( β , τ ) is replaced by the estimator θ ^ , either the first-step estimate or second-step estimate. Then, using the same simulation design as Example 1, we iterate ( 15 ) and ( 16 ) to obtain the estimator σ ^ 2 . Based on 500 simulation runs, we found that some solutions did not converge, so we used the converge solution to obtain the results, which are shown in Table 3. The results show that there is little difference using a different method, but the estimation result is good. We can use either the first-step estimate or second-step estimate for the next step.
Remark 3.
The results in Table 3 are based on the converged solutions only. An implication is that, if the solution converges one may expect a good estimate from this procedure. In our real-data analysis results, the solution converged in all cases. A topic of future work would be to improve the estimator with better convergence.

6. Real Data

6.1. Height of Girls

The data are from the Longitudinal Studies of Child Health Development project, initiated in 1929 at the Harvard School of Public Health (the full description of the project is given by Stuart and Reed, 1929 [15]), which consists of the heights of 67 girls and 67 boys aged from 7 to 18, as described in chapter 8, Demidenko (2013) [16]. Here, we only consider the data for the girls, see Figure 1; the data from the boys are similar.
We use a non-linear mixed-effects model to describe the growth trend. For example, assuming that one parameter is subject-specific, the NLME model is
y i j = β 1 1 + e x p ( β 2 + α i β 3 t i j ) + ϵ i j ,
where i = 1 , , 67 ; j = 1 , , n i and α i represents the random effect with distribution N ( 0 , τ 2 ) . Furthermore, we assume the random errors ϵ i j are independent from α i , and are distributed as N ( 0 , σ 2 ) . t i j is the age for i t h girl at j t h time. We first estimate the parameter θ = ( β 1 , β 2 , β 3 , τ ) .
For simplicity of notation, let h ( x ) = 1 1 + e x . Under this model, we can obtain μ i j = E ( y i j ) = E ( β 1 h ( β 2 + α i β 3 t i j ) ) . It is convenient to use the expression α i = τ ξ i , where ξ i N ( 0 , 1 ) . Then, we have
μ i j = E ( y i j ) = E ( β 1 h ( β 2 + τ ξ β 3 t i j ) ) .
In order to estimate parameter θ = ( β 1 , β 2 , β 3 , τ ) , we use the first-step estimation equation
F ( θ ) = i = 1 m ( μ i θ ) ( y i μ i ) = 0 .
The parameter estimate is θ ^ 1 = ( 169.835 , 1.048 , 0.310 , 1.191 ) , which is solved using the Gauss–Seidel iteration method.
The second-step estimation equation is
F ( θ ) = i = 1 m ( μ i θ ) Σ i 1 ( y i μ i ) = 0 ,
where Σ i = V a r ( y i ) , but this value is unknown, so we need an estimate to approximate it. Because the dataset consists of unbalanced data, the estimators of the covariances are obtained component-wise [10]. The estimates are θ ^ 2 = ( 164.828 , 1.428 , 0.427 , 1.933 ) .
Next, we use our method to estimate the parameter σ 2 . In this model, f ( x j , β , α i ) = β 1 1 + e x p ( β 2 + α i β 3 t i j ) ; then, we can obtain the first-step estimator σ 1 2 = 96.352 ( σ = 9.816 ) and second-step estimator σ 2 2 = 173.161 ( σ = 13.159 ) .

6.2. Indomethacin Concentration

Pinheiro and Bates (2000) [17] presented a dataset on the drug indomethacin for six patients. Every patient was injected with intravenous indomethacin before the commencement of the study. The plasma concentration of the indomethacin level (mcg/mL) of the patients was measured 11 times at the following points (hr): t = (0.25, 0.5, 0.75, 1, 1.25, 2, 3, 4, 5, 6, 8). Let y i j , i = 1 , , 6 , j = 1 , , 11 represent the plasma concentration for i t h patients at j t h point. We can plot the concentration change, as shown in Figure 2.
From the plot, we can see that the initial decrease in the plasma concentration of the drug level is dramatic due to the movement of the drug from the body circulation system into the tissue, until an equilibrium is reached. We establish a non-linear mixed-model to describe the change. y i j = β 1 exp ( ( β 2 + α i ) t j ) + ϵ i j , where we assume α i , i = 1 , , 6 to be random effects, which are independent and distributed as N ( 0 , τ 2 ) . ϵ i j are i.i.d random errors distributed as N ( 0 , σ 2 ) , and α i and ϵ i j s are independent of each other. We use our method to estimate the parameter θ = ( β 1 , β 2 , τ ) . The first-step estimate is θ ^ 1 = ( 2.910 , 1.539 , 0.533 ) and the second-step estimate is θ ^ 2 = ( 2.804 , 1.492 , 0.522 ) . The results show that the first-step estimate is comparable to the second-step estimate.
Then, we estimate the parameter σ 2 . In this model, f ( x j , β , α i ) = β 1 exp ( ( β 2 + α i ) t j ) . Then, we can obtain the first-step estimator σ 1 2 = 0.021 and second-step estimator σ 2 2 = 0.021 . The two-step estimator returns the same estimate, so the first-step estimates are the ones to be used.

6.3. Orange Trees

We consider the data on the growth of orange trees over time given in Draper and Smith ([18] Exercise 24.N, p.559), described in [4]. The data are presented in Figure 3 and consist of seven measurements of the trunk circumferences (in millimeters) of five trees on seven occasions.
Each of the five trees was measured at 118, 484, 664, 1004, 1231, 1372, and 1582 days after December 31, 1968, when the study started. Let y i j be the trunk circumferences (in millimeters) for the i t h tree at j t h time. We consider a non-linear model as follows:
y i j = β 1 1 + exp { [ t j / 365.25 ( β 2 + α i ) ] } + ϵ i j , i = 1 , , 5 ; j = 1 , , 7 ,
where t j represents the day corresponding to the j t h measurement; α i , i = 1 , , 5 are independent random effects, identically distributed as N ( 0 , τ 2 ) ; and ϵ i j , i = 1 , , 5 ; j = 1 , , 7 are the random errors, assumed to be independent and distributed as N ( 0 , σ 2 ) . α i and ϵ i j are independent of each other. We use our method to estimate the parameter θ = ( β 1 , β 2 , τ ) . The first-step estimate is θ ^ 1 = ( 204.960 , 2.158 , 0.577 ) , and the second-step estimate is θ ^ 2 = ( 204.960 , 2.159 , 0.591 ) . If the iteration converges, we can obtain the estimate θ ^ = ( 204.961 , 2.169 , 0.673 ) . The results show that the two-step estimate is similar to the converged estimate. We use the second-step estimate for subsequent work. For σ 2 , we can obtain the first-step estimator σ 1 2 = 204.686 ( σ = 14.306 ) and second-step estimator σ 2 2 = 204.579 ( σ = 14.303 ) . It is shown that the two-step estimate is very close.

7. Concluding Remarks

In this paper, we propose a two-step method to estimate the parameters, and we study the asymptotic properties of the estimators. This method is very convenient to use because it depends only on E ( y i j ) and we do not have to know the special distribution of y i j . On the other hand, this method can only estimate the parameters related to E ( y i j ) . Thus, we need some other method to estimate other parameters. Here, we provide a method to estimate the variance σ 2 using the method of moments.
We found that the second-step estimate is sometimes not more efficient than the first-step estimate, or that it demonstrates little improvement. In such a case, we choose to use the first-step estimate, which is simpler and computationally more attractive.
In this paper, the numerical solution is also an important topic. Hence, we will attempt to improve our numerical solution method in future work; see [19,20,21].

Author Contributions

Conceptualization, J.W. and J.J.; methodology, J.W., Y.L. and J.J.; software, J.W.; validation, J.W., Y.L. and J.J.; formal analysis, J.J. and Y.L.; investigation, J.W., Y.L. and J.J.; resources, J.W., Y.L. and J.J.; data curation, J.W. and J.J.; writing—original draft preparation, J.W.; writing—review and editing, Y.L. and J.J.; visualization, J.W.; supervision, Y.L.and J.J.; project administration, J.W., Y.L. and J.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China, Grant No. 2018YFA0703900, and the National Science Foundation of China, Grant No. 11971264.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data included in this study are available upon request from the corresponding author.

Acknowledgments

We are thankful to the reviewers for their constructive comments, which helped us to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Proof of Theorem 1. 
Solution (3) exists and is in Θ if and only if 0 F N ( Θ ) .
For any ϵ > 0 ,
Inequality (5) implies that there is δ > 0 , N 1 > 0 such that P ( d { F N ( θ 0 ) , F N c ( Θ ) } > δ ) > 1 ϵ for large N.
Equation (4) implies that any 0 < ϵ 1 < δ , N ϵ > 0 such that P ( | F N ( θ 0 ) | > ϵ 1 ) < ϵ as N N ϵ . Thus, when N N 1 N ϵ ,
P ( 0 F N ( Θ ) ) P ( 0 F N ( Θ ) , d { F N ( θ 0 ) , F N c ( Θ ) } > δ ) + P ( d { F N ( θ 0 ) , F N c ( Θ ) } δ ) < 2 ϵ
because ϵ is arbitrary, then lim N P ( 0 F N ( Θ ) ) = 0 . Therefore, the solution exists to (3) with a probability tending to one. □

Appendix B

Proof of Theorem 2. 
For any ϵ > 0 ,
by (6), there is Θ 0 Θ , δ 1 > 0 and N δ 1 > 0 such that P ( inf θ Θ 0 | F N ( θ ) F N ( θ 0 ) | > δ 1 ) > 1 ϵ , N N δ 1
by (4), there is Θ 0 Θ , any 0 < ϵ 1 < δ 1 , N ϵ 1 > 0 such that P ( | F N ( θ 0 ) | > ϵ 1 ) < ϵ as N N ϵ 1 .
by Theorem 1, there is N ϵ > 0 such that for N N ϵ , P ( solution ( 3 ) exists ) > 1 ϵ
Then, for δ 1 , N 1 = max { N ϵ 1 , N δ 1 , N ϵ } , N N 1 , such that
P ( θ ˜ N Θ 0 ) P ( θ ˜ N Θ 0 , solution ( 3 ) exists ) + P ( solution ( 3 ) does not exist ) P ( θ ˜ N Θ 0 , inf θ Θ 0 | F N ( θ ) F N ( θ 0 ) | > δ 1 , solution ( 3 ) exists ) + P ( inf θ Θ 0 | F N ( θ ) F N ( θ 0 ) | δ 1 , solution ( 3 ) exists ) + P ( solution ( 3 ) does not exist ) P ( | F N ( θ ˜ N ) F N ( θ 0 ) | > δ 1 , solution ( 3 ) exists ) + P ( inf θ Θ 0 | F N ( θ ) F N ( θ 0 ) | δ 1 ) + P ( solution ( 3 ) does not exist ) 3 ϵ
On the other hand, by (7), there is δ 2 > 0 and N δ 2 > 0 , such that
P ( inf θ Θ 0 , θ θ 0 | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ 2 ) > 1 ϵ , N N δ 2
Then, for any ϵ 2 > ϵ 1 δ 2 , there is N 2 = max { N 1 , N δ 2 , N ϵ } , N N 2 such that
P ( | θ ˜ N θ 0 | ϵ 2 ) P ( | θ ˜ N θ 0 | ϵ 2 , θ ˜ N Θ 0 ) + P ( | θ ˜ N θ 0 | ϵ 2 , θ ˜ N Θ 0 ) P ( θ ˜ N Θ 0 ) + P ( | θ ˜ N θ 0 | ϵ 2 , θ ˜ N Θ 0 , inf θ Θ 0 , θ θ 0 | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ 2 ) + P ( inf θ Θ 0 , θ θ 0 | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | δ 2 ) P ( θ ˜ N Θ 0 ) + P ( | F N ( θ ˜ N ) F N ( θ 0 ) | δ 2 ϵ 2 ) + P ( inf θ Θ 0 , θ θ 0 | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | δ 2 ) 6 ϵ
The result follows because F N ( θ ˜ N ) = 0 with a probability tending to one and by the above argument.
The following lemmas provide sufficient conditions for (4)–(7). Let V N be the covariance matrix of y. Write U N 0 = U N ( θ 0 ) , H N , j , 2 , ε = sup | θ θ 0 | ϵ H N , j , 2 , 1 j r

Appendix C

Lemma A1.
We find that (4) holds provided that, as N , t r ( C N 1 U N 0 T B N V N B N T U N 0 C N 1 ) 0 ( L 1 )
where V N = V a r ( y ) = diag ( V N , 1 , , V N , N ) , V N , i = V a r ( y i ) .
Proof. 
By Chebyshev’s Inequality, we know that
P ( | F N ( θ 0 ) | > ϵ ) E | F N ( θ 0 ) | 2 ϵ 2 = E | C N 1 U N 0 T B N ( y μ N ( θ 0 ) ) | 2 ϵ 2 = t r ( C N 1 U N 0 T B N V N B N T U N 0 C N 1 ) ϵ 2
Then, by (L1), we can obtain P ( | F N ( θ 0 ) | > ϵ ) 0 ( N ) , followed by (4). □
Lemma A2.
Suppose
(1) lim inf N d { 0 , F N c ( Θ ) } > 0 with probability tending to one.
(2) F N ( θ 0 ) 0 in probability as N ;
Then, (5) holds.
Proof. 
For any ϵ > 0 :
By condition (1), there is δ 1 > 0 , N δ 1 > 0 such that P ( d { 0 , F N c ( Θ ) } > δ 1 ) > 1 ϵ 2 , N N δ 1 ,
By condition (2), for any ϵ 1 > 0 ( ϵ 1 δ 1 / 2 ) , there is N ϵ 1 > 0 such that
P ( | F N ( θ 0 ) | < ϵ 1 ) > 1 ϵ / 2 , N N ϵ 1 ;
Then, let δ ( ϵ 1 δ δ 1 / 2 ) , N δ = max { N ϵ 1 , N δ 1 } , as N N δ , by triangle inequality d { F N ( θ 0 ) , F N c ( Θ ) } d { 0 , F N c ( Θ ) } d { 0 , F N ( θ 0 ) } , we have
P ( d { F N ( θ 0 ) , F N c ( Θ ) } > δ ) P ( d { 0 , F N c ( Θ ) } d { 0 , F N ( θ 0 ) } > δ ) P ( { d { 0 , F N c ( Θ ) } > 2 δ } { d { 0 , F N ( θ 0 ) } < δ } ) P ( { d { 0 , F N c ( Θ ) } > 2 δ } ) + P ( { d { 0 , F N ( θ 0 ) } < δ } ) 1 1 ϵ 2 + 1 ϵ 2 1 = 1 ϵ , N N δ
Then, (5) holds. □
Lemma A3.
Suppose that
there are continuous functions f j ( . ) , g j ( . ) ( 1 j r ) , such that
(1) lim inf N min [ | f j ( F N ( θ 0 ) ) | , | g j ( F N ( θ 0 ) ) | ] > 0 ( 1 j r ) with a probability tending to one.
(2) For any ϵ 1 > 0 , such that
lim θ j sup θ Θ P ( | f j ( F N ( θ ) ) | > ϵ 1 ) = 0 , lim θ j sup θ Θ P ( | g j ( F N ( θ ) | > ϵ 1 ) ) = 0 , ( 1 j r )
(3) If, as N , F N ( θ 0 ) is bounded in probability.
Then, there is a compact subset Θ 0 Θ such that (6) holds with Θ N = Θ 0 .
Proof. 
For any ϵ > 0 :
By condition (1), there is δ 1 > 0 , N δ 1 > 0 such that P ( | f j ( F N ( θ 0 ) ) | > δ 1 ) > 1 ϵ / 4 , P ( | g j ( F N ( θ 0 ) ) | > δ 1 ) > 1 ϵ / 4 .
By condition (2), for any ϵ 1 > 0 ( ϵ 1 < δ 1 / 2 ),there is γ 1 > 0 such that P ( | f j ( F N ( θ ) ) | > ϵ 1 ) < ϵ / 4 if θ Θ and θ j < γ 1 uniformly in N; there is γ 2 > 0 such that P ( | g j ( F N ( θ ) ) | > ϵ 1 ) < ϵ / 4 if θ Θ and θ j > γ 2 uniformly in N.
Then, for any ϵ 2 > 0 ( ϵ 1 ϵ 2 δ 1 / 2 ) , there is N ϵ 2 > N δ 1 , such that
P ( | f j ( F N ( θ 0 ) ) f j ( F N ( θ ) ) | > ϵ 2 ) P ( f j ( F N ( θ 0 ) ) > 2 ϵ 2 f j ( F N ( θ ) ) < ϵ 2 ) P ( f j ( F N ( θ 0 ) ) > 2 ϵ 2 ) + P ( ( f j ( F N ( θ ) ) < ϵ 2 ) 1 1 ϵ 4 + 1 ϵ 4 1 = 1 ϵ 2 , N N ϵ 2
if θ Θ and θ j γ 1 . Similarly, we can obtain P ( | g j ( F N ( θ 0 ) ) g j ( F N ( θ ) ) | > ϵ 2 ) > 1 ϵ / 2 , N N ϵ 2 if θ Θ and θ j γ 2 .
By condition (3), there is M 1 > 0 such that P ( | F N ( θ 0 ) | M 1 ) > 1 ϵ / 2 , N 1 , let M 2 = M 1 + 1 ,then P ( | F N ( θ 0 ) | M 2 ) P ( | F N ( θ 0 ) | M 1 ) > 1 ϵ / 2 ,
So for any ϵ 3 ( ϵ 3 < ϵ 2 ) , there is δ > 0 ( δ 1 ) , if | F N ( θ 0 ) F N ( θ ) | < δ , | F N ( θ 0 ) | M 1 . Then, | F N ( θ ) | | F N ( θ 0 ) | < | F N ( θ 0 ) F N ( θ ) | < δ , | F N ( θ ) | < | F N ( θ 0 ) | + δ M 2 ,so | F N ( θ 0 | M 2 , F N ( θ ) M 2 . Furthermore, f j ( . ) , g j ( . ) ( 1 j r ) are continuous functions. If they are uniformly continuous in a compact set, then | f j ( F N ( θ 0 ) ) f j ( F N ( θ ) ) | < ϵ 3 .
Then, there is N δ > 0 ( N δ > N ϵ 2 ) , if N N δ , θ j < γ 1 such that
P ( | F N ( θ 0 ) F N ( θ ) | < δ ) P ( | F N ( θ 0 ) F N ( θ ) | < δ , | F N ( θ 0 ) | M 1 ) + P r ( F N ( θ 0 ) | > M 1 ) P ( | f j ( F N ( θ 0 ) ) f j ( F N ( θ ) ) | < ϵ 3 ) + P ( F N ( θ 0 ) | > M 1 ) ϵ
Similarly, if θ j > γ 2 , then P ( | F N ( θ 0 ) F N ( θ ) | < δ ) < ϵ .
So, for any ϵ > 0 , let compact subset Θ 0 = Θ [ γ 1 , γ 2 ] r . Then, if θ Θ 0 , there is δ > 0 , N δ > 0 , such that P ( | F N ( θ 0 ) F N ( θ ) | > δ ) > 1 ϵ , N N δ . Then, (6) holds. □
Lemma A4.
Suppose F N ( θ ) is continuously differentiable,
(1) lim inf N λ m i n ( U N 0 T B N T U N 0 C 2 U N 0 T B N U N 0 ) > 0
λ m i n means the smallest eigenvalue.
(2) For any ϵ > 0 , lim sup N max j ( H N , j , 2 , ϵ ) 2 λ m i n ( U N 0 T B N T U N 0 C 2 U N 0 T B N U N 0 ) < ( 1 j r )
(3) R N , 1 + A N , 1 2 λ m i n ( U N 0 T B N T U N 0 C 2 U N 0 T B N U N 0 ) = o p ( 1 ) ( N )
where H N , j , 2 , ϵ = sup θ θ 0 < ϵ H N , j , 2 ( θ ) , H N , j , 2 ( θ ) is r × r ,
( H N , j , 2 ) k l = ( c j 1 2 μ N θ j θ k ) T B N ( μ N θ l ) + ( c j 1 2 μ N θ j θ l ) T B N ( μ N θ k ) + ( c j 1 μ N θ j ) T B N ( 2 μ N θ k θ l ) , 1 j r = i = 1 N ( ( c j 1 2 μ N , i θ j θ k ) T B N , i ( μ N , i θ l ) + ( c j 1 2 μ N , i θ j θ l ) T B N , i ( μ N , i θ k ) + ( c j 1 μ N , i θ j ) T B N , i ( 2 μ N , i θ k θ l ) ) , 1 j r . R N , 1 = 1 2 ( θ θ 0 ) T H N , 1 , 1 ( θ 1 ) 1 2 ( θ θ 0 ) T H N , r , 1 ( θ r ) , ( H N , j , 1 ( θ j ) ) k l = ( c j 1 3 μ N θ j θ k θ l ) T B N ( y μ N ( θ j ) ) , 1 j r
θ j lies between θ and θ 0 .
A N , 1 is r × r , the (k,l) element is ( c l 1 2 μ N θ k θ l ) T B N ( y μ N ( θ 0 ) )
(4) Suppose there is a compact set Θ 1 Θ such that d { θ 0 , Θ 1 } > 0 , and δ 2 > 0 ,
N δ 2 > 0 , P ( inf θ Θ 1 | F N ( θ ) F N ( θ 0 ) | > δ 2 ) > 1 ϵ as N N δ 2
Then, there is a compact Θ 0 such that δ > 0 , N δ > 0 ,
P ( inf θ Θ 0 , θ θ 0 | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ ) > 1 ϵ
as N N δ , where Θ 0 is any compact subset of Θ that includes θ 0 as an interior point.
Proof. 
F N ( θ ) = C N 1 U N T ( θ ) B N ( y μ N ( θ ) ) = i = 1 N c 1 1 ( μ N , i θ 1 ) T B N , i ( y i μ N , i ) i = 1 N c r 1 ( μ N , i θ r ) T B N , i ( y i μ N , i )
by F N ( θ ) Taylor expansion,
F N ( θ ) = F N ( θ 0 ) + ( F N θ ) | θ = θ 0 ( θ θ 0 ) + 1 2 ( θ θ 0 ) T H N , 1 ( θ 1 ) 1 2 ( θ θ 0 ) T H N , r ( θ r ) ( θ θ 0 )
where the ( k , l ) element of H N , j ( θ j ) is
( c j 1 3 μ N θ j θ k θ l ) T B N ( y μ N ( θ ) ) ( c j 1 2 μ N θ j θ k ) T B N ( μ N θ l ) ( c j 1 2 μ N θ j θ l ) T B N ( μ N θ k ) ( c j 1 μ N θ j ) T B N ( 2 μ N θ k θ l )
where θ j lies between θ 0 and θ N ( 1 j r ) .
Then, F N ( θ ) F N ( θ 0 ) = ( A N , 1 A N , 2 ) ( θ θ 0 ) + ( R N , 1 R N , 2 ) ( θ θ 0 ) = A N , 2 ( θ θ 0 ) + ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) , where
( A N , 1 ) k l = ( c k 1 2 μ N θ k θ l ) T B N ( y μ N ( θ 0 ) ) , A N , 1 ( r × r ) A N , 2 = C N 1 U N 0 T B N U N 0 R N , 1 = 1 2 ( θ θ 0 ) T H N , 1 , 1 ( θ 1 ) 1 2 ( θ θ 0 ) T H N , r , 1 ( θ r ) , ( H N , j , 1 ) k l = ( c j 1 3 μ N θ j θ k θ l ) T B N ( y μ N ( θ j ) ) , 1 j r R N , 2 = 1 2 ( θ θ 0 ) T H N , 1 , 2 ( θ 1 ) 1 2 ( θ θ 0 ) T H N , r , 2 ( θ r ) , ( H N , j , 2 ) k l = ( c j 1 2 μ N θ j θ k ) T B N ( μ N θ l ) + ( c j 1 2 μ N θ j θ l ) T B N ( μ N θ k ) + ( c j 1 μ N θ j ) T B N ( 2 μ N θ k θ l ) , 1 j r
So
| F N ( θ ) F N ( θ 0 ) | 2 = | ( A N , 2 + ( R N , 2 R N , 1 A N , 1 ) ) ( θ θ 0 ) | 2 = ( θ θ 0 ) T A N , 2 T A N , 2 ( θ θ 0 ) + 2 ( θ θ 0 ) T A N , 2 T ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) + ( θ θ 0 ) T ( R N , 2 R N , 1 A N , 1 ) T ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) ( θ θ 0 ) T A N , 2 T A N , 2 ( θ θ 0 ) 2 | ( θ θ 0 ) T A N , 2 T ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | + ( θ θ 0 ) T ( R N , 2 R N , 1 A N , 1 ) T ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) = | A N , 2 ( θ θ 0 ) | 2 2 | ( θ θ 0 ) T A N , 2 T ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | + | ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | 2 | A N , 2 ( θ θ 0 ) | 2 2 | ( θ θ 0 ) T A N , 2 T | | ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | + | ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | 2 = ( | A N , 2 ( θ θ 0 ) | | ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | ) 2
so, | F N ( θ ) F N ( θ 0 ) | | | A N , 2 ( θ θ 0 ) | | ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | |
Let L N = A N , 2 T A N , 2 = U N 0 T B N T U N 0 C 2 U N 0 T B N U N 0
we have
| A N , 2 ( θ θ 0 ) | 2 λ min ( A N , 2 T A N , 2 ) | θ θ 0 | 2 = λ min ( L N ) | θ θ 0 | 2
| ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | 2 R N , 2 R N , 1 A N , 1 2 | θ θ 0 | 2
by A B A + B , we can obtain
| A N , 2 ( θ θ 0 ) | | ( R N , 2 R N , 1 A N , 1 ) ( θ θ 0 ) | λ min 1 / 2 ( L N ) | θ θ 0 | R N , 2 R N , 1 A N , 1 | θ θ 0 | = ( λ min 1 / 2 ( L N ) R N , 2 R N , 1 A N , 1 ) | θ θ 0 | = ( 1 R N , 2 R N , 1 A N , 1 λ min 1 / 2 ( L N ) ) λ min 1 / 2 ( L N ) | θ θ 0 | ( 1 ( R N , 2 + ( R N , 1 + A N , 1 ) ) λ min 1 / 2 ( L N ) ) λ min 1 / 2 ( L N ) | θ θ 0 | = ( 1 ( R N , 2 2 λ min ( L N ) + R N , 1 + A N , 1 2 λ min ( L N ) ) ) λ min 1 / 2 ( L N ) | θ θ 0 |
R N , 2 2 j = 1 r | 1 2 ( θ θ 0 ) T H N , j , 2 ( θ j ) | 2 1 4 j = 1 r ( ( | θ θ 0 ) | H N , j , 2 ( θ j ) ) 2 1 4 | θ θ 0 | 2 j = 1 r ( H N , j , 2 , ε ) 2 r 4 | θ θ 0 | 2 max j ( H N , j , 2 , ε ) 2
By (2), there is M 1 > 0 , δ > 0 ( δ = 1 r M 1 ) , as | θ θ 0 | < δ , such that R N , 2 2 λ min ( L N ) r 4 | θ θ 0 | 2 max j ( H N , j , 2 , ε ) 2 λ min ( L N ) r 4 δ 2 M 1 , let r 4 δ 2 M 1 = ϵ 1 ; therefore,
| F N ( θ ) F N ( θ 0 ) | ( 1 ( R N , 2 2 λ min ( L N ) + R N , 1 + A N , 1 2 λ min ( L N ) ) ) λ min 1 / 2 ( L N ) | θ θ 0 | ( 1 ( ϵ 1 + R N , 1 + A N , 1 2 λ min ( L N ) ) ) λ min 1 / 2 ( L N ) | θ θ 0 |
For any ϵ > 0 , by condition (3), for any ϵ 2 > 0 ( ϵ 2 < 1 4 ) there is N 1 > 0 , P ( R N , 1 + A N , 1 2 λ min ( L N ) < ϵ 2 ) > 1 ϵ as N > N 1 .
So, there is 0 < δ 1 < ( 1 ( ϵ 1 + ϵ 2 ) ) λ min ( L N ) , N δ 1 N 1 , as N N δ 1 , such that
P ( | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ 1 ) P ( ( 1 ( ϵ 1 + R N , 1 + A N , 1 2 λ min ( L N ) ) ) λ min 1 / 2 ( L N ) > δ 1 ) P ( R N , 1 + A N , 1 2 λ min ( L N ) λ min 1 / 2 ( L N ) < 1 ϵ 1 δ 1 λ min 1 / 2 ( L N ) ) P ( R N , 1 + A N , 1 2 λ min ( L N ) < ϵ 2 ) ( 1 ϵ )
So, for any ϵ > 0 , there is δ > 0 , δ 1 > 0 , N δ 1 > 0 , as N N δ 1 , such that P ( | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ 1 ) > ( 1 ϵ ) , | θ θ 0 | < δ
Suppose Θ 0 is a compact subset that includes θ 0 as an interior point, as say, there is D > 0 , Θ 0 = { θ : | θ θ 0 | D }
Let Θ 1 = { θ : δ | θ θ 0 | D } , when d { θ 0 , Θ 1 } > 0 then there is δ 2 > 0 , N δ 2 > 0 , as N N δ 2 , such that P ( inf θ Θ 1 | F N ( θ ) F N ( θ 0 ) | > δ 2 ) > 1 ϵ
Let δ 3 < δ 2 D , there is N δ 3 N δ 2 , as N N δ 3 such that
P ( | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ 3 ) P ( | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ 3 , θ Θ 1 ) P ( D | F N ( θ ) F N ( θ 0 ) | > δ 3 ) P ( | F N ( θ ) F N ( θ 0 ) | > δ 2 ) P ( inf θ Θ 1 | F N ( θ ) F N ( θ 0 ) | > δ 2 ) > 1 ϵ
Then, there is a compact Θ 0 such that δ > 0 , N δ > 0 ,
P ( inf θ Θ 0 , θ θ 0 | F N ( θ ) F N ( θ 0 ) | | θ θ 0 | > δ ) > 1 ϵ
as N N δ , where Θ 0 is any compact subset of Θ that includes θ 0 as an interior point. □

Appendix D

Proof of Theorem 3. 
By the Taylor expansion, it is easy to show that
F N ( θ ) = F N ( θ 0 ) + ( F N θ ) | θ = θ 0 ( θ θ 0 ) + 1 2 ( θ θ 0 ) T H N , 1 ( θ 1 ) 1 2 ( θ θ 0 ) T H N , r ( θ r ) ( θ θ 0 )
Take θ ˜ N into the equation,
0 = F N ( θ ˜ N ) = F N ( θ 0 ) + ( F N θ ) | θ = θ 0 ( θ ˜ N θ 0 ) + 1 2 ( θ ˜ N θ 0 ) T H N , 1 ( θ 1 ) 1 2 ( θ ˜ N θ 0 ) T H N , r ( θ r ) ( θ ˜ N θ 0 )
Now, we have
0 = F N ( θ 0 ) + ( A N , 1 A N , 2 ) ( θ ˜ θ 0 ) + ( R N , 1 R N , 2 ) ( θ ˜ θ 0 )
where
( A N , 1 ) i j = ( c i 1 2 μ N θ i θ j ) T B N ( y μ N ( θ 0 ) ) , A N , 1 ( r × r ) A N , 2 = C N 0 1 U N 0 T B N U N R N , 1 = 1 2 ( θ ˜ θ 0 ) T H N , 1 , 1 ( θ 1 ) 1 2 ( θ ˜ θ 0 ) T H N , r , 1 ( θ r ) , ( H N , j , 1 ) k l = ( c j 1 3 μ N θ j θ k θ l ) T B N ( y μ N ( θ ) ) , 1 j r R N , 2 = 1 2 ( θ ˜ θ 0 ) T H N , 1 , 2 ( θ 1 ) 1 2 ( θ ˜ θ 0 ) T H N , r , 2 ( θ r ) , ( H N , j , 2 ) k l = ( c j 1 2 μ N θ j θ k ) T B N ( μ N θ l ) + ( c j 1 2 μ N θ j θ l ) T B N ( μ N θ k ) + ( c j 1 μ N θ j ) T B N ( 2 μ N θ k θ l ) , ( 1 j r )
F N ( θ 0 ) + A N , 1 ( θ ˜ N θ 0 ) + R N , 1 ( θ ˜ N θ 0 ) = A N , 2 ( θ ˜ N θ 0 ) + R N , 2 ( θ ˜ N θ 0 )
Write W N = C N 1 U N 0 T B N V N B N T U N 0 C N 1 . Then,
W N 1 2 ( F N ( θ 0 ) + A N , 1 ( θ ˜ N θ 0 ) + R N , 1 ( θ ˜ N θ 0 ) ) = W N 1 2 ( A N , 2 ( θ ˜ N θ 0 ) + R N , 2 ( θ ˜ N θ 0 ) )
by (iv)(v)(vi),
W N 1 2 ( F N ( θ 0 ) ) N ( 0 , I r ) , W N 1 2 A N , 1 ( θ ˜ N θ 0 ) = o p ( 1 ) , W N 1 2 R N , 1 ( θ ˜ N θ 0 ) = o p ( 1 )
so
W N 1 2 ( C N 1 U N T B N U N ( θ ˜ N θ 0 ) + R N , 2 ( θ ˜ N θ 0 ) ) N ( 0 , I r ) W N 1 2 ( C N 1 U N T B N U N + R N , 2 ) ( θ ˜ N θ 0 ) ) N ( 0 , I r )
in distribution. Furthermore, we have
W N 1 2 ( C N 1 U N T B N U N + R N , 2 ) = ( I r + K N ) ( W N 1 2 C N 1 U N T B N U N )
where
K N = W N 1 2 R N , 2 ( W N 1 2 C N 1 U N 0 T B N U N 0 ) 1
W N 1 2 = ( λ m i n ( W N ) ) 1 2 = ( λ m i n ( C N 1 U N 0 T B N V N B N T U N 0 C N 1 ) ) 1 2 ( W N 1 2 C N 1 U N 0 T B N U N 0 ) 1 = ( λ m i n ( U N 0 T B N T U N ( U N 0 T B N V N B N T U N 0 ) 1 U N T B N U N 0 ) ) 1 2
so
K N W N 1 2 R N , 2 ( W N 1 2 C N 1 U N 0 T B N U N 0 ) 1 = ( λ N , 1 λ N , 2 ) 1 2 R N , 2 R N , 2 2 = j = 1 r ( 1 2 ( θ ˜ N θ 0 ) T H N , j , 2 ( θ j ) ) 2 1 4 j = 1 r ( ( θ ˜ N θ 0 H N , j , 2 ( θ j ) ) 2 1 4 | θ ˜ N θ 0 | 2 j = 1 r ( H N , j , 2 ( θ j ) ) 2 1 4 | θ ˜ N θ 0 | 2 j = 1 r ( H N , j , 2 , ε ( θ j ) ) 2 r 4 | θ ˜ N θ 0 | 2 max j ( H N , j , 2 , ε ( θ j ) ) 2
then
K N W N 1 2 R N , 2 ( W N 1 2 C N 1 U N 0 T B N U N 0 ) 1 r 2 | θ ˜ N θ 0 ) | ( λ N , 1 λ N , 2 ) 1 2 max j ( H N , j , 2 , ε ( θ j ) ) 0
( I r + K N ) 1 ( W N 1 2 ( C N 1 U N T B N U N + R 2 ) ) ( θ ˜ N θ 0 ) = ( W N 1 2 C N 1 U N T B N U N ) ( θ ˜ N θ 0 )
( I r + K N ) 1 1 in probability, ( W N 1 2 ( C N 1 U N T B N U N + R N , 2 ) ) ( θ ˜ N θ 0 ) N ( 0 , I r ) in distribution, so
( W N 1 2 C N 1 U N T B N U N ) ( θ ˜ N θ 0 ) N ( 0 , I r )
Then, θ ˜ N is an asymptotically normal with mean θ 0 and asymptotic covariance matrix
( U N 0 T B N T U N 0 ) 1 ( U N 0 T B N V N B N T U N 0 ) ( U N 0 T B N U N 0 ) 1

References

  1. FDA US. Guidance for Industry: Population Pharmacokinetics; FDA: Rockville, MD, USA, 1999. [Google Scholar]
  2. Jiang, J.; Ge, Z. Mixed models: An overview. In Frontiers of Statistics in Honor of Professor Peter J. Bickel’s 65th Birthday; Fan, J., Koul, H., Eds.; Imperial College Press: London, UK, 2006; pp. 445–466. [Google Scholar]
  3. Gelman, A.; Carlin, J.; Stern, H.; Rubin, D. Bayesian Data Analysis, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
  4. Lindstrom, M.; Bates, D. Nonlinear mixed effects models for repeated measures data. Biometrics 1990, 46, 673–687. [Google Scholar] [CrossRef] [PubMed]
  5. Pinheiro, J.; Bates, D. Approximations to the Log-likelihood function in Nonlinear Mixed Effects Models. J. Comput. Graph. Stat. 1995, 4, 12–35. [Google Scholar]
  6. Geweke, J. Bayesian Inference in Econmetric Models Using MonteCarlo Integration. Econometrica 1989, 57, 1317–1339. [Google Scholar] [CrossRef]
  7. Davidian, M.; Gallant, A.R. Smooth Nonparametric Maximum Likelihood Estimation for Population Pharmacokinetics, with Application to Quinidine. J. Pharmacokinet. Biopharm. 1992, 20, 529–556. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Jiang, J. Linear and Generalized Linear Mixed Models and Their Applications; Springer: New York, NY, USA, 2007. [Google Scholar]
  9. Jiang, J.; Nguyen, T. Linear and Generalized Linear Mixed Models and Their Applications, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
  10. Jiang, J.; Luan, Y.; Wang, Y.G. Iterative Estimating Equations: Linear Convergence and Asymptotic Properties. Ann. Stat. 2007, 35, 2233–2260. [Google Scholar] [CrossRef]
  11. Jiang, J. Asymptotic Analysis of Mixed Effects Models. Theory, Applications, and Open Problems; Chapman & Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
  12. McCullagh, P. ; Nelder, J A. Generalised Linear Modelling; Chapman and Hall: New York, NY, USA, 1989. [Google Scholar]
  13. Jiang, J. A nonlinear Gauss–Seidel algorithm for inference about GLMM. Comput. Stat. 2000, 15, 229–241. [Google Scholar] [CrossRef]
  14. Jiang, J.; Zhang, W. Robust estimation in generalised linear mixed models. Biometrika 2001, 88, 753–765. [Google Scholar] [CrossRef]
  15. Stuart, H.C.; Reed, R.B. Longitudinal studies of child health and development, Harvard School of Public Health, Series II, No. 1, Description of project. Pediatrics 1929, 24, 875–885. [Google Scholar] [CrossRef]
  16. Demidenko, E. Mixed Models: Theory and Applications with R; John Wiley & Sons: New York, NY, USA, 2013. [Google Scholar]
  17. Pinheiro, J.; Bates, D. Mixed-Effects Models in S and S-PLUS; Statistics and Computing Series; Springer: New York, NY, USA, 2000. [Google Scholar]
  18. Draper, N.R.; Smith, H. Applied Regression Analysis, 3rd ed.; Wiley: New York, NY, USA, 1998. [Google Scholar]
  19. Qalandarov, A.A.; Khaldjigitov, A.A. Mathematical and numerical modeling of the coupled dynamic thermoelastic problems for isotropic bodies. TWMS J. Pure Appl. Math. 2020, 11, 119–126. [Google Scholar]
  20. Shokri, A.; Saadat, H. Trigonometrically fitted high-order predictor–corrector method with phase-lag of order infinity for the numerical solution of radial Schrödinger equation. J. Math. Chem. 2014, 52, 1870–1894. [Google Scholar] [CrossRef]
  21. Shokri, A.; Saadat, H. P-stability, TF and VSDPL technique in Obrechkoff methods for the numerical solution of the Schrodinger equation. Bull. Iran. Math. Soc. 2016, 42, 687–706. [Google Scholar]
Figure 1. The height of girls aged from 7 to 18.
Figure 1. The height of girls aged from 7 to 18.
Mathematics 10 04547 g001
Figure 2. Indomethicin concentration (mcg/mL) of six individuals measured 11 times after injection.
Figure 2. Indomethicin concentration (mcg/mL) of six individuals measured 11 times after injection.
Mathematics 10 04547 g002
Figure 3. Trunk circumference (in millimeters) of five orange trees.
Figure 3. Trunk circumference (in millimeters) of five orange trees.
Mathematics 10 04547 g003
Table 1. Simulation result: non-linear model.
Table 1. Simulation result: non-linear model.
β = 1 τ = 1 Overall
EstimatorMeanBiasSDMeanBiasSDMSE
1st-step−1.0006−0.00060.03210.9891−0.01090.06440.0053
2nd-step−0.99930.00070.01840.9849−0.01510.06310.0046
GEE−0.99920.00080.01820.9961−0.00390.06220.0042
SD, standard deviation. Overall MSE represents the mean squared error of the estimator of β + the mean squared error of the estimator of τ.
Table 2. Simulation estimation result.
Table 2. Simulation estimation result.
β 1 = 2 β 2 = 1 τ = 1 Overall
EstimatorMeanBiasSDMeanBiasSDMeanBiasSDMSE
1st-step1.9804−0.01960.04040.9463−0.05370.10150.9421−0.05790.09740.0280
2nd-step1.9985−0.00150.04290.9914−0.00860.10800.9835−0.01650.10320.0245
GEE1.9989−0.00110.04280.9933−0.00670.10820.9854−0.01460.10330.0245
SD, standard deviation. Overall MSE represents the mean squared error of the estimator of β1 + the mean squared error of the estimator of β2 + the mean squared error of the estimator of τ.
Table 3. Estimate σ 2 .
Table 3. Estimate σ 2 .
σ 2 = 1 meanvarmse% of Convergence
1st-step1.10190.02770.038046.247
2nd-step1.11060.03980.052042.857
GEE1.10280.03000.040544.794
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, J.; Luan, Y.; Jiang, J. A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models. Mathematics 2022, 10, 4547. https://doi.org/10.3390/math10234547

AMA Style

Wang J, Luan Y, Jiang J. A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models. Mathematics. 2022; 10(23):4547. https://doi.org/10.3390/math10234547

Chicago/Turabian Style

Wang, Jianling, Yihui Luan, and Jiming Jiang. 2022. "A Two-Step Method of Estimation for Non-Linear Mixed-Effects Models" Mathematics 10, no. 23: 4547. https://doi.org/10.3390/math10234547

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop