1. Introduction
In previous works [
1,
2,
3], we described the individual growth of animals subject to random fluctuations in the environment and study estimation, prediction, and optimization problems with applications to cattle weight data. Considering
M animals, we used the following general SDE model:
where
is the modified weight by a transformation function
h, a known monotonous continuously differentiable function of the real weight
of the animal
i at age
t, and
, where
is the assumed known size of animal
i at an initial age of observation
. The parameter
is the growth coefficient, and
is the mean asymptotic modified size towards which the mean modified size converges as
; we denote the corresponding real asymptotic size by
. The intensity of the effect of environmental random fluctuations on growth is measured by the parameter
, being
independent realizations of the standard Wiener process. A transformation of the size leading to more general models was also suggested by Reference [
4] with applications to tree growth. Adequate choices of the
h function lead to stochastic versions of well-known growth models. For instance, the monomolecular model corresponds to
, the Bertallanfy-Richards model corresponds to
(with
), the Gompertz model corresponds to
, and the logistic model corresponds to
, but the theoretical treatment is valid for any monotonous
function
h. Here, for comparison purposes, we will illustrate with
, corresponding to the stochastic Gompertz model.
It is natural to think that the model parameters may vary from animal to animal, and so SDE models with fixed parameters as the one presented in (
1) may not be suitable models for these applications. Parameter estimation for models where the parameters are considered random, known as mixed models or mixed-effects models, is presented in References [
1,
5,
6,
7,
8,
9,
10]. A review on the asymptotic inference of SDE mixed models can be found in Reference [
11].
For example, in References [
1,
7], the mixed model considers that different individuals may have different values of
A and, consequently, different values of
, i.e., the case where the average asymptotic weight varies randomly from animal to animal has been considered. In this particular case, in Reference [
1], it was considered that
was a random variable, independent of
, with a Gaussian distribution with mean
and variance
, and the maximum likelihood estimation method was applied to estimate the model parameters. In this case, the likelihood function can be explicitly obtained, but it is extremely difficult or impossible to obtain a closed-form expression for the likelihood function in other cases.
More recently, we considered either
or
random, following a Gaussian distribution, and, for these cases, we solved the integral that appears in the likelihood function through approximation methodologies, such as the Laplace and the delta approximation methods (to appear in a forthcoming paper [
12]). In References [
8,
10], for the general case where it is not possible to obtain a closed-form expression for the likelihood function, as is the case of random
, a numerical approximation based on an Hermite expansion was applied, whereas, in Reference [
9], in addition to an Hermite expansion, a Gauss-Hermite quadrature was also applied, and the parameters of the SDE mixed model were estimated by the maximum likelihood method. In References [
5,
6], for mixed-models with linear drift term, when a closed-form expression for the likelihood function is not possible, a different approximation technique is used, based on a discretized version of the continuous-time data likelihood function.
In this work, we consider that both parameters
and
are random variables, and our main contribution is to extend our delta approximation method to this mixed model with two random parameters and compare its performance with other previously proposed estimation methods. The delta approximation method is inspired on the classical statistical delta method, which is properly adapted to serve a quite different purpose. The delta approximation method allows us to approximate the parameter estimates, through numerical maximization of the approximate likelihood function. This approximation methodology, to the best of our knowledge, is the first method to derive simple closed-form expressions for the approximated likelihood function when both
and
are random, allowing anyone to use this method. Notice also that the existing methods for SDE mixed models [
5,
6,
8], when it comes to their numerical implementations, assume that the age vector of the observations is the same for all trajectories and, in some cases, even require equidistant ages of observation. When using real data of cattle weights, these assumptions are not adequate, since the animals are not weighed at the same time instants (ages), nor even with the same elapsed times between weighings. Our proposed methodology allows parameter estimation in scenarios, quite common in applications, where such restrictions on the data structure are not satisfied.
In order to compare our method with existing ones, particularly with the one provided by the
MsdeParEst R package (see References [
5,
6,
13]), we worked with simulated cattle weight data with 50, 500, and 5000 animals with the same age vector and with consecutive weights taken at equidistant intervals. We also estimated the model parameters with our method using a real dataset of 16,029 animals with very heterogeneous ages of observations.
This paper is organized as follows. In
Section 2, we present the SDE models with fixed parameters, their main properties, and the respective likelihood function, proceeding to the extension to mixed stochastic differential equations models. In
Section 3, we develop the delta approximation applied to the likelihood function for the case where both parameters
and
are independent Gaussian distributed random variables. In
Section 4, we present the application for both simulated and real datasets and, whenever possible, compare the results with the existing methods. Based on those results, we present practical recommendations on how to deal with real datasets on
Section 5 and end with the main conclusions in
Section 6.
2. Stochastic Differential Equations Models
Considering data from
M individuals, we will denote the size (some measure of weight, volume, height, length, etc.) at age
t of the
th individual (
) by
. If the individual is growing in a randomly fluctuating environment, working with a modified size
, where the transformation
h is a monotonous continuously differentiable function, we can describe the evolution of individual growth through an SDE of the form (
1). This is a variant of the Ornstein–Uhlenbeck model, also called the Vasicek model in the context of interest rate dynamics [
14]. The model solution
is a homogeneous diffusion process with drift coefficient
and diffusion coefficient
, given by
(see, for instance, Reference [
15]).
Let
(
,
) be the age of the
th observation of individual number
i and let
be the corresponding modified weight according to model (
1). For each individual
i (
), denote its age vector of observations (which may differ from individual to individual) by
, the corresponding vector of modified sizes by
, and the observed value of
by
. We assume
and make
. We see that, for
conditioned on
, the transition distribution for animal
i is Gaussian:
In Reference [
1], we applied the maximum likelihood estimation method to estimate the parameter vector
. From (
3), using the fact that
is a Markov process, we know that, given
(assumed known), the
joint probability density function for individual
i is given by the product of the transition densities between consecutive observation times of animal
i; thus, it takes the form
By independence among individuals, we obtain the likelihood function for the
M animals:
The maximum likelihood estimate of the parameter vector
is obtained by maximization of (
5) or of the log-likelihood function
.
The maximum likelihood estimators are asymptotically Gaussian with mean vector
and variance-covariance matrix
, where
is the Fisher information matrix with elements given by
. We can estimate
by the inverse of the empirical information matrix
, with elements
. From these values, we can obtain the approximate confidence bands for the parameters.
For these type of models, in terms of estimation methods, we also developed and applied parametric and non-parametric bootstrap methods [
1,
3]. Since the asymptotic confidence intervals obtained from the empirical Fisher information matrix may be quite unreliable for small sample sizes, the bootstrap methods can be used in such cases.
In References [
1,
16], non-parametric estimation methods were developed in order to estimate the drift and diffusion coefficients of a stochastic differential equation model for the case of non-equidistant data. For our application on cattle weight data, we had been working with models with specific functional forms for the drift and the diffusion coefficients, for example, in the case of model (
1), with a drift coefficient of linear form
and a constant diffusion coefficient
. These non-parametric methods are useful to assess whether our specific choice of functional forms is appropriate for our data or whether some alternative functional forms for these coefficients are suggested.
Recently, weighted maximum likelihood estimation methods were studied and adapted to overcome one very common limitation in the cattle weight data applications, related to the fact that animals are usually not weighed very frequently, and a scarce number of weight observations exists for older ages. In the weighted maximum likelihood estimation method, the weights are built such that the times elapsed between consecutive observations are considered in the likelihood function [
3].
We described the general SDE model (
1) for the complete growth curve of the animals where the model’s parameters
,
, and
are assumed common to all individuals. Here, we consider a different generalization to account for the fact that it is natural to think that different animals, due to their specific genetics and other characteristics, may have different values of the parameters. So, in this paper, we will consider the situation where different individuals may have different randomly assigned parameters.
In References [
1,
5,
6,
9,
10,
17], it has been shown that, to consider at least one of the two parameters of the drift term,
or
, as random variables, the likelihood function can be obtained from the transition density function conditioned on the respective random parameter.
We now briefly review how to do it in a more general setting. Let
be the
d-dimensional vector of parameters that vary randomly among animals and assume that the distribution of
among animals has probability density function (p.d.f.)
, where
is the parameter vector that characterizes this distribution and needs to be estimated. Assuming independence among the animals, the
M parameter vectors
of the different animals
i (
) are independent identically distributed random variables with common p.d.f.
and assume the
(
) are also independent of the Wiener processes that characterize the environmental conditions under which the animals are growing. Let
be the vector of the remaining model parameters (the ones not involved in
), assumed to be common to all animals. The likelihood function for
M trajectories (animals) is given by
The case of a single random parameter
has already been studied for when
. When we have the special situation of a time vector of observations
,
common to all animals, we can see it in Reference [
8] (for the particular situation of
and uniform time spacing
) and in References [
6,
7]. The general situation, with no such restrictions, can be seen in Reference [
1]. In this case, it is possible to explicitly compute the integral in the likelihood function, resulting in a final closed-form expression for this function. This is shown in Reference [
1], where the log-likelihood function for all animals
is given by
with
. However, despite the existence of a closed-form expression for the likelihood function for this particular case, we also applied approximation methods (Laplace and delta approximations—to appear in a forthcoming paper [
12]), showing that the approximation methods also provide very good results when compared with the exact method.
Unfortunately, the integral in (
7) does not always have a closed-form expression, for example, when the random parameter is
, corresponding to
, and, in such cases, the approximation methods are good alternatives. In the forthcoming paper [
12]), we used the Laplace and delta approximations to obtain closed-form approximate expressions of the likelihood function for this case, also with very good results.
In the following section, we present the study of the case where both
and
are considered random variables and propose the application of the delta approximation method to the integral in (
7).
4. Results
The main interest with the methodology proposed in the previous sections is to have a reliable estimation method that can be applied to real animal weight data, where the animals’ weights are often not taken at the same age instants.
In our application, we worked with real cattle weight data provided by the Associação de Criadores de Bovinos Mertolengos (ACBM), which performs the growing and finishing phases of young Mertolengo males, and by associated breeders, from the Alentejo region in Portugal. The available dataset contains a total of 96,204 observations of the weight (in kg) of 16,029 Mertolengo cattle males, where each animal has several observations with a minimum of 3 and a maximum of 33 weights at ages varying from birth until a maximum age that ranges between 0.2 and 16 years old. This is a case where, indeed, each animal has its weight measurements taken at different (and even non-equidistant) age instants, varying from animal to animal.
We obtained the estimation results implementing the proposed delta approximation method (DA) in the software R project [
18]. However, since it is a new method, we first will compare the results obtained with this DA method with the estimation methods available in the R package
MsdeParEst described in Reference [
13], which we will call MPE methods. The package includes, when closed-form likelihood function expressions are not possible, techniques based on a discretized version of the continuous-time data maximum likelihood developed in References [
5,
6]. The main drawback of this package (and, to the best of our knowledge, of other similar available R packages [
19,
20]) is that all observations must be measured at the same age instants for all animals, and, in some cases, the age instants are even required to be equidistant. Therefore, in order to compare the performance of the proposed approximation method DA with the methods available in the literature (in particular, the MPE method used here), we would also need datasets where weights are observed at the same equidistant age instants for all the animals. Whenever possible, we will also compare the DA and MPE methods, which consider approximations for the likelihood function, with existing estimation methods that use exact closed-form expressions for the likelihood function, in particular, when the likelihood function assumes both parameters fixed or Non-Mixed (which we will call the NMSDE method, presented in Reference [
1]) or, if appropriate, when the likelihood function assumes just the parameter
as random (which we will call the Exact(
) method, presented in Reference [
1]).
For this purpose, we worked with simulated datasets of equidistant monthly weights for M animals, since birth until four years of age, totaling M*49 observations.
The animal’s weights were simulated based on the stochastic Gompertz model (). We simulated four datasets:
: Mixed SDE model with random independent and , where and with ( kg), , , , and with fixed parameter ;
: Mixed SDE model with random with ( kg) and , and with fixed parameters and ;
: Mixed SDE model with random with and , and with fixed parameters ( kg) and ;
: Non-mixed SDE model (
1) with fixed parameters
(
kg),
and
.
Each of the simulated datasets based on a Mixed SDE model were obtained in the following way. First, the
M random parameter values
(
) of the
M animals were simulated based on the Gaussian distribution
; then, for each animal
i (
), the simulated values were incorporated in the transition density (
3) as the true values of
and/or
for that animal. Then, we simulated the weights of each animal based on the Markov property and the use of the transition densities between consecutive observation ages. For the non-mixed or fixed parameters model, the simulation of the dataset is obtained without simulating the parameter values of
and
since they are fixed.
The advantage of using simulated data, besides the possibility of comparing the different methods due to its common age vector of observations, is the knowledge of the true parameter values, which allows us to compare the performances of the different methods when they can be applied. We first consider simulated datasets for weights of animals to evaluate the behavior of the methods for a large number of trajectories close to an asymptotic regimen. However, in applications, we usually do not have datasets with so many animals available, and it is important to evaluate the performance of the different methods for smaller datasets. For this reason, we also present a comparison of the main methods, considering datasets with and animals. To distinguish them, we index the simulated datasets by the number of animals M.
Table 1,
Table 2,
Table 3 and
Table 4 presents the results for each of the datasets,
,
,
, and
, simulated under the four stochastic Gompertz mixed and fixed models. In each table, we present the results obtained using the different DA methods, DA(
) (the one that assumes both parameters
and
random and is described in
Section 3), and DA(
) and DA(
) (the ones that assume just
random or just
random as described in Reference [
12]), as well as the estimates obtained by the different MPE methods, MPE(
), MPE(
), and MPE(
) (again, when we consider random both or just one of the parameters). We also present, for comparison purposes, the results obtained when using the NMSDE estimation method (which assumes both parameters to be fixed).
In this way, we can assess what happens when an appropriate estimation method is applied (for instance, when applied to the dataset, the estimation method MPE() and the estimation method DA() are appropriate) and when an inappropriate estimation method is applied (for instance, when applied to the dataset, the MPE() method is inappropriate since it cannot estimate the parameters involved in the random ); this may be important since, in real non-simulated data, we may not know which parameters are random or not. Of course, DA() and MPE() are appropriate for all datasets, but, when the data has one or both parameters as non-random, they are overparametrized and, therefore, may not be as accurate as non-overparametrized methods.
In
Table 1,
Table 2,
Table 3 and
Table 4, we have also underlined the headings of the estimation methods that are appropriate to analyze the corresponding dataset, in the sense of being able to estimate all the parameters of the model used to simulate the dataset. Appropriate methods include the ones that use, in the likelihood function or its approximation, the same random parameters that were used in the dataset generation (these would be the most appropriate methods when we know beforehand which parameters are random) but include, as well, overparametrized methods that also use additional random parameters in the likelihood function or its approximation.
The maximum likelihood estimates and the approximate 95% confidence bands based on the inverse of the empirical Fisher information matrix are presented only for the DA methods and the NMSDE method but not for the MPE methods since the confidence intervals of the estimates are not provided by the R package MsdeParEst. Instead of presenting the results for the overall mean of the modified asymptotic weight (which is = when is a fixed parameter), we present them in terms of the corresponding actual weight value .
The tables also display the log-likelihood function values computed at the estimated parameter values. We display the log-likelihood values in terms of the actual weights X, which can be easily obtained by a simple conversion using the function h from the corresponding log-likelihood values (expressed in terms of the modified weights Y we have been working with so far in all computations). However, although the displayed log-likelihood values use the exact log-likelihood known expressions for the NMSDE and the Exact() methods, for the DA methods, they use the approximate log-likelihood expressions of those methods, and the same is possibly happening with the values delivered by the MsdeParEst Package for the MPE methods. Since these approximations involve some incomparable errors, the displayed log-likelihood values should, with rare exceptions, not be used for comparisons in which an approximate method is involved.
When considering, in
Table 1, the
dataset, the appropriate estimation methods are the ones assuming
and
random, the DA(
) method we have proposed, and the MPE(
) method. We can observe that the DA(
) method provides a lower bias for all parameters than the MPE(
) method, noticing that, in the MPE(
) method, the estimate of
is extremely biased, close to zero. The use of an inappropriate method should obviously be avoided, but, when using real data, we often do not know beforehand exactly which are the random parameters and may make a wrong assumption, leading to the use of an inappropriate estimation method. These simulated datasets, in which we know exactly which parameters are random, give us the opportunity to assess the consequences of a wrong assumption on the random parameters when dealing with real data. Let us look at this issue, beyond the obvious fact that an inappropriate method cannot estimate some of the parameters. When comparing the appropriate DA(
) method with the inappropriate (for this dataset) NMSDE method, the former method leads to less biased estimates. When comparing the appropriate DA(
) with the inappropriate (for this dataset) DA(
) and DA(
) methods, the inappropriate methods estimate the standard deviation of their own random parameter (respectively, the standard deviation
of the random
and the standard deviation
of the random
) better than the appropriate method, which underestimates both standard deviations. On the estimation of the noise intensity parameter
, the appropriate DA(
) method shows the better performance, and it also outperforms the inappropriate DA(
) method on the estimation of the means
and
, but these means are curiously better estimated by the inappropriate DA(
) method. As for the inappropriate MPE(
) and MPE(
) methods, besides the obvious impossibility of estimating one parameter, they have a behavior that is not too different from the MPE(
) appropriate method.
Table 2 shows the results for the
dataset. For this case, we included the Exact(
) method, where the maximum likelihood is obtained exactly by the closed-form expression (
8), and that is certainly the best choice method for this dataset. The methods DA(
) and DA(
) are both appropriate for this dataset, but the first is overparametrized, while the latter is the most appropriate of the two, and we expect it to be more accurate. Still, it is interesting to notice that these two methods provide exactly the same parameter estimates, except, naturally, for
(“parameter” obviously out of the DA(
) method). Comparing the MPE(
) with the MPE(
) methods, they do not give exactly the same parameter estimates, but their estimates are very close. As expected, the most appropriate method for this dataset, the Exact(
) method, provides slightly better estimates than the DA(
) and the DA(
) methods, and than the MPE(
) and the MPE(
) methods. The estimates for
and
are better when using the appropriate MPE methods than the ones obtained with the corresponding DA methods, but the reverse happens for the estimates of the mean
and the standard deviations
of parameter
. The inappropriate methods DA(
) and MPE(
) give worse estimates than the appropriate methods of the same type. The use of the inappropriate (for this data) NMSDE method performs quite badly in the estimation of
. Notice that the dataset was simulated for fixed
, i.e,
, and this is better captured by the DA method.
Since the values for the approximate methods are only approximations, we cannot perform a likelihood ratio test for the randomness of the parameter (null hypothesis corresponding to a non-random ). However, if one knows (or assumes) that is fixed, one can perform a likelihood-ratio test on whether is fixed or random (null hypothesis corresponding to a fixed ) by using the log-likelihood values of the NMSDE method and the Exact() method, which are exact values; the result is the rejection of the null hypothesis at the usual significance levels (p-value ).
For the case of the
dataset,
Table 3 shows that, again, the appropriate (for this dataset) DA(
) (which is overparametrized) and DA(
) methods provide very similar parameter estimates. Curiously, the inappropriate NMSDE model performs surprisingly well. In this case, the MPE methods, both appropriate and inappropriate ones, present very poor estimates. We also highlight that the estimates of the standard deviations
and
using the appropriate MPE methods severely fail, suggesting even that
is random and
is not when the reverse is the true situation, while the DA(
) method very well captures the non-random nature of parameter
, and both DA(
) and DA(
) methods capture the random nature of
, although somewhat underestimating its standard deviation
.
In
Table 4, the estimates using the
dataset, with both parameters
and
fixed, i.e., with
and
, are presented. Now, all the methods are appropriate. However, with the exception of the NMSDE method, which, in this case, is the most appropriate one, they are overparametrized. It is quite interesting to note that the different DA methods give practically coincidental parameter estimates among them and are practically coincidental with the parameter estimates of the NMSDE model. This reveals robustness of the proposed DA method, which also captures the non-random character of
and
very well. The MPE methods give reasonable results, but they are not so good.
In this case, not only the NMSDE method and the DA methods give almost coincidental parameter estimates, but the value of the the NMSDE method (which is the exact maximum of the exact log-likelihood function) is also almost coincidental with the values for the DA methods; so, these DA values, which are approximations, are likely to be good approximations. If these values were exact instead of just approximations, a likelihood ratio test for (i.e, for the randomness of ) and/or the significance of (i.e., for the randomness of ) would obtain non-significant results.
Since the typical situation when dealing with real data is not knowing beforehand whether and/or are random or fixed, and, since we have seen that, among the approximate methods DA, the method DA() performs as well or almost as well as other appropriate DA methods (in what concerns the estimation of the common parameters), even when it is overparametrized with respect to the dataset, we will consider from now on, among the three delta approximation methods, only the DA() method. For similar reasons, among the MPE methods, we will consider from now on just the MPE() method. For comparison purposes, we also consider the usual NMSDE method, which is the most appropriate method for the fixed-effects model considered in most applications, but it is, of course, inappropriate when our datasets have random and/or random.
With the purpose of analyzing the influence of smaller sample sizes on the estimates,
Table 5,
Table 6,
Table 7 and
Table 8 present the results for the four mixed- and fixed-effects stochastic Gompertz models for the datasets with smaller samples sizes, 50 and 500 animals. We will use the DA(
), the MPE(
), and the NMSDE estimation methods. For comparison purposes, for the datasets
and
, we also present the results of the Exact(
) method.
We can conclude that the same main characteristic features observed in the estimates for the large datasets with 5000 animals still hold. The DA() method is able to accurately identify the fixed and the random effects, while the MPE() method usually fails (except, and not clearly, when both effects are fixed).
As expected, when comparing the confidence intervals of the parameter estimates for different number of animals in the datasets (see
Table 5,
Table 6,
Table 7 and
Table 8 for 50 and 500 animals, and
Table 1,
Table 2,
Table 3 and
Table 4 for 5000 animals), their amplitudes decrease as the number of animals in the dataset increases. In general, the MPE(
) method have worst estimates than the DA(
) method, both for small and large samples, except in estimating the standard deviation
of the parameter
when this parameter is random.
For the datasets and , the results obtained with the DA() method are not exactly the same as the ones obtained by the Exact() method, but they are quite close, even very close for the 500 animals dataset .
For the fixed-effects datasets and , the estimates of the DA() method replicate the same results as the NMSDE method, which, for these datasets, is the most appropriate method. The values of the NMSDE method are exact maximum log-likelihood values, and they are again practically equal to the approximate values of the DA() method, again suggesting that, for fixed-effects datasets, these are good approximations, in which case a likelihood ratio test of would accept this null hypothesis.
Finally, in
Table 9, we used the real cattle weight dataset of 16,029 Mertolengo cattle males (totaling 96,204 observations) and present the estimates obtained by the DA(
) method and by the NMSDE method. The true parameter values are not known, and we cannot use the R package
MsdeParEst since the real weight data from the 16,029 animals presents a different age vector of observations for each animal. Analyzing the results of
Table 9, we can conclude that, despite the database being large and heterogeneous, the two estimation methods provide very similar estimates. According to our findings obtained above, it is interesting to note that the DA(
) method presents clear evidence of random effects on the
parameter and of strong random effects on the
parameter. Although we cannot perform a likelihood ratio test to reject the null hypothesis
because the displayed
value for the DA(
) method is an approximation, the difference of
values between this method and the NMSDE method (which displays an exact
value) is striking enough to leave doubts about the random nature of
and
.
6. Conclusions
To describe the individual growth of animals in a randomly varying environment, a general stochastic differential equation (SDE) model was used, and, in order to take into account that the model parameters may vary from animal to animal, which, for instance, occurs due to different individual genetics and other characteristics of the animals, we considered SDE mixed models. Here, we studied the SDE mixed model where both parameters included in the drift term, the parameter (asymptotic modified weight) and the growth parameter , were assumed to be Gaussian distributed.
We applied the maximum likelihood estimation method to obtain the estimates for the parameters of the SDE mixed models. In most cases, for this type of model, it is difficult or impossible to obtain a closed-form expression for the likelihood function and approximation methods are used to cope with this issue. In this paper, we propose to use the delta approximation method (DA method) with both and random, in which we adapted the delta method, a classic in Statistics, to obtain approximate closed-form expressions for the likelihood function when both parameters and are random. Of course, the DA method can also be applied when only one of the two parameters is assumed random.
To evaluate the performance of the proposed method on parameter estimation, we used simulated datasets from different mixed-effects models (with only random, with only random, and with both parameters random) and from a fixed-effects model (both and fixed), with different numbers of animals (5000, 500, and 50), datasets in which all animals were weighed at the same ages so that we could use all the estimation methods considered here and compare them. Since, in real life, we usually do not know which are the random parameters, wrong assumptions may be made on that issue. In order to evaluate the consequences of wrong assumptions on what parameters are random, we included in the comparisons also the methods designed under such assumptions. We compared the DA method (in its variants of assuming both or just one parameter random) with an existing method specifically designed for mixed-effects models (also with the same variants), referred to here as the MPE method, which is provided by the R package MsdeParEst. We also included in the comparisons the estimation methods for which log-likelihood expressions in closed-form are available, namely the NMSDE method and, in some cases, the Exact() method, which are designed, respectively, for the fixed-effects model and the mixed-effects model with only random.
The results of these comparisons show a very good performance of the proposed DA method with both and random, being globally the best method for all the simulated scenarios. This method, unlike the MPE methods, was able to correctly identify, in each of the settings, the fixed and the random parameters. It gives generally better parameter estimates than the MPE method and the estimates are quite close to the true parameter values, with the exception of the standard deviations of random effects, which were somewhat underestimated. The performance of the proposed DA method was confirmed when using a simulated dataset with both parameters fixed, since it provided the same results as the ones obtained when using the exact NMSDE method.
For this type of SDE mixed models, it is usual to find in the literature estimation methods developed under the assumption of having a unique (often also evenly spaced) age vector of observations common to all individuals, and this is also required when using available R packages, such as the MsdeParEst package used in the MPE method. The delta approximation (DA) method has the advantage of not requiring such restrictions, so it can be used in real situations where such restrictions usually do not hold. In our real data application, we are precisely in this situation, and we applied the estimation methods to real weights of Mertolengo cattle males from a large and heterogeneous dataset, reaching the conclusion that the proposed DA method identifies both parameters and (more strikingly) as being random.
This approach revealed to be a very interesting alternative to the available estimation methods for SDE mixed models.
As future work, we are undergoing the study of the case where we will incorporate the genetic factors of the animals into the model to explain part of the variation in the random parameters, and we intend to implement the current methods in an R package.