Next Article in Journal
Information Theoretic Approaches for Motor-Imagery BCI Systems: Review and Experimental Comparison
Next Article in Special Issue
Robustness Property of Robust-BD Wald-Type Test for Varying-Dimensional General Linear Models
Previous Article in Journal
Nonlinear Multiscale Entropy and Recurrence Quantification Analysis of Foreign Exchange Markets Efficiency
Previous Article in Special Issue
Robust-BD Estimation and Inference for General Partially Linear Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator

1
Department of Statistics and O.R. I, Complutense University of Madrid, 28040 Madrid, Spain
2
Department of Statistics and O.R. II, Complutense University of Madrid, 28003 Madrid, Spain
3
Department of Mathematics, University of Ioannina, 45110 Ioannina, Greece
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(1), 18; https://doi.org/10.3390/e20010018
Submission received: 6 November 2017 / Revised: 26 December 2017 / Accepted: 28 December 2017 / Published: 31 December 2017

Abstract

:
In this paper, a robust version of the Wald test statistic for composite likelihood is considered by using the composite minimum density power divergence estimator instead of the composite maximum likelihood estimator. This new family of test statistics will be called Wald-type test statistics. The problem of testing a simple and a composite null hypothesis is considered, and the robustness is studied on the basis of a simulation study. The composite minimum density power divergence estimator is also introduced, and its asymptotic properties are studied.

1. Introduction

It is well known that the likelihood function is one of the most important tools in classical inference, and the resultant estimator, the maximum likelihood estimator (MLE), has nice efficiency properties, although it has not so good robustness properties.
Tests based on MLE (likelihood ratio test, Wald test, Rao’s test, etc.) have, usually, good efficiency properties, but in the presence of outliers, the behavior is not so good. To solve these situations, many robust estimators have been introduced in the statistical literature, some of them based on distance measures or divergence measures. In particular, density power divergence measures introduced in [1] have given good robust estimators: minimum density power divergences estimators (MDPDE) and, based on them, some robust test statistics have been considered for testing simple and composite null hypotheses. Some of these tests are based on divergence measures (see [2,3]), and some others are used to extend the classical Wald test; see [4,5,6] and the references therein.
The classical likelihood function requires exact specification of the probability density function, but in most applications, the true distribution is unknown. In some cases, where the data distribution is available in an analytic form, the likelihood function is still mathematically intractable due to the complexity of the probability density function. There are many alternatives to the classical likelihood function; in this paper, we focus on the composite likelihood. Composite likelihood is an inference function derived by multiplying a collection of component likelihoods; the particular collection used is a conditional determined by the context. Therefore, the composite likelihood reduces the computational complexity so that it is possible to deal with large datasets and very complex models even when the use of standard likelihood methods is not feasible. Asymptotic normality of the composite maximum likelihood estimator (CMLE) still holds with the Godambe information matrix to replace the expected information in the expression of the asymptotic variance-covariance matrix. This allows the construction of composite likelihood ratio test statistics, Wald-type test statistics, as well as score-type statistics. A review of composite likelihood methods is given in [7]. We have to mention at this point that CMLE, as well as the respective test statistics are seriously affected by the presence of outliers in the set of available data.
The main purpose of the paper is to introduce a new robust family of estimators, namely, composite minimum density power divergence estimators (CMDPDE), as well as a new family of Wald-type test statistics based on the CMDPDE in order to get broad classes of robust estimators and test statistics.
In Section 2, we introduce the CMDPDE, and we provide the associated estimating system of equations. The asymptotic distribution of the CMDPDE is obtained in Section 2.1. Section 2.2 is devoted to the definition of a family of Wald-type test statistics, based on CMDPDE, for testing simple and composite null hypotheses. The asymptotic distribution of these Wald-type test statistics is obtained, as well as some asymptotic approximations to the power function. A numerical example, presented previously in [8], is studied in Section 3. A simulation study based on this example is also presented (Section 3), in order to study the robustness of the CMDPDE, as well as the performance of the Wald-type test statistics based on CMDPDE. Proofs of the results are presented in the Appendix A.

2. Composite Minimum Density Power Divergence Estimator

We adopt here the notation by [9], regarding the composite likelihood function and the respective CMLE. In this regard, let { f ( · ; θ ) , θ Θ R p , p 1 } be a parametric identifiable family of distributions for an observation y , a realization of a random m-vector Y . In this setting, the composite density based on K different marginal or conditional distributions has the form:
CL ( θ , y ) = k = 1 K f A k ( y j , j A k ; θ ) w k
and the corresponding composite log-density has the form:
c ( θ , y ) = k = 1 K w k A k ( θ , y ) ,
with:
A k ( θ , y ) = log f A k ( y j , j A k ; θ ) ,
where { A k } k = 1 K is a family of random variables associated either with marginal or conditional distributions involving some y j and j { 1 , , m } and w k , k = 1 , , K are non-negative and known weights. If the weights are all equal, then they can be ignored. In this case, all the statistical procedures produce equivalent results.
Let y 1 , , y n also be independent and identically distributed replications of y . We denote by:
c ( θ , y 1 , , y n ) = i = 1 n c ( θ , y i )
the composite log-likelihood function for the whole sample. In complete accordance with the classical MLE, the CMLE, θ ^ c , is defined by:
θ ^ c = arg max θ Θ i = 1 n c ( θ , y i ) = arg max θ Θ i = 1 n k = 1 K w k A k ( θ , y i ) .
It can also be obtained by solving the equations.
u ( θ , y 1 , , y n ) = 0 p ,
where:
u ( θ , y 1 , , y n ) = c ( θ , y 1 , , y n ) θ = i = 1 n k = 1 K w k A k ( θ , y i ) θ .
We are going to see how it is possible to get the CMLE, θ ^ c , on the basis of the Kullback–Leibler divergence measure. We shall denote by g y the density generating the data with the respective distribution function denoted by G. The Kullback–Leibler divergence between the density function g y and the composite density function CL ( θ , y ) is given by:
d K L ( g . , CL ( θ , . ) ) = R m g ( y ) log g ( y ) CL ( θ , y ) d y = R m g ( y ) log g ( y ) d y R m g ( y ) log CL ( θ , y ) d y .
The term:
R m g ( y ) log g ( y ) d y
can be removed because it does not depend on θ ; hence, we can define the following estimator of θ , based on the Kullback–Leibler divergence:
θ ^ K L = arg min θ d K L ( g . , CL ( θ , . ) )
or equivalently:
θ ^ K L = arg min θ R m g ( y ) log CL ( θ , y ) d y = arg min θ R m log CL ( θ , y ) d G ( y ) .
If we replace in (3) the distribution function G by the empirical distribution function G n , we have:
θ ^ K L = arg min θ R m log CL ( θ , y ) d G n ( y ) = arg min θ 1 n i = 1 n c ( θ , y i )
and this expression is equivalent to Expression (1). Therefore, the estimator θ ^ K L coincides with the CMLE. Based on the previous idea, we are going to introduce, in a natural way, the composite minimum density power divergence estimator (CMDPDE).
The CMLE, θ ^ c , obeys asymptotic normality (see [9]) and in particular:
n ( θ ^ c θ ) L n N 0 , G * ( θ ) 1 ,
where G * ( θ ) denotes the Godambe information matrix, defined by:
G * ( θ ) = H ( θ ) J ( θ ) 1 H ( θ ) ,
with H ( θ ) being the sensitivity or Hessian matrix and J ( θ ) being the variability matrix, defined, respectively, by:
H ( θ ) = E θ [ θ u ( θ , Y ) T ] , J ( θ ) = V a r θ [ u ( θ , Y ) ] = E θ [ u ( θ , Y ) u ( θ , Y ) T ] ,
where the superscript T denotes the transpose of a vector or a matrix.
The matrix J ( θ ) is nonnegative definite by definition. In the following, we shall assume that the matrix H ( θ ) is of full rank. Since the component score functions can be correlated, we have H ( θ ) J ( θ ) . If c ( θ , y ) is a true log-likelihood function, then H ( θ ) = J ( θ ) = I F ( θ ) , I F ( θ ) being the Fisher information matrix of the model. Using multivariate version of the Cauchy–Schwarz inequality, we have that the matrix G * ( θ ) I F ( θ ) is non-negative definite, i.e., the full likelihood function is more efficient than any other composite likelihood function (cf. [10], Lemma 4A).
We are now going to proceed to the definition of the CMDPDE, which is based on the density power divergence measure, defined as follows. For two densities p and q associated with two m-dimensional random variables, respectively, the density power divergence (DPD) between p and q was defined in [1] by:
d β ( p , q ) = R m q ( y ) 1 + β 1 + 1 β q ( y ) β p ( y ) + 1 β p ( y ) 1 + β d y ,
for β > 0 , while for β = 0 , it is defined by:
lim β 0 d β ( p , q ) = d K L ( p , q ) .
For β = 1 , Expression (4) reduces to the L 2 distance:
L 2 ( p , q ) = R m q ( y ) p ( y ) 2 d y .
It is also interesting to note that (4) is a special case of the so-called Bregman divergence T ( p ( y ) ) T ( q ( y ) ) { p ( y ) q ( y } T ( q ( y ) ) d y . If we consider T ( l ) = l 1 + β , we get β times d β ( p , q ) . The parameter β controls the trade-off between robustness and asymptotic efficiency of the parameter estimates (see the Simulation Section), which are the minimizers of this family of divergences. For more details about this family of divergence measures, we refer to [11].
In this paper, we are going to consider DPD measures between the density function g y and the composite density function CL ( θ , y ) , i.e.,
d β ( g . , CL ( θ , . ) ) = R m CL ( θ , y ) 1 + β 1 + 1 β CL ( θ , y ) β g ( y ) + 1 β g ( y ) 1 + β d y
for β > 0 , while for β = 0 , we have,
lim β 0 d β ( g . , CL ( θ , . ) ) = d K L ( g . , CL ( θ , . ) ) .
The CMDPDE, θ ^ c β , is defined by:
θ ^ c β = arg min θ Θ d β ( g . , CL ( θ , . ) ) .
The term:
R m g ( y ) 1 + β d y
does not depend on θ , and consequently, the minimization of (5) with respect to θ is equivalent to minimizing:
R m CL ( θ , y ) 1 + β 1 + 1 β CL ( θ , y ) β g ( y ) d y
or:
R m CL ( θ , y ) 1 + β d y 1 + 1 β R m CL ( θ , y ) β d G ( y ) .
Now, we replace the distribution function G by the empirical distribution function G n , and we get:
R m CL ( θ , y ) 1 + β d y 1 + 1 β 1 n i = 1 n CL ( θ , y i ) β .
As a consequence, for a fixed value of β , the CMDPDE of θ can be obtained by minimizing the expression given in (6); or equivalently, by maximizing the expression:
1 n β i = 1 n CL ( θ , y i ) β 1 1 + β R m CL ( θ , y ) 1 + β d y .
Under the differentiability of the model, the maximization of the function in Equation (7) leads to an estimating system of equations of the form:
1 n i = 1 n CL ( θ , y i ) β c ( θ , y i ) θ R m c ( θ , y ) θ CL ( θ , y ) 1 + β d y = 0 .
The system of Equations (8) can be written as:
1 n i = 1 n CL ( θ , y i ) β u ( θ , y i ) R m u ( θ , y ) CL ( θ , y ) 1 + β d y = 0 .
and the CMDPDE θ ^ c β of θ is obtained by the solution of (9). For β = 0 in (9), we have:
1 n i = 1 n u ( θ , y ) R m u ( θ , y ) CL ( θ , y ) d y .
but:
R m u ( θ , y ) CL ( θ , y ) d y = θ CL ( θ , y ) d y = 0
and we recover the estimating equation for the CMLE, θ ^ c , presented in (2).

2.1. Asymptotic Distribution of the Composite Minimum Density Power Divergence Estimator

Equation (9) can be written as follows:
1 n i = 1 n Ψ β y i , θ = 0
with:
Ψ β y i , θ = CL ( θ , y i ) β u ( θ , y i ) R m u ( θ , y ) CL ( θ , y ) 1 + β d y .
Therefore, the CMDPDE, θ ^ c β , is an M-estimator. In this case, it is well known (cf. [12]) that the asymptotic distribution of θ ^ c β is given by:
n ( θ ^ c β θ ) L n N 0 , H β ( θ ) 1 J β ( θ ) H β ( θ ) 1 ,
being:
H β ( θ ) = E θ Ψ β Y , θ θ T
and:
J β ( θ ) = E θ Ψ β Y , θ Ψ β Y , θ T .
We are going to establish the expressions of H β ( θ ) and J β ( θ ) . In relation to H β ( θ ) , we have:
Ψ β y , θ θ T = β CL ( θ , y ) β 1 CL ( θ , y ) u ( θ , y ) T u ( θ , y ) + CL ( θ , y ) β u ( θ , y ) T θ R m u ( θ , y ) T θ CL ( θ , y ) 1 + β d y 1 + β R m CL ( θ , y ) β CL ( θ , y ) u ( θ , y ) T u ( θ , y ) d y
and:
H β ( θ ) = E θ Ψ β Y , θ θ T = R m CL ( θ , y ) β + 1 u ( θ , y ) T u ( θ , y ) d y .
In relation to J β ( θ ) , we have,
Ψ β Y , θ Ψ β Y , θ T = CL ( θ , y ) β u ( θ , y ) R m u ( θ , y ) CL ( θ , y ) 1 + β d y CL ( θ , y ) β u ( θ , y ) T R m u ( θ , y ) T CL ( θ , y ) 1 + β d y = CL ( θ , y ) 2 β u ( θ , y ) u ( θ , y ) T CL ( θ , y ) β u ( θ , y ) R m u ( θ , y ) T CL ( θ , y ) 1 + β d y CL ( θ , y ) β u ( θ , y ) T R m u ( θ , y ) CL ( θ , y ) 1 + β d y + R m u ( θ , y ) CL ( θ , y ) 1 + β d y R m u ( θ , y ) T CL ( θ , y ) 1 + β d y .
Then,
J β ( θ ) = E θ Ψ β Y , θ Ψ β Y , θ T = R m CL ( θ , y ) 2 β + 1 u ( θ , y ) u ( θ , y ) T d y
R m CL ( θ , y ) β + 1 u ( θ , y ) d y R m u ( θ , y ) T CL ( θ , y ) 1 + β d y .
Based on the previous results, we have the following theorem.
Theorem 1.
Under suitable regularity conditions, we have:
n ( θ ^ c β θ ) L n N 0 , H β ( θ ) 1 J β ( θ ) H β ( θ ) 1 ,
where the matrices H β ( θ ) and J β ( θ ) were defined in (10) and (11), respectively.
Remark 1.
If we apply the previous theorem for β = 0 , then we get the CMLE, and the asymptotic variance covariance matrix coincides with the Godambe information matrix because:
H β ( θ ) = H ( θ ) and J β ( θ ) = J ( θ ) ,
for β = 0 .

2.2. Wald-Type Tests Statistics Based on the Composite Minimum Power Divergence Estimator

Wald-type test statistics based on MDPDE have been considered with excellent results in relation to the robustness in different statistical problems; see for instance [4,5,6].
Motivated by those works, we focus in this section on the definition and the study of Wald-type test statistics, which are defined by means of CMDPDE estimators instead of MDPDE estimators. In this context, if we are interested in testing:
H 0 : θ = θ 0 against H 1 : θ θ 0 ,
we can consider the family of Wald-type test statistics:
W n , β 0 = n ( θ ^ c β θ 0 ) T H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 1 ( θ ^ c β θ 0 ) .
For β = 0 , we get the classical Wald-type test statistic considered in the composite likelihood methods (see for instance [7]).
In the following theorem, we present the asymptotic null distribution of the family of the Wald-type test statistics W n , β 0 .
Theorem 2.
The asymptotic distribution of the Wald-type test statistics given in (14) is a chi-square distribution with p degrees of freedom.
The proof of this Theorem 2 is given in Appendix A.1.
Theorem 3.
Let θ * be the true value of the parameter θ, with θ * θ 0 . Then, it holds:
n l θ ^ c β l θ * L n N ( 0 , σ W β 0 2 θ * ) ,
being:
l θ = θ θ 0 T H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 1 θ θ 0
and:
σ W β 0 2 θ * = 4 θ * θ 0 T H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 1 θ * θ 0 .
The proof of the Theorem is outlined in Appendix A.2.
Remark 2.
Based on the previous result, we can approximate the power, β W n 0 , of the Wald-type test statistics in θ * by:
β W n , β 0 θ * = Pr W n , β 0 > χ p , α 2 / θ = θ * = Pr l θ ^ c β l θ * > χ p , α 2 n l θ * θ = θ * = Pr n l θ ^ c β l θ * > n χ p , α 2 n l θ * θ = θ * = Pr n l θ ^ c β l θ * σ W n , β 0 θ * > n σ W n , β 0 θ * χ p , α 2 n l θ * θ = θ * = 1 Φ n n σ W n , β 0 θ * χ p , α 2 n l θ * ,
where Φ n is a sequence of distribution functions tending uniformly to the standard normal distribution function Φ ( x ) .
It is clear that:
lim n β W n , β 0 θ * = 1
for all α 0 , 1 . Therefore, the Wald-type test statistics are consistent in the sense of Fraser.
In many practical hypothesis testing problems, the restricted parameter space Θ 0 Θ is defined by a set of r restrictions of the form:
g ( θ ) = 0 r
on Θ , where g : R p R r is a vector-valued function such that the p × r matrix:
G θ = g ( θ ) T θ
exists and is continuous in θ and rank G θ = r ; where 0 r denotes the null vector of dimension r.
Now, we are going to consider composite null hypotheses, Θ 0 Θ , in the way considered in (16), and our interest is in testing:
H 0 : θ Θ 0 against H 1 : θ Θ 0
on the basis of a random simple of size n , X 1 , , X n .
Definition 1.
The family of Wald-type test statistics for testing (18) is given by:
W n , β = n g θ ^ c β T G ( θ ^ c β ) T H β ( θ ^ c β ) 1 J β ( θ ^ c β ) H β ( θ ^ c β ) 1 G ( θ ^ c β ) 1 g θ ^ c β ,
where the matrices G ( θ ) , H β θ and J β θ were defined in (17), (10) and (11), respectively, and the function g in (16).
If we consider β = 0 , then θ ^ c β coincides with the CMLE, θ ^ c , of θ and ( H β ( θ ^ c ) ) 1 J β ( θ ^ c ) ( H β ( θ ^ c ) ) 1 with the inverse of the Fisher information matrix, and then, we get the classical Wald test statistic considered in the composite likelihood methods.
In the next theorem, we present the asymptotic distribution of W n , β .
Theorem 4.
The asymptotic distribution of the Wald-type test statistics, given in (19), is a chi-square distribution with r degrees of freedom.
The proof of this Theorem is presented in Appendix A.3.
Consider the null hypothesis H 0 : θ Θ 0 Θ . By Theorem 4, the null hypothesis should be rejected if W n , β χ r , α 2 . The following theorem can be used to approximate the power function. Assume that θ * Θ 0 is the true value of the parameter, so that θ ^ c β n a . s . θ * .
Theorem 5.
Let θ * be the true value of the parameter, with θ * θ 0 . Then, it holds:
n l * θ ^ c β l * θ * L n N ( 0 , σ W β 2 θ * )
being:
l * θ = n g θ T G ( θ 0 ) T H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 G ( θ 0 ) 1 g θ
and:
σ W β 2 θ * = l * θ θ θ = θ * T H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 l * θ θ θ = θ * .

3. Numerical Example

In this section, we shall consider an example, studied previously by [8], in order to study the robustness of CMLE. The aim of this section is to clarify the different issues that were discussed in the previous sections.
Consider the random vector Y = ( Y 1 , Y 2 , Y 3 , Y 4 ) T , which follows a four-dimensional normal distribution with mean vector μ = ( μ 1 , μ 2 , μ 3 , μ 4 ) T and variance-covariance matrix:
Σ = 1 ρ 2 ρ 2 ρ ρ 1 2 ρ 2 ρ 2 ρ 2 ρ 1 ρ 2 ρ 2 ρ ρ 1 ,
i.e., we suppose that the correlation between Y 1 and Y 2 is the same as the correlation between Y 3 and Y 4 . Taking into account that Σ should be semi-positive definite, the following condition is imposed: 1 5 ρ 1 3 . In order to avoid several problems regarding the consistency of the CMLE of the parameter ρ (cf. [8]), we shall consider the composite likelihood function:
CL ( θ , y ) = f A 1 ( θ , y ) f A 2 ( θ , y ) ,
where:
f A 1 ( θ , y ) = f 12 ( μ 1 , μ 2 , ρ , y 1 , y 2 ) , f A 2 ( θ , y ) = f 34 ( μ 3 , μ 4 , ρ , y 3 , y 4 ) ,
where f 12 and f 34 are the densities of the marginals of Y , i.e., bivariate normal distributions with mean vectors ( μ 1 , μ 2 ) T and ( μ 3 , μ 4 ) T , respectively, and common variance-covariance matrix:
1 ρ ρ 1 ,
with densities given by:
f h , h + 1 ( μ h , μ h + 1 , ρ , y h , y h + 1 ) = 1 2 π 1 ρ 2 exp 1 2 ( 1 ρ 2 ) Q ( y h , y h + 1 ) , h { 1 , 3 } ,
being:
Q ( y h , y h + 1 ) = ( y h μ h ) 2 2 ρ ( y h μ h ) ( y h + 1 μ h + 1 ) + ( y h + 1 μ h + 1 ) 2 , h { 1 , 3 } .
By θ , we are denoting the parameter vector of our model, i.e, θ = ( μ 1 , μ 2 , μ 3 , μ 4 , ρ ) T . The system of equations that it is necessary to solve in order to obtain the CMDPDE:
θ ^ c β = μ ^ 1 , c β , μ ^ 2 , c β , μ ^ 3 , c β , μ ^ 4 , c β , ρ ^ c β T ,
is given (see Appendix A.4) by:
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 1 i μ 1 + 2 ρ y 2 i μ 2 = 0 ,
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 2 i μ 2 + 2 ρ y 1 i μ 1 = 0 ,
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 3 i μ 3 + 2 ρ y 4 i μ 4 = 0 ,
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 4 i μ 4 + 2 ρ y 3 i μ 3 = 0
and:
1 n β i = 1 n CL ( θ , y i ) β ρ β ( 2 π ) 2 β β + 1 3 2 ρ 1 ρ 2 β + 1 = 0 ,
being:
CL ( θ , y i ) β ρ = ρ 1 ρ 2 β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 2 + 1 ρ y 1 i μ 1 y 2 i μ 2 + y 3 i μ 3 y 4 i μ 4 1 1 ρ 2 y 1 i μ 1 2 2 ρ y 1 i μ 1 y 2 i μ 2 + y 2 i μ 2 2 1 1 ρ 2 y 3 i μ 3 2 2 ρ y 3 i μ 3 y 4 i μ 4 + y 4 i μ 4 2 .
After some heavy algebraic manipulations specified in Appendix A.5, the sensitivity and variability matrices are given by:
H β ( θ ) = C β ( β + 1 ) ( 1 ρ 2 ) 1 ρ 0 0 0 ρ 1 0 0 0 0 0 1 ρ 0 0 0 ρ 1 0 0 0 0 0 2 ( ρ 2 + 1 ) + 2 ρ 2 β 2 ( 1 ρ 2 ) ( 1 + β )
and:
J β ( θ ) = H 2 β ( θ ) ξ β ( θ ) ξ β ( θ ) T ,
where C β = 1 ( β + 1 ) 2 1 ( 2 π ) 2 ( 1 ρ 2 ) β and ξ β ( θ ) = ( 0 , 0 , 0 , 0 , 2 ρ β C β ( β + 1 ) ( 1 ρ 2 ) ) T .

Simulation Study

A simulation study, developed by using the R statistical programming environment, is presented in order to study the behavior of the CMDPDE, as well as the behavior of the Wald-type test statistics based on them. The theoretical model studied in the previous example is considered. The parameters in the model are:
θ = μ 1 , μ 2 , μ 3 , μ 4 , ρ T
and we are interested in studying the behavior of the CMDPDE:
θ ^ c β = μ ^ 1 , c β , μ ^ 2 , c β , μ ^ 3 , c β , μ ^ 4 , c β , ρ ^ c β T
as well as the behavior of the Wald-type test statistics for testing:
H 0 : ρ = ρ 0 against H 1 : ρ ρ 0 .
Through R = 10,000 replications of the simulation experiment, we compare, for different values of β , the corresponding CMDPDE through the root of the mean square errors (RMSE), when the true value of the parameters is θ = ( 0 , 0 , 0 , 0 , ρ ) T and ρ { 0.1 , 0 , 0.15 } . We pay special attention to the problem of the existence of some outliers in the sample, generating 5 % of the samples with θ ˜ = ( 1 , 3 , 2 , 1 , ρ ˜ ) T and ρ ˜ { 0.15 , 0.1 , 0.2 } , respectively. Notice that, although the case ρ = 0 has been considered; this case is less important taking into account the method of the theoretical model under consideration, and having the case of independent observations, the composite likelihood theory is useless. Results are presented in Table 1 and Table 2. Two points deserve our attention. The first one is that, as expected, RMSEs for contaminated data are always greater than RMSEs for pure data and that the RMSEs decrease when the sample size n increases. The second is that, while in pure data, RMSEs are greater for big values of β , when working with contaminated data, the CMDPDE with medium-low values of β ( β { 0.1 , 0.2 , 0.3 } ) present the best behavior in terms of efficiency. These statements are also true for larger levels of contamination, noting that, when larger percentages are considered, larger values of β are also considerable in terms of efficiency (see Table 3, Table 4 and Table 5 for contamination equal to 10 % , 15 % and 20 % , respectively). Considering the mean absolute error (MAE) for the evaluation of the accuracy, we obtain similar results (Table 6).
For a nominal size α = 0.05 , with the model under the null hypothesis given in (29), the estimated significance levels for different Wald-type test statistics are given by:
α ^ n ( β ) ( ρ 0 ) = Pr ^ ( W n β > χ 1 , 0 . 05 2 | H 0 ) = i = 1 R I ( W n , i β ) > χ 1 , 0 . 05 2 | ρ 0 ) R ,
with I ( S ) being the indicator function (with a value of one if S is true and zero otherwise). Empirical levels with the same previous parameter values are presented in Table 7 (pure data) and Table 8 ( 5 % of outliers). While medium-high values of β are not recommended at all, CMLE is generally the best choice when working with pure data. However, the lack of robustness of the CMLE test is impressive, as can be seen in Table 8. The effect of contamination in medium-low values of β is much lighter, while for medium-high values of β , it can return to being deceptively beneficial.
For finite sample sizes and nominal size α = 0.05 , the simulated powers are obtained under H 1 in (29), when ρ { 0.1 , 0 , 0.1 } , ρ ˜ = 0.2 and ρ 0 = 0.15 (Table 9 and Table 10). The (simulated) power for different composite Wald-type test statistics is obtained by:
β n ( β ) ( ρ 0 , ρ ) = Pr ( W n β > χ 1 , 0 . 05 2 | H 1 ) and β ^ n ( λ ) ( ρ 0 , ρ ) = i = 1 R I ( W n , i β > χ 1 , 0.05 2 | ρ 0 , ρ ) R .
As expected, when we get closer to the null hypothesis and when decreasing the sample sizes, the power decreases. With pure data, the best behavior is obtained with low values of β , and with this level of contamination ( 5 % ), the best results are obtained for medium values of β .

4. Conclusions

The likelihood function is the basis of the maximum likelihood method in estimation theory, and it also plays a key role in the development of log-likelihood ratio tests. However, it is not so tractable in many cases, in practice. Maximum likelihood estimators are based on the likelihood function, and they can be easily obtained; however, there are cases where they do not exist or they cannot be obtained. In such a case, composite likelihood methods constitute an appealing methodology in the area of estimation and testing of hypotheses. On the other hand, the distance or divergence based on methods of estimation and testing have increasingly become fundamental tools in the field of mathematical statistics. The work in [13] is the first, to the best of our knowledge, to link the notion of composite likelihood with divergence based on methods for testing statistical hypotheses.
In this paper, MDPDE are introduced, and they are exploited to develop Wald-type test statistics for testing simple or composite null hypotheses, in a composite likelihood framework. The validity of the proposed procedures is investigated by means of simulations. The simulation results point out the robustness of the proposed information theoretic procedures in estimation and testing, in the composite likelihood context. There are several areas where the notions of divergence and composite likelihood are crucial, including spatial statistics and time series analysis. These are areas of interest, and they will be explored elsewhere.

Acknowledgments

We would like to thank the referees for their helpful comments and suggestions. Their comments have improved the paper. This research is supported by Grant MTM2015-67057-P, from Ministerio de Economia y Competitividad (Spain).

Author Contributions

All authors conceived and designed the study, conducted the numerical simulation and wrote the paper. All authors read and approved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLEMaximum likelihood estimator
CMLEComposite maximum likelihood estimator
DPDDensity power divergence
MDPDEMinimum density power divergence estimator
CMDPDEComposite minimum density power divergence estimator
RMSERoot of mean square error
MAEMean absolute error

Appendix A. Proof of the Results

Appendix A.1. Proof of Theorem 2

The result follows in a straightforward manner because of the asymptotic normality of θ ^ c β ,
n ( θ ^ c β θ 0 ) L n N 0 , H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 .

Appendix A.2. Proof of Theorem 3

A first order Taylor expansion of l θ at θ ^ c β around θ * gives:
l θ ^ c β l θ * = l θ θ θ = θ * θ ^ c β θ * + o p θ ^ c β θ * .
Now, the result follows because the asymptotic distribution of l θ ^ c β l θ * coincides with the asymptotic distribution of n l θ θ θ = θ * θ ^ c β θ * .

Appendix A.3. Proof of Theorem 4

We have:
g ( θ ^ c β ) = g θ 0 + G ( θ 0 ) T θ ^ c β θ 0 + o p θ ^ c β θ 0 = G ( θ 0 ) T θ ^ c β θ 0 + o p θ ^ c β θ 0 ,
because g θ 0 = 0 r .
Therefore:
n g θ ^ c β L n N ( 0 , G β θ 0 T H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 G β θ 0 )
because:
n θ ^ c β θ 0 L n N ( 0 , H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 ) .
Now:
W n , β = n g θ ^ β T G ( θ 0 ) T H β ( θ 0 ) 1 J β ( θ 0 ) H β ( θ 0 ) 1 G ( θ 0 ) 1 g θ ^ β L n χ r 2 .

Appendix A.4. CMDPE for the Numerical Example

The estimator θ ^ c β is obtained by maximizing Expression (6) with respect to θ . Firstly, we are going to get:
R 4 CL ( θ , y ) 1 + β θ d y = θ R 4 CL ( θ , y ) 1 + β d y = θ R 4 f 12 ( μ 1 , μ 2 , ρ , y 1 , y 2 ) β + 1 f 34 ( μ 3 , μ 4 , ρ , y 3 , y 4 ) β + 1 d y 1 d y 2 d y 3 d y 4 = θ R 2 f 12 ( μ 1 , μ 2 , ρ , y 1 , y 2 ) β + 1 d y 1 d y 2 R 2 f 34 ( μ 3 , μ 4 , ρ , y 3 , y 4 ) β + 1 d y 3 d y 4 .
Based on [14] (p. 32):
R 2 f 12 ( μ 1 , μ 2 , ρ , y 1 , y 2 ) β + 1 d y 1 d y 2 = R 2 f 34 ( μ 3 , μ 4 , ρ , y 3 , y 4 ) β + 1 d y 3 d y 4 = 1 ρ 2 β 2 β + 1 ( 2 π ) β .
Then:
R 4 CL ( θ , y ) 1 + β θ d y = θ R 4 CL ( θ , y ) 1 + β d y = θ 1 ρ 2 β β + 1 2 ( 2 π ) 2 β
and:
μ i 1 ρ 2 β β + 1 2 ( 2 π ) 2 β = 0 , i = 1 , 2 , 3 , 4 ,
while:
ρ 1 ρ 2 β β + 1 2 ( 2 π ) 2 β = β ( 2 π ) 2 β β + 1 2 2 ρ 1 ρ 2 β + 1 .
Now, we are going to get:
1 n β i = 1 n CL ( θ , y i ) β θ
in order to obtain the CMDPDE, θ ^ c β , by maximizing (6) with respect to θ .
We have,
CL ( θ , y ) β = f 12 ( μ 1 , μ 2 , ρ , y 1 , y 2 ) β f 34 ( μ 3 , μ 4 , ρ , y 3 , y 4 ) β .
Therefore,
CL ( θ , y i ) β μ 1 = β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 1 2 1 ρ 2 2 y 1 i μ 1 + 2 ρ y 2 i μ 2 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β
and the expression:
1 n β i = 1 n CL ( θ , y i ) β μ 1 = 0
leads to the estimator of μ 1 , given by:
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 1 i μ 1 + 2 ρ y 2 i μ 2 = 0 .
In a similar way:
CL ( θ , y i ) β μ 2 = β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 1 2 1 ρ 2 2 y 2 i μ 2 + 2 ρ y 1 i μ 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β ,
CL ( θ , y i ) β μ 3 = β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 2 1 ρ 2 2 y 3 i μ 3 + 2 ρ y 4 i μ 4 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1
and:
CL ( θ , y i ) β μ 4 = β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 2 1 ρ 2 2 y 4 i μ 4 + 2 ρ y 3 i μ 3 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 .
Therefore, the equations:
1 n β i = 1 n CL ( θ , y i ) β μ 2 = 0 , 1 n β i = 1 n CL ( θ , y i ) β μ 3 = 0 and 1 n β i = 1 n CL ( θ , y i ) β μ 4 = 0
lead to the estimators of μ 2 , μ 3 and μ 4 , which should be read as follows:
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 2 i μ 2 + 2 ρ y 1 i μ 1 = 0 ,
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 3 i μ 3 + 2 ρ y 4 i μ 4 = 0
and:
1 n i = 1 n f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 2 1 ρ 2 2 y 4 i μ 4 + 2 ρ y 3 i μ 3 = 0 .
Now, it is necessary to get:
CL ( θ , y i ) β ρ = f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β ρ = β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) ρ + β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 1 f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) ρ .
However, f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) ρ is given by:
1 2 π 1 1 ρ 2 2 ρ 2 1 ρ 2 1 2 exp 1 2 1 ρ 2 y 1 i μ 1 2 2 ρ y 1 i μ 1 y 2 i μ 2 + y 2 i μ 2 2 + 1 2 π 1 ρ 2 1 2 exp 1 2 1 ρ 2 y 1 i μ 1 2 2 ρ y 1 i μ 1 y 2 i μ 2 + y 2 i μ 2 2 ρ 1 ρ 2 2 y 1 i μ 1 2 2 ρ y 1 i μ 1 y 2 i μ 2 + y 2 i μ 2 2 + 1 1 ρ 2 y 1 i μ 1 y 2 i μ 2 = ρ 1 ρ 2 f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) + f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) ρ 1 ρ 2 2 y 1 i μ 1 2 2 ρ y 1 i μ 1 y 2 i μ 2 + y 2 i μ 2 2 + 1 1 ρ 2 y 1 i μ 1 y 2 i μ 2 = f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) ρ 1 ρ 2 1 1 1 ρ 2 y 1 i μ 1 2 2 ρ y 1 i μ 1 y 2 i μ 2 + y 2 i μ 2 2 + 1 ρ y 1 i μ 1 y 2 i μ 2 .
In a similar way, f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) ρ is given by:
f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) ρ 1 ρ 2 1 1 1 ρ 2 y 3 i μ 3 2 2 ρ y 3 i μ 3 y 4 i μ 4 + y 4 i μ 4 2 + 1 ρ y 3 i μ 3 y 4 i μ 4 .
Therefore,
CL ( θ , y i ) β ρ = ρ 1 ρ 2 β f 12 ( μ 1 , μ 2 , ρ , y 1 i , y 2 i ) β f 34 ( μ 3 , μ 4 , ρ , y 3 i , y 4 i ) β 2 + 1 ρ y 1 i μ 1 y 2 i μ 2 + y 3 i μ 3 y 4 i μ 4 1 1 ρ 2 y 1 i μ 1 2 2 ρ y 1 i μ 1 y 2 i μ 2 + y 2 i μ 2 2 1 1 ρ 2 y 3 i μ 3 2 2 ρ y 3 i μ 3 y 4 i μ 4 + y 4 i μ 4 2 .
Therefore, the equation in relation to ρ is given by:
1 n β i = 1 n CL ( θ , y i ) β ρ 1 β + 1 R m CL ( θ , y i ) β + 1 ρ dy = 0
being:
R m CL ( θ , y i ) β + 1 θ dy = β ( 2 π ) 2 β β + 1 2 2 ρ 1 ρ 2 β + 1
and:
CL ( θ , y i ) β ρ
was given in (A5).
Finally,
θ ^ c β = μ ^ 1 , c β , μ ^ 2 , c β , μ ^ 3 , c β , μ ^ 4 , c β , ρ ^ c β T
will be obtained as the solution of the system of equations given by (A1)–(A6).

Appendix A.5. Computation of Sensitivity and Variability Matrices in the Numerical Example

We want to compute:
H β ( θ ) = R m CL ( θ , y ) β + 1 u ( θ , y ) T u ( θ , y ) d y J β ( θ ) = R m CL ( θ , y ) 2 β + 1 u ( θ , y ) T u ( θ , y ) d y R m CL ( θ , y ) β + 1 u ( θ , y ) d y R m u ( θ , y ) T CL ( θ , y ) β + 1 d y .
First of all, we can see that:
CL ( θ , y ) β + 1 = f A 1 ( θ , y ) f A 2 ( θ , y ) β + 1 = 1 2 π 1 ρ 2 exp 1 2 ( 1 ρ 2 ) Q ( y 1 , y 2 ) · 1 2 π 1 ρ 2 exp 1 2 ( 1 ρ 2 ) Q ( y 3 , y 4 ) β + 1 = 1 ( 2 π ) 2 ( 1 ρ 2 ) β + 1 exp β + 1 2 ( 1 ρ 2 ) Q ( y 1 , y 2 ) + Q ( y 3 , y 4 ) = 1 ( β + 1 ) 2 1 ( 2 π ) 2 ( 1 ρ 2 ) β ( β + 1 ) 2 ( 2 π ) 2 ( 1 ρ 2 ) exp β + 1 2 ( 1 ρ 2 ) Q ( y 1 , y 2 ) + Q ( y 3 , y 4 ) = C β · CL β * ,
where C β = 1 ( β + 1 ) 2 1 ( 2 π ) 2 ( 1 ρ 2 ) β and CL β * = CL β ( θ , y ) * N ( μ , Σ * ) , with Σ * = 1 β + 1 Σ .
While u ( θ , y ) = log CL ( θ , y ) θ , we will denote as u ( θ , y ) * to u ( θ , y ) * = log CL β * θ . Then:
u ( θ , y ) = log CL ( θ , y ) θ = 1 β + 1 log CL ( θ , y ) β + 1 θ = 1 β + 1 log ( C β · CL β * ) θ = 1 β + 1 log C β θ + log CL β * θ = 1 β + 1 log C β θ + u ( θ , y ) * .
Further,
R m CL ( θ , y ) β + 1 u ( θ , y ) d y = R m CL ( θ , y ) β + 1 log CL ( θ , y ) θ d y = R m CL ( θ , y ) β CL ( θ , y ) θ d y = R m 1 β + 1 CL ( θ , y ) β + 1 θ d y = 1 β + 1 θ R m CL ( θ , y ) β + 1 d y = 1 β + 1 C β θ = ( 0 , 0 , 0 , 0 , 2 ρ β C β ( β + 1 ) ( 1 ρ 2 ) ) T = ξ β ( θ ) .
Now:
R 4 CL β + 1 u ( θ , y ) T u ( θ , y ) d y = R 4 ( C β · CL β * ) 1 ( β + 1 ) 2 log C β θ + u ( θ , y ) * T log C β θ + u ( θ , y ) * d y = C β ( β + 1 ) 2 R 4 log C β θ T log C β θ CL β * + CL β * u ( θ , y ) * T log C β θ + CL β * log C β θ T u ( θ , y ) * + CL β * ( u ( θ , y ) * ) T u ( θ , y ) * d y = C β ( β + 1 ) 2 log C β θ T log C β θ R 4 CL β * d y + R 4 CL β * u ( θ , y ) * d y T log C β θ + log C β θ T R 4 CL β * u ( θ , y ) * d y + R 4 CL β * ( u ( θ , y ) * ) T u ( θ , y ) * d y = C β ( β + 1 ) 2 K T K + R 4 CL β * u ( θ , y ) * d y T K + K T R 4 CL β * u ( θ , y ) * d y + R 4 CL β * ( u ( θ , y ) * ) T u ( θ , y ) * d y ,
where K = log C β θ = ( 0 , 0 , 0 , 0 , 2 ρ · β 1 ρ 2 ) . However:
R 4 CL β * u ( θ , y ) * d y = R 4 1 C β CL ( θ , y ) β + 1 ( β + 1 ) u ( θ , y ) log C β θ d y = β + 1 C β R 4 CL ( θ , y ) β + 1 u ( θ , y ) d y K C β R 4 CL ( θ , y ) β + 1 d y = 1 C β C β θ K = K K = 0 ,
and thus, (A9) can be expressed as:
R 4 CL ( θ , y ) β + 1 u ( θ , y ) T u ( θ , y ) d y = C β ( β + 1 ) 2 K T K + R 4 CL β * ( u ( θ , y ) * ) T u ( θ , y ) * d y .
On the other hand, it is not difficult to prove that:
R 4 CL β * ( u ( θ , y ) * ) T u ( θ , y ) * d y = C · R 4 CL ( θ , y ) u ( θ , y ) T u ( θ , y ) d y = C · H 0 ( θ ) ,
where C = d i a g ( β + 1 , β + 1 , β + 1 , β + 1 , 1 ) and ([13]):
H 0 ( θ ) = 1 1 ρ 2 ρ 1 ρ 2 0 0 0 ρ 1 ρ 2 1 1 ρ 2 0 0 0 0 0 1 1 ρ 2 ρ 1 ρ 2 0 0 0 ρ 1 ρ 2 1 1 ρ 2 0 0 0 0 0 2 ( ρ 2 + 1 ) ( 1 ρ 2 ) 2 .
Therefore,
H β ( θ ) = C β ( β + 1 ) 2 C · H 0 ( θ ) + K T K ,
that is:
H β ( θ ) = C β ( β + 1 ) ( 1 ρ 2 ) 1 ρ 0 0 0 ρ 1 0 0 0 0 0 1 ρ 0 0 0 ρ 1 0 0 0 0 0 2 ( ρ 2 + 1 ) + 2 ρ 2 β 2 ( 1 ρ 2 ) ( 1 + β ) .
Note that, for β = 0 , (A11) reduces to (A10).
On the other hand, the expression of the variability matrix J β ( θ ) can be obtained from Expressions (27) and (A8) as:
J β ( θ ) = H 2 β ( θ ) ξ β ( θ ) ξ β ( θ ) T .

References

  1. Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimizing a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
  2. Basu, A.; Mandal, A.; Martín, N.; Pardo, L. Testing statistical hypotheses based on the density power divergence. Ann. Inst. Stat. Math. 2013, 65, 319–348. [Google Scholar] [CrossRef]
  3. Basu, A.; Mandal, A.; Martín, N.; Pardo, L. Robust tests for the equality of two normal means based on the density power divergence. Metrika 2015, 78, 611–634. [Google Scholar] [CrossRef] [Green Version]
  4. Basu, A.; Mandal, A.; Martín, N.; Pardo, L. Generalized Wald-type tests based on minimum density power divergence estimators. Statistics 2016, 50, 1–26. [Google Scholar] [CrossRef]
  5. Basu, A.; Ghosh, A.; Mandal, A.; Martín, N.; Pardo, L. A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator. Electon. J. Stat. 2017, 11, 2741–2772. [Google Scholar] [CrossRef]
  6. Ghosh, A.; Mandal, A.; Martín, N.; Pardo, L. Influence analysis of robust Wald-type tests. J. Multivar. Anal. 2016, 147, 102–126. [Google Scholar] [CrossRef] [Green Version]
  7. Varin, C.; Reid, N.; Firth, D. An overview of composite likelihood methods. Stat. Sin. 2011, 21, 4–42. [Google Scholar]
  8. Xu, X.; Reid, N. On the robustness of maximum composite estimate. J. Stat. Plan. Inference 2011, 141, 3047–3054. [Google Scholar] [CrossRef]
  9. Joe, H.; Reid, N.; Somg, P.X.; Firth, D.; Varin, C. Composite Likelihood Methods. Report on the Workshop on Composite Likelihood. 2012. Available online: http://www.birs.ca/events/2012/5-day-workshops/12w5046 (accessed on 28 December 2017).
  10. Lindsay, G. Composite likelihood methods. Contemp. Math. 1998, 80, 221–239. [Google Scholar]
  11. Basu, A.; Shioya, H.; Park, C. Statistical Inference: The Minimum Distance Approach; Chapman & Hall/CRC: Boca Raton, FA, USA, 2011. [Google Scholar]
  12. Maronna, R.A.; Martin, R.D.; Yohai, V.J. Time Series, in Robust Statistics: Theory and Methods; John Wiley & Sons, Ltd.: Chichester, UK, 2006. [Google Scholar]
  13. Martín, N.; Pardo, L.; Zografos, K. On divergence tests for composite hypotheses under composite likelihood. In Statistical Papers; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
  14. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FA, USA, 2006. [Google Scholar]
Table 1. RMSEs for pure data.
Table 1. RMSEs for pure data.
n = 100 n = 200 n = 300
ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15
β = 0 0.09580.09500.09480.06830.06680.06660.05530.05520.0551
β = 0.1 0.09720.09610.09660.06930.06760.06770.05600.05590.0561
β = 0.2 0.10090.09910.10070.07180.06970.07040.05810.05750.0585
β = 0.3 0.10610.10340.10620.07540.07270.07420.06120.05990.0619
β = 0.4 0.11230.10870.11270.07970.07620.07870.06490.06280.0659
β = 0.5 0.11950.11470.12000.08450.08030.08370.06910.06610.0702
β = 0.6 0.12740.12150.12800.08980.08480.08920.07370.06970.0748
β = 0.7 0.13610.12910.13690.09550.08970.09520.07860.07360.0797
β = 0.8 0.14560.13740.14670.10150.09050.10160.08390.07780.0849
Table 2. RMSEs for contaminated data ( 5 % ).
Table 2. RMSEs for contaminated data ( 5 % ).
n = 100 n = 200 n = 300
ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15
β = 0 0.13710.13360.12870.12100.11670.11130.11440.10980.1047
β = 0.1 0.11050.11040.10810.08750.08740.08430.07780.07860.0748
β = 0.2 0.10610.10530.10470.07830.07770.07590.06600.06690.0643
β = 0.3 0.10910.10720.10830.07830.07660.07610.06460.06450.0635
β = 0.4 0.11470.11180.11460.08140.07880.07980.06680.06570.0665
β = 0.5 0.12150.11760.12200.08580.08230.08480.07030.06830.0709
β = 0.6 0.12920.12420.13020.09070.08640.09050.07440.07160.0758
β = 0.7 0.13750.13150.13910.09610.09110.09660.07900.07530.0810
β = 0.8 0.14650.13960.14860.10180.09620.10310.08380.07940.0863
Table 3. RMSEs for contaminated data ( 10 % ).
Table 3. RMSEs for contaminated data ( 10 % ).
n = 100 n = 200 n = 300
ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15
β = 0 0.21070.20520.20000.20030.19440.18840.19680.19110.1844
β = 0.1 0.15000.14720.14360.13240.13050.12640.12590.12500.1204
β = 0.2 0.12380.12290.11920.09910.09870.09510.08810.08980.0858
β = 0.3 0.11730.11700.11390.08820.08710.08460.07350.07540.0726
β = 0.4 0.11890.11870.11700.08720.08490.08450.07050.07140.0706
β = 0.5 0.12370.12340.12340.09010.08680.08840.07210.07180.0734
β = 0.6 0.13010.12960.13110.09440.09030.09380.07530.07420.0779
β = 0.7 0.13750.13670.13960.09950.09470.10000.07930.07760.0831
β = 0.8 0.14670.14460.14880.10500.09960.10640.08370.08140.0884
Table 4. RMSEs for contaminated data ( 15 % ).
Table 4. RMSEs for contaminated data ( 15 % ).
n = 100 n = 200 n = 300
ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15
β = 0 0.29120.28540.27880.28350.27700.27130.28140.27570.2687
β = 0.1 0.20360.19940.19510.19090.18740.18280.18710.1850.1785
β = 0.2 0.15300.14970.14530.13250.13060.12520.12520.12560.1181
β = 0.3 0.13290.12950.12570.10490.10310.09760.09320.09450.0872
β = 0.4 0.12870.12490.12290.09570.09310.08930.08050.08150.0763
β = 0.5 0.13120.12720.12720.09490.09150.09020.07740.07770.0755
β = 0.6 0.13670.13230.13430.09770.09360.09470.07840.07810.0788
β = 0.7 0.14360.13890.14250.10190.09740.10050.08110.08040.0836
β = 0.8 0.15140.14650.15140.10700.10200.10690.08470.08370.0888
Table 5. RMSEs for contaminated data ( 20 % ).
Table 5. RMSEs for contaminated data ( 20 % ).
n = 100 n = 200 n = 300
ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15
β = 0 0.37250.36800.36120.36840.36180.35540.36610.36100.3534
β = 0.1 0.26910.26570.25910.26250.25660.25060.25770.25470.2473
β = 0.2 0.19490.19210.18310.18190.17660.16830.17420.17230.1624
β = 0.3 0.15620.15370.14410.13450.12990.12040.12350.12220.1109
β = 0.4 0.14190.13910.13160.11260.10820.10030.09870.09710.0876
β = 0.5 0.13970.13660.13230.10500.10050.09620.08900.08670.0812
β = 0.6 0.14300.13950.13830.10420.09960.09900.08660.08370.0828
β = 0.7 0.14880.14500.14630.10660.10180.10430.08770.08430.0873
β = 0.8 0.15600.15180.15520.11060.10560.11050.09050.08660.0927
Table 6. MAEs for pure and contaminated data ( 5 % , 10 % , 15 % and 20 % ), n = 100 .
Table 6. MAEs for pure and contaminated data ( 5 % , 10 % , 15 % and 20 % ), n = 100 .
Pure data 5 % 10 % 15 % 20 %
ρ = 0.1 ρ = 0.15 ρ = 0.1 ρ = 0.15 ρ = 0.1 ρ = 0.15 ρ = 0.1 ρ = 0.15 ρ = 0.1 ρ = 0.15
β = 0 0.0760.0760.1900.1790.3710.3420.6260.5740.9540.877
β = 0.1 0.0770.0770.1670.1630.2890.2770.4640.4370.6970.652
β = 0.2 0.0810.0800.1650.1630.2630.2570.3880.3720.5510.520
β = 0.3 0.0850.0850.1720.1700.2640.2600.3700.3590.4950.473
β = 0.4 0.0900.0900.1810.1800.2750.2720.3770.3700.4890.474
β = 0.5 0.0950.0950.1920.1920.2900.2890.3940.3910.5040.496
β = 0.6 0.1010.1020.2040.2040.3080.3080.4160.4160.5280.527
β = 0.7 0.1080.1090.2180.2180.3280.3290.4410.4440.5580.561
β = 0.8 0.1150.1160.2320.2330.3490.3510.4680.4740.5900.599
Table 7. Levels for pure data.
Table 7. Levels for pure data.
n = 100 n = 200 n = 300
ρ 0 = 0.1 ρ 0 = 0 ρ 0 = 0.15 ρ 0 = 0.1 ρ 0 = 0 ρ 0 = 0.15 ρ 0 = 0.1 ρ 0 = 0 ρ 0 = 0.15
β = 0 0.0670.0590.0700.0680.0460.0620.0720.0450.075
β = 0.1 0.0670.0600.0720.0620.0460.0700.0850.0450.079
β = 0.2 0.0720.0610.0840.0690.0510.0840.0970.0490.102
β = 0.3 0.0810.0620.0930.0840.0530.1000.1120.0510.121
β = 0.4 0.0940.0690.0990.1030.0550.1110.1270.0550.142
β = 0.5 0.1050.0710.1110.1180.0560.1220.1490.0510.155
β = 0.6 0.1220.0830.1290.1310.0620.1360.1670.0510.165
β = 0.7 0.1350.0880.1410.1390.0630.1460.1810.0550.177
β = 0.8 0.1530.0990.1580.1510.0710.1560.1980.0560.179
Table 8. Levels for contaminated data ( 5 % ).
Table 8. Levels for contaminated data ( 5 % ).
n = 100 n = 200 n = 300
ρ 0 = 0.1 ρ 0 = 0 ρ 0 = 0.15 ρ 0 = 0.1 ρ 0 = 0 ρ 0 = 0.15 ρ 0 = 0.1 ρ 0 = 0 ρ 0 = 0.15
β = 0 0.3570.2230.0810.6380.4290.1550.7880.6230.24 0
β = 0.1 0.1210.1130.0560.2070.1910.0770.2870.2840.100
β = 0.2 0.0650.0740.0480.0660.0990.0490.0860.1290.059
β = 0.3 0.0570.0670.0710.0570.0660.0590.0650.0770.073
β = 0.4 0.0750.0660.0870.0670.0580.0810.0790.0600.095
β = 0.5 0.0900.0620.1070.0800.0610.1100.1050.0510.128
β = 0.6 0.0960.0630.1260.0950.0630.1310.1170.0490.151
β = 0.7 0.1090.0730.1370.1010.0610.1410.1270.0470.159
β = 0.8 0.1250.0830.1470.1090.0610.1490.1410.0490.171
Table 9. Powers for pure data, ρ 0 = 0.15 .
Table 9. Powers for pure data, ρ 0 = 0.15 .
n = 100 n = 200 n = 300
ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15
β = 0 0.9450.6030.14110.8710.18010.9620.265
β = 0.1 0.9540.5880.15710.8630.20710.960.299
β = 0.2 0.9520.5570.15810.8250.21310.9440.315
β = 0.3 0.9410.5100.1530.9990.7830.21310.9130.313
β = 0.4 0.9250.4650.1540.9990.7340.21010.8850.301
β = 0.5 0.9040.4240.1590.9960.6770.20210.8450.289
β = 0.6 0.8730.3950.1530.9900.6180.1970.9990.7890.277
β = 0.7 0.8300.3610.1530.9850.5550.1830.9990.7330.261
β = 0.8 0.7890.3220.1610.9740.4990.1790.9970.6780.246
Table 10. Powers for contaminated data ( 5 % ), ρ 0 = 0.15 .
Table 10. Powers for contaminated data ( 5 % ), ρ 0 = 0.15 .
n = 100 n = 200 n = 300
ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15 ρ = 0.1 ρ = 0 ρ = 0.15
β = 0 0.4240.0900.0290.7460.1410.0300.9190.2460.037
β = 0.1 0.7160.2220.0410.9540.3970.0290.9940.5690.037
β = 0.2 0.8380.3330.0710.9890.5550.0750.9990.7440.096
β = 0.3 0.8810.3830.1050.9930.6330.1210.9990.8030.161
β = 0.4 0.8790.3930.1290.9930.6420.1500.9990.8090.213
β = 0.5 0.8650.3810.1350.9920.6210.1680.9990.7970.241
β = 0.6 0.8360.3570.1490.9840.5830.1740.9980.7690.252
β = 0.7 0.8080.3320.1460.9800.5310.1730.9970.7130.256
β = 0.8 0.7730.3090.1520.9610.4870.1730.9950.6570.243

Share and Cite

MDPI and ACS Style

Castilla, E.; Martín, N.; Pardo, L.; Zografos, K. Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator. Entropy 2018, 20, 18. https://doi.org/10.3390/e20010018

AMA Style

Castilla E, Martín N, Pardo L, Zografos K. Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator. Entropy. 2018; 20(1):18. https://doi.org/10.3390/e20010018

Chicago/Turabian Style

Castilla, Elena, Nirian Martín, Leandro Pardo, and Konstantinos Zografos. 2018. "Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator" Entropy 20, no. 1: 18. https://doi.org/10.3390/e20010018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop