Next Article in Journal
Continuity of Channel Parameters and Operations under Various DMC Topologies
Next Article in Special Issue
A Generalized Relative (α, β)-Entropy: Geometric Properties and Applications to Robust Statistical Inference
Previous Article in Journal
The Gibbs Paradox: Lessons from Thermodynamics
Previous Article in Special Issue
ϕ-Divergence in Contingency Table Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Minimum Penalized ϕ-Divergence Estimation under Model Misspecification

by
M. Virtudes Alba-Fernández
1,*,
M. Dolores Jiménez-Gamero
2 and
F. Javier Ariza-López
3
1
Departamento de Estadística e Investigación Operativa, Universidad de Jaén, 23071, Jaén, Spain
2
Departamento de Estadística e Investigación Operativa, Universidad de Sevilla, 41012, Sevilla, Spain
3
Departamento de Ingeniería Cartográfica, Geodésica y Fotogrametría, Universidad de Jaén, 23071, Jaén, Spain
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(5), 329; https://doi.org/10.3390/e20050329
Submission received: 8 March 2018 / Revised: 22 April 2018 / Accepted: 27 April 2018 / Published: 30 April 2018

Abstract

:
This paper focuses on the consequences of assuming a wrong model for multinomial data when using minimum penalized ϕ -divergence, also known as minimum penalized disparity estimators, to estimate the model parameters. These estimators are shown to converge to a well-defined limit. An application of the results obtained shows that a parametric bootstrap consistently estimates the null distribution of a certain class of test statistics for model misspecification detection. An illustrative application to the accuracy assessment of the thematic quality in a global land cover map is included.

1. Introduction

In many practical settings, individuals are classified into a finite number of unique nonoverlapping categories, and the experimenter collects the number of observations falling in each of such categories. In statistics, that sort data is called multinomial data. Examples arise in many scientific disciplines: in economics, when dealing with the number of different types of industries observed in a geographical area; in biology, when counting the number of individuals belonging to one of k species (see, for example, Pardo [1], pp. 94–95); in sports, when considering the number of injured players in soccer matches (see, for example, Pardo [1], p. 146); and many others.
When dealing with multinomial data, one often finds zero cell frequencies, even for large samples. Although many examples can be given, we will center on the following one, since two related data sets will be analyzed in Section 4. Zero cell frequencies are usually observed when the quality of the geographic information data is assessed, and specifically, when we pay attention to the thematic component of this quality. Roughly speaking, the thematic quality refers to the correctness of the qualitative aspect of an element (pixel, feature, etc.). To give an assessment of the thematic accuracy, a comparison is needed between the label considered as true of a feature and the label assigned to the same feature after a classification (among a number of labels previously stated). This way, each element/feature, which really belongs to a particular category, can be classified as belonging to the same category (correct assignment), or as belonging to another one (incorrect assignment). Given a sample of n elements belonging to a particular category, after collecting the number of elements correctly classified, X 1 , and the number of incorrect classifications in a set of k 1 possible categories, X i , i = 2 , , k , we obtain a multinomial vector ( X 1 , X 2 , , X k ) t , for which small or zero cell frequencies are often observed associated with the incorrect classifications, X i , i = 2 , , k .
Motivated by this example in the geographic information data context, as well as many others, along this paper, it will be assumed that the available information can be summarized by means of a random vector X = ( X 1 , , X k ) t having a k-cell multinomial distribution with parameters n and π = ( π 1 , , π k ) t Δ 0 k = { ( π 1 , , π k ) t : π i 0 , 1 i k , i = 1 k π i = 1 } , X M k ( n ; π ) in short. Notice that, if π Δ 0 k , then some components of π may equal 0, implying that some cell frequencies can be equal to zero, even for large samples. In many instances, it is assumed that π belongs to a parametric family π P = { P ( θ ) = ( p 1 ( θ ) , , p k ( θ ) ) t , θ Θ } Δ k = { ( π 1 , , π k ) t : π i > 0 , 1 i k , i = 1 k π i = 1 } , where Θ R s , k s 1 > 0 and p 1 ( · ) , …, p k ( · ) are known real functions.
When it is assumed that π P , π is usually estimated through P ( θ ^ ) = ( p 1 ( θ ^ ) , , p k ( θ ^ ) ) t for some estimator θ ^ of θ . A common choice for θ ^ is the maximum likelihood estimator (MLE), which is known to have good asymptotic properties. Basu and Sarkar [2] and Morales et al. [3] have shown that these properties are shared by a larger class of estimators: the minimum ϕ -divergence estimators (M ϕ E). This class includes MLEs as a particular case. However, as illustrated in Mandal et al. [4], the finite sample performance of these estimators can be improved by modifying the weight that each ϕ -divergence assigns to the empty cells. The resulting estimator is called the minimum penalized ϕ -divergence estimator (MP ϕ E). Moreover, Mandal et al. [4] have shown that such estimators have the same asymptotic properties as the M ϕ Es. Specifically, they are strongly consistent and, conveniently normalized, asymptotically normal. To derive these asymptotic properties, it is assumed that the probability model is correctly specified, that is to say, that we are sure about π P .
If the parametric model is not correctly specified, Jiménez-Gamero et al. [5] have shown that, under certain assumptions, the M ϕ Es still have a well defined limit, and, conveniently normalized, they are asymptotically normal. For the MLE, these results were known from those in [6]. Because, as argued before, the use of penalized ϕ -divergences may lead to better performance of the resulting estimators, the aim of this piece of research is to investigate the asymptotic properties of the MP ϕ Es under model misspecification. If the model considered is true, we obtain as a particular case the results in [4].
The usefulness of the results obtained is illustrated by applying them to the problem of testing goodness-of-fit to the parametric family P ,
H 0 : π P ,
against the alternative
H 1 : π P ,
using as a test statistic a penalized ϕ 1 -divergence between a nonparametric estimator of π , the relative frequencies, and a parametric estimator of π , obtained by assuming that the null hypothesis is true, P ( θ ^ ) , θ ^ being an MP ϕ 2 E. Here, ϕ 1 and ϕ 2 may differ. The convenience of using this type of test statistics is justified in Mandal et al. [7]. Although these authors show that, under H 0 , such test statistics are asymptotically distribution free, the asymptotic approximation to the null distribution of the test statistics in this class is rather poor. Some numerical examples illustrate this unsatisfactory behavior of the asymptotic approximation. By using the fact that the MP ϕ E always converges to a well-defined limit, whether the model in H 0 is true or not, we prove that the bootstrap consistently estimates the null distribution of these test statistics. We then retake the previously cited numerical examples to exemplify the usefulness of the bootstrap approximation which, despite the demand for more computing time, is more accurate than that yielded by the asymptotic null distribution for small and moderate sample sizes.
The rest of the paper is organized as follows. Section 2 studies certain asymptotic properties of MP ϕ 2 Es; specifically, conditions are given for the strong consistency and asymptotic normality. Section 3 uses such results to prove that a parametric bootstrap provides a consistent estimator to the null distribution of test statistics based on penalized ϕ -divergences for testing H 0 . Section 4 displays an application of the results obtained in the context of a classification work in a cover land map.
Before ending this section we introduce some notation: all limits in this paper are taken when n ; L denotes convergence in distribution; P denotes convergence in probability; a . s . denotes the almost sure convergence; let { A n } be a sequence of random variables and let ϵ R , then A n = O P ( n ϵ ) means that n ϵ A n is bounded in probability, A n = o P ( n ϵ ) means that n ϵ A n P 0 , and A n = o ( n ϵ ) means that n ϵ A n a . s . 0 ; N k ( μ , Σ ) denotes the k-variate normal law with mean μ and variance matrix Σ ; all vectors are column vectors; the superscript t denotes transpose; if x R k , with x t = ( x 1 , , x k ) , then D i a g ( x ) is the k × k diagonal matrix whose ( i , i ) entry is x i , 1 i k , and
Σ x = D i a g ( x ) x x t ;
I k denotes the k × k identity matrix; to simplify notation, all 0s appearing in the paper represent vectors of the appropriate dimension.

2. Some Asymptotic Properties of MP ϕ Es

Let X M k ( n ; π ) , with π Δ 0 k , and let π ^ = ( π ^ 1 , π ^ 2 , , π ^ k ) t be the vector of relative frequencies,
π ^ i = X i n , 1 i k .
Let P be a parametric model satisfying Assumption 1 below.
Assumption 1.
P = { P ( θ ) = ( p 1 ( θ ) , , p k ( θ ) ) t , θ Θ } Δ k , where Θ R s , k s 1 > 0 and p 1 ( . ) , …, p k ( . ) : Θ R are known twice continuously differentiable in i n t Θ functions.
Let ϕ : [ 0 , ) R { } be a continuous convex function. For arbitrary Q = ( q 1 , , q k ) t Δ 0 k and P = ( p 1 , , p k ) t Δ k , the ϕ -divergence between Q and P is defined by (Csiszár [8])
D ϕ ( Q , P ) = i = 1 k p i ϕ ( q i / p i ) .
Note that
D ϕ ( Q , P ) = i / q i > 0 p i ϕ ( q i / p i ) + ϕ ( 0 ) i / q i = 0 p i .
The penalized ϕ -divergence for the tuning parameter h between Q and P is defined from the above expression by replacing ϕ ( 0 ) with h as follows (see Mandal et al. [4]):
D ϕ , h ( Q , P ) = i / q i > 0 p i ϕ ( q i / p i ) + h i / q i = 0 p i .
If
θ ^ ϕ , h = arg min θ D ϕ , h ( π ^ , P ( θ ) ) ,
then θ ^ ϕ , h is called the MP ϕ E of θ .
In order to study some of the properties of θ ^ ϕ , h , we will assume that ϕ satisfies Assumption 2 below.
Assumption 2.
ϕ : [ 0 , ) R is a strictly convex function, twice continuously differentiable in ( 0 , ) .
Assumption 2 is assumed when dealing with estimators based on minimum divergence, since it lets us take Taylor series expansions of D ϕ ( π ^ , P ( θ ) ) , which is useful to derive asymptotic properties of the M ϕ Es. For example, Section 3 of Lindsay [9] assumes that the function ϕ (he calls G what we call ϕ ) is a thrice differentiable function (which is stronger than Assumption 2); Theorem 3 in Morales et al. [3] requires, among other conditions, ϕ to meet Assumption 2 to derive the consistency and asymptotic normality of M ϕ Es.
Assumption 2 is also assumed in Mandal et al. [4] (they call G what we call ϕ ) to study the consistency and asymptotic normality of MP ϕ Es. Specifically, these authors show that, if π P and θ 0 is the true parameter value, then, under suitable regularity conditions including Assumption 2, the MP ϕ E is consistent for θ 0 , and n ( θ ^ ϕ , h θ 0 ) is asymptotically normal with a mean of 0 and a variance matrix equal to the inverse of the information matrix.
Next we will only assume that π Δ 0 k , that is, the assumption that π P is dropped. In this context, we prove that the MP ϕ E is consistent for θ 0 , where now θ 0 is the parameter vector that minimizes D ϕ , h ( π , P ( θ ) ) , that is to say, θ 0 = arg min θ D ϕ , h ( π , P ( θ ) ) . Note that θ 0 also depends on ϕ and h, so to be rigorous we should denote it by θ 0 , ϕ , h , but to simplify notation we will simply denote it as θ 0 . We also show that n ( θ ^ ϕ , h θ 0 ) is asymptotically normal with a mean of 0. With this aim, we will also assume the following.
Assumption 3.
D ϕ , h ( π , P ( θ ) ) has a unique minimum at θ 0 i n t Θ .
Assumption 3 is assumed in papers on estimators based on minimum divergence estimation. For example, it is Assumption A3(b) in [6], which states, that it is the fundamental identification condition for quasi-maximum likelihood estimators to have a well-defined limit; and it is contained in Assumptions 7 and 9 in [10], required for minimum chi-square estimators to have a well-defined limit; it also coincides with Assumption 30 in [9], imposed for the same reason.
Let θ 0 be as defined in Assumption 3. Then P ( θ 0 ) is the ( ϕ , h ) -projection of π on P . Section 3 in [11] shows that Assumption 3 holds for two-way tables when P is the uniform association model, so the ( ϕ , h ) -projection always exists for such model. Nevertheless, this projection may not exist, or may not be defined uniquely. See Example 2 in [12] for an instance where there is no unique minimum (because although Θ is that example is convex, the family { P ( θ ) , θ Θ } is not convex, so the uniqueness of the projection is not guaranteed). Let Δ k ( ϕ , P , h ) = { π Δ 0 k   such   that   Assumption 3 holds } .
From now on, we will assume that the components of π are sorted so that π 1 , , π m > 0 , and π m + 1 = = π k = 0 , for some 1 < m k , where, if m = k , then it is understood that all components of π are positive. We will write π + = ( π 1 , , π m ) t and π ^ + = ( π ^ 1 , , π ^ m ) t . The next result shows the strong consistency and asymptotic normality of the MP ϕ E.
Theorem 1.
Let P be a parametric family satisfying Assumption 1. Let ϕ be a real function satisfying Assumption 2. Let X M k ( n ; π ) with π Δ k ( ϕ , P , h ) . Then
(a) 
θ ^ ϕ , h a . s . θ 0 .
(b) 
n π ^ + π + θ ^ ϕ , h θ 0 L N m + s ( 0 , A Σ π + A t ) , where A t = ( I m , G t ) and G is defined in Equation (7). In particular,
n ( θ ^ ϕ , h θ 0 ) L N s ( 0 , G Σ π + G t )
(c) 
n π ^ + π + P ( θ ^ ϕ , h ) P ( θ 0 ) L N 2 m ( 0 , B Σ π + B t ) , where B t = ( I m , G t D 1 ( P ( θ 0 ) ) ) , with D 1 ( P ( θ ) ) defined in Equation (8).
Remark 1.
Observe that, if m = k , then the penalization has no effect asymptotically; by contrast, if m < k , then the presence of the tuning parameter h influences the covariance matrix of the asymptotic law of n ( θ ^ ϕ , h θ 0 ) and n ( P ( θ ^ ϕ , h ) P ( θ 0 ) ) .
Remark 2.
If π P , we obtain as a particular case the results in Mandal et al. [4]. Our conditions are weaker than those in [4]. The reason is that they allow an infinite number of categories, while we are assuming that such a number is finite, k. Therefore, when the number of categories is finite, the assumptions in [4] for the consistency and asymptotic normality of the MPϕE can be weakened.
As a consequence of Theorem 1, the following corollary gives the asymptotic behavior of D ϕ 1 , h 1 ( π ^ , P ( θ ^ ϕ 2 , h 2 ) ) , for arbitrary ϕ 1 , ϕ 2 , and h 1 , h 2 , that may or may not coincide. Part (a) of Corollary 1, which assumes that the model P is correctly specified, has been previously proven in [7]. It is included here for the sake of completeness. Part (b), which describes the limit in law under alternatives is, to the best of our knowledge, new.
Corollary 1.
Let P be a parametric family satisfying Assumption 1. Let ϕ 1 and ϕ 2 be two real functions satisfying Assumption 2. Let X M k ( n ; π ) with π Δ k ( ϕ , P , h ) .
(a) 
For π P ,
T = 2 n ϕ 1 ( 1 ) { D ϕ 1 , h 1 ( π ^ , P ( θ ^ ϕ 2 , h 2 ) ) ϕ 1 ( 1 ) } L χ k s 1 2 .
(b) 
For π Δ k ( ϕ 2 , P , h 2 ) P , let θ 0 = arg min θ D ϕ 2 , h 2 ( π , P ( θ ) ) . Then
W = n { D ϕ 1 , h 1 ( π ^ , P ( θ ^ ϕ 2 , h 2 ) ) D ϕ 1 , h 1 ( π , P ( θ 0 ) ) } L N ( 0 , ϱ 2 )
where ϱ 2 = a t B Σ π B t a , with B, as defined in Theorem 1 with ϕ = ϕ 2 and h = h 2 ,
a t = ϕ 1 π 1 p 1 ( θ 0 ) , , ϕ 1 π m p m ( θ 0 ) , v 1 , , v m , h 1 , , h 1 k m times ,
and v i , 1 i m , are as defined in Equation (5) with ϕ = ϕ 1 and h = h 1 .
Remark 3.
If π P , the asymptotic behavior of the statistic T does not depend either on ϕ 1 , ϕ 2 , or on h 1 , h 2 . In fact, the asymptotic law of T is the same as if non-penalized divergences were used.
Remark 4.
When π Δ k ( ϕ 2 , P , h 2 ) P , if m = k , then the asymptotic distribution of W does not depend on h 1 , h 2 ; by contrast, if m < k , then the asymptotic distribution of W does depend on h 1 and h 2 .
Remark 5.
(Properties of the asymptotic test) As a consequence of Corollary 1(a), we have that for testing H 0 vs. H 1 , the test that rejects the null hypothesis when T χ k s 1 , 1 α 2 is asymptotically correct, in the sense that P 0 ( T χ k s 1 , 1 α 2 ) α , where χ k s 1 , 1 α 2 stands for the 1 α percentile of the χ k s 1 2 distribution and P 0 stands for the probability when the null hypothesis is true. From Corollary 1(b), it follows that such a test is consistent against fixed alternatives π Δ k ( ϕ 2 , P , h 2 ) P , in the sense that P ( T χ k s 1 , 1 α 2 ) 1 .

3. Application to Bootstrapping Goodness-Of-Fit Tests

As observed in Remark 5, the test that rejects H 0 when T χ k s 1 , 1 α 2 is asymptotically correct and consistent against fixed alternatives. Nevertheless, the χ 2 approximation to the null distribution of the test statistic is rather poor. Next we illustrate this fact with three examples. The last one is motivated by a real data set application in Section 4. All computations have been performed using programs written in the R language [13].
Example 1.
Let X M 3 ( n ; π ) , with π P so that
p 1 ( θ ) = 1 3 θ , p 2 ( θ ) = 2 3 θ , p 3 ( θ ) = 2 θ , 0 < θ < 1 / 3 .
The problem of testing goodness-of-fit to this family is dealt with by considering as test statistic a penalized ϕ 1 -divergence and an MP ϕ 2 E, with ϕ 1 and ϕ 2 , two members of the power-divergence family, defined as follows:
P D λ ( x ) = 1 λ ( λ + 1 ) x ( λ + 1 ) x λ ( x 1 ) , λ 0 , 1 ,
P D 0 ( x ) = x log ( x ) x + 1 , for λ = 0 , and P D 1 ( x ) = log ( x ) + x 1 , for λ = 1 . We thank an anonymous referee for pointing out that the power divergence family is also known as the α -divergence family (see, for example, Section 4 of Amari [14]).
In order to evaluate the performance of the χ 2 approximation to the null distribution of T, we carried out an extensive simulation experiment. As a previous part of the simulation experiment, we evaluated the possible effect of the tuning parameter h 2 on the accuracy of the MP ϕ 2 E. For this goal, we generated 10,000 samples of size 200 from the parametric family with θ = 0.3333 , and calculated the MP ϕ 2 E with h 2 = 0.5 , 1 , 2 , 5 , 10 and ϕ 2 = P D 2 , which correspond to the modified chi-square test statistic (see, for example, [1], p. 114). We calculated the root mean square deviation (RMSD) of the resulting estimations,
R M S D = i = 1 10 , 000 θ ^ 2 , h 2 θ 2 10 , 000 ,
obtaining 0.00156, 0.00128, 0.00128, 0.00128, and 0.00128, respectively. According to these results, there are rather small differences in the performance of the MP ϕ 2 E for the values of h 2 considered. Because of this, we fixed ϕ 2 = P D 2 and h 2 = 0.5 , 1 , 2 .
Next, to study the goodness of the asymptotic approximation, we generated 10,000 samples of size n = 100 from the parametric family with θ = 0.3333 , and calculated the test statistic T with h 1 = h 2 = 0.5 and ϕ 1 ( x ) = ϕ 2 ( x ) = P D 2 ( x ) , as well as the associated p-values corresponding to the asymptotic null distribution. We then computed the fraction of these p-values, which are less than or equal to the nominal values α = 0.05 , 0.10 (top and below in tables). This experiment was repeated for n = 150 , 200 , h 1 = h 2 = 1 , 2 , ϕ 1 = P D 1 (which corresponds to the chi-square test statistic) and ϕ 1 = P D 2 . Table 1 shows the results obtained. We also considered the case h 1 h 2 , obtaining quite close outcomes. Table 2 displays the results obtained for n = 200 and ϕ 1 = ϕ 2 = P D 2 . Looking at these tables, we conclude that the asymptotic null distribution does not provide an accurate estimation of the null distribution of T since the type I error probabilities are much greater than the nominal values, 0.05 and 0.10. Therefore, other approximations of the null distribution should be studied.
Example 2.
Let X M 3 ( n ; π ) , with π P so that
p 1 ( θ ) = 0.5 2 θ , p 2 ( θ ) = 0.5 + θ , p 3 ( θ ) = θ , 0 < θ < 1 / 4 .
We repeated the simulation schedule described in Example 1 for this law with θ = 0.24 . Table 3 and Table 4 report the obtained results. In contrast to the results for Example 1, where the asymptotic approximation gives a rather liberal test, in this case the resulting test is very conservative. Therefore, we again conclude that the asymptotic null distribution does not provide an accurate estimation of the null distribution of T.
Example 3.
Let X M 4 ( n ; π ) , with π P so that
p 1 ( θ ) = θ 2 , p 2 ( θ ) = θ ( 1 θ ) , p 3 ( θ ) = θ ( 1 θ ) , p 4 ( θ ) = ( 1 θ ) 2 , 0 < θ < 1 .
We repeated the simulation schedule described in Example 1 for this law with θ = 0.8 . Table 5 and Table 6 report the results obtained. Looking at these tables, we see that the test based on asymptotic approximation is liberal, and conclude, as in the previous examples, that other approximations of the null distribution should be considered.
The reason for the unsatisfactory results in the three examples is that the asymptotic approximation requires unaffordably large sample sizes when some cells have extremely small probabilities, which provoke the presence of zero cell frequencies. To appreciate this fact, notice that Example 1 requires n > 30 , 000 to obtain expected cell frequencies greater than 10.
Motivated by these examples, the aim of this section is to study another way of approximating the null distribution of T, the bootstrap. The null bootstrap distribution of T is the conditional distribution of
T * = 2 n ϕ 1 ( 1 ) { D ϕ 1 , h 1 ( π ^ * , P ( θ ^ ϕ 2 , h 2 * ) ) ϕ 1 ( 1 ) } ,
given ( X 1 , , X k ) , where π ^ * is defined as π ^ with ( X 1 , , X k ) replaced by ( X 1 * , , X k * ) M k ( n ; P ( θ ^ ϕ 2 , h 2 ) ) , and θ ^ ϕ 2 , h 2 * = arg min θ D ϕ 2 , h 2 ( π ^ * , P ( θ ) ) .
Let P * denote the bootstrap conditional probability law, given ( X 1 , , X k ) . The next theorem gives the weak limit of T * .
Theorem 2.
Let P be a parametric family satisfying Assumption 1. Let ϕ 1 and ϕ 2 be two real functions satisfying Assumption 2. Let X M k ( n ; π ) with π Δ k ( ϕ , P , h ) . Then
sup x P * ( T * x ) P ( Y x ) P 0
where Y χ k s 1 2 .
Recall that, from Corollary 1(a), when H 0 is true, the test statistic T converges in law to a χ k s 1 2 law. Thus, the result in Theorem 2 implies the consistency of the null bootstrap distribution of T as an estimator of the null distribution of T. It is important to remark that the result in Theorem 2 holds whether H 0 is true or not, that is, the bootstrap properly estimates the null distribution, even if the available data does not obey the law in the null hypothesis. This is due to the fact that, under the assumed conditions, the MP ϕ E always converges to a well-defined limit.
Remark 6.
Properties of the Bootstrap Test. Similarly to Remark 5, as a consequence of Corollary 1(a) and Theorem 2, we have that, for testing H 0 vs. H 1 , the test that rejects the null hypothesis when T T 1 α * is asymptotically correct, in the sense that P 0 ( T T 1 α * ) α , where T 1 α * stands for the 1 α percentile of the bootstrap distribution of T. From Corollary 1(b) and Theorem 2, it follows that such a test is consistent against fixed alternatives π Δ k ( ϕ 2 , P , h 2 ) P , in the sense that P ( T T 1 α * ) 1 .
In practice, the bootstrap p-value must be approximated by simulation as follows:
  • Calculate the observed value of the test statistic for the available data ( X 1 , , X k ) , T o b s .
  • Generate B bootstrap samples ( X 1 b * , , X k b * ) M k ( n ; P ( θ ^ ϕ 2 , h 2 ) ) , b = 1 , , B , and calculate the test statistic for each bootstrap sample obtaining T * b , b = 1 , , B .
  • Approximate the p-value by means of the expression
    p ^ b o o t = card { b : T b * b T o b s } B .
For the numerical experiments previously described, whose results are displayed in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6, we also calculated the bootstrap p-values. This was done by generating B = 1000 bootstrap samples to approximate each p-value, and calculating the fraction of these p-values, which are less than or equal to 0.05 and 0.10 (top and bottom in the tables). Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 display the estimated type I error probabilities obtained by using the bootstrap approximation as well as those obtained with the asymptotic approximation (bootstrap, B, and asymptotic, A, in the tables) taken from Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 in order to facilitate the comparison between them. Looking at Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12, we conclude that the bootstrap approximation is superior to the asymptotic one for small and moderate sample sizes, since in all cases the bootstrap type I error probabilities were closer to the nominal values than those obtained using the asymptotic null distribution. This superior performance of the bootstrap null distribution estimator has been noticed in other inferential problems, where ϕ -divergences are used as test statistics (see, for example, [5,12,15,16]).

4. Application to the Evaluation of the Thematic Classification in Global Land Cover Maps

This section displays the results of an application of our proposal to two real data sets related to the thematic quality assessment of a global land cover (GLC) map. The data comprise the results of two thematic classifications of the land cover category “Evergreen Broadleaf Trees” (EBL) and summarize the number of sample units correctly classified in this class, and the number of confusions with other land cover classes: “Deciduous Broadleaf Trees” (DBL), “Evergreeen Needleleaf Trees” (ENL), and “Urban/Built Up” (U). The results of these two classifications were collected from two different global land cover maps: the Globcover map and the LC-CCI map (see Tsendbazar et al. [17] for additional details) and they are displayed in Table 13.
Parametric specifications of the multinomial vector of probabilities are quite attractive since they describe in a concise way the classification pattern. Because of this, given the similarity between the two observed classifications in Table 13, we are interested in the search of a parametric model suitable to depict the thematic accuracy of this class in both GLC maps. For this purpose, we consider the parametric family in Equation (3) of Example 3. The presence of a zero cell frequency in each data set leads us to consider a penalized ϕ -divergence as a test statistic for testing goodness-of-fit to such a parametric family.
Table 14 displays the observed values of the test statistic T and the associated bootstrap p-values for the goodness-of-fit test with respect to the parametric family in Equation (3) for the two observed classifications of the EBL class in Table 13. Looking at this table, it can be concluded that the null hypothesis cannot be rejected in both cases. Therefore, the parametric model in Equation (3) provides an adequate description of the thematic classification of the EBL class.

5. Proofs

Notice that
D ϕ , h ( π , P ( θ ) ) = i = 1 m p i ( θ ) ϕ π i p i ( θ ) + h i = m + 1 k p i ( θ ) = h I ( m < k ) + i = 1 m p i ( θ ) ϕ h π i p i ( θ )
where I stands for the indicator function, ϕ h ( x ) = ϕ ( x ) h , if m < k , and ϕ h ( x ) = ϕ ( x ) , if m = k . Let
D ϕ , h + ( π , P ( θ ) ) = i = 1 m p i ( θ ) ϕ h π i p i ( θ ) .
Clearly,
arg min θ D ϕ , h ( π ^ , P ( θ ) ) = arg min θ D ϕ , h + ( π ^ , P ( θ ) ) .
Note that, if Assumptions 1 and 2 hold, then Assumption 3 implies that
θ D ϕ + ( π , P ( θ 0 ) ) = i = 1 m θ p i ( θ 0 ) v i = 0
where
v i = ϕ π i p i ( θ 0 ) π i p i ( θ 0 ) ϕ π 1 p i ( θ 0 ) h I ( m < k )
1 i m , and ϕ ( x ) = x ϕ ( x ) . The s × s matrix
D 2 = 2 θ θ t D ϕ + ( π , P ( θ 0 ) ) = i = 1 m 2 θ θ t p i ( θ 0 ) v i + i = 1 m θ p i ( θ 0 ) θ p i ( θ 0 ) t w i
is positive definite, where
w i = π i 2 p i 3 ( θ 0 ) ϕ π i p i ( θ 0 ) ,
1 i m , and ϕ ( x ) = 2 x 2 ϕ ( x ) . Therefore, by the Implicit Function Theorem (see, for example, Dieudonne [18], p. 272), there is an open neighborhood U ( 0 , 1 ) m of π + and s unique functions, g i : U R , 1 i s , so that
(i)
θ ^ ϕ = ( g 1 ( π ^ + ) , , g s ( π ^ + ) ) t , n n 0 , for some n 0 N ;
(ii)
θ 0 = ( g 1 ( π + ) , , g s ( π + ) ) t ;
(iii)
g = ( g 1 , , g s ) t is continuously differentiable in U and the s × m Jacobian matrix of g at ( π 1 , , π m ) is given by
G = D 2 1 D 1 ( P ( θ 0 ) ) D i a g ( ϖ )
where
D 1 ( P ( θ ) ) = θ p 1 ( θ ) , , θ p m ( θ ) ,
ϖ = ( ϖ 1 , , ϖ m ) t ,
ϖ i = π i p i 2 ( θ 0 ) ϕ π i p i ( θ 0 ) ,
and 1 i m .
Proof of Theorem 1.
Part (a) follows from (i) and (ii) above and the fact that π ^ + π + a.s. From (i)–(ii), and taking into account that n ( π ^ + π + ) is asymptotically normal, it follows that
θ ^ ϕ = θ 0 + G ( π , P ( θ 0 ) , ϕ ) ( π ^ π ) + o P ( n 1 / 2 ) .
Parts (b) and (c) follow from Equation (9) and the asymptotic normality of n ( π ^ + π + ) . ☐
Proof of Corollary 1.
Part (a) was shown in Theorem 5.1 in [7]. To prove (b), we first demonstrate that
W = W 0 + r n
where
W 0 = n j = 1 m p j ( θ ^ ϕ 2 , h 2 ) ϕ 1 π ^ j p j ( θ ^ ϕ 2 , h 2 ) + h 1 j = m + 1 k p j ( θ ^ ϕ 2 , h 2 ) D ϕ 1 , h 1 ( π , P ( θ 0 ) ) + r n ,
and r n = o P ( 1 ) . Notice that
r n = n { h 1 ϕ 1 ( 0 ) } j : π ^ j = 0 , π j > 0 p j ( θ ^ ϕ 2 , h 2 ) = n { h 1 ϕ 1 ( 0 ) } j = 1 m p j ( θ ^ ϕ 2 , h 2 ) I ( π ^ j = 0 ) .
Therefore,
0 E | r n | n | h 1 ϕ 1 ( 0 ) | j = 1 m P ( π ^ j = 0 ) = n | h 1 ϕ 1 ( 0 ) | j = 1 m ( 1 π j ) n 0 ,
which implies r n = o P ( 1 ) . From Theorem 1 and Taylor expansion, it follows that W 0 L N ( 0 , ϱ 2 ) ; hence, the result in part (b) is proven. ☐
Proof of Theorem 2.
The proof of Theorem 2 is parallel to that of Theorem 2 in [5], so we omit it. ☐

Author Contributions

M.V. Alba-Fernández and M.D. Jiméz-Gamero conceived and designed the experiments; M.V. Alba-Fernández performed the experiments; M.V. Alba-Fernández and F.J. Ariza-López analyzed the data; F.J. Ariza-López contributed materials; M.V. Alba-Fernández and M.D. Jiméz-Gamero wrote the paper.

Acknowledgments

The authors thank the anonymous referees for their valuable time and careful comments, which improved the presentation of this paper. The research in this paper has been partially funded by grants: CTM2015–68276–R of the Spanish Ministry of Economy and Competitiveness (M.V. Alba-Fernández and F.J. Ariza-López) and MTM2017-89422-P of the Spanish Ministry of Economy, Industry and Competitiveness, ERDF support included (M.D. Jiménez-Gamero).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLEmaximum likelihood estimator
M ϕ Eminimum ϕ -divergence estimator
MP ϕ Eminimum penalized ϕ -divergence estimator
RMSDroot mean square deviation
Bbootstrap
Aasymptotic
GLCglobal land cover
EBLevergreen broadleaf trees
DBLdeciduous broadleaf trees
ENLevergreeen needleleaf trees
Uurban/built up

References

  1. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall: London, UK; CRC Press: Boca Raton, FL, USA, 2006. [Google Scholar]
  2. Basu, A.; Sarkar, S. On disparity based goodness-of-fit tests for multinomial models. Stat. Probab. Lett. 1994, 19, 307–312. [Google Scholar] [CrossRef]
  3. Morales, D.; Pardo, L.; Vajda, I. Asymptotic divergence of estimates of discrete distributions. J. Stat. Plann. Inference 1995, 48, 347–369. [Google Scholar] [CrossRef]
  4. Mandal, A.; Basu, A.; Pardo, L. Minimum disparuty inference and the empty cell penalty: Asymptotic results. Sankhya Ser. A 2010, 72, 376–406. [Google Scholar] [CrossRef]
  5. Jiménez-Gamero, M.D.; Pino-Mejías, R.; Alba-Fernández, M.V.; Moreno-Rebollo, J.L. Minimum ϕ-divergence estimation in misspecified multinomial models. Comput. Stat. Data Anal. 2011, 55, 3365–3378. [Google Scholar] [CrossRef]
  6. White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
  7. Mandal, A.; Basu, A. Minimum disparity inference and the empty cell penalty: Asymptotic results. Electron. J. Stat. 2011, 5, 1846–1875. [Google Scholar] [CrossRef]
  8. Csiszár, I. Information type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967, 2, 299–318. [Google Scholar]
  9. Lindsay, B.G. Efficiency versus robustness: The case for minimum Hellinger distance and related methods. Ann. Stat. 1994, 22, 1081–1114. [Google Scholar] [CrossRef]
  10. Vuong, Q.H.; Wang, W. Minimum χ-square estimation and tests for model selection. J. Econom. 1993, 56, 141–168. [Google Scholar] [CrossRef]
  11. Alba-Fernández, M.V.; Jiménez-Gamero, M.D.; Lagos-Álvarez, B. Divergence statistics for testing uniform association in cross-classifications. Inf. Sci. 2010, 180, 4557–4571. [Google Scholar] [CrossRef]
  12. Jiménez-Gamero, M.D.; Pino-Mejías, R.; Rufián-Lizana, A. Minimum Kϕ-divergence estimators for multinomial models and applications. Comput. Stat. 2014, 29, 363–401. [Google Scholar] [CrossRef]
  13. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017; Available online: https://www.R-project.org/ (accessed on 29 April 2018).
  14. Amari, S. Integration of stochastic models by minimizing α-divergence. Neural Comput. 2007, 19, 2780–2796. [Google Scholar] [CrossRef] [PubMed]
  15. Alba-Fernández, M.V.; Jiménez-Gamero, M.D. Bootstrapping divergence statistics for testing homogeneity in multinomial populations. Math. Comput. Simul. 2009, 79, 3375–3384. [Google Scholar] [CrossRef]
  16. Jiménez-Gamero, M.D.; Alba-Fernández, M.V.; Barranco-Chamorro, I.; Muñoz-García, J. Two classes of divergence statistics for testing uniform association. Statistics 2014, 48, 367–387. [Google Scholar] [CrossRef]
  17. Tsendbazar, N.E.; de Bruina, S.; Mora, B.; Schoutenc, L.; Herolda, M. Comparative assessment of thematic accuracy of GLC maps for specific applications using existing reference data. Int. J. Appl. Earth. Obs. Geoinf. 2016, 44, 124–135. [Google Scholar] [CrossRef]
  18. Dieudonne, J. Foundations of Modern Analysis; Academic Press: New York, NY, USA; London, UK, 1969. [Google Scholar]
Table 1. Type I error probabilities obtained using asymptotic approximation for Example 1 with θ = 0.3333 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
Table 1. Type I error probabilities obtained using asymptotic approximation for Example 1 with θ = 0.3333 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
ϕ 1 = PD 2 ϕ 1 = PD 1 ϕ 1 = PD 2
h 1 = h 2 h 1 = h 2 h 1 = h 2
n0.5120.5120.512
1000.9960.9960.9980.9950.9970.9960.9950.9970.997
0.9960.9960.9980.9950.9970.9960.9950.9970.997
1500.9950.9950.9960.9940.9950.9960.9940.9940.995
0.9950.9950.9960.9940.9950.9960.9940.9940.995
2000.9920.9930.9940.9920.9940.9910.9930.9930.994
0.9920.9940.9940.9920.9940.9910.9930.9930.994
Table 2. Type I error probabilities obtained using asymptotic approximation for Example 1 with n = 200 , θ = 0.3333 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
Table 2. Type I error probabilities obtained using asymptotic approximation for Example 1 with n = 200 , θ = 0.3333 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
( h 1 , h 2 ) (0.5, 1)(1, 0.5)(0.5, 2)(2, 0.5)(1, 2)(2, 1)
0.9890.9970.9980.9980.9940.998
0.9990.9970.9980.9980.9940.999
Table 3. Type I error probabilities obtained using asymptotic approximation for Example 2 with θ = 0.24 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
Table 3. Type I error probabilities obtained using asymptotic approximation for Example 2 with θ = 0.24 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
ϕ 1 = PD 2 ϕ 1 = PD 1 ϕ 1 = PD 2
h 1 = h 2 h 1 = h 2 h 1 = h 2
n0.5120.5120.512
1000.0160.0170.0170.0130.0130.0140.0130.0140.015
0.0340.0360.0360.0310.0300.0310.0300.0330.033
1500.0180.0190.0170.0140.0140.0140.0130.0150.016
0.0350.0390.0370.0310.0330.0320.0350.0330.032
2000.0240.0220.0220.0140.0160.0160.0140.0150.016
0.0430.0420.0400.0320.0340.0320.0320.0350.033
Table 4. Type I error probabilities obtained using asymptotic approximation for Example 2 with n = 200 , θ = 0.24 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
Table 4. Type I error probabilities obtained using asymptotic approximation for Example 2 with n = 200 , θ = 0.24 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
( h 1 , h 2 ) (0.5, 1)(1, 0.5)(0.5, 2)(2, 0.5)(1, 2)(2, 1)
0.0170.0170.0180.0190.0180.016
0.0350.0330.0350.0400.0360.034
Table 5. Type I error probabilities obtained using asymptotic approximation for Example 3 with θ = 0.8 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
Table 5. Type I error probabilities obtained using asymptotic approximation for Example 3 with θ = 0.8 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
ϕ 1 = PD 2 ϕ 1 = PD 1 ϕ 1 = PD 2
h 1 = h 2 h 1 = h 2 h 1 = h 2
n0.5120.5120.512
1000.0630.0660.0740.0950.1070.1110.1220.1360.131
0.1220.1200.1250.1570.1650.1610.1810.1900.182
1500.0630.0640.0660.0830.0820.0840.0990.1050.100
0.1140.1180.1130.1370.1340.1360.1530.1590.152
2000.0620.0610.0610.0750.0790.0740.0860.0910.086
0.1110.1110.1150.1290.1370.1230.1450.1480.144
Table 6. Type I error probabilities obtained using asymptotic approximation for Example 3 with n = 200 , θ = 0.8 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
Table 6. Type I error probabilities obtained using asymptotic approximation for Example 3 with n = 200 , θ = 0.8 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
( h 1 , h 2 ) (0.5, 1)(1, 0.5)(0.5, 2)(2, 0.5)(1, 2)(2, 1)
0.0600.0620.0630.0620.0630.058
0.1080.1140.1130.1120.1130.109
Table 7. Asymptotic and bootstrap type I error probabilities for Example 1 with θ = 0.3333 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , h 1 = h 2 { 0.5 , 1 , 2 } .
Table 7. Asymptotic and bootstrap type I error probabilities for Example 1 with θ = 0.3333 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , h 1 = h 2 { 0.5 , 1 , 2 } .
h 1 = h 2 0.5 12
ϕ 1 nB A B A B A
P D 2 1000.0510.9960.0480.9960.0480.998
0.1100.9960.1030.9960.1090.998
1500.0550.9950.0500.9950.0560.996
0.1060.9950.1010.9950.1090.996
2000.0530.9920.0530.9930.0560.994
0.1030.9920.1060.9940.1080.994
P D 1 1000.0570.9950.0560.9970.0550.996
0.1100.9950.1100.9970.1070.996
1500.0540.9940.0520.9950.0550.996
0.1100.9940.1040.9950.1140.996
2000.0550.9920.0510.9940.0520.991
0.1060.9920.1030.9940.1060.991
P D 2 1000.0550.9950.0560.9970.0540.997
0.1100.9950.1090.9970.1070.997
1500.0540.9940.0550.9940.0560.995
0.1070.9940.1060.9940.1100.995
2000.0540.9930.0530.9930.0550.994
0.1070.9930.1050.9930.1080.994
Table 8. Asymptotic and bootstrap type I error probabilities for Example 1 with n = 200 , θ = 0.3333 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
Table 8. Asymptotic and bootstrap type I error probabilities for Example 1 with n = 200 , θ = 0.3333 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
( h 1 , h 2 ) (0.5, 1)(1, 0.5)(0.5, 2)(2, 0.5)(1, 2)(2, 1)
BABABABABABA
0.0610.9890.0500.9970.0590.9960.0420.9980.0440.9940.0630.998
0.1070.9990.1130.9970.1060.9960.0950.9980.1050.9940.1150.999
Table 9. Asymptotic and bootstrap type I error probabilities for Example 2 with θ = 0.24 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
Table 9. Asymptotic and bootstrap type I error probabilities for Example 2 with θ = 0.24 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
h 1 = h 2 0.5 12
ϕ 1 nBABABA
P D 2 1000.0570.0160.0550.0170.0510.017
0.1110.0340.1100.0360.1020.036
1500.0490.0180.0480.0190.0510.017
0.0970.0350.1030.0390.1010.036
2000.0510.0240.0550.0220.0510.022
0.0990.0430.1020.0420.0990.040
P D 1 1000.0580.0130.0540.0130.0510.014
0.1140.0310.1130.0300.1060.031
1500.0500.0140.0510.0140.0520.014
0.0980.0310.1030.0310.1000.032
2000.0490.0140.0540.0160.0520.016
0.0990.0320.1040.0340.0990.032
P D 2 1000.0550.0130.0530.0140.0500.015
0.1100.0300.1080.0330.1040.033
1500.0500.0130.0520.0150.0510.016
0.0970.0320.1030.0330.0980.032
2000.0490.0140.0510.0150.0510.016
0.1000.0320.1020.0350.0980.033
Table 10. Asymptotic and bootstrap type I error probabilities for Example 2 with n = 200 , θ = 0.24 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
Table 10. Asymptotic and bootstrap type I error probabilities for Example 2 with n = 200 , θ = 0.24 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
( h 1 , h 2 ) (0.5, 1)(1, 0.5)(0.5, 2)(2, 0.5)(1, 2)(2, 1)
B A B A B A B A B A B A
0.0480.0170.0510.0170.0520.0180.0530.0190.0500.0180.0490.016
0.1010.0350.0990.0330.1000.0350.1050.0400.1030.0360.1010.034
Table 11. Asymptotic and bootstrap type I error probabilities for Example 3 with θ = 0.8 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
Table 11. Asymptotic and bootstrap type I error probabilities for Example 3 with θ = 0.8 , ϕ 1 = P D λ , λ { 2 , 1 , 2 } , ϕ 2 = P D 2 , and h 1 = h 2 { 0.5 , 1 , 2 } .
h 1 = h 2 0.5 12
ϕ 1 nBABABA
P D 2 1000.0660.0630.0580.0660.0440.074
0.1190.1220.1010.1200.0860.125
1500.0530.0630.0500.0640.0450.066
0.0980.1140.0950.1180.0930.113
2000.0510.0620.0470.0610.0460.061
0.0990.1110.0960.1110.1000.115
P D 1 1000.0490.0950.0490.1070.0410.111
0.1030.1570.0980.0650.0840.161
1500.0500.0830.0400.0820.0400.084
0.0980.1370.0900.1340.0870.136
2000.0460.0750.0480.0790.0440.074
0.0950.1290.1020.1370.0920.123
P D 2 1000.0430.1220.0450.1360.0370.131
0.0990.1810.0460.1900.0770.182
1500.0400.0990.0470.1050.0350.100
0.0410.1530.0930.1590.0810.152
2000.0430.0860.0480.0910.0430.086
0.0920.1450.0970.1480.0900.144
Table 12. Asymptotic and bootstrap type I error probabilities for Example 3 with n = 200 , θ = 0.8 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
Table 12. Asymptotic and bootstrap type I error probabilities for Example 3 with n = 200 , θ = 0.8 , ϕ 1 = ϕ 2 = P D 2 , h 1 h 2 , and h 1 , h 2 { 0.5 , 1 , 2 } .
( h 1 , h 2 ) (0.5, 1)(1, 0.5)(0.5, 2)(2, 0.5)(1, 2)(2, 1)
BABABABABABA
0.0470.0600.0480.0620.0510.0630.0490.0620.0480.0630.0440.058
0.0950.1080.0990.1140.0990.1130.0970.1120.0990.1130.0920.109
Table 13. Thematic classification of the Evergreen Broadleaf Trees (EBL) class.
Table 13. Thematic classification of the Evergreen Broadleaf Trees (EBL) class.
Globcover MapLC-CCI Map
Classified DataEBL165172
DBL135
ENL75
U00
Table 14. Results of the goodness-of-fit test applied to the thematic classification of the EBL class.
Table 14. Results of the goodness-of-fit test applied to the thematic classification of the EBL class.
Globcover Map LC-CCI Map
θ ^ 2 , 0.5 = 0.9490 θ ^ 2 , 0.5 = 0.9721
ϕ 1 P D 2 P D 1 P D 2 P D 2 P D 1 P D 2
T o b s 2.30152.76183.0111 0.14320.14320.1433
p ^ b o o t 0.17000.22530.2926 0.92830.92000.9148
θ ^ 2 , 1 = 0.9503 θ ^ 2 , 1 = 0.9725
T o b s 2.76863.37523.6962 0.28210.28230.2826
p ^ b o o t 0.18010.23250.2671 0.84310.91620.9182
θ ^ 2 , 2 = 0.9527 θ ^ 2 , 2 = 0.9732
T o b s 3.63524.54005.0219 0.54920.55080.5514
p ^ b o o t 0.13000.24920.2584 0.75260.81440.8291

Share and Cite

MDPI and ACS Style

Alba-Fernández, M.V.; Jiménez-Gamero, M.D.; Ariza-López, F.J. Minimum Penalized ϕ-Divergence Estimation under Model Misspecification. Entropy 2018, 20, 329. https://doi.org/10.3390/e20050329

AMA Style

Alba-Fernández MV, Jiménez-Gamero MD, Ariza-López FJ. Minimum Penalized ϕ-Divergence Estimation under Model Misspecification. Entropy. 2018; 20(5):329. https://doi.org/10.3390/e20050329

Chicago/Turabian Style

Alba-Fernández, M. Virtudes, M. Dolores Jiménez-Gamero, and F. Javier Ariza-López. 2018. "Minimum Penalized ϕ-Divergence Estimation under Model Misspecification" Entropy 20, no. 5: 329. https://doi.org/10.3390/e20050329

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop