Next Article in Journal
How Do Pre-/In-Service Mathematics Teachers Reason for or against the Use of Digital Technology in Teaching?
Next Article in Special Issue
A New Clustering Method Based on the Inversion Formula
Previous Article in Journal
Novel Results for Two Generalized Classes of Fibonacci and Lucas Polynomials and Their Uses in the Reduction of Some Radicals
Previous Article in Special Issue
Supervised Classification of Healthcare Text Data Based on Context-Defined Categories
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Empirical Likelihood Ratio Tests for Homogeneity of Multiple Populations in the Presence of Auxiliary Information

1
Department of Mathemstics and Statistics, Yulin Normal University, Yulin 537000, China
2
Key Laboratory of Complex System Optimization and Big Data Processing in Department of Guangxi Education, Yulin Normal University, Yulin 537000, China
3
Department of Statistics, Guangxi Normal University, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(13), 2341; https://doi.org/10.3390/math10132341
Submission received: 24 May 2022 / Revised: 1 July 2022 / Accepted: 2 July 2022 / Published: 4 July 2022

Abstract

:
The empirical likelihood ratio test (ELRT) statistic is constructed for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information. It is shown—under some regularity conditions and under the null hypothesis that all distribution functions of the populations are equal—that the asymptotic distribution of the ELRT is a chi-squared distribution. The proposed ELRT could be more powerful than the Kruskal–Wallis test, as extra information can be efficiently employed by ELRT. The advantage of ELRT over T&P (2006) is that researchers do not need to select approximately normal statistics for inter-group comparisons, and ELRT is more suitable for the multi-population consistency test with a small sample size.

1. Introduction

Suppose that there are k ( k 2 ) populations, and the distribution function of the i-th population is F ( x ; θ i ) ( 1 i k ) , where θ i A R p , θ i are parameter vectors ( 1 i k ), and A is the parameter space. In other words, the k populations share the same type of distribution but may have different structures as θ i ( 1 i k ) varies. Consider the hypothesis
H 0 : θ 1 = θ 2 = = θ k .
This test for homogeneity arises, for example, in the comparison of a number of different treatments, processes, varieties, or locations, when one wishes to test whether these differences have any effect on an outcome X, where X can be a scalar or a vector.
If F is the normal distribution function, the standard analysis of variance (ANOVA) for testing the above hypothesis has been widely used by a number of investigators. For example, Dou [1] employed this method in a parametric study of a developed statistical model. However, the standard ANOVA is not suitable for other distributions.
Due to the complexity of the real world, the form of F may not be known in many applications. In this nonparametric setting, the Kruskal–Wallis test (KWT) provides tests of the null hypothesis that independent samples from two or more groups come from identical populations. Refer to Lehmann [2] for the theory and applications of KWT.
Here, we provide a brief definition for KWT and its limiting distributions. First, the data of all samples in a single series are arranged in an ascending order, and a rank is assigned to each data in the ascending order too. In the case of a repeated value, or a tie, assign ranks to them by averaging their rank position.For example, if the sample number is even, the rank of the median is the average rank of the two numbers before and after it. The KWT statistics for the k independent samples, each of size n i , is
T = 1 s 2 i = 1 k R i 2 n i n ( n + 1 ) 2 4 ,
where n = i = 1 k n i , and R i is the sum of the ranks (from all samples pooled). For the i-th sample, we have
s 2 = 1 n 1 i = 1 k j = 1 n i R i j 2 n ( n + 1 ) 2 4 ,
where R i j is the rank (from all samples pooled) of the j-th observation in the i-th sample. The null hypothesis of this test is that all k distribution functions are equal. It is shown, under the null hypothesis and some regularity conditions, that
T d χ k 1 2 .
As mentioned above, when the form of the F may not be known in many applications. KWT can be used to perform consistency tests for multiple populations. As KWT constructs statistics based on sample rank, its test efficacy is good when the sample size is large. However, when the sample size is small, the statistics constructed based on sample rank carry much less sample information. In other words, KWT is obviously going to be a lot worse. So we introduce the empirical likelihood method; here, we provide a brief definition for it.
The empirical likelihood method as a nonparametric technique for statistical inference in the nonparametric setting was introduced by Owen [3,4] and has many advantages over other nonparametric test methods such as the normal-approximation-based method and the bootstrap method, as put forward by Hall and La Scala [5] and Hall [6]. The Wilks’ theorem, Bartlett correction and the ability of using auxiliary information are three striking properties of the empirical likelihood methods. Chen and Qin [7] proved that the empirical likelihood method can be seamlessly applied to finite population estimation problems, and more accurate statistical inference can be obtained through the effective use of auxiliary information. Zhang [8] developed a new class of M function estimators and quantile estimators with some auxiliary information, using the empirical likelihood technique. A natural question is whether and how an empirical likelihood method can efficiently use the auxiliary information to decide whether several samples should be regarded as those that come from the same population. In this paper, an empirical likelihood ratio test (ELRT) statistics is constructed for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information. Since the auxiliary information is not employed in the KWT method, KWT may be less powerful than ELRT in the field of population distribution consistency. A comprehensive comparison between ELRT and KWT was conducted and is presented in Section 3.
We note that there exist a few other approaches which allow to incorporate auxiliary information in statistics testing. For example, the method based on the auxiliary information in a form of vectors of unbiased estimates in Tarima and Pavlov [9] (T&P (2006)) may be used in the context of this article. The asymptotic properties of the above work are analyzed by Albertus [10]. Tarima and Pavlov used additional information to construct parameter estimation statistics, and completed parameter estimation by adding data sources. Therefore, we will compare ELRT and T&P (2006) separately in the numerical simulation part.
The form of F is unknown in the present study. However, it is assumed that some auxiliary information about the distribution function F ( x ; θ ) is available in the sense that there exist r ( r > p ) known functions g 1 ( x ; θ ) , g 2 ( x ; θ ) , , g r ( x , θ ) such that
E { g ( X ; θ ) } = 0 ,
where X F ( x ; θ ) and g ( x ; θ ) = ( g 1 ( x ; θ ) , g 2 ( x ; θ ) , , g r ( x , θ ) ) τ is an r-dimensional vector.
Equation (2) defines a group of estimating equations. Those equations are widely applicable and particularly powerful when the data model is not specified by a full parametric likelihood function, as elaborated by Hansen [11] and Godambe and Heyde [12] among many others. Qin and Lawless [13] showed that an empirical likelihood approach produces a semiparametric efficient parameter estimate. In this study, r > p is required. Excellent explanations related to this requirement are given by Qin and Lawless [13] and Zhang [8]. More related results of statistical inference using the estimating equations can be found in Wang and Chen [14] and Zhou et al. [15], among others.
This assumption (2) is natural in practice; as in most commonly used distribution families, the distribution is usually determined by some of its moments, such as the mean, variance, skewness, kurtosis and so on. For example, if X is the amount of a type of grains and one suspects that there could be some differences in the amount of the grains, among several populations, caused by the amount of fertilizer, we may set θ ( 1 ) = E X and g 1 ( x ; θ ( 1 ) ) = x θ ( 1 ) . On the other hand, if one suspects that the use of the fertilizer may not only cause the change of amount of grains but also the change of the variance of X, then we may set θ ( 1 ) = E X , θ ( 2 ) = E X 2 , g 1 ( x ; θ ) = x θ ( 1 ) and g 2 ( x ; θ ) = X 2 θ ( 2 ) , where θ = ( θ ( 1 ) , θ ( 2 ) ) . These could be initially assessed by comparing the histograms of the data sets of populations which are under consideration. In addition to the above (partial) information, we may know some extra information. For example, we may know some moments of X or may know that the distribution of X is symmetric about some points.
Based on (2), we will construct an empirical likelihood ratio test (ELRT) to test H 0 . It is shown that the limiting distribution of the ELRT under H 0 is χ ( r p ) ( k 1 ) 2 , and thus the testing method for H 0 is ready to use, where χ ( r p ) ( k 1 ) 2 is the chi-squared random variable with ( r p ) ( k 1 ) degrees of freedom.
The rest of the paper is organized as follows. The main results of this study are presented in Section 2. Results of a simulation study on the finite sample performance of the ELRT are reported in Section 3. We conclude and give some remarks on our future work in Section 4. Finally, the proof of the main results is presented in Section 5.

2. Main Results

For 1 i k , suppose that data X i j ( j = 1 , 2 , , n i ) are independently distributed as F ( x ; θ i ) (unknown) and that all X i j ( j = 1 , 2 , , n i ; i = 1 , 2 , , k ) are independent. Let n = i = 1 k n i , θ i Θ R p and Θ is the parameter space of θ and an open set of R p . Let
B = { p i j , j = 1 , 2 , , n i ; i = 1 , 2 , , k | p i j 0 , i , j p i j = 1 , i , j p i j g ( X i j ; θ ) = 0 } ,
B i = { q i j , j = 1 , 2 , , n i | q i j 0 , j q i j = 1 , j q i j g ( X i j ; θ ) = 0 } , 1 i k .
Here, p i j is the probability mass, which represents the probability that the random variable g ( X , θ ) values g ( X i j , θ ) , both of which are non-negative, and the sum is 1. Similarly, q i j represents the probability that the random variable g ( X , θ ) values g ( X i j , θ ) when 1 i k .
Applying the method proposed by Qin and Lawless [13], the ELRT for testing H 0 can be defined as
λ n = sup θ Θ sup { p i j } B i = 1 k j = 1 n i ( n p i j ) i sup θ Θ sup { q i j } B i j = 1 n i ( n i q i j ) .
The ELRT rejects H 0 for large values of 2 log λ n .
For every 1 i k , assume that 0 is in the convex hull of { g ( X i j ; θ ) , 1 j n i } . Then, according to the Lagrange multiplier method, one can obtain
sup { p i j } B i = 1 k j = 1 n i ( n p i j ) = i , j { 1 + t ( θ ) g ( X i j ; θ ) } 1 ,
where t ( θ ) is the solution of the equation
1 n i , j g ( X i j ; θ ) 1 + t ( θ ) g ( X i j ; θ ) = 0 .
Similarly, for 1 i k ,
sup { q i j } B i j = 1 n i ( n i q i j ) = j { 1 + t i ( θ ) g ( X i j ; θ ) } 1 ,
where t i ( θ ) is the solution of the equation
1 n i j g ( X i j ; θ ) 1 + t i ( θ ) g ( X i j ; θ ) = 0 .
Hence,
λ n = sup θ Θ i , j { 1 + t ( θ ) g ( X i j ; θ ) } 1 i sup θ Θ j { 1 + t i ( θ ) g ( X i j ; θ ) } 1 .
The log-empirical likelihood function for data X i j ( j = 1 , 2 , , n i ) is therefore defined by
n ( θ ) = i = 1 k j = 1 n i log { 1 + t ( θ ) g ( X i j ; θ ) } 1 .
Suppose that there is a θ ^ n that maximizes n ( θ ) , then the θ ^ n is called the maximum empirical likelihood estimator (MELE) of θ . Suppose that, in addition, n ( θ ) is differentiable in θ , then θ ^ n will be a solution of the empirical likelihood equation
θ n ( θ ) = θ v n ( θ ) p × 1 = i , j 1 1 + t ( θ ) g ( X i j ; θ ) g u ( X i j ; θ ) θ v t ( θ ) = 0 ,
where g u ( X i j ; θ ) θ v is the ( u , v ) -element of the r × p matrix g u ( X i j ; θ ) θ v .
Similarly, for 1 i k , the log-empirical likelihood function for X i j ( j = 1 , 2 , , n i ) is defined by
n i ( θ ) = j = 1 n i log { 1 + t i ( θ ) g ( X i j ; θ ) } 1 .
The MELE of θ under the i-th sample is denoted as θ ^ n i , which is a solution of the empirical likelihood equations
θ n i ( θ ) = j 1 1 + t i ( θ ) g ( X i j ; θ ) g u ( X i j ; θ ) θ v t i ( θ ) = 0 .
We assume that all θ ^ n and θ ^ n i are consistent estimators of θ as min 1 i k n i . Then the λ n can be rewritten as
λ n = i , j { 1 + t ( θ ^ n ) g ( X i j ; θ ^ n ) } 1 i j { 1 + t i ( θ ^ n i ) g ( X i j ; θ ^ n i ) } 1 .
Let X be a population with a distribution F ( x ; θ ) , θ 0 be the true value of θ , and | | M | | be the L 2 -norm of a matrix M. To obtain the asymptotic distribution of λ n , we need some regularity conditions as follows Qin and Lawless [13] (pp. 305–306):
(A) E { g ( X ; θ 0 ) g ( X ; θ 0 ) } is positive definite, g ( X ; θ ) / θ is continuous in a neighborhood of θ 0 , | | g ( x ; θ ) / θ | | and | | g ( x ; θ ) | | 3 are bounded by a function G ( x ) in this neighborhood, and the rank of E { g ( X ; θ 0 ) g ( X ; θ 0 ) } is p, where E { G ( X ) } < .
(B) 2 g ( x ; θ ) / θ θ is continuous in θ in a neighborhood of θ 0 and | | 2 g ( x ; θ ) / θ θ | | is bounded by a function G ( x ) in this neighborhood, where E { G ( X ) } < .
The main results of this study are presented as follows.
Theorem 1.
Suppose assumptions (A) and (B) are satisfied, then, under H 0 , as min 1 i k n i , we have
2 log λ n d χ ( r p ) ( k 1 ) 2 ,
where χ ( r p ) ( k 1 ) 2 is the chi-squared random variable with ( r p ) ( k 1 ) degrees of freedom.
Remark 1.
If we use λ n in Equation (11) in stead of Equation (6) as the original definition, where θ ^ n and θ ^ n i are the roots of related likelihood equations, then Theorem 1 still holds true. This can be seen from the proof of Theorem 1. In other words, θ ^ n and θ ^ n i do not need to be the MELEs to have the results of Theorem 1.
To sum up, we constructed an ELRT statistic for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information when the population distribution is unknown, and proved the asymptotic distribution of ELRT as a chi-square distribution under some regularity conditions when the null hypothesis is true. Next we will begin the numerical simulation. In this section, we will calculate the rejection rates of ELRT and compared with those of the Kruskal–Wallis test under several alternatives and compare the powers of them.

3. Simulation Results

Several commonly used distribution families were used in our simulations. The collective distribution and related parameter information are shown in Table 1.
In this study, only three populations were compared. In the simulations, it was supposed that we only know the means of the populations. On the one hand, under the combination of sample size, we took the true value of the distribution under the null hypothesis to generate three distribution populations, and calculated the value of 2 log λ n . The simulation was repeated 5000 times to obtain 5000 corresponding 2 log λ n values. Then, the quantiles of 2 log λ n samples obtained were compared with the quantiles of the Chi-square distribution in Theorem 1. Finally, the Q-Q diagram of ELRT was made as well as the Q-Q diagram of KWT under the same conditions (Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6). Here the abscissa is the theoretical quantile value, and the ordinate is the quantile value of the distribution population. It can be seen from Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6 that when the null hypothesis is true, the Q-Q diagrams of ERLT and KWT can prove that the asymptotic distribution of the test statistics given in this chapter obeys the Chi-square distribution when the null hypothesis is true.
At the same time, the simulated rejection rates of ELRT and KWT under several alternatives were compared using 5000 Monte Carlo trials with various sample sizes. It should be noted here that the rejection rate is calculated as follows:
r e j e c t . r a t e . K W T = s u m ( r e s l t . K W T > q u a n t l . K W T ) / m
where the reslt. KWT is calculated from the samples by KWT statistics, and quantl. KWT is calculated from the samples under the alternatives by Chi-square distribution, and m is the number of samples. The significant level was always set as 0.05 in the simulations. Results of these comparisons were reported in Table 2. In addition, we simulated the rejection rates of KWT and ELRT under the original hypothesis, and the results are shown in Table 3. From these results, it can be seen that the simulated powers are quite good for both tests, even for moderate sample sizes with better performance, as sample sizes increase and ELRT performs better than KWT.
On the other hand, we consider that T&P (2006) also performs parameter estimation research based on additional information, so we will separately compare the ELRT proposed by T&P (2006) in this paper. The results are shown in Table 4. We can see some interesting results from the comparison results. For example, T&P (2006) is more dependent on the normal sample, that is to say, when the comparison sample is biased to the normal sample, the test efficacy of T&P (2006) is very effective; when deviating from the normal condition, T&P (2006) showed poor test efficacy compared with ELRT. In other words, the advantage of ELRT over T&P (2006) is that researchers do not need to select approximately normal statistics for inter-group comparisons. At the same time, from the perspective of the sample size, ELRT is more suitable for the multi-population consistency test with a small sample size.

4. Conclusions

In this study, we discussed the consistency test of the population when the population distribution is unknown, and constructed an ELRT statistic for testing the homogeneity of several nonparametric populations in the presence of some auxiliary information.Meanwhile, we proved the distribution of ELRT both theoretically and numerically and calculated the rejection rates of ELRT and compared with those of the Kruskal–Wallis test under several alternatives. In addition, the efficacy of ELRT and T&P (2006) were compared separately.
The results show that, firstly, the asymptotic distribution of ELRT as a chi-square distribution under some regularity conditions when the null hypothesis is true. Secondly, the rejection rates of ELRT are bigger than those of KWT, as the sample sizes increase when the sample is small. In other words, the proposed ELRT could be more powerful than the Kruskal–Wallis test, as extra information can be more efficiently employed by ELRT. Thirdly, the advantage of ELRT over T&P (2006) is that researchers do not need to select approximately normal statistics for inter-group comparisons. At the same time, compared with T&P (2006), ELRT is more suitable for multi-population consistency test with small sample size.
This discussion will be applied to the field of biological information. For example, when two samples are from the data of an experimental group and a control group, the statistics we constructed will be able to test whether the experimental processing is effective. If the overall distributions of the two data are equal, it means that the experimental processing is ineffective, otherwise it means that the experimental processing is effective. Although some good main conclusions and simulation results were obtained in this paper, there are still many problems to be further discussed in the future. On one hand, the study presented in this paper is based on simple random samples, so more complex cases (such as mixed cases or dependent samples) should be considered. On the other hand, the simulations of one-parameter distributions were completed in this paper, while the simulations of multi-parameter distributions still need to be completed. Therefore, in the future, we will continue to complete the simulation of population consistency for multi-parameter distributions by ELRT and construct a new ELRT statistics above multi-population consistency under complex samples.

5. Proofs

We first state a lemma which will be used in the proof of Theorem 1.
Lemma 1.
Let A k = ( a i j ) be a k × k ( k 2 ) symmetric matrix, r i > 0 for all 1 i k and i = 1 k r i = 1 , where a i i = r i 1 1 for 1 i k and a i j = 1 for i j , 1 i , j k . Let B k = ( b i j ) be a k × k diagonal matrix and C k = B k A k B k with b i i = r i 1 / 2 for 1 i k , then C k = C k and C k is an idempotent matrix with t r ( C k ) = k 1 .
Proof 
(Proof of Lemma 1). Let R k = B k 2 and 1 k = ( 1 , 1 , , 1 ) . Then A k = R k 1 1 k 1 k . It can be shown that R k 1 / 2 1 k 1 k R k 1 / 2 = ( ( r i r j ) 1 / 2 ) k × k , where ( r i r j ) 1 / 2 is the ( i , j ) element of the matrix. Combining with i = 1 k r i = 1 , one can show that ( ( r i r j ) 1 / 2 ) k × k is a idempotent matrix. Notice that C k = I k R k 1 / 2 1 k 1 k R k 1 / 2 . It follows that C k is an idempotent matrix and t r ( C k ) = i = 1 k ( 1 r i ) = k 1 , it is clear that C k = C k . The proof of Lemma 1 is thus complete. □
Proof of Theorem 1 
Let
S 11 = E { g ( X ; θ 0 ) g ( X ; θ 0 ) } , S 12 = E { g ( X ; θ ) / θ | θ = θ 0 } ,
S 21 = S 12 , S 22.1 = S 21 ( S 11 ) 1 S 12 .
Throughout the proof, we assume that H 0 holds true and the true value of θ is θ 0 . Rewrite λ n as
λ n = λ n 1 / i λ n 2 . i ,
where
λ n 1 = i , j { 1 + t ( θ ^ n ) g ( X i j ; θ ^ n ) } 1
and
λ n 2 . i = j { 1 + t i ( θ ^ n i ) g ( X i j ; θ ^ n i ) } 1 .
Employing the result in the proof of Theorem 2 in Qin and Lawless [13], we have
2 log λ n 1 = n 1 / 2 i = 1 k Y i A n 1 / 2 i = 1 k Y i + o p ( 1 ) ,
where
A = S 11 1 { I + S 12 S 22.1 1 S 21 S 11 1 } , Y i = j = 1 n i g ( X i j ; θ 0 ) , 1 i k ,
with A being an identity matrix. Similarly, for 1 i k ,
2 log λ n 2 . i = ( n i 1 / 2 Y i ) A ( n i 1 / 2 Y i ) + o p ( 1 ) .
It follows that
2 log λ n = n 1 / 2 ( Y 1 , , Y k ) M k A · n 1 / 2 ( Y 1 , , Y k ) + o p ( 1 ) ,
where ⨂ is the Kronecker product and M k = ( m i j ) be a k × k symmetric matrix with m i i = n n i 1 for 1 i k and a i j = 1 for i j , 1 i , j k . Let N k = d i a g ( ( n / n 1 ) 1 / 2 , , ( n / n k ) 1 / 2 ) k × k and
Z n = N k ( S 11 ) 1 / 2 ( n 1 / 2 Y 1 , , n 1 / 2 Y k ) .
According to the central limiting theorem,
Z n d N 0 , I p k .
From (14), we have
2 log λ n = Z n N k ( S 11 ) 1 / 2 1 M k A N k ( S 11 ) 1 / 2 1 Z n + o p ( 1 ) .
It can be shown, by the properties of the Kronecker product, that
N k ( S 11 ) 1 / 2 1 M k A N k ( S 11 ) 1 / 2 1 = N k 1 ( S 11 ) 1 / 2 M k A N k 1 ( S 11 ) 1 / 2 = S ,
where S = N k 1 M k N k 1 ( S 11 ) 1 / 2 A ( S 11 ) 1 / 2 . It is clear that ( S 11 ) 1 / 2 · A · ( S 11 ) 1 / 2 is symmetric and idempotent with a trace equal to r p . On the other hand, using Lemma 1, we can see that N k 1 M k N k 1 is symmetric and idempotent with a trace equal to k 1 . It follows that S must be symmetric and idempotent with a trace equal to ( r p ) ( k 1 ) . Theorem 1 is therefore proved by following Equations (14)–(17). □

Author Contributions

Conceptualization, R.W. and Y.Q.; methodology, R.W. and Y.Q.; software, R.W.; validation, Y.Q.; formal analysis, Y.Q.; investigation, R.W. and Y.Q.; resources, R.W. and Y.Q.; data curation, R.W.; writing—original draft preparation, R.W.; writing—review and editing, R.W. and Y.Q.; visualization, R.W.; supervision, Y.Q.; project administration, Y.Q.; funding acquisition, R.W. and Y.Q. All authors have read and agreed to the published version of the manuscript.

Funding

The project was partially supported by the National Natural Science Foundation of China (12061017), the Natural Science Foundation of Guangxi (AD19245102, 2020GXNSFAA159155), Improving the Basic Scientific Research Ability of Young and Middle-aged Teachers in Guangxi Universities under Grant 2022KY0575, Yulin Normal University Research Grant (2018YJKY29).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable to this article, as no datasets were generated or analyzed during the current study. All data generated or analyzed in this study are generated by the corresponding probability distribution, and its parameters are presented in Table 2 of the numerical simulation.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dou, Y.Q.; Ricks, T.M.; DuBien, J.L.; Lacy, T.E., Jr.; Liu, Y.-C. Response surface modeling to facilitate the parametric study of transversely impacted pressurized pipelines. Thin-Walled Struct. 2017, 119, 646–652. [Google Scholar] [CrossRef]
  2. Lehmann, E.L. Nonparametrics: Statistical Methods Based on Ranks; McGraw-Hill International Boook Company: New York, NY, USA, 1975. [Google Scholar]
  3. Owen, A.B. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 1988, 75, 237–249. [Google Scholar] [CrossRef]
  4. Owen, A.B. Empirical likelihood ratio confidence regions. Ann. Stat. 1990, 18, 90–120. [Google Scholar] [CrossRef]
  5. Hall, P. The Bootstrap and Edgeworth Expansion; Springer: New York, NY, USA, 1992. [Google Scholar]
  6. Hall, P.; La Scala, B. Methodology and algorithms of empirical likelihood. Int. Stat. Rev. 1990, 55, 109–127. [Google Scholar] [CrossRef]
  7. Chen, J.; Qin, J. Empirical likelihood estimation for finite populations and the effective usage of auxiliary information. Biometrika 1993, 80, 107–116. [Google Scholar] [CrossRef]
  8. Zhang, B. M-estimation and quantile estimation in the presence of auxiliary information. J. Stat. Plan. Inference 1995, 44, 77–94. [Google Scholar] [CrossRef]
  9. Sergey, T.; Dmitri, P. Using auxiliary information in statistical function estimation. Esaim Probab. Stat. 2005, 10, 11–23. [Google Scholar]
  10. Albertus, M. Asymptotic Z and chi-squared tests with auxiliary information. Metrika 2022. [Google Scholar] [CrossRef]
  11. Hansen, L.P. Large sample properties of generalized method of moments estimators. Econometrica 1982, 50, 1029–1054. [Google Scholar] [CrossRef]
  12. Godambe, V.P.; Heyde, C.C. Quasi-likelihood and optimal estimation. Int. Stat. Rev. 1987, 55, 231–244. [Google Scholar] [CrossRef]
  13. Qin, J.; Lawless, J. Empirical likelihood and general estimating equations. Ann. Stat. 1994, 22, 300–325. [Google Scholar] [CrossRef]
  14. Wang, D.; Chen, S.X. Empirical likelihood for estimating equations with missing values. Ann. Stat. 2009, 37, 490–517. [Google Scholar] [CrossRef]
  15. Zhou, Y.; Wan, A.T.K.; Wang, X. Estimating equatios inference with missing data. J. Am. Stat. Assoc. 2008, 44, 1187–1199. [Google Scholar] [CrossRef]
Figure 1. p 0 = 0.25 , the sample size is (30,30,40), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Bernoulli distributions.
Figure 1. p 0 = 0.25 , the sample size is (30,30,40), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Bernoulli distributions.
Mathematics 10 02341 g001
Figure 2. λ 0 = 1 , the sample size is (30,30,40), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Poisson distributions.
Figure 2. λ 0 = 1 , the sample size is (30,30,40), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Poisson distributions.
Mathematics 10 02341 g002
Figure 3. θ 0 = 1 , the sample size is (30,30,40), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 exponential distributions.
Figure 3. θ 0 = 1 , the sample size is (30,30,40), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 exponential distributions.
Mathematics 10 02341 g003
Figure 4. p 0 = 0.25 , the sample size is (110,110,120), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Bernoulli distributions.
Figure 4. p 0 = 0.25 , the sample size is (110,110,120), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Bernoulli distributions.
Mathematics 10 02341 g004
Figure 5. λ 0 = 1 , the sample size is (110,110,120), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Poisson distributions.
Figure 5. λ 0 = 1 , the sample size is (110,110,120), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 Poisson distributions.
Mathematics 10 02341 g005
Figure 6. θ 0 = 1 , the sample size is (110,110,120), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 exponential distributions.
Figure 6. θ 0 = 1 , the sample size is (110,110,120), the Q-Q plots of ELRT and KWT for the overall comparison of the 3 exponential distributions.
Mathematics 10 02341 g006
Table 1. Distribution families investigated in simulations.
Table 1. Distribution families investigated in simulations.
Density FunctionDistribution NameNotationParameter SpaceValue
p x ( 1 p ) 1 x Bernoulli ( p ) b ( p ) ( 0 , 1 ) x = 0 , 1
λ x x ! e λ Poisson ( λ ) P ( λ ) ( 0 , ) x = 0 , 1 ,
1 θ e x / θ Exponential ( θ ) E ( θ ) ( 0 , ) ( 0 , )
Table 2. Rejection rates of ELRT and KWT under sample sizes ( 30 , 30 , 40 ) and ( 110 , 110 , 120 ) and different alternatives indicated in terms of parameters.
Table 2. Rejection rates of ELRT and KWT under sample sizes ( 30 , 30 , 40 ) and ( 110 , 110 , 120 ) and different alternatives indicated in terms of parameters.
(30, 30, 40) (110, 110, 120)
DistributionAlternative Hypothesis
CombinationELRTKWT ELRTKWT
b ( p ) ( 0.1 , 0.3 , 0.5 ) 0.83030.8154 0.93760.8952
( 0.3 , 0.3 , 0.5 ) 0.82100.8103 0.92230.9109
P ( λ ) ( 0.1 , 0.3 , 0.4 ) 0.77560.7301 0.90120.8950
( 0.2 , 0.2 , 0.4 ) 0.80200.7850 0.90510.8700
E ( θ ) ( 0.2 , 0.3 , 0.5 ) 0.84250.7866 0.91330.8806
( 0.2 , 0.2 , 0.3 ) 0.86440.8157 0.92560.9102
Table 3. Rejection rates ELRT and KWT under sample sizes ( 30 , 30 , 40 ) and ( 110 , 110 , 120 ) and corresponding null hypothesis.
Table 3. Rejection rates ELRT and KWT under sample sizes ( 30 , 30 , 40 ) and ( 110 , 110 , 120 ) and corresponding null hypothesis.
(30, 30, 40) (110, 110, 120)
DistributionAlternative Hypothesis
CombinationELRTKWT ELRTKWT
b ( p ) , p 0 = 0.25 ( 0.25 , 0.25 , 0.25 ) 0.04900.0511 0.05100.0505
P ( λ ) , λ 0 = 1 ( 1 , 1 , 1 ) 0.04240.0455 0.04900.0510
E ( θ ) , θ 0 = 1 ( 1 , 1 , 1 ) 0.04150.0422 0.04800.0498
Table 4. Rejection rates ELRT and T&P (2006) under sample sizes ( 30 , 30 , 40 ) and ( 110 , 110 , 120 ) and different alternatives indicated in terms of parameters.
Table 4. Rejection rates ELRT and T&P (2006) under sample sizes ( 30 , 30 , 40 ) and ( 110 , 110 , 120 ) and different alternatives indicated in terms of parameters.
(30, 30, 40) (110, 110, 120)
DistributionAlternative Hypothesis
CombinationELRTT&P (2006) ELRTT&P (2006)
P ( λ ) ( 0.1 , 0.3 , 0.4 ) 0.77560.0.640 0.90120.9930
( 0.2 , 0.2 , 0.4 ) 0.80200.3750 0.90510.7800
( 2 , 3 , 4 ) 0.92600.3633 0.97220.9933
E ( θ ) ( 0.2 , 0.3 , 0.5 ) 0.84250.5230 0.91330.9820
( 0.2 , 0.2 , 0.3 ) 0.86440.1730 0.92560.6120
( 2 , 3 , 4 ) 0.68100.1100 0.85500.8610
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wu, R.; Qin, Y. Empirical Likelihood Ratio Tests for Homogeneity of Multiple Populations in the Presence of Auxiliary Information. Mathematics 2022, 10, 2341. https://doi.org/10.3390/math10132341

AMA Style

Wu R, Qin Y. Empirical Likelihood Ratio Tests for Homogeneity of Multiple Populations in the Presence of Auxiliary Information. Mathematics. 2022; 10(13):2341. https://doi.org/10.3390/math10132341

Chicago/Turabian Style

Wu, Ronghuo, and Yongsong Qin. 2022. "Empirical Likelihood Ratio Tests for Homogeneity of Multiple Populations in the Presence of Auxiliary Information" Mathematics 10, no. 13: 2341. https://doi.org/10.3390/math10132341

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop