Next Article in Journal
Robust Sliding-Mode Control Design of DC-DC Zeta Converter Operating in Buck and Boost Modes
Next Article in Special Issue
Joint Statistics of Partial Sums of Ordered i.n.d. Gamma Random Variables
Previous Article in Journal
Stability and Convergence Analysis of Multi-Symplectic Variational Integrator for Nonlinear Schrödinger Equation
Previous Article in Special Issue
Flexible-Elliptical Spatial Scan Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Homogeneity Test for Multiple Semicontinuous Data with the Density Ratio Model

1
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
2
School of Mathematical Science, Shenzhen University, Shenzhen 518060, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(17), 3789; https://doi.org/10.3390/math11173789
Submission received: 25 June 2023 / Revised: 5 August 2023 / Accepted: 2 September 2023 / Published: 4 September 2023
(This article belongs to the Special Issue Statistical Analysis: Theory, Methods and Applications)

Abstract

:
The density ratio model has been widely used in many research fields. To test the homogeneity of the model, the empirical likelihood ratio test (ELRT) has been shown to be valid. In this paper, we conduct a parametric test procedure. We transform the hypothesis of homogeneity to one on the equality of mean parameters of the exponential family of distributions. Then, we propose a modified Wald test and give its asymptotic power. We further apply it to the semicontinuous case when there is an excess of zeros in the sample. The simulation studies show that the new test controls the type-I error better than ELRT while retaining competitive power. Benefiting from the simple closed form of the test statistic, the computational cost is small. We also use a real data example to illustrate the effectiveness of our test.

1. Introduction

The density ratio model (DRM) was first introduced by Anderson [1] and later popularized by Qin and Zhang [2], who found the relationship between the two-sample DRM and the logistic regression model in case–control studies. The DRM models in a semi-parametric way the difference between two independent samples. Assume that X 01 , X 02 , , X 0 n 0 and X 11 , X 12 , , X 1 n 1 are two samples independently drawn from two cumulative distribution functions G 0 and G 1 . The DRM postulates that
d G 1 ( x ) = exp ( α + β q ( x ) ) d G 0 ( x ) ,
where q ( x ) is a d-dimensional pre-specified basis function while α and β are unknown parameters. We can also generalize the DRM to the ( m + 1 ) sample case as follows
X 01 , X 02 , , X 0 n 0 G 0 ( x ) , X 11 , X 12 , , X 1 n 1 G 1 ( x ) , X m 1 , X m 2 , , X m n m G m ( x ) ,
where
d G i ( x ) = exp ( α i + β i q ( x ) ) d G 0 ( x ) ,
for i = 1 , 2 , , m . Even though the form of g i ( x ) = d G i ( x ) is unspecified, many parametric distribution families are in the DRM, including normal, exponential, and gamma distributions, among others.
Due to its flexibility and utility, increasing importance has been attached to the DRM. Zhang [3] proposed a weighted Kolmogorov–Smirnov type statistic to test the validity of the DRM based on case–control data. Qin [4] and Zou et al. [5] applied the DRM to the semi-parametric mixture model and developed test statistics based on the empirical likelihood function. Zhang [6] induced the quantile estimator under a two-sample semi-parametric model and Chen and Liu [7] generalized the estimator to the ( m + 1 ) -sample case. Another problem of interest is to test the homogeneity of the DRM model, that is, to test whether G 0 = G 1 = = G m . Fokianos et al. [8] outlined a method based on the classical normal-based one-way analysis of variance. Cai et al. [9] studied the properties of the dual empirical likelihood ratio tests to general hypotheses on parameters. Moreover, let G 0 be the initial cumulative distribution function (cdf) of a population, and G 1 be the cdf of the weighted distribution of G 0 , so that their densities are connected to each other as follows,
g 1 ( x ) = w ( x ) E [ w ( x ) ] g 0 ( x ) .
Then, w ( x ) , in the context of the DRM, seems to be e α + β q ( x ) , and X is a random variable with density g 0 ( x ) . Thus, the DRM lies in the context of weighted distributions which have many applications in various fields. The problem of detecting or estimating the weight function w ( x ) is of interest in the framework of weighted distributions; see Patil and Rao [10], Rao [11,12] and Lele and Keim [13].
Recent research on the DRM mainly considered using the empirical likelihood function. We give a brief introduction to this method below. Given α 0 = 0 and β 0 = 0 , the likelihood function of the model (2) has the form
L = i = 0 m j = 1 n i d G i ( x i j ) = i = 0 m j = 1 n i exp ( α i + β i q ( x i j ) ) d G 0 ( x i j ) .
If G 0 is restricted to a discretized distribution as
G 0 ( x ) = i = 0 m j = 1 n i p i j I ( x i j x ) ,
where p i j is constrained by
p i j > 0 a n d i = 0 m j = 1 n i p i j exp ( α t + β t q ( x i j ) ) = 1 ,
for t = 0 , 1 , , m . Then, the Lagrangian multipliers described in Qin and Lawless [14] are used to obtain the maximum empirical likelihood estimate of ( α i , β i ) . However, the type-I error of the empirical likelihood ratio test cannot be well controlled in finite samples. To deal with this problem, Wang et al. [15] suggested using a nonparametric bootstrap procedure. However, the computational cost of the bootstrap procedure is non-negligible, especially when m is large.
We also notice that there is increasing interest in the case when there are zero values in the samples. This phenomenon happens in many research fields such as meteorology, health, economics, and life sciences; see Tu and Zhou [16], Muralidharan and Kale [17] and Kassahun-Yimer et al. [18]. For example, in the meteorology study, a group of zero observations may correspond to a number of dry days when there are no rainfall measurements recorded. Another example happens in dietary intake studies, where zero observations may occur for some food components that are consumed episodically. In the examples mentioned above, samples are constructed from two parts. One is the zero observations and the other is the positive observations. This kind of distribution is also called a semicontinuous distribution, which has the form
F ( x ) = p I ( x = 0 ) + ( 1 p ) I ( x > 0 ) G ( x ) , X 0 ,
where p indicates the probability of drawing a zero observation and G ( x ) is a positive and continuous distribution. We recommend the reviews of Neelon et al. [19,20] for more details. In this paper, we adopt the DRM, as the choice of G ( x ) benefits from the advantages we introduced above. Thus, the model becomes
X 01 , X 02 , , X 0 n 0 F 0 ( x ) , X 11 , X 12 , , X 1 n 1 F 1 ( x ) , X m 1 , X m 2 , , X m n m F m ( x ) ,
where
F i ( x ) = p i I ( x = 0 ) + ( 1 p i ) I ( x > 0 ) G i ( x ) , x 0
for i = 0 , 1 , 2 , , m , where I is the indicator function.
A two-part test is proposed to test the homogeneity of the model (3), which is a fundamental problem in real applications. For example, the different distributions of precipitation in certain areas among years may influence the strategy of agricultural irrigation. Furthermore, in colorectal cancer clinical trials, it is important to compare the efficacy and safety between two or more treatment arms; see Lachenbruch [21], Su et al. [22], Smith et al. [23] and Wang and Tu [24]. The two-part test consists of a test for the binomial distribution and another for the continuous responses. For the two-sample case, Wang et al. [15] suggested that the former test is a χ 2 test while the latter can be a Wilcoxon–Mann–Whitney rank-sum test or a two-sample t-test. For the ( m + 1 ) -sample case, the latter can be replaced by a Kruskal–Wallis rank-sum test or an ANOVA F-test; see for example, Wilcox [25], Hallstrom [26] and Pauly et al. [27]. However, as far as we are concerned, the tests mentioned above may perform badly in heteroskedastic cases.
In this paper, we propose an efficient method based on the exponential family of distributions. First, the problem of testing the homogeneity is transformed to testing the equalities of the mean parameters. Secondly, a Wald test statistic is proposed to test the equalities. Since g 0 is unknown, we modify the Wald test statistic based on the sample from g 0 . This modified statistic has a simple closed form and we show that it converges in distribution to the χ 2 distribution under the null hypotheses. We also give the local asympotical power. Thirdly, the Bernoulli distribution can be regarded as a DRM and we obtain the combined modified Wald test for the semicontinuous case. Finally, the simulation studies illustrate that the computational cost of the modified Wald test is much less than the bootstrap procedure, while it always controls type-I error better than the empirical likelihood ratio test. Moreover, the power of the modified Wald test is competitive.
The rest of the paper is organized as follows. In Section 2, we propose the method for testing the homogeneity of the two-sample model for both continuous and semicontinuous distributions. In Section 3, we generalize the result to multiple-sample cases. We illustrate the performance of the modified Wald test and compare it with the empirical likelihood ratio test through simulations in Section 4. We consider a real data sample to show the practicability of our method and give the conclusions in the last section.

2. Two-Sample Case

2.1. Density Ratio Model

In this section, we assume that X 01 , X 02 , , X 0 n 0 and X 11 , X 12 , , X 1 n 1 are the two independent samples drawn from G 0 ( x ) and G 1 ( x ) , respectively. It is further assumed that for certain d-dimensional q ( x ) = ( q 1 ( x ) , q 2 ( x ) , , q d ( x ) ) ,
g 1 ( x ) = e α + β q ( x ) g 0 ( x ) ,
where g 1 ( x ) and g 0 ( x ) are the density of G 1 ( x ) and G 0 ( x ) with respect to a σ -finite measure ν , respectively. The hypotheses for testing the homogeneity are
H 0 : g 0 = g 1 vs . H 1 : g 0 g 1 .
Since g 1 ( x ) is a density function, we have
e α + β q ( x ) g 0 ( x ) d ν ( x ) = e α e β q ( x ) g 0 ( x ) d ν ( x ) = 1 .
Hence, there is a function A ( β ) such that
e α = e A ( β ) .
Then,
g ( x ) = e β q ( x ) A ( β ) g 0 ( x ) .
Construct an exponential family of distributions
P = { e β q ( x ) A ( β ) g 0 ( x ) , β Ω 0 } ,
where
Ω 0 = β : e β q ( x ) g 0 ( x ) d ν ( x ) <
is the natural parameter space. Under the family P , the hypotheses (4) are equivalent to
H 0 : β = 0 vs . H 1 : β 0 .
For family P , we give two simple assumptions.
Assumption 1.
P is a full-rank exponential family of distributions.
Then, under Assumption 1, the Fisher information matrix of P is positively definite and continuous. By the properties of the exponential family,
I ( β ) = c o v β ( q ( x ) ) > 0 ,
for an interior point β of Ω 0 .
Assumption 2.
The origin 0 is an interior point of Ω 0 .
Although always 0 Ω 0 because g 0 ( x ) is a density, it may not be an interior point. For example, if d = 1 , q ( x ) = x 4 and g 0 ( x ) = ϕ ( x ) , the density of the standard normal distribution, then Ω 0 = ( , 0 ] .
Hypotheses (6) are expressed by the nature parameter β of P . We further want to represent them with the mean parameter of P , which is defined as
m ( β ) = E β ( q ( x ) ) = q ( x ) e β q ( x ) A ( β ) g 0 ( x ) d ν ( x ) .
The following lemma is demanded.
Lemma 1.
Under Assumptions 1 and 2, β = 0 if and only if m ( β ) = m ( 0 ) .
The proof is given in Appendix A.
Lemma 1 shows that the hypotheses (6) are equivalent to
H 0 : m ( β ) = m ( 0 ) vs . H 1 : m ( β ) m ( 0 ) .
First, consider the case where g 0 is known. Based on the data X 1 = ( X 11 , X 12 , , X 1 n 1 ) , the maximum likelihood estimator of m ( β ) is
q ¯ ( 1 ) 1 n 1 i = 1 n 1 q ( X 1 i ) .
The Wald test statistic of hypotheses (7) is then
T ( X 1 ) = n 1 ( q ¯ ( 1 ) m ( 0 ) ) ( I ( 0 ) ) 1 ( q ¯ ( 1 ) m ( 0 ) ) .
When β = 0 , by the central limit theorem, we have
n 1 ( q ¯ ( 1 ) m ( 0 ) ) d N ( 0 , I ( 0 ) ) ,
where d is the convergence in the distribution. Then, T ( X 1 ) d χ 2 ( d ) . The Wald test with significance level α can be obtained by the critical region
{ x 1 : T ( x 1 ) χ 1 α 2 ( d ) } ,
where χ 1 α 2 ( d ) denotes the ( 1 α ) -quantile of the χ 2 ( d ) .
However, the test (9) is not applicable when g 0 ( x ) is unknown, because m ( 0 ) and I ( 0 ) in T ( X 1 ) are unknown. Fortunately, we have sample X 0 = ( X 01 , , X 0 n 0 ) from g 0 ( x ) , which can be used to estimate m ( 0 ) and I ( 0 ) instead. The estimators are
m ( 0 ) ^ = q ¯ ( 0 ) = 1 n 0 i = 1 n 0 q ( X i ) , I ( 0 ) ^ = S 0 2 = 1 n 0 1 i = 1 n 0 ( q ( X 0 i ) q ¯ ( 0 ) ) ( q ( X 0 i ) q ¯ ( 0 ) ) .
Then, the test statistic (8) can be modified to
T ( X 0 , X 1 ) = n 0 n 1 n 0 + n 1 ( q ¯ ( 1 ) q ¯ ( 0 ) ) S 0 2 ( q ¯ ( 1 ) q ¯ ( 0 ) ) .
We refer to this statistic as a modified Wald statistic.
Notice that the two populations are the same under the null hypothesis, let
S 1 2 = 1 n 1 1 i = 1 n 1 ( q ( X 1 i ) q ¯ ( 0 ) ) ( q ( X 1 i ) q ¯ ( 0 ) ) .
then, we can use
S 2 = 1 n 0 + n 1 2 ( n 0 1 ) S 0 2 + ( n 1 1 ) S 1 2 .
as an estimate of I ( 0 ) and obtain T ( X 0 , X 1 ) , which is
T ( X 0 , X 1 ) = n 0 n 1 n 0 + n 1 ( q ¯ ( 1 ) q ¯ ( 0 ) ) S 2 ( q ¯ ( 1 ) q ¯ ( 0 ) ) .
Assumption 3.
Let n = n 0 + n 1 . When n ,
n i n r i ( 0 , 1 ) , i = 0 , 1 .
Theorem 1.
Assume that the Assumptions 1–3 hold. Then,
1. 
Under H 0 in (7),
T ( X 0 , X 1 ) d χ 2 ( d ) .
2. 
Take β n = 1 n h , h R d . Under this alternative,
T ( X 0 , X 1 ) d χ 2 ( d , δ ) ,
where δ = r 0 r 1 h I ( 0 ) h , the non-central parameter.
The proof is given in Appendix A.
Now, the modified Wald test with level α is determined by the critical region
{ ( x 0 , x 1 ) : T ( x 0 , x 1 ) > χ 1 α 2 ( d ) } .
The local asymptotic power of the modified Wald test is given by
P ( V > χ ( 1 α ) 2 ( d ) ) ,
where V χ 2 ( d , δ ) . Since r 0 + r 1 = 1 , δ is maximized at r 0 = r 1 = 1 / 2 , i.e, n 0 = n 1 . Furthermore, the power increases in h I ( 0 ) h .
Remark 1. 
The distributions we consider in the next subsection are semicontinuous, where the data are one-dimensional and non-negative. However, Theorem 1 holds for P in which the supports of the distributions can be either multivariate or negative.

2.2. Semicontinuous Data

In this subsection, we consider the case when both populations are semicontinuous. Specifically, assume that the two independent samples X 0 = ( X 01 , X 02 , , X 0 n 1 ) and X 1 = ( X 11 , X 12 , , X 1 n 1 ) are drawn from F 0 ( x ) and F 1 ( x ) , respectively, where
F i ( x ) = p i I ( x = 0 ) + ( 1 p i ) I ( x > 0 ) G i ( x ) , i = 0 , 1 .
The distributions G 0 and G 1 satisfy (1) and the supports of them are in [ 0 , ) . Denote the densities of them by g 0 and g 1 . Then, the hypotheses for testing homogeneity are
H 0 : p 0 = p 1 and g 0 = g 1 vs . p 0 p 1 or g 0 g 1 .
Let n 00 and n 10 be the numbers of zero observations and let n 01 and n 11 be the numbers of non-zero observations in two populations, respectively. Without loss of generality, assume that the first n 01 of X 0 and n 11 of X 1 are non-zero. Then, the estimates of p 0 and p 1 are
p ^ 0 = n 00 n 0 , p ^ 1 = n 10 n 1 .
A natural test statistic for p 0 = p 1 is
B 2 = ( p ^ 0 p ^ 1 ) 2 1 n 0 p ^ 0 ( 1 p ^ 0 ) + 1 n 1 p ^ 1 ( 1 p ^ 1 ) .
Then, the two-part test statistic is a combination of test statistics (16) and (11), which is
T s e m i ( X 0 , X 1 ) = B 2 + n 01 n 11 n 01 + n 11 ( q ¯ ( 1 ) q ¯ ( 0 ) ) S 2 ( q ¯ ( 1 ) q ¯ ( 0 ) )
where
q ¯ ( 0 ) = 1 n 01 i = 1 n 01 q ( X 0 i ) , q ¯ ( 1 ) = 1 n 11 i = 1 n 11 q ( X 1 i ) , S 2 = 1 n 01 + n 11 2 ( n 01 1 ) S 0 2 + ( n 11 1 ) S 1 2 ,
and
S 0 2 = 1 n 01 1 i = 1 n 01 ( q ( X 0 i ) q ¯ ( 0 ) ) ( q ( X 0 i ) q ¯ ( 0 ) ) , S 1 2 = 1 n 11 1 i = 1 n 11 ( q ( X 1 i ) q ¯ ( 1 ) ) ( q ( X 1 i ) q ¯ ( 1 ) ) .
Corollary 1. 
Assume that Assumptions 1–3 hold and 0 < p 0 , p 1 < 1 . Then,
1. 
Under H 0 in (14),
T s e m i ( X 0 , X 1 ) d χ 2 ( d + 1 ) .
2. 
Take β n = 1 n h , h R d , p 1 n = p 0 + k n , under this alternative,
T s e m i ( X 0 , X 1 ) d χ 2 ( d + 1 , δ ) ,
where
δ = r 0 r 1 k 2 p 0 ( 1 p 0 ) + h I ( 0 ) h
the non-central parameter.
The proof is given in Appendix A.
Now, the modified Wald test with level α is determined by the critical region
{ ( x 0 , x 1 ) : T s e m i ( x 0 , x 1 ) > χ 1 α 2 ( d + 1 ) } .
The local asymptotic power of the modified Wald test is given by
P ( V > χ 1 α 2 ( d + 1 ) ) ,
where V χ 2 ( d + 1 , δ ) . Interestingly, although the numbers of non-zero observations in two samples are random, the non-central parameter
δ c = r 0 r 1 h I ( 0 ) h
as δ in Theorem 1 (2).

3. Multiple Sample Case

In this section, we generalize the conclusions in the last section to the cases when there are more than two populations. Similarly, we first study the case when all the populations are DRM. Then, we move on to the semicontinuous case.

3.1. Density Ratio Model

Assume that X i j , j = 1 , 2 , , n i are samples independently drawn from the distributions G i , i = 0 , 1 , 2 , , m . Let g 0 ( x ) be the density of G 0 . Then, the density function g i of G i satisfies
g i ( x ) = e α i + β i q ( x ) g 0 ( x )
where i = 1 , 2 , , m . q ( x ) = ( q 1 ( x ) , q 2 ( x ) , , q d ( x ) ) is known. < α i < , and β i = ( β i 1 , β i 2 , , β i p ) are unknown parameters. For convenience, we also define α 0 = 0 and β 0 = 0 . As in Section 2.1, there exists a function A ( β ) such that
g i ( x ) = e β i q ( x ) A ( β i ) g 0 ( x ) P ,
for i = 1 , 2 , , m . Then, to test the homogeneity of the DRM is equivalent to testing
H 0 : β 1 = β 2 = = β m = 0 vs . H 1 : β i 0 0 for some i 0 { 1 , 2 , , m } .
With Lemma 1, testing the homogeneity is equivalent to testing
H 0 : m ( β i ) = m ( 0 ) , 1 i m vs . H 1 : m ( β i 0 ) m ( 0 ) for some 1 i 0 m .
Based on the sample X i = ( X i 1 , X i 2 , , X i n i ) , the MLE of the mean vector m ( β i ) is
q ¯ ( i ) = 1 n i j = i n i q ( X i j ) , i = 1 , 2 , , m .
Then, under H 0 , by the central limit theorem, we have
n i q ¯ ( i ) m ( 0 ) d N d ( 0 , I ( 0 ) ) , i = 1 , 2 , , m .
We can construct the test statistic as
T = i = 1 m n i q ¯ ( i ) m ( 0 ) ( I ( 0 ) ) 1 q ¯ ( i ) m ( 0 ) .
Then, by the independence of q ¯ ( i ) , this statistic is converging in distribution to a χ 2 distribution with m p degrees of freedom, that is,
T d χ 2 ( m d ) .
When g 0 ( x ) is unknown, and m ( 0 ) and I ( 0 ) cannot be computed directly. Analogously, the estimates of them using the samples X 0 = ( X 01 , X 02 , , X 0 n 0 ) and X 1 , X 2 , , X m are
m ( 0 ) ^ = q ¯ ( 0 ) = 1 n 0 j = 1 n 0 q x 0 j , I ( 0 ) ^ = S 2 = 1 n m 1 i = 0 m ( n i 1 ) S i 2 ,
where
S i 2 = 1 n i 1 j = 1 n i q ( x i j ) q ¯ ( i ) q ( x i j ) q ¯ ( i )
and n = i = 0 m n i . Then, the test statistic (22) is estimated by
i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) S 2 q ¯ ( i ) q ¯ ( 0 ) .
However, the statistic above may not converge in distribution to χ 2 ( m d ) since there is q ¯ ( 0 ) in all the terms of (23). So, we construct a modified test statistic as
T ( X ) = i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) S 2 q ¯ ( i ) q ¯ ( 0 ) 1 n i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) S 2 i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) ,
where X = ( X 0 , X 1 , , X m ) .
Assumption 4. 
When n ,
n i n r i ( 0 , 1 ) , i = 0 , 1 , , m .
Theorem 2. 
Assume that Assumptions 1, 2, and 4 hold. Then,
1. 
Under H 0 in (21),
T ( X ) d χ 2 ( m d ) .
2. 
Take β i n = 1 n h i , h i R d , i = 1 , 2 , , m . Under this alternative,
T ( X ) d χ 2 ( m d , δ ) ,
where
δ = i = 1 m r i h i I ( 0 ) h i i = 1 m r i h i I ( 0 ) i = 1 m r i h i .
The proof is given in Appendix A.
Now, the modified Wald test with level α is determined by the critical region
{ x : T ( x ) > χ 1 α 2 ( m d ) } .
The local asymptotic power of the modified Wald test is given by
P ( V > χ 1 α 2 ( m d ) ) ,
where V χ 2 ( m d , δ ) .
Remark 2. 
When m = 1 , the statistic (24) has the form
T ( X ) = n 1 q ¯ ( 1 ) q ¯ ( 0 ) S 2 q ¯ ( 1 ) q ¯ ( 0 ) 1 n n 1 q ¯ ( 1 ) q ¯ ( 0 ) S 2 n 1 q ¯ ( 1 ) q ¯ ( 0 ) = n 1 n 0 n q ¯ ( 1 ) q ¯ ( 0 ) S 2 q ¯ ( 1 ) q ¯ ( 0 ) .
This is the same as the statistic (11).
Remark 3. 
When h i = h , g 1 = g 2 = = g m . In this case, δ becomes
δ = i = 1 m r i h I ( 0 ) h i = 1 m r i h I ( 0 ) i = 1 m r i h = ( 1 r 0 ) r 0 h I ( 0 ) h .
This means that δ is maximized at r 0 = 1 / 2 .
Remark 3 above can be naturally generalized to the following question. When the total sample size n is fixed, how to arrange ( n 0 , n 1 , , n m ) to maximize the local power? To solve this problem, we first let
H = ( h 1 , h 2 , , h m )
and
D = h 1 I ( 0 ) h 1 , h 2 I ( 0 ) h 2 , , h m I ( 0 ) h m .

3.2. Semicontinuous Data

Now, we consider the model (3) where the populations are semicontinuous. Assume that X i = ( X i 1 , X i 2 , , X i n i ) is drawn from
F i ( x ) = p i I ( x = 0 ) + ( 1 p i ) I ( x > 0 ) G i ( x ) , , i = 0 , 1 , , m .
Let n i 0 and n i 1 be the numbers of zero and non-zero observations X i . Without loss of generality, assume that the first n i 1 samples of X i are non-zero. The densities of G 0 , G 1 , , G m are denoted by g 0 , g 1 , , g m and satisfy
g i ( x ) = exp ( α i + β i q ( x ) ) g 0 ( x ) , i = 0 , 1 , , m ,
where α 0 = 0 and β 0 = 0 . From the continuous case considered in the last subsection, the hypotheses of testing the homogeneity are equivalent to
H 0 : p 0 = p 1 = = p m and β 0 = β 1 = = β m vs . H 1 : p i 0 p 0 or β i 0 0 for some i 0 { 1 , 2 , , m } .
The test for homogeneity of the continuous part is considered in the last subsection. The remaining task is to test the homogeneity of ( m + 1 ) binomial distributions. The hypotheses are
H 0 : p 0 = p 1 = = p m vs . H 1 : p i 0 p 0 for some i 0 { 1 , 2 , , m } .
As a proof of Corollary 1, the Bernoulli distributions can be expressed as a DRM, where
α i = log 1 p i 1 p 0 , β i = log p i p 0 1 p 0 1 p i ,
and q ( x ) = x . Then, the MLE of p i is
p ^ i = n i 0 n i , i = 0 , 1 , , m .
The Fisher information is estimated by
S b 2 = 1 n m 1 i = 0 m ( n i 1 ) S b i 2 ,
where
S b i 2 = 1 n i 1 j = 1 n i x i j n i 0 n i 2 = n i n i 1 p ^ i ( 1 p ^ i ) .
Then, we can construct the test statistic for the binomial part using Theorem 2.
T b = i = 1 m n i p ^ ( i ) p ^ ( 0 ) 2 S b 2 1 n i = 1 m n i p ^ ( i ) p ^ ( 0 ) 2 S b 2 .
Finally, we combine the two test statistics together to obtain the test statistic for the semicontinuous case. Let
q ¯ ( i ) = 1 n i 1 j = i n i 1 q ( x i j ) , i = 0 , 1 , , m .
and
S c 2 = 1 i = 0 m n i 1 m 1 i = 0 m ( n i 1 1 ) S c i 2 , i = 0 , 1 , , m ,
where
S c i 2 = 1 n i 1 1 j = 1 n i 1 q j ( x i j ) q ¯ ( i ) q ( x i j ) q ¯ ( i ) .
Then, the test statistic for the semicontinuous case is
T s e m i = i = 1 m n i 1 q ¯ ( i ) q ¯ ( 0 ) S c 2 q ¯ ( i ) q ¯ ( 0 ) 1 i = 0 m n i 1 i = 1 m n i 1 q ¯ ( i ) q ¯ ( 0 ) S c 2 i = 1 m n i 1 q ¯ ( i ) q ¯ ( 0 ) + i = 1 m n i p ^ ( i ) p ^ ( 0 ) 2 S b 2 1 n i = 1 m n i p ^ ( i ) p ^ ( 0 ) 2 S b 2 .
Corollary 2. 
Assume that Assumptions 1, 2, and 4 hold and 0 < p 0 , p 1 , , p m < 1 . Then,
1. 
Under H 0 in (27),
T s e m i ( X ) d χ 2 ( m ( d + 1 ) ) .
2. 
Take β i n = 1 n h i , h i R d , p i n = p 0 + k i n , i = 1 , 2 , , m . Under this alternative,
T s e m i ( X ) d χ 2 ( m ( d + 1 ) , δ ) ,
where
δ = 1 p 0 ( 1 p 0 ) i = 1 m r i k i 2 i = 1 m r i k i 2 + i = 1 m r i h i I ( 0 ) h i i = 1 m r i h i I ( 0 ) i = 1 m r i h i .
The proof is given in Appendix A.
Now, the modified Wald test with level α is determined by the critical region
{ x : T ( x ) > χ 1 α 2 ( m ( d + 1 ) ) } .
The local asymptotic power of the modified Wald test is given by
P ( V > χ 1 α 2 ( m ( d + 1 ) ) ) ,
where V χ 2 ( m ( d + 1 ) , δ ) .

4. Simulation Study

In our simulations we make comparison between three tests. In addition to the modified Wald test we proposed, denoted by “MWT”, the others are the dual empirical likelihood ratio test proposed by Cai et al. [9] and the empirical likelihood ratio test using the bootstrap procedure proposed by Wang et al. [15], which are denoted by “DELRT” and “BELRT”, respectively. We hope to show that our modified Wald test is available for different cases. In the first simulation study, we illustrate the case when the number of populations is large. We compare the performances and computational costs of the three tests. It can be seen that MWT controls the type-I error better than DELRT while taking much less time than BELRT. In the second one, we look into three normal distributions with the same scale and study how the tests perform with the change in location parameter. This means that the three populations vary from the same to totally different. We can clearly see from Figure 1 how the three tests perform. In the third simulation study we hope to verify Remark 3 in our context, which shows an interesting phenomenon of the power effected by sample sizes under certain alternative hypotheses. In the last one, we consider the semicontinuous case when the continuous part is either log-normal or a gamma distribution. The same parameter settings are also considered by Wang et al. [15]. From Figure 2 and Figure 3, we can show that our method is competitive.

4.1. Scenario 1

We consider the DRM when ( m + 1 ) = 2 , 3 , 5 , 8 , and 11. Let G 0 be the standard normal distribution while the rest are the normal distribution with scale fixed to 1 and location fixed to μ . We consider the cases when μ = 0 , 0.5 , 0.75 , 1 . We choose the same sample size n 0 = n 1 = = n m = 30 and 50 for all the populations and generate M = 1000 repetitions for each situation with different m and μ . Then, we calculate the type-I error of the three statistics when μ = 0 and the power of them when μ 0 at the 5% significance level. The results are shown in Table 1 and Table 2, respectively.
It can be seen that the type-I error of DELRT is not as well controlled as the other two. The type-I error and the power of MWT is similar to that of BELRT. However, the computational cost of MWT is much smaller. For the DELRT and the modified Wald test, realizing a repetition of M = 1000 when ( m + 1 ) = 11 needs no more than 40 s. However, for the bootstrap procedure when B = 999 , it takes nearly 4 h using the “for” loop in the R programming language to realize a single repetition of M = 1000 when ( m + 1 ) = 5 and 12 h when ( m + 1 ) = 8 . When it comes to ( m + 1 ) = 11 , it took nearly a whole day. Certainly we can use some parallel computational methods to accelerate the computation, but the running time is still a big challenge. The modified Wald test statistic we proposed seems to be a promising compromise, especially when the number of the population is large. It controls the type-I error better than DELRT while retaining a similar computational cost.

4.2. Scenario 2

In the second simulation study, we show how our test statistic performs in the case of three continuous populations. We choose the three populations as normal distributions with the scale equal to 1. The location parameters of the three are set to be μ , 0, and μ . Then, we change μ from 0.2 to 0.6 to see how our test statistic performs when the three distributions vary from “similar” to “totally different”. We consider the case with equal sample sizes n i = 20 , 30 , and 50, i = 0 , 1 , 2 . For each sample size, we consider μ = 0 , 0.3 , 0.4 , 0.5 , and 0.6 . We generate M = 10,000 repetitions for each case and show the comparison of the three statistics in Table 3 and Figure 1. In this figure, “MWT”, “DELRT”, and “BELRT” denote the modified Wald test, dual empirical likelihood ratio test, and bootstrap empirical likelihood ratio test, respectively.
Figure 1. Type-I error and power (%) of the three statistics in simulation two for different sample sizes.
Figure 1. Type-I error and power (%) of the three statistics in simulation two for different sample sizes.
Mathematics 11 03789 g001
It can be seen that the modified Wald test can control the type-I error nicely in this case, even when the sample size is small. The power of the Wald test is always smaller than that of the DELRT due to the better control of the type-I error. However, the disparity is gradually eliminated with the increase in the sample size and the differences between the populations.

4.3. Scenario 3

In this simulation study, we verify the conclusion in Remark 3. The total sample size n is fixed and m = 2 and 4 are under consideration. We choose different ( n 0 , n 1 , , n m ) for both cases and compare the power for different sample sizes. We fixed g 0 to N ( 0 , 1 ) , L N ( 0 , 1 ) , and G A M ( 1 , 2 ) . The rest g 1 = = g m are chosen to be the same distribution corresponding to g 0 with different μ = 0.3 , 0.5 , and 0.7 for normal and log-normal cases and 1.2 , 1.4 , and 1.6 for the location parameter in gamma’s case. For each different sample size and μ , we generalize M = 100,000 repetitions and calculate the power. The details are given in Table 4 and Table 5. The symbols I to VIII in Table 5 denote different sample sizes which are shown in Table 6.
It can be seen that the conclusion in Remark 3 holds basically. It is obviously that n 0 has the biggest impact on the power while the rest of the sample sizes n 1 , , n m do not seem to have much influence. This can be seen quite clearly from the comparison of the first four sample sizes in the three-sample case and case I and II, and case V and VI in the five-sample case.

4.4. Scenario 4

In this simulation study, we consider the semicontinuous case. We adopt the same parameter settings as in Wang et al. [15]. Assume that the samples are generated from
F i ( x ) = p i I ( x = 0 ) + ( 1 p i ) I ( x > 0 ) G i ( x ) ,
for i = 0 , 1 , 2 , where G i ’s are all log-normal or gamma distributions. The parameters of F i are present in Table 7. Each of LN 1 –LN 15 and GAM 1 –GAM 15 in the first column denotes a mixture model whose continuous part follows a log-normal or gamma distribution. p i denotes the probability of drawing a zero observation for F i . LN ( a i , b i ) denotes a log-normal distribution whose associated normal distribution has the mean a i and variance b i . GAM ( a i , b i ) denotes a gamma distribution with shape parameter a i and scale parameter b i . We consider both the equal sample sizes where n 0 = n 1 = n 2 = 30 , 50 , 100 and the unequal sample size where ( n 0 , n 1 , n 2 ) = ( 50 , 100 , 150 ) . For every parameter setting, we generate M = 10,000 repetitions. We calculate the type-I error of testing homogeneity at 5% significance level for LN 1 –LN 3 and GAM 1 –GAM 3 , and the power of that for the rest of the parameter settings. The type-I errors of the three statistics are shown in Table 8 while the powers are shown in Table 9 and Table 10, respectively, for the log-normal and the gamma cases. To have a better view of them, we show the powers of the three statistics in Figure 2 and Figure 3. It can be seen that the results are competitive.
Figure 2. Power (%) for testing H 0 at significance level 0.05 when data are generated from LN 4 –LN 15 in Table 7.
Figure 2. Power (%) for testing H 0 at significance level 0.05 when data are generated from LN 4 –LN 15 in Table 7.
Mathematics 11 03789 g002
Figure 3. Power (%) for testing H 0 at significance level 0.05 when data are generated from GAM 4 –GAM 15 in Table 7.
Figure 3. Power (%) for testing H 0 at significance level 0.05 when data are generated from GAM 4 –GAM 15 in Table 7.
Mathematics 11 03789 g003

5. Real Data Sample

In this section, we employ the real data example suggested by Wang et al. [15] which is available from the website of the University of Waterloo weather station data archive (http://weather.uwaterloo.ca/data.html, accessed on 1 June 2023). We focus on the data that records the daily precipitation measurements (in millimeters) in the North Campus of the University of Waterloo, Canada and investigate whether the precipitation distribution has changed over the past few years.
Benefiting from what Wang et al. [15] has previously reported, to reduce the time dependence among the observations, we take every fourth measurement into our analysis, i.e., only use the observations on days 1, 5, 9, …, 361, which gives a sample size of 91 for each sample. Then, we consider two cases, one is from 2003 to 2006 and the other from 2008 to 2012, we hope to obtain some information about the changing of the precipitation distribution in the last few years. Some summaries of the samples are given below
  • From 2003 to 2006, the estimates of the probability of dry days are (0.30, 0.40, 0.42, 0.42) while those of 2008 to 2012 are (0.45, 0.49, 0.43, 0.38, 0.40).
  • The sample means of 2003 to 2006 are (2.05, 3.54, 3.40, 3.50) while those of 2008 to 2012 are (3.42, 1.37, 2.29, 4.08, 3.09).
  • The sample variances are (17.52, 41.07, 76.10, 59.50) and (95.19, 13.53, 18.35, 73.83, 59.76), respectively.
For each null and alternative hypothesis, we fit the data to both the log-normal and the gamma mixture under the assumption of the density ratio model using the maximum likelihood estimate. The details are give in Table 11 below. There is a small difference between the parameters of ours and Wang et al. [15], this may be caused by the mistake when summarizing the data of the year 2003. LN 16 and GAM 16 are the parameters under the null hypothesis of the case of 2003 to 2006, while LN 18 and GAM 18 are those of 2008 to 2012. The rest of the parameters are for the alternative hypotheses.
We apply the modified Wald test on the null hypotheses LN 16 and GAM 16 , respectively. The test statistic is 21.65 for the log-normal mixture and 24.02 for the gamma mixture. Both statistics are larger than the 0.05% quantile of χ 8 2 , which is 15.51. The null hypothesis should be rejected at the significance level 0.05. We then move on to the case of 5 years. This time the result becomes quite different. The test statistic for LN 18 is 11.70, while that for GAM 18 is 9.95, this is smaller than the 0.05% quantile of χ 10 2 , which is 18.3074, which means that the null hypothesis is true at the significance level 0.05. The two simulations above indicate that the precipitation distribution of the area was changing from 2003 to 2006, but may have remained unchanged over 2008 to 2012.

6. Conclusions

In this paper, we propose a modified Wald test for homogeneity of the density ratio model. Since the density functions are unknown, recent works mainly focus on the empirical likelihood ratio test, which is a nonparametric method. We transform the problem of testing homogeneity to testing the equalities of the mean parameters of the exponential family of distributions. Then, we propose a modified Wald test, which is a parametric method. The simulations show that the type-I error of the modified Wald test is smaller than that of the empirical likelihood ratio test. Since the modified Wald test statistic converges in distribution to the χ 2 distribution, it can further be applied to the semicontinuous data. It should be noticed that for the DRM, we test hypotheses β 1 = β 2 = = β m = 0 . This can be generalized to test hypotheses β 1 = β 10 , β 2 = β 20 , , β m = β m 0 .

Author Contributions

Conceptualization, X.X.; methodology, X.X.; software, Y.W.; validation, Y.W. and X.X.; formal analysis, Y.W. and X.X.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W.; visualization, Y.W.; supervision, X.X.; project administration, X.X.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under grant no. 11471030 and 11471035.

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Proof of Lemma 1. 
We only need to prove that for two parameters β ( 1 ) and β ( 2 ) , the equation m ( β ( 1 ) ) = m ( β ( 2 ) ) holds only if β ( 1 ) = β ( 2 ) . Assume that β ( 1 ) β ( 2 ) . Let
h ( t ) = β ( 2 ) β ( 1 ) m β ( 1 ) + t β ( 2 ) β ( 1 ) .
The derivative of h ( t ) is
h ( t ) = β ( 2 ) β ( 1 ) I β ( 1 ) + t β ( 2 ) β ( 1 ) β ( 2 ) β ( 1 ) .
Since I ( 0 ) > 0 , h ( t ) > 0 . Then, h ( t ) is a strictly increasing function. However, it is easy to compute that when m ( β ( 1 ) ) = m ( β ( 2 ) ) ,
h ( 0 ) = β ( 2 ) β ( 1 ) m β ( 1 ) = β ( 2 ) β ( 1 ) m β ( 2 ) = h ( 1 ) .
This is a contradiction. Hence, m ( β ( 1 ) ) m ( β ( 2 ) ) . Then, the lemma is proved by letting β ( 1 ) = β and β ( 2 ) = 0 . □
Proof of Theorem 1. 
  • As n , by Assumption 3, n 0 , n 1 . Hence, under H 0 ,
    n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) d N 0 , I ( 0 ) 0 0 I ( 0 )
    By Assumption 1, I ( 0 ) > 0 . Thus,
    n 1 n 0 I 1 2 ( 0 ) , 1 n 1 I 1 2 ( 0 ) n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) = n I 1 2 ( 0 ) q ¯ ( 1 ) q ¯ ( 0 ) d N 0 , 1 r 0 + 1 r 1 I d .
    Then,
    r 0 r 1 n q ¯ ( 1 ) q ¯ ( 0 ) I 1 ( 0 ) q ¯ ( 1 ) q ¯ ( 0 ) d χ 2 ( d ) .
    Again by Assumption 3 and S 2 P I ( 0 ) ,
    T ( X 0 , X 1 ) d χ 2 ( d ) .
  • The Taylor expansion of m ( β n ) is
    m ( β n ) = m ( 1 n h ) = m ( 0 ) + I ( 0 ) 1 n h + O ( 1 n ) .
    Then,
    n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) d N 0 r 1 I ( 0 ) h , I ( 0 ) 0 0 I ( 0 )
    By Assumption 1,
    n 1 n 0 I 1 2 ( 0 ) , 1 n 1 I 1 2 ( 0 ) n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) = n I 1 2 ( 0 ) q ¯ ( 1 ) q ¯ ( 0 ) d N I 1 2 ( 0 ) h , 1 r 0 + 1 r 1 I d .
    This means that
    r 0 r 1 n q ¯ ( 1 ) q ¯ ( 0 ) I 1 ( 0 ) q ¯ ( 1 ) q ¯ ( 0 ) d χ 2 ( d , h I ( 0 ) h ) .
    By Assumption 2, S 1 2 P I ( 0 ) . Then,
    S 2 = n 0 1 n 2 S 0 2 + n 1 1 n 2 S 1 2 r 0 I ( 0 ) + r 1 I ( 0 ) = I ( 0 ) .
    As in the proof of (1), we have
    T ( X 0 , X 1 ) d χ 2 ( d , δ ) .
Proof of Corollary 1. 
  • First, we show that the Bernoulli distributions can be expressed as a DRM. Let
    g 0 ( x ) = p 0 x ( 1 p 0 ) 1 x , g 1 ( x ) = p 1 x ( 1 p 1 ) 1 x
    Then,
    g 1 ( x ) g 0 ( x ) = p 1 p 0 · 1 p 0 1 p 1 x 1 p 1 1 p 0 = exp log 1 p 1 1 p 0 + x log p 1 p 0 · 1 p 0 1 p 1 .
    Thus,
    g 1 ( x ) = e α + β q ( x ) g 0 ( x ) ,
    where
    α = log 1 p 1 1 p 0 , β = log p 1 p 0 1 p 0 1 p 1 ,
    and q ( x ) = x . Thus, by Theorem 1, the binomial test converges in distribution to χ 2 ( 1 ) .
    For the continuous test, by Assumption 3 and 0 < p 0 , p 1 < 1 ,
    lim n n 01 , lim n n 11
    with the probability tending to 1. Then, as in the proof of Theorem 1,
    n 01 n 11 n 01 + n 11 n q ¯ ( 1 ) q ¯ ( 0 ) I 1 ( 0 ) q ¯ ( 1 ) q ¯ ( 0 ) d χ 2 ( d ) .
    Then, by the independence of the two test statistics, we have
    T s e m i ( X 0 , X 1 ) d χ 2 ( d + 1 ) .
  • Since p 1 n = p 0 + k n , then by Theorem 1, for the binomial part,
    B 2 χ 2 ( 1 , δ b ) ,
    where
    δ b = r 0 r 1 k I ( 0 ) k = r 0 r 1 k 2 1 p 0 ( 1 p 0 ) .
    Notice that for a fixed p 1 ,
    n 01 n 01 + n 11 = n 01 n n 01 + n 11 n = n 01 n 0 n 0 n n 01 n 0 n 0 n + n 11 n 1 n 1 n ( 1 p 0 ) r 0 ( 1 p 0 ) r 0 + ( 1 p 1 ) r 1 .
    Since p 1 = p 0 + k n , p 1 p 0 . Then,
    n 01 n 01 + n 11 P ( 1 p 0 ) r 0 ( 1 p 0 ) r 0 + ( 1 p 0 ) r 1 = r 0 .
    Similarly,
    n 11 n 01 + n 11 r 1 .
    Thus, in the same way as in the proof of Theorem 1 we can obtain
    n 01 n 11 n 01 + n 11 n q ¯ ( 1 ) q ¯ ( 0 ) I 1 ( 0 ) q ¯ ( 1 ) q ¯ ( 0 ) d χ 2 ( d , δ c ) ,
    where
    δ c = r 0 r 1 h I ( 0 ) h .
    Then by independence,
    T s e m i ( X 0 , X 1 ) d χ 2 ( d + 1 , δ ) .
Proof of Theorem 2. 
  • Let
    a n = n 0 + n 1 n 0 , n 0 + n 2 n 0 , , n 0 + n m n 0 , Λ n = d i a g n 0 + n 1 n 1 , n 0 + n 2 n 2 , , n 0 + n m n m .
    Furthermore we define
    Z n = n 0 + n 1 q ¯ ( 1 ) q ¯ ( 0 ) n 0 + n 2 q ¯ ( 2 ) q ¯ ( 0 ) n 0 + n m q ¯ ( m ) q ¯ ( 0 ) .
    When the null hypothesis is true, by the independence of q ¯ ( i ) for i = 0 , 1 , 2 , , m , we have
    n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) n m q ¯ ( m ) m ( 0 ) d N ( m + 1 ) p ( 0 , W ) ,
    where W = I m + 1 I ( 0 ) , I m + 1 is the ( m + 1 ) -order identity matrix and ⊗ is the Kronecker product.
    We further define
    L n = ( a n , Λ n ) I d .
    For an example of the computation, we left multiply (A2) by the first p rows in L n . This results in
    n 0 + n 1 n 0 , n 0 + n 1 n 1 I d n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) = n 0 + n 1 n 0 n 0 q ¯ ( 0 ) m ( 0 ) + n 0 + n 1 n 1 n 1 q ¯ ( 1 ) m ( 0 ) = n 0 + n 1 q ¯ ( 1 ) q ¯ ( 0 ) .
    Then, left multiply (A2) by L n and we obtain
    Z n = ( a n , Λ n ) I d n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) n m q ¯ ( m ) m ( 0 ) = n 0 + n 1 q ¯ ( 1 ) q ¯ ( 0 ) n 0 + n 2 q ¯ ( 2 ) q ¯ ( 0 ) n 0 + n m q ¯ ( m ) q ¯ ( 0 ) .
    By Assumption 4, when n + , a n and Λ n converge to a and Λ , respectively, that is,
    a n a = 1 + r 1 r 0 , 1 + r 2 r 0 , , 1 + r m r 0 , Λ n Λ = d i a g 1 + r 0 r 1 , 1 + r 0 r 2 , , 1 + r 0 r m .
    Let
    L = ( a , Λ ) I d ,
    we have
    Z n d N m p ( 0 , L W L ) = N m p 0 , ( Λ 2 + a a ) I ( 0 ) .
    Then,
    Z n ( Λ 2 + a a T ) 1 I 1 ( 0 ) Z n d χ 2 ( m d ) .
    Since a n and Λ n converge to a and Λ , respectively, when n , the test statistic
    T = Z n ( Λ n 2 + a n a n T ) 1 I 1 ( 0 ) Z n
    also converges in distribution to χ 2 ( m d ) when n .
    We then show that the test statistic (A4) is equal to (24). Since
    Λ 2 + a a 1 = Λ 2 1 1 + a Λ 2 a Λ 2 a a Λ 2 .
    Then, the test statistic (A4) is rewritten as
    T = Z n Λ n 2 + a n a n 1 I 1 ( 0 ) Z n = Z n Λ n 2 I 1 ( 0 ) Λ n 2 a n a n Λ n 2 1 + a n Λ n 2 a n I 1 ( 0 ) Z n = Z n Λ n 2 I 1 ( 0 ) Z n 1 1 + a n Λ n 2 a n Z n Λ n 2 a n a n Λ n 2 I 1 ( 0 ) Z n .
    Putting Z n , Λ n , and a n into the formula we obtain
    T = i = 1 m n 0 + n i n i 2 n 0 + n i q ¯ ( i ) q ¯ ( 0 ) I 1 ( 0 ) n 0 + n i q ¯ ( i ) q ¯ ( 0 ) 1 1 + i = 1 m n 0 + n i n 0 2 n 0 + n i n i 2 Z n Λ n 2 a n V 1 / 2 ( 0 ) a n Λ n 2 V 1 / 2 ( 0 ) Z n = i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) I 1 ( 0 ) q ¯ ( i ) q ¯ ( 0 ) 1 1 + i = 1 m n i n 0 i = 1 m n 0 + n i n i 2 n 0 + n i n 0 n 0 + n i q ¯ ( i ) q ¯ ( 0 ) V 1 2 ( 0 ) × i = 1 m n 0 + n i n i 2 n 0 + n i n 0 n 0 + n i q ¯ ( i ) q ¯ ( 0 ) V 1 2 ( 0 ) = i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) I 1 ( 0 ) q ¯ ( i ) q ¯ ( 0 ) 1 1 + i = 1 m n i n 0 i = 1 m n i n 0 q ¯ ( i ) q ¯ ( 0 ) I 1 ( 0 ) i = 1 m n i n 0 q ¯ ( i ) q ¯ ( 0 ) = i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) I 1 ( 0 ) q ¯ ( i ) q ¯ ( 0 ) 1 i = 0 m n i i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) I 1 ( 0 ) i = 1 m n i q ¯ ( i ) q ¯ ( 0 ) .
  • Under the alternative, by Theorem 1,
    n 0 q ¯ ( 0 ) m ( 0 ) n 1 q ¯ ( 1 ) m ( 0 ) n m q ¯ ( m ) m ( 0 ) d N ( m + 1 ) p 0 r 1 I ( 0 ) h 1 r m I ( 0 ) h m , W .
    Then, left multiply (A5) by L n we obtain
    Z n = n 0 + n 1 q ¯ ( 1 ) q ¯ ( 0 ) n 0 + n 2 q ¯ ( 2 ) q ¯ ( 0 ) n 0 + n m q ¯ ( m ) q ¯ ( 0 ) d N m p ( b , L W L ) ,
    where
    b = L 0 r 1 I ( 0 ) h 1 r m I ( 0 ) h m = r 0 + r 1 I ( 0 ) h 1 r 0 + r 2 I ( 0 ) h 2 r 0 + r m I ( 0 ) h m .
    Thus,
    δ = b ( Λ 2 + a a T ) 1 I 1 ( 0 ) b .
    We can obtain the expression of δ in the same way as in the proof of (1), that is
    δ = i = 1 m r i h i I ( 0 ) h i i = 1 m r i h i I ( 0 ) i = 1 m r i h i .
Proof of Corollary 2. 
  • From the construction of (28) and Theorem 2, it is easy to prove that T b d χ 2 ( m ) . Then, by the independence of the two test statistics,
    T s e m i ( X ) d χ 2 ( m ( d + 1 ) ) .
  • Since p i n = p 0 + k i n , then by Theorem 2,
    T B 2 d χ 2 ( m , δ b ) ,
    where
    δ b = i = 1 m r i k i I b ( 0 ) k i i = 1 m r i k i I b ( 0 ) i = 1 m r i k i = i = 1 m r i k i 2 I b ( 0 ) i = 1 m r i k i 2 I b ( 0 ) .
    Since
    I b ( δ ) = 1 p 0 ( 1 p 0 ) ,
    then
    δ b = 1 p 0 ( 1 p 0 ) i = 1 m r i k i 2 i = 1 m r i k i 2 .
    As with the test statistic for the continuous part, we can prove that
    n i 1 j = 0 m n j 1 = n i 1 n j = 0 m n j 1 n = n i 1 n i n i n j = 0 m n j 1 n j n j n ( 1 p i ) r i j = 0 m 1 p j r j .
    Since p i n = p 0 + k i n , p i n p 0 . Then,
    n i 1 i = 0 m n i 1 P r i j = 0 m r j = r i .
    Thus, in the same way as in proof of Theorem 2 we obtain
    T c ( X ) = i = 1 m n i 1 q ¯ ( i ) q ¯ ( 0 ) S c 2 q ¯ ( i ) q ¯ ( 0 ) 1 i = 0 m n i 1 i = 1 m n i 1 q ¯ ( i ) q ¯ ( 0 ) S c 2 i = 1 m n i 1 q ¯ ( i ) q ¯ ( 0 ) d χ 2 ( m d , δ c ) ,
    where
    δ c = i = 1 m r i h i I ( 0 ) h i i = 1 m r i h i I ( 0 ) i = 1 m r i h i .
    Thus, by independence,
    T s e m i ( X ) d χ 2 ( m ( d + 1 ) , δ ) .

References

  1. Anderson, J.A. Multivariate logistic compounds. Biometrika 1979, 66, 17–26. [Google Scholar] [CrossRef]
  2. Qin, J.; Zhang, B. A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 1997, 84, 609–618. [Google Scholar] [CrossRef]
  3. Zhang, B. Assessing goodness-of-fit of generalized logit models based on case-control data. J. Multivar. Anal. 2002, 82, 17–38. [Google Scholar] [CrossRef]
  4. Qin, J. Empirical likelihood ratio based confidence intervals for mixture proportions. Ann. Stat. 1999, 27, 1368–1384. [Google Scholar] [CrossRef]
  5. Zou, F.; Fine, J.P.; Yandell, B.S. On empirical likelihood for a semiparametric mixture model. Biometrika 2002, 89, 61–75. [Google Scholar] [CrossRef]
  6. Zhang, B. Quantile estimation under a two-sample semi-parametric model. Bernoulli 2000, 6, 491–511. [Google Scholar] [CrossRef]
  7. Chen, J.; Liu, Y. Quantile and quantile-function estimations under density ratio model. Ann. Stat. 2013, 41, 1669–1692. [Google Scholar] [CrossRef]
  8. Fokianos, K.; Kedem, B.; Qin, J.; Short, D.A. A semiparametric approach to the one-way layout. Technometrics 2001, 43, 56–65. [Google Scholar] [CrossRef]
  9. Cai, S.; Chen, J.; Zidek, J.V. Hypothesis testing in the presence of multiple samples under density ratio models. Statist. Sin. 2017, 27, 761–783. [Google Scholar] [CrossRef]
  10. Patil, G.P.; Rao, C.R. Weighted Distributions and Size-Biased Sampling with Applications to Wildlife Populations and Human Families. Biometrics 1978, 34, 179–189. [Google Scholar] [CrossRef]
  11. Rao, C.R. Weighted Distributions Arising Out of Methods of Ascertainment: What Population Does a Sample Represent? In A Celebration of Statistics; Springer: New York, NY, USA, 1985; pp. 543–569. [Google Scholar]
  12. Rao, C.R. On Discrete Distributions Arising out of Methods of Ascertainment. Sankhyā Indian J. Stat. Ser. A 1965, 27, 311–324. [Google Scholar]
  13. Lele, S.R.; Keim, J.L. Weighted distributions and estimation of resource selection probability functions. Ecology 2006, 87, 3021–3028. [Google Scholar] [CrossRef]
  14. Qin, J.; Lawless, J. Empirical likelihood and general estimating equations. Ann. Stat. 1994, 22, 300–325. [Google Scholar] [CrossRef]
  15. Wang, C.; Marriott, P.; Li, P. Testing homogeneity for multiple nonnegative distributions with excess zero observations. Comput. Stat. Data Anal. 2017, 114, 146–157. [Google Scholar] [CrossRef]
  16. Tu, W.; Zhou, X.H. A Wald test comparing medical costs based on log-normal distributions with zero valued costs. Stat. Med. 1999, 18, 2749–2761. [Google Scholar] [CrossRef]
  17. Muralidharan, K.; Kale, B.K. Modified Gamma distribution with singularity at zero. Commun. Stat.-Simul. Comput. 2002, 31, 143–158. [Google Scholar] [CrossRef]
  18. Kassahun-Yimer, W.; Albert, P.S.; Lipsky, L.M.; Nansel, T.R.; Liu, A. A joint model for multivariate hierarchical semicontinuous data with replications. Stat. Methods Med. Res. 2019, 28, 858–870. [Google Scholar] [CrossRef]
  19. Neelon, B.; O’Malley, A.J.; Smith, V.A. Modeling zero-modified count and semicontinuous data in health services research part 1: Background and overview. Stat. Med. 2016, 35, 5070–5093. [Google Scholar] [CrossRef]
  20. Neelon, B.; O’Malley, A.J.; Smith, V.A. Modeling zero-modified count and semicontinuous data in health services research part 2: Case studies. Stat. Med. 2016, 35, 5094–5112. [Google Scholar] [CrossRef]
  21. Lachenbruch, P.A. Analysis of data with excess zeros. Stat. Methods Med. Res. 2002, 11, 297–302. [Google Scholar] [CrossRef]
  22. Su, L.; Tom, B.D.M.; Farewell, V.T. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics 2009, 10, 374–389. [Google Scholar] [CrossRef]
  23. Smith, V.A.; Preisser, J.S.; Neelon, B.; Maciejewski, M.L. A marginalized two-part model for semicontinuous data. Stat. Med. 2014, 33, 4891–4903. [Google Scholar] [CrossRef] [PubMed]
  24. Wang, C.; Tu, D. A bootstrap semiparametric homogeneity test for the distributions of multigroup proportional data, with applications to analysis of quality of life outcomes in clinical trials. Stat. Med. 2020, 39, 1715–1731. [Google Scholar] [CrossRef]
  25. Wilcox, R.R. ANOVA: A Paradigm for Low Power and Misleading Measures of Effect Size? Rev. Educ. Res. 1995, 65, 51–77. [Google Scholar] [CrossRef]
  26. Hallstrom, A.P. A modified Wilcoxon test for non-negative distributions with a clump of zeros. Stat. Med. 2009, 29, 391–400. [Google Scholar] [CrossRef] [PubMed]
  27. Pauly, M.; Brunner, E.; Konietschke, F. Asymptotic permutation tests in general factorial designs. J. R. Stat. Soc. Ser. B. Stat. Methodol. 2015, 77, 461–473. [Google Scholar] [CrossRef]
Table 1. Type-I error and power of the three test statistics for different ( m + 1 ) = 2 , 3 , 5 , 8 , 11 and μ = 0 , 0.5 , 0.75 , 1 when the sample size is 30.
Table 1. Type-I error and power of the three test statistics for different ( m + 1 ) = 2 , 3 , 5 , 8 , 11 and μ = 0 , 0.5 , 0.75 , 1 when the sample size is 30.
( m + 1 ) μ MWTDELRTBELRT
205.66.74.8
0.539.740.834.1
0.7573.774.369.5
193.893.892.6
304.96.04.7
0.535.643.138.1
0.7571.879.176.1
194.096.895.0
505.78.05.4
0.532.139.232.7
0.7568.375.370.8
193.295.493.9
806.08.85.9
0.530.037.330.3
0.7565.372.566.2
193.394.593.0
1105.67.44.6
0.526.531.525.4
0.7559.765.358.2
189.291.888.3
Table 2. Type-I error and power of the three test statistics for different ( m + 1 ) = 2 , 3 , 5 , 8 , 11 and μ = 0 , 0.5 , 0.75 , 1 when the sample size is 50.
Table 2. Type-I error and power of the three test statistics for different ( m + 1 ) = 2 , 3 , 5 , 8 , 11 and μ = 0 , 0.5 , 0.75 , 1 when the sample size is 50.
( m + 1 ) μ MWTDELRTBELRT
205.46.15.1
0.559.660.357.4
0.7592.892.892.5
199.699.699.3
304.95.94.7
0.560.065.161.9
0.7594.095.695.0
199.499.599.5
505.57.15.8
0.556.259.656.3
0.7593.795.194.1
1100.0100.0100.0
805.86.85.5
0.549.753.949.2
0.7590.992.390.3
199.799.899.8
1105.16.15.2
0.548.951.749.8
0.7590.290.390.2
199.699.899.7
Table 3. Type-I error and the power of the three statistics in the case of three populations.
Table 3. Type-I error and the power of the three statistics in the case of three populations.
n i μ MWTDELRTBELRT
2005.157.425.12
0.213.8417.9013.39
0.327.0831.9025.5
0.446.3751.6744.87
0.567.5072.0765.42
0.684.3486.9982.67
3004.396.194.65
0.218.9321.8018.06
0.340.6344.4939.44
0.466.6069.9365.1
0.587.2989.0385.94
0.696.7897.2496.36
5004.805.774.80
0.231.6334.0230.81
0.365.6967.6864.62
0.490.1890.7789.30
0.598.7398.9298.59
0.699.9699.9699.81
Table 4. The power of testing H 0 at significance level 0.05 for different sample sizes and μ when m = 2 .
Table 4. The power of testing H 0 at significance level 0.05 for different sample sizes and μ when m = 2 .
μ ( n 0 , n 1 , n 2 )
(40, 40, 120)(40, 80, 80)(60, 40, 100)(60, 70, 70)(100, 50, 50)(140, 30, 30)(180, 10, 10)
0.319.2320.8326.7627.8634.0029.6516.58
Normal0.551.9655.0468.8470.2180.0472.4737.92
0.785.1587.2395.2295.8098.4296.2766.09
0.324.3824.4630.8130.8536.2031.0817.17
Log-normal0.561.2961.3073.9573.8681.9174.0138.99
0.790.6590.7496.7396.8198.6796.6667.21
1.218.5318.6721.5221.5546.0936.0216.05
Gamma1.455.6755.8266.2866.3688.5878.4135.41
1.687.1287.2494.1994.1999.2697.1259.99
Table 5. The power of testing H 0 at significance level 0.05 for different sample sizes and μ when m = 4 .
Table 5. The power of testing H 0 at significance level 0.05 for different sample sizes and μ when m = 4 .
μ ( n 0 , n 1 , n 2 , n 3 , n 4 )
Case ICase IICase IIICase IVCase VCase VICase VIICase VIII
0.310.6510.3316.2224.3225.9025.5625.5920.31
Normal0.523.4422.8143.7665.5868.3467.8466.6849.41
0.745.9844.5377.8694.5695.6195.4894.8481.69
0.313.2713.2119.1326.9628.3128.2027.6121.64
Log-normal0.530.2430.3949.7769.1871.2571.1969.3351.45
0.757.1457.2583.1195.7396.4296.3595.6283.18
1.211.8711.7214.4617.1216.5616.6515.5712.31
Gamma1.429.8829.8744.6557.2657.0156.7252.6532.89
1.656.6356.5278.9591.1091.2291.2888.3464.47
Table 6. The different settings of ( n 0 , n 1 , n 2 , n 3 , n 4 ) in Table 5.
Table 6. The different settings of ( n 0 , n 1 , n 2 , n 3 , n 4 ) in Table 5.
Case LabelSample Size
I(20, 30, 40, 50, 60)
II(20, 45, 45, 45, 45)
III(40, 40, 40, 40, 40)
IV(80, 30, 30, 30, 30)
V(100, 25, 25, 25, 25)
VI(100, 40, 30, 20, 10)
VII(120, 20, 20, 20, 20)
VIII(160, 10, 10, 10, 10)
Table 7. Parameter settings for simulation study 3.
Table 7. Parameter settings for simulation study 3.
Model ( p 0 , p 1 , p 2 ) ( a 0 , a 1 , a 2 ) ( b 0 , b 1 , b 2 ) MeanVariance
LN 1 (0.2, 0.2, 0.2)(0.0, 0.0, 0.0)(1.0, 1.0,1.0)(1.32, 1.32, 1.32)(4.17, 4.17, 4.17)
LN 2 (0.4, 0.4, 0.4)(0.0, 0.0, 0.0)(1.0, 1.0, 1.0)(0.99, 0.99, 0.99)(3.45,3.45,3.45)
LN 3 (0.7, 0.7, 0.7)(0.0, 0.0, 0.0)(1.0, 1.0, 1.0)(0.49, 0.49, 0.49)(1.97, 1.97, 1.97)
LN 4 (0.2, 0.3, 0.4)(0.0, 0.0, 0.0)(1.0, 1.0, 1.0)(1.32, 1.15, 0.99)(4.17, 3.84, 3.45)
LN 5 (0.4, 0.4, 0.4)(0.0, 0.5, 1.0)(2.0, 2.0, 2.0)(1.63, 2.69, 4.43)(30.10, 81.82, 222.40)
LN 6 (0.6, 0.6, 0.6)(0.0, 0.0, 0.0)(1.0, 2.0, 3.0)(0.66, 1.09, 1.79)(2.52, 20.66, 158.16)
LN 7 (0.5, 0.6, 0.7)(0.0, 0.5, 1.0)(3.0, 2.0, 1.0)(2.24, 1.79, 1.34)(196.69, 56.15, 14.57)
LN 8 (0.6, 0.6, 0.6)(0.0, 0.5, 1.0)(3.0, 2.0, 1.0)(1.79, 1.79, 1.79)(158.16, 56.15, 18.63)
LN 9 (0.3, 0.4, 0.5)(0.0, 0.15, 0.34)(2.0, 2.0, 2.0)(1.90, 1.90, 1.90)(34.60, 40.97, 49.89)
LN 10 (0.4, 0.5, 0.6)(0.0, 0.0, 0.0)(2.0, 2.36, 2.81)(1.63, 1.63, 1.63)(30.10, 53.95, 107.90)
LN 11 (0.4, 0.5, 0.6)(0.0, 0.5, 1.0)(2.69, 2.05, 1.5)(2.30, 2.30, 2.30)(124.67, 77.32, 54.07)
LN 12 (0.5, 0.5, 0.5)(0.0, 0.5, 1.0)(2.46, 1.98, 1.5)(1.71, 2.21, 2.88)(65.93, 65.93, 65.93)
LN 13 (0.3, 0.4, 0.5)(0.0, 0.07, 0.15)(2.0, 2.0, 2.0)(1.90, 1.75, 1.58)(34.60, 34.60, 34.60)
LN 14 (0.3, 0.4, 0.5)(0.0, 0.0, 0.0)(2.0, 2.07, 2.15)(1.90, 1.69, 1.46)(34.60, 34.60, 34.60)
LN 15 (0.4, 0.5, 0.6)(0.0, 0.5, 1.0)(2.28, 1.88, 1.5)(1.88, 2.11, 2.30)(54.07, 54.07, 54.07)
GAM 1 (0.2, 0.2, 0.2)(1.0, 1.0, 1.0)(1.0, 1.0, 1.0)(0.8, 0.8, 0.8)(0.96, 0.96, 0.96)
GAM 2 (0.4, 0.4, 0.4)(1.0, 1.0, 1.0)(1.0, 1.0, 1.0)(0.6, 0.6, 0.6)(0.84, 0.84, 0.84)
GAM 3 (0.7, 0.7, 0.7)(1.0, 1.0, 1.0)(1.0, 1.0, 1.0)(0.3, 0.3, 0.3)(0.51, 0.51, 0.51)
GAM 4 (0.2, 0.3, 0.4)(1.0, 1.0, 1.0)(2.0, 2.0, 2.0)(1.6, 1.4, 1.2)(3.84, 3.64, 3.36)
GAM 5 (0.6, 0.6, 0.6)(1.0, 1.5, 2.0)(2.0, 2.0, 2.0)(0.8, 1.2, 1.6)(2.56, 4.56, 7.04)
GAM 6 (0.6, 0.6, 0.6)(1.0, 1.0, 1.0)(1.0, 2.0, 3.0)(0.4, 0.8, 1.2)(0.64, 2.56, 5.76)
GAM 7 (0.4, 0.5, 0.6)(1.0, 1.5, 3.0)(3.0, 2.0, 1.0)(1.8, 1.5, 1.2)(7.56, 5.25, 3.36)
GAM 8 (0.5, 0.5, 0.5)(1.0, 1.5, 3.0)(3.0, 2.0, 1.0)(1.5, 1.5, 1.5)(6.75, 5.25, 3.75)
GAM 9 (0.4, 0.5, 0.6)(1.5, 1.8, 2.25)(2.0, 2.0, 2.0)(1.8, 1.8, 1.8)(5.76, 6.84, 8.46)
GAM 10 (0.4, 0.5, 0.6)(1.0, 1.0, 1.0)(2.0, 2.4, 3.0)(1.2, 1.2, 1.2)(3.36, 4.32, 5.76)
GAM 11 (0.4, 0.5, 0.6)(2.0, 3.0, 4.0)(2.0, 1.6, 1.5)(2.4, 2.4, 2.4)(8.64, 9.60, 12.24)
GAM 12 (0.4, 0.4, 0.4)(1.0, 1.5, 3.0)(2.0, 1.53, 0.92)(1.20, 1.37, 1.66)(3.36, 3.36, 3.36)
GAM 13 (0.3, 0.4, 0.5)(1.5, 1.56, 1.66)(2.0, 2.0, 2.0)(2.1, 1.87, 1.66)(6.09, 6.09, 6.09)
GAM 14 (0.3, 0.4, 0.5)(1.0, 1.0, 1.0)(2.0, 2.08, 2.20)(1.4, 1.25, 1.1)(3.64, 3.64, 3.64)
GAM 15 (0.4, 0.5, 0.6)(2.0, 3.0, 4.0)(2.0, 1.52, 1.26)(2.4, 2.28, 2.02)(8.64, 8.64, 8.64)
Table 8. Type I error rates (%) for testing H 0 at significance level 0.05 when data are generated from LN 1 –LN 3 and GAM 1 –GAM 3 in Table 7.
Table 8. Type I error rates (%) for testing H 0 at significance level 0.05 when data are generated from LN 1 –LN 3 and GAM 1 –GAM 3 in Table 7.
3050Unequal100
MWTDELRTBELRTMWTDELRTBELRTMWTDELRTBELRTMWTDELRTBELRT
LN 1 6.277.285.166.126.545.535.675.805.205.816.395.68
LN 2 6.297.414.785.596.424.834.945.824.925.886.245.31
LN 3 7.329.754.166.928.355.345.987.115.385.055.784.40
GAM 1 5.957.445.084.985.974.724.965.033.915.585.774.83
GAM 2 6.677.545.206.317.215.325.466.044.915.035.824.72
GAM 3 8.7211.045.977.309.235.615.566.864.995.206.574.85
Table 9. Power (%) for testing H 0 at significance level 0.05 when data are generated from LN 4 –LN 15 in Table 7.
Table 9. Power (%) for testing H 0 at significance level 0.05 when data are generated from LN 4 –LN 15 in Table 7.
3050Unequal100
MWTDELRTBELRTMWTDELRTBELRTMWTDELRTBELRTMWTDELRTBELRT
LN 4 22.9123.6618.6835.2235.3731.7351.8953.8751.3663.9063.6861.71
LN 5 32.2932.2625.8950.3350.2145.6675.3375.2673.0484.4084.1382.28
LN 6 18.4725.0316.7228.5439.3031.6327.2354.4448.7457.6870.4166.76
LN 7 45.0456.7645.3772.5581.9576.2298.1298.2797.7898.5199.4499.19
LN 8 34.9344.1539.2453.9166.2858.4793.3093.7692.8889.5295.0992.78
LN 9 23.2223.7818.7736.5436.7732.7357.2458.1554.7566.4166.5264.08
LN 10 23.7723.8417.1932.0532.1127.1252.1354.0950.5464.4363.8760.69
LN 11 44.8250.1741.1567.9472.2767.4394.4293.8392.8996.9297.3396.78
LN 12 31.0733.2425.3450.0253.1047.0981.5179.5377.3382.6484.1382.21
LN 13 20.4920.7516.1833.4433.2528.9449.4750.4847.6258.3558.5255.89
LN 14 19.9020.2315.6732.2932.0127.8847.7649.1945.6557.4557.0954.16
LN 15 45.5148.3639.7468.1570.9666.0493.4993.0091.7797.0897.2596.89
Table 10. Power (%) for testing H 0 at significance level 0.05 when data are generated from GAM 4 –GAM 15 in Table 7.
Table 10. Power (%) for testing H 0 at significance level 0.05 when data are generated from GAM 4 –GAM 15 in Table 7.
3050Unequal100
MWTDELRTBELRTMWTDELRTBELRTMWTDELRTBELRTMWTDELRTBELRT
GAM 4 23.1723.3218.0235.7635.7931.8051.1952.8749.9864.1864.3461.47
GAM 5 42.1642.9831.9663.0162.8455.1886.6884.4781.5193.1193.0291.24
GAM 6 41.3947.6535.1264.6670.9263.7578.5789.1286.8393.9596.4095.42
GAM 7 37.0350.1139.6165.4875.8170.4896.2396.3295.4696.1498.4097.77
GAM 8 25.2335.1226.2641.5155.0048.5387.3386.9784.5280.9088.7286.75
GAM 9 37.8540.1031.4456.2756.9051.4884.9684.7382.2090.6091.2189.09
GAM 10 28.4928.8721.6943.4643.0737.3266.0868.4964.3175.1975.0572.36
GAM 11 51.4054.7744.2575.7877.2872.3795.8795.9294.8598.6998.8798.32
GAM 12 40.4350.9541.3865.2975.6570.5297.6997.6896.8095.3397.5797.10
GAM 13 21.2321.6116.3633.5533.0729.0251.1351.2448.0660.3259.8757.14
GAM 14 22.6323.3917.2530.9131.2026.6848.6750.4046.1360.2260.2457.32
GAM 15 35.5640.6731.3656.5560.1254.0685.9084.5382.4491.1992.1290.77
Table 11. The parameter settings for the null and alternative hypothesis for testing homogeneity.
Table 11. The parameter settings for the null and alternative hypothesis for testing homogeneity.
ModelpabMeanVariance
LN 16 (0.38, 0.38, 0.38, 0.38)(0.49, 0.49, 0.49, 0.49)(2.58, 2.58, 2.58, 2.58)(3.67, 3.67, 3.67, 3.67)(274.43, 274.43, 274.43, 274.43)
LN 17 (0.30, 0.40, 0.42, 0.42)(0.10, 1.05 0.36, 0.52)(2.13, 1.70, 3.08, 3.00)(2.25, 4.04, 3.91, 4.39)(55.77, 131.52, 559.29, 645.92)
LN 18 (0.43, 0.43, 0.43, 0.43, 0.43)(0.43, 0.43, 0.43, 0.43, 0.43)(2.66, 2.66, 2.66, 2.66, 2.66)(3.29, 3.29, 3.29, 3.29, 3.29)(262.59, 262.59, 262.59, 262.59, 262.59)
LN 19 (0.45, 0.49, 0.43, 0.38, 0.40)(0.66, 0.04, 0.32, 0.57, 0.48)(2.37, 2.03, 2.78, 3.26, 2.54)(3.48, 1.46, 3.15, 5.54, 3.47)(222.47, 29.97, 271.27, 1266.19, 238.72)
GAM 16 (0.38, 0.38, 0.38, 0.38)(0.55, 0.55, 0.55, 0.55)(9.11, 9.11, 9.11, 9.11)(3.12, 3.12, 3.12, 3.12)(34.44, 34.44, 34.44, 34.44)
GAM 17 (0.30, 0.40, 0.42, 0.42)(0.63, 0.82, 0.46, 0.50)(4.61, 7.12, 12.70, 12.05)(2.05, 3.54, 3.40, 3.50)(11.21, 33.40, 51.42, 50.92)
GAM 18 (0.43, 0.43, 0.43, 0.43, 0.43)(0.53, 0.53, 0.53, 0.53, 0.53)(9.33, 9.33, 9.33, 9.33, 9.33)(2.83, 2.83, 2.83, 2.83, 2.83)(32.44, 32.44, 32.44, 32.44, 32.44)
GAM 19 (0.45, 0.49, 0.43, 0.38, 0.40)(0.55, 0.64, 0.58, 0.48, 0.54)(10.97, 4.24, 6.90, 13.74, 9.40)(3.32, 1.37, 2.29, 4.08, 3.09)(45.46, 7.65, 19.69, 66.45, 35.28)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Xu, X. Homogeneity Test for Multiple Semicontinuous Data with the Density Ratio Model. Mathematics 2023, 11, 3789. https://doi.org/10.3390/math11173789

AMA Style

Wang Y, Xu X. Homogeneity Test for Multiple Semicontinuous Data with the Density Ratio Model. Mathematics. 2023; 11(17):3789. https://doi.org/10.3390/math11173789

Chicago/Turabian Style

Wang, Yufan, and Xingzhong Xu. 2023. "Homogeneity Test for Multiple Semicontinuous Data with the Density Ratio Model" Mathematics 11, no. 17: 3789. https://doi.org/10.3390/math11173789

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop