Next Article in Journal
Limit Theorem for Spectra of Laplace Matrix of Random Graphs
Next Article in Special Issue
Design Efficiency of the Asymmetric Minimum Projection Uniform Designs
Previous Article in Journal
Design of Confidence-Integrated Denoising Auto-Encoder for Personalized Top-N Recommender Systems
Previous Article in Special Issue
Triple Designs: A Closer Look from Indicator Function
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Conway–Maxwell–Poisson Type Generalization of Hypergeometric Distribution

by
Sudip Roy
1,*,
Ram C. Tripathi
1,* and
Narayanaswamy Balakrishnan
2
1
Department of Management Science and Statistics, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA
2
Department of Mathematics and Statistics, McMaster University, 1280 Main Street West, Hamilton, ON L8S 4L8, Canada
*
Authors to whom correspondence should be addressed.
Mathematics 2023, 11(3), 762; https://doi.org/10.3390/math11030762
Submission received: 20 December 2022 / Revised: 29 January 2023 / Accepted: 30 January 2023 / Published: 2 February 2023
(This article belongs to the Special Issue Distribution Theory and Application)

Abstract

:
The hypergeometric distribution has gained its importance in practice as it pertains to sampling without replacement from a finite population. It has been used to estimate the population size of rare species in ecology, discrete failure rate in reliability, fraction defective in quality control, and the number of initial faults present in software coding. Recently, Borges et al. considered a COM type generalization of the binomial distribution, called COM–Poisson–Binomial (CMPB) and investigated many of its characteristics and some interesting applications. In the same spirit, we develop here a generalization of the hypergeometric distribution, called the COM–hypergeometric distribution. We discuss many of its characteristics such as the limiting forms, the over- and underdispersion, and the behavior of its failure rate. We write its probability-generating function (pgf) in the form of Kemp’s family of distributions when the newly introduced shape parameter is a positive integer. In this form, closed-form expressions are derived for its mean and variance. Finally, we develop statistical inference procedures for the model parameters and illustrate the results by extensive Monte Carlo simulations.

1. Introduction

As is well-known, the Poisson distribution is equidispersed, which limits its applicability in situations where overdispersion or underdispersion is present. To overcome this limitation, Conway and Maxwell [1] introduced a generalization of the Poisson distribution which can exhibit both, the under- and overdispersion. Recently, with a view to extend the scope of applications of some other distributions such as the binomial by Borges et al. [2] and negative hypergeometric by Roy et al. [3], there have been heightened interest in generating COM type generalizations of other distributions. For example, Borges et al. [2] proposed an extension of the binomial distribution called COMP–Binomial distribution and studied many of its properties. Subsequently, Chakraborty and Ong [4] developed a COM–Poisson type generalization of the negative binomial distribution with many applications. In this paper, we develop a COM type generalization of the hypergeometric distribution, called the COM–Hypergeometric (COM-H). It is formulated in the spirit of the CMPB distribution introduced recently by Borges et al. [2]. This is a three-parameter extension of the hypergeometric distribution which exhibits both over- and underdispersion, often encountered in count data. This is in contrast to the hypergeometric distribution which is always underdispersed. The COM-H distribution can be used in industrial quality control, reliability analysis, and for capture–recapture sampling in estimating wildlife populations. We compare graphically the probability mass functions (pmf) of the hypergeometric and the COM-H distributions for various choices of parameters to display the flexibility this model provides over the usual hypergeometric distribution. We also investigate the log-concavity and log-convexity of its pmf and use these to characterize its failure rate. We develop statistical inference for its parameters. In particular, we develop the maximum likelihood estimators for the parameters of the COM-H model and investigate some of their properties such as the bias, mean squared error (MSE), and coverage probabilities by a Monte Carlo simulation. We also develop the likelihood ratio test for the shape parameter and investigate its asymptotic power.
The paper is organized as follows. In Section 2, we introduce the COM-H model by introducing a shape parameter. In Section 3, we compare the shapes of pmfs of the COM-H and hypergeometric distributions for various choices of the parameters. In Section 4, we express the probability-generating function (pgf) of the COM-H distribution in terms of the generalized hypergeometric series when the shape parameter is a positive integer. We show that in that case, the COM-H model is a member of the Kemp family of distributions and hence, shares various properties of this general family. In Section 5, we discuss the relationships between its moments and inverse moments and in Section 6, we express it as a weighted hypergeometric distribution. In Section 7, we discuss its asymptotic approximations by CMPB and COM–Poisson (see [1]) distributions. In Section 8, we characterize the behavior of its failure rate in terms of the log-concavity and log-convexity of its pmf. In Section 9, we develop the maximum likelihood estimators for its parameters and investigate their behavior by simulation. In that section, we also develop a likelihood ratio test to test if the underlying model is hypergeometric versus COM-H and investigate its power by simulation.

2. Formulation of COM–Hypergeometric Distribution

Consider a finite population of N items with a fraction p , 0 < p < 1 of defective items. Then, the population has N p defective items and N N p nondefective items. Suppose, we take a random sample of size n without replacement from the N items. Then, the number of defective items in the sample, denoted by the random variable X, has the hypergeometric distribution with the pmf
P ( X = x ) = N p x N N p n x N n = n x N n N p x N N p , max ( 0 , n N + N p ) x min ( n , N p ) .
The mean and variance of this distribution are, respectively,
E ( X ) = n ( N p ) N , V a r ( X ) = n ( N p ) ( N n ) ( N N p ) N 2 ( N 1 ) .
The hypergeometric distribution is underdispersed since it is readily seen that ( N n ) ( N N p ) N ( N 1 ) < 1 . Here, we develop a more flexible version of the hypergeometric distribution which can accommodate both under- and overdispersion. The proposed model, called the COM–hypergeometric (COM-H) distribution, with a shape parameter γ , has its pmf as
P ( X = x | n , N p , N , γ ) = n x γ N n N p x j = 0 n n j γ N n N p j = n x γ N n N p x H ( γ , N p , N ) , max ( 0 , n N + N p ) x min ( n , N p ) , γ ,
where H ( γ , n , N p , N ) = j = 0 n n j γ N n N p j .
We shall also denote this distribution by COM-H ( γ , n , N p , N ) when the underlying parameters are of interest. For γ = 1, we have the usual hypergeometric distribution. The values of γ > 1 correspond to underdispersion (the mean is greater than variance) while values of γ < 1 represent overdispersion (the variance exceeds the mean) with respect to the hypergeometric distribution. For γ , the pmf is concentrated at 0 and n. In Section 3 below, we present graphical comparisons of the pmf of the COM-H distribution for various choices of γ , keeping other parameters fixed (see Figure 1 and Figure 2).

3. Graphical Comparison of Hypergeometric and COM-H Distributions

In Figure 1 and Figure 2, we present some graphs to compare the pmfs of the hypergeometric and COM-H distributions for different values of γ . Figure 1 compares the graphs for positive integer values of γ , while Figure 2 compares them for negative integer values of γ . We observe that as the positive γ value increases, there is a reduction in the variance of the distribution as compared to that of the hypergeometric distribution, which happens due to the concentration of probabilities around the mean as γ increases. The dispersion in the model increases compared to that of the hypergeometric distribution for large negative values of γ , as can be seen in Figure 2. This happens since the probabilities are more dispersed away from the mean as compared to those of the hypergeometric distribution for large negative values of γ .

4. Moments and Probability-Generating Function of COM-H (for Positive Integer γ )

Here, we express the probability-generating function (pgf) of the COM-H in the form of a generalized hypergeometric function for the case when γ is a positive integer. This facilitates the derivation of the factorial moments and hence other moments of the distribution. Let us first introduce some functions and notation which are used throughout this paper:
  • Pochhammer’s symbol or rising factorial:
    ( b ) 0 = 1 , ( b ) k = b ( b + 1 ) ( b + k 1 ) , k 1 ;
  • Generalized hypergeometric function:
    p F q a 1 , a 2 , , a p ; b 1 , b 2 , , b q ; z = m = 0 ( a 1 ) m ( a 2 ) m ( a p ) m ( b 1 ) m ( b 2 ) m ( b q ) m z m m ! .
Then, the following theorem gives an expression of the pgf of the COM-H distribution in terms of the generalized hypergeometric function.
Theorem 1.
The pgf of the COM-H distribution is given by
G ( s ) = E ( s X ) = x = 0 n s x P ( X = x | n , N p , N , γ ) = γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ ) s γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ )
where γ is a positive integer.
Proof. 
In the pmf of the COM-H distribution in (2), the denominator can be written in terms of the generalized hypergeometric function as follows:
x = 0 n n x γ N n N p x = ( N N p n ) ! ( N n ) ! N p ! × γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ ) .
The pgf of X, given by G ( s ) = E ( s X ) , can then be written as
G ( s ) = γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ ) s γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ ) ,
as required. □
Remark 1.
The kth order derivative of G ( s ) is given by (using the formula from Abramowitz and Stegun [5], 15.2 . 2 ):
G ( k ) ( s ) = ( n ) k γ ( N p ) k ( k ! ) ( γ 1 ) ( N N p n + 1 ) k × γ + 1 F γ n + k , n + k , , n + k , N p + k ; 1 + k , 1 + k , , 1 + k , N N p n + 1 + k ; ( 1 ) ( 1 + γ ) s γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ ) .
The first and second factorial moments of X are then obtained by setting k = 1 , 2 and s = 1 . Thus, we obtain
E ( X ) = n γ N p ( N N p n + 1 ) × γ + 1 F γ n + 1 , n + 1 , , n + 1 , N p + 1 ; 2 , 2 , , 2 , N N p n + 2 ; ( 1 ) ( 1 + γ ) γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ )
and
E ( X ( X 1 ) ) = n ( n 1 ) γ N p ( N p 1 ) ( 2 ! ) γ 1 ( N N p n + 1 ) ( N N p n + 2 ) × γ + 1 F γ n + 2 , n + 2 , , n + 2 , N p + 2 ; 3 , 3 , , 3 , N N p n + 3 ; ( 1 ) ( 1 + γ ) γ + 1 F γ n , n , , n , N p ; 1 , 1 , , 1 , N N p n + 1 ; ( 1 ) ( 1 + γ ) .
Since V a r ( X ) = E ( X ( X 1 ) ) + E ( X ) [ E ( X ) ] 2 , the variance of X can be obtained readily in terms of the hypergeometric series as V a r ( X ) = G ( 1 ) + G ( 1 ) [ G ( 1 ) ] 2 .
When γ = 1 , the above results reduce, respectively, to the mean and variance of the hypergeometric distribution. These can be verified as follows:
E ( X ) = n ( N p ) ( N N p + n + 1 ) 2 F 1 n + 1 , N p + 1 ; N N p n + 2 ; 1 2 F 1 n , N p ; N N p n + 1 ; 1 .
Using the formula (see Abramowitz and Stegun [5])
2 F 1 a , b ; c ; 1 = Γ c Γ c a b Γ c a Γ c b ,
the last term in (3) can be written as
2 F 1 n + 1 , N p + 1 ; N N p n + 2 ; 1 2 F 1 n , N p ; N N p n + 1 ; 1 = Γ N N p n + 2 Γ N Γ N N p + 1 Γ N n + 1 × Γ N N p + 1 Γ N n + 1 Γ N N p n + 1 Γ N + 1 .
Therefore, E ( X ) = n N p N , upon simplification, which is the mean of the hypergeometric distribution.
Since, for γ = 1 ,
E ( X ( X 1 ) ) = n ( n 1 ) N p ( N p 1 ) ( N N p n + 1 ) ( N N p n + 2 ) 2 F 1 n + 2 , N p + 2 ; N N p n + 3 ; 1 2 F 1 n , N p ; N N p n + 1 ; 1 ,
using formula (4) here again, the variance can be simplified to
V a r ( X ) = n ( N p ) ( N n ) ( N N p ) N 2 ( N 1 ) ,
which is the variance of the hypergeometric distribution.

5. The COM-H as a Member of Kemp’s Family of Distributions when γ Is an Integer

For the Kemp family of distributions (see Tripathi and Gurland [6], Johnson et al. [7], and Kemp and Kemp [8]), the ratio of successive probabilities is given by
p ( x + 1 ) p ( x ) = ( a 1 + x ) ( a 2 + x ) ( a p + x ) ( b 1 + x ) ( b 2 + x ) ( b q + x ) θ 1 + x .
Using this result, the following theorem gives the ratio of successive probabilities of the COM-H distribution.
Theorem 2.
The ratio of successive probabilities of the COM-H distribution can be written in the form of the ratio of probabilities of Kemp Type 1 A ( i ) families of distributions.
Proof. 
The ratio of successive probabilities of the COM-Hypergeometric distribution can be written in the form of Kemp Type 1 A ( i ) families of distributions as follows:
p ( x + 1 ) p ( x ) = ( n + x ) ( n + x ) ( N p + x ) ( 1 + x ) ( 1 + x ) ( N N p n + x + 1 ) ( 1 ) γ + 1 1 + x ,
or
p ( x + 1 ) p ( x ) = ( n + x ) γ ( N p + x ) ( 1 + x ) γ 1 ( N N p n + x + 1 ) ( 1 ) γ + 1 1 + x ,
which we can use in the COM-H model with γ + 1 numerator factors, ( ( n + x ) , ( n + x ) , , ( n + x ) , ( N p + x ) ) , and γ denominator factors ( ( 1 + x ) , ( 1 + x ) , , ( 1 + x ) , ( N N p n + x + 1 ) ) .
Hence, the theorem. □

6. COM-H as a Weighted Version of Hypergeometric Distribution

The COM-H model can be expressed as the weighted version of the hypergeometric distribution, with the weighted pmf expressed in the form
p w ( x , γ ) = w ( x ) p ( x ; γ ) E ( w ( X ) ) ,
where E ( . ) is the expectation with respect to the distribution P ( x ; γ ) and w ( . ) is the weight function. In Equation (5), let X be a standard hypergeometric random variable with probability mass function (pmf) p ( x ; γ ) = P r ( X = x ) , where γ = 1 as in Equation (1).
This weighted distribution concept is similar to that in Kokonendji et al. [9], for the weighted Poisson distribution (WPD). These authors related the log-convexity and log-concavity property of the weight function to the underdispersion and overdispersion characteristics of the COM–Poisson distribution. For this purpose, they in fact represented the COM–Poisson distribution as a weighted Poisson distribution. They also discussed the concept of pointwise duality between two weighted Poisson distributions and illustrated it with an example of the COM–Poisson and WPD family considered earlier by Castillo and Peŕez-Casany [10].
Definition 1
(Kokonendji et al. [9]). Let w 1 and w 2 be two positive Poisson weight functions. Then, the two corresponding WPDs are said to be pointwise dual (simply dual) if
w 1 × w 2 = 1 .
They show that if w is a positive Poisson weight function and for a given x, if w ( x ) is log-concave, then that WPD is also log-concave. The dual pair of WPDs is said to be closed if one of the distributions is overdispersed, and the other is underdispersed. This property makes the two distributions as a closed dual pair of the WPD family.
In a similar vein, the COM-H distribution in Equation (2) can be written in the form of a weighted hypergeometric distribution as
P ( X = x ) = n x γ 1 n x N n N p x j = 0 n n j γ N n N p j = w ( x ) n x N n N p x j = 0 n w ( j ) n j N n N p j ,
where the hypergeometric weight function (HWF) is given by w ( x ) = n x γ 1 .
We discuss two important results of this weight function. The first result relates the log-concavity (log-convexity) of w ( x ) to the weighted hypergeometric distribution (WHD). The second result is about the duality of the WHD.
Theorem 3.
Let w be a positive HWF. If for given x, w ( x ) is log-concave, then the corresponding COM-H, represented as a weighted hypergeometric distribution (WHD), is log-concave as well.
Proof. 
Let w be the positive HWF. We can then associate the log-concavity of w ( x ) , ∀x N , with that of the WHD. Consider p x as the COM - H ( γ , n , N p , N ) for a given x, then
p x p x + 2 p x + 1 2 = p w ( x + 2 ) / p w ( x + 1 ) p w ( x + 1 ) / p w ( x ) p w ( x + 1 ) p w ( x ) = w ( x + 1 ) ( n x ) ( N p x ) w ( x ) ( x + 1 ) ( N N p n + x + 1 ) p w ( x + 2 ) p w ( x + 1 ) = w ( x + 2 ) ( n x 1 ) ( N p x 1 ) w ( x + 1 ) ( x + 2 ) ( N N p n + x + 2 ) .
We have
p x p x + 2 p x + 1 2 < 1 logconcave w ( x + 2 ) w ( x ) ( n x 1 ) ( x + 1 ) ( N p x 1 ) ( N N p n + x + 1 ) w 2 ( x + 1 ) ( n x ) ( x + 2 ) ( N p x ) ( N N p n + x + 2 ) < 1 ,
which follows, since
( n x 1 ) ( x + 1 ) ( N p x 1 ) ( N N p n + x + 1 ) ( n x ) ( x + 2 ) ( N p x ) ( N N p n + x + 2 ) < 1 .
For the WHD pmf to be log-concave, we need
w ( x + 2 ) w ( x ) w 2 ( x + 1 ) < 1 .
Thus, we can state that if the HWF is log-concave, then the corresponding WHD is log-concave. Hence, the theorem. □
We now investigate the duality of the WHD based on the HWF, which has the following practical importance. The dual pair of WHDs is closed with respect to the property that the WHD is exhibiting overdispersion and underdispersion for the pair of weight functions w 1 ( x ) = n x γ 1 and w 2 ( x ) = n x 1 γ with γ 0 , respectively. Notice that the hypergeometric distribution can also be considered as a WHD family and it forms a “self-dual” [9] with the weight function w ( x ) = 1 , x N . With this weight function, we can see that w 1 ( x ) × w 2 ( x ) = 1 . The pointwise duality in the WHD family is displayed in Figure 3 below.
Figure 3 shows the closed dual pair of WHDs. At x = 0 and x = n ( = 10 ) , the weight functions give w 1 = w 2 = 1 . When the value of γ = 1.2 , the WHD is log-concave (underdispersed), and the HWF also shows log-concavity with γ = 1.2 . Similarly, for the value of γ = 0.2 , the WHD is log-convex (overdispersed), and the HWF also shows log-convexity.

7. Convergence of COM-H to COM–Poisson and COMP–Binomial

As is well known, the hypergeometric distribution approaches the binomial distribution, which in turn approaches the Poisson distribution under certain limiting conditions. In this section, we show that similar relationships hold between the COM-H, COMP–binomial, and COM–Poisson distributions.
Theorem 4.
The COM-H distribution (γ,n, N p ,N) approaches the COMP–binomial distribution, when N with (n,p) as fixed. Subsequently, the COMP–binomial distribution approaches to the COM–Poisson distribution (λ) when n and N p , and N are large such that N p > n, with λ = n γ N p N remaining fixed.
Proof. 
First, let us divide the numerator and the denominator of the COM-H pmf in (2) by N N p . Then, the numerator can be written as
n x γ N n N p x N N p = n x γ ( N p ) ! ( N N p n + x ) ! ( N p x ) ! ( N n ) ! ( N N p ) ! N ! = n x γ N p ( N p 1 ) ( N p 2 ) ( N p x + 1 ) N ( N 1 ) ( N x + 1 ) × ( N N p ) ( N N p 1 ) ( N N p n + x 1 ) ( N x ) ( N x + 1 ) ( N n + 1 ) .
When N p and N are large, we have, for a fixed x,
N p ( N p 1 ) ( N p 2 ) ( N p x + 1 ) N ( N 1 ) ( N x + 1 ) = ( N p ) x ( 1 1 N p ) ( 1 2 N p ) ( 1 x 1 N p ) ( N ) x ( 1 1 N ) ( 1 1 N ) ( N p ) x N x = p x ,
when x is small relative to N. Using a similar argument, we have
( N N p ) ( N N p 1 ) ( N N p n + x + 1 ) ( N x ) ( N x 1 ) ( N n + 1 ) ( N N p ) n x N n x = ( 1 p ) n x .
Thus, we can approximate the COM-H pmf by the COMP–binomial form as
COM - H ( n , N p , N , γ ) n x γ p x ( 1 p ) n x j = 0 n n j γ p j ( 1 p ) n j .
Similarly, the COM-H distribution can be shown to approach the COM–Poisson distribution when we take λ = n γ N p N with N p , N large, and n . Thus,
n x γ p x ( 1 p ) n x j = 0 n n j γ p j ( 1 p ) n j λ x x ! γ 1 m = 0 λ m m ! γ .
Hence, the theorem. □

8. Failure Rate

The monotonicity of the failure rate plays an important role while modeling failure time in reliability studies. As in, Gupta and Tripathi [11], we characterize the monotonicity of the failure rate in terms of the log-concavity and log-convexity of the distribution. We use the ratio of two consecutive probabilities to determine the log-concavity and log-convexity of the distribution. Let
η ( x ) = p ( x ) p ( x + 1 ) p ( x ) ;
then,
Δ η ( x ) = η ( x + 1 ) η ( x ) = p ( x + 1 ) p ( x ) p ( x + 2 ) p ( x + 1 ) .
The following result characterizes the monotonicity of the failure rate of the COM-H distribution.
Theorem 5.
The COM-H distribution is log-concave for positive γ, and thus has a nondecreasing failure rate (IFR) for γ > 0 .
Proof. 
For the COM-H distribution, we have
p ( x + 1 ) p ( x ) = n x x + 1 γ N p x N N p n + x + 1 .
and
p ( x + 2 ) p ( x + 1 ) = n x 1 x + 2 γ N p x 1 N N p n + x + 2
Now, for positive γ , we have
n x x + 1 γ > n x 1 x + 2 γ
and since
N p x N N p n + x + 1 > N p x 1 N N p n + x + 2 ,
it can be seen that Δ η ( x ) 0 for γ 0 . Thus, we can say that when γ 0 , the distribution is log-concave and has a nondecreasing failure rate (IFR). Hence, the theorem. □
Remark 2.
For γ < 0 , the COM-H distribution does not show monotonicity for the failure rate.
The behavior of the failure rate function, r ( x ) = log R ( x 1 ) R ( x ) (see Xie et al. [12]), where R ( x ) = P ( X > x ) , x = 1 , 2 , , is displayed in Figure 4. We observe that this model has a nondecreasing failure rate for different positive integer values of γ . The failure rate increases when the positive value of γ increases but only after a certain value of x, as can be seen in Figure 4.
Figure 5 displays graphs of the failure rate function for negative values of γ . It can be seen that, in this case, the failure rate does not have a definite pattern such as a DFR. In this case, the failure rate of the COM-H distribution exhibits a bathtub shape.

9. Inference for COM-H Model

In this section, we discuss the maximum likelihood estimation (MLE) of γ when p is known and unknown. The MLE of N is based on the modality of the pmf of the COM-H distribution, while the final estimate needs to be obtained by the use of numerical methods. Let x = ( x 1 , x 2 , , x m ) be a random sample of size m from the COM-H distribution. Then, from the pmf in (2), the likelihood function is given by
L ( γ , N , N p | X ) = i = 1 m n x i γ N n N p x i H ( γ , N p , N ) ,
and the corresponding log-likelihood is given by
l = i = 1 m γ log n x i + i = 1 m log N n N p x i m log j = 0 n n j γ N n N p j .

9.1. MLE of γ (with p Known)

The likelihood equation with respect to γ , obtained from (9), can be solved by the Newton–Raphson iterative method. Specifically, to evaluate γ numerically, we begin with an initial value γ ( 0 ) and update γ as
γ ( t ) = γ ( t 1 ) f ( γ ) f ( γ ) ,
until convergence is attained according to some desired criterion, where
l γ = f ( γ ) = i = 1 m log n x i m C H ( γ , log n j ) C H ( γ , 1 ) ,
with
C H ( γ , g ( j ) ) = j = 0 n n j γ N n N p j g ( j )
and
f ( γ ) = m C H ( γ , ( log n j ) 2 ) C H ( γ , 1 ) C H ( γ , log n j ) 2 C H ( γ , 1 ) 2 .
In the case when both γ and p are unknown, the parameters need to be estimated by maximizing the log-likelihood function in (9) with respect to both parameters, as detailed later in Section 9.2.

9.2. Maximum Likelihood Estimation of Both γ and p

For the estimation of both γ and p in the COM-H model, we proceed with the maximum likelihood estimation keeping N fixed. We used the optim function from the stats package in R programming language for this purpose.

Asymptotic Confidence Intervals for γ and p

Next, we investigated the performance of the MLEs of γ and p, which we denoted as γ ^ and p ^ , respectively, in terms of their asymptotic bias, standard errors, and coverage probabilities, based on 1000 Monte Carlo simulations when both parameters were unknown. The asymptotic normality of the MLEs was used to construct asymptotic confidence intervals for γ and p. The asymptotic standard errors of γ ^ and p ^ (the MLEs of γ and p) were obtained from the observed Fisher information matrix through the Hessian matrix. The square root of the diagonal elements of the inverse of Hessian matrix gave the estimates of the standard errors, s . e . ( γ ^ ) and s . e . ( p ^ ) , of the MLEs of γ and p, respectively. Thus, the asymptotic 100 ( 1 α ) % confidence intervals for γ and p were
γ ^ ± z α / 2 s . e . ( γ ^ ) , p ^ ± z α / 2 s . e . ( p ^ ) .
Table 1 and Table 2 provide the MLE, s.e., bias, root-mean-square error (RMSE), and coverage probabilities of 90 % and 95 % confidence intervals based on 1000 Monte Carlo simulations. Here, we used n = 10 , N = 80 and n = 20 , N = 80 , respectively. The results in these tables reveal that when γ has a large negative value ( γ 1.5 ) , the bias and the standard error of its estimate are both large. The MLE of p, on the other hand, behaves well with a small bias and s.e., and with coverage probabilities closer to the chosen nominal levels.

9.3. Likelihood Ratio Test for the Significance of γ

As mentioned earlier, the COM-H distribution reduces to the ordinary hypergeometric distribution with parameters N p and N when γ = 1 . Since the shape parameter γ controls the under- and overdispersion of the distribution, we can use the likelihood ratio test (LRT) to test the null hypothesis H 0 : γ = 1 against the alternative H 1 : γ 1 . In this section, we investigate the behavior of the statistical power of this test by means of simulations. Let x = ( x 1 , x 2 , , x m ) be a random sample of size m from the pmf p ( x , θ ) , where θ = ( γ , p ) , and denote the joint distribution by P ( x , θ ) = p ( x 1 , θ ) p ( x 2 , θ ) p ( x m , θ ) . The restricted MLE of θ is θ 0 ^ = ( 1 , p ^ ) (under the null hypothesis) and the unrestricted MLE of θ is given by θ ^ = ( γ ^ , p ^ ) under the alternative. The log-likelihood function of the COM-H distribution, when θ is the parameter of interest, is given by L ( θ | x ) = log P ( x , θ ) . The likelihood ratio test of Neyman and Pearson [13] for testing H 0 : γ = 1 is given by
Λ = L ( θ ^ 0 | x ) L ( θ ^ | x ) .
By comparing the log-likelihood function under the null and alternative hypotheses, this test gives evidence whether the deviation of one model from the other is statistically significant. The LR test statistic is
L R = 2 log Λ = 2 [ log L ( θ ^ | x ) log L ( θ ^ 0 | x ) ] .
Under H 0 , the asymptotic distribution of LR is that of a chi-squared random variable with one degree of freedom (d.f). We assessed the performance of this test by conducting a power study with Monte Carlo simulations. For this purpose, we chose a significance level α of 5% and 10 % and the sample sizes m = 100 (small), 500 (moderate), and 1000 (large). The departure from the null hypothesis could be explained by an effect size | γ 1 | , which served as the index of departure. We chose values of γ = −0.5, 0.5, 1.2, 1.4, 1.6, and 2 with the corresponding effect sizes 1.5 , 0.5 , 0.2 , 0.4 , 0.6 , and 1, respectively. The results of the simulation study are presented in Table 3. We fixed N = 100 , n = 10 , and took p = 0.3 , 0.5 in Table 3. The statistical power greatly depended on m and the effect size. It was observed that the power of the LR test did not vary much for different values of p. The larger the value of m and the effect size, the higher the power of the test. Moreover, the introduction of γ was useful when the amount of underdispersion in the dataset was different from that of the hypergeometric distribution.

10. Discussion

We introduced a Conway–Maxwell type hypergeometric distribution and its salient characteristics in this paper. In subsequent publications, we will show some of its applications in fault detection for software reliability, feature it in acceptance sampling methods for quality control, and describe it through operating characteristic curves, where the hypergeometric distribution is widely used. Furthermore, we will propose parameter estimation techniques for fitting the COM-H distribution and compare it with current estimation methods.

Author Contributions

Conceptualization, S.R. and R.C.T.; Methodology, S.R. and N.B.; Validation, R.C.T.; Resources, R.C.T.; Writing—original draft, S.R.; Writing—review & editing, R.C.T.; Supervision, R.C.T. and N.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The authors declare that no data is used.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shmueli, G.; Borle, S.; Minka, P.T.; Kadane, J.B.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. J. R. Stat. Soc. Ser. C 2005, 54, 127–142. [Google Scholar] [CrossRef]
  2. Borges, P.; Rodrigues, J.; Balakrishnan, N.; Bazan, J. A COM-Poisson type generalization of the binomial distribution and its properties and application. Stat. Probab. Lett. 2014, 87, 158–166. [Google Scholar] [CrossRef]
  3. Roy, S.; Tripathi, R.C.; Balakrishnan, N. A Conway Maxwell Poisson type generalization of the negative hypergeometric distribution. Commun. Stat. Theory Methods 2019, 49, 2410–2428. [Google Scholar] [CrossRef]
  4. Chakraborty, S.; Ong, S.H. A COM-Poisson type generalization of the negative binomial distribution. Commun. Stat. Theory Methods 2016, 45, 4117–4135. [Google Scholar] [CrossRef]
  5. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions; Dover: New York, NY, USA, 1972. [Google Scholar]
  6. Tripathi, R.C.; Gurland, J. A general family of discrete distributions with hypergeometric probabilities. J. R. Stat. Soc. Ser. B 1977, 39, 349–356. [Google Scholar] [CrossRef]
  7. Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions, 3rd ed.; Wiley: New York, NY, USA, 2005. [Google Scholar]
  8. Kemp, C.D.; Kemp, A.W. Generalized hypergeometric distributions. J. R. Stat. Soc. Ser. B 1956, 18, 202–211. [Google Scholar] [CrossRef]
  9. Kokonendji, C.C.; Mizere, D.; Balakrishnan, N. Connections of the Poisson weight function to over-dispersion and under-dispersion. J. Stat. Plan. Inference 2007, 138, 1287–1296. [Google Scholar] [CrossRef]
  10. Castillo, J.; Perez-Casany, M. Weighted Poisson distributions for overdispersion and underdispersion situations. Ann. Inst. Stat. Math. 1998, 50, 567–585. [Google Scholar] [CrossRef]
  11. Gupta, P.L.; Gupta, R.C.; Tripathi, R.C. On the monotonic properties of discrete failure rates. J. Stat. Plan. Inference 1997, 65, 255–268. [Google Scholar] [CrossRef]
  12. Xie, M.; Gaudoin, O.; Bracquemond, C. Redefining failure rate function for discrete distribution. Int. J. Reliab. Qual. Saf. Eng. 2002, 9, 275–285. [Google Scholar] [CrossRef]
  13. Neyman, J.; Pearson, E.S. On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 1928, 20, 175–240. [Google Scholar]
Figure 1. Comparison of the pmfs of Hypergeometric and COM-H ( n = 10 , N p = 20 , N = 50 ) for Different Values of Positive Integer γ and γ = 0 .
Figure 1. Comparison of the pmfs of Hypergeometric and COM-H ( n = 10 , N p = 20 , N = 50 ) for Different Values of Positive Integer γ and γ = 0 .
Mathematics 11 00762 g001
Figure 2. Comparison of the pmfs of Hypergeometric and COM-H Distributions ( n = 10 , N p = 20 , N = 50 ) for Different Values of Negative Integer γ and γ = 0 .
Figure 2. Comparison of the pmfs of Hypergeometric and COM-H Distributions ( n = 10 , N p = 20 , N = 50 ) for Different Values of Negative Integer γ and γ = 0 .
Mathematics 11 00762 g002
Figure 3. Comparison of the weight functions showing duality and closed pair of WHDs.
Figure 3. Comparison of the weight functions showing duality and closed pair of WHDs.
Mathematics 11 00762 g003
Figure 4. Failure rate function of COM-H for positive values of integer γ .
Figure 4. Failure rate function of COM-H for positive values of integer γ .
Mathematics 11 00762 g004
Figure 5. Failure rate function of COM-H for negative values of γ from [−1.5, −1].
Figure 5. Failure rate function of COM-H for negative values of γ from [−1.5, −1].
Mathematics 11 00762 g005
Table 1. Coverage probabilities (C.P.) of the asymptotic confidence intervals for different values of γ and p, along with bias, RMSE, and s.e. of the MLEs for COM-H ( n = 10 , N = 80 ) .
Table 1. Coverage probabilities (C.P.) of the asymptotic confidence intervals for different values of γ and p, along with bias, RMSE, and s.e. of the MLEs for COM-H ( n = 10 , N = 80 ) .
ParametersEstimate (s.e.)BiasRMSEC.P. (90%)C.P. (95%)
γ p γ ^ (s.e.) p ^ (s.e.) γ p γ p γ p γ p
−20.4−2.031
(0.166)
0.39
(0.009)
0.0310.00140.160.0070.9230.9310.9720.971
−1.50.4−1.51
(0.094)
0.398
(0.005)
0.0090.0010.0770.0070.9060.8970.9550.958
−1.00.4−1.001
(0.041)
0.399
(0.005)
0.00120.00060.0430.0070.8960.9090.9460.949
−0.50.4−0.497
(0.026)
0.399
(0.005)
−0.0020.00080.02710.0060.9030.9040.950.954
0.50.40.502
(0.0344)
0.399
(0.0055)
−0.00170.00030.03490.00560.8930.9040.9420.95
1.00.41.003
(0.0547)
0.399
(0.006)
−0.00280.00010.0530.00590.9050.920.9510.962
1.50.41.5
(0.077)
0.399
(0.0067)
−0.00320.00010.0760.00680.890.8960.9460.952
2.00.42.0
(0.091)
0.399
(0.0069)
−0.00490.00010.1030.00750.880.8940.9390.943
−2.00.6−2.023
(0.175)
0.601
(0.005)
0.023−0.0010.1590.0070.9050.9220.9580.961
−1.50.6−1.502
(0.072)
0.601
(0.007)
0.002−0.0010.0790.0070.8990.9120.9490.966
−10.6−0.996
(0.044)
0.601
(0.007)
−0.004−0.0010.0410.0070.8970.9210.9500.962
−0.50.6−0.499
(0.028)
0.601
(0.006)
−0.001−0.0010.0270.0060.9030.9160.9580.963
0.50.60.503
(0.034)
0.601
(0.006)
−0.003−0.0010.0350.0060.8980.8870.9520.953
10.60.981
(0.054)
0.599
(0.006)
0.0190.0010.0340.0050.9580.9420.9760.974
1.50.61.504
(0.077)
0.60
(0.007)
−0.0040.0000.0750.0070.9030.9130.9580.962
20.62.006
(0.091)
0.60
(0.007)
−0.0060.0000.0960.0070.9010.9210.9620.959
Table 2. Coverage probabilities (C.P.) of the asymptotic confidence intervals for different values of γ and p, along with bias, RMSE, and s.e. of the MLEs for COM-H ( n = 20 , N = 80 ) .
Table 2. Coverage probabilities (C.P.) of the asymptotic confidence intervals for different values of γ and p, along with bias, RMSE, and s.e. of the MLEs for COM-H ( n = 20 , N = 80 ) .
ParameterEstimate (s.e)BiasRMSEC.P. (90%)C.P. (95%)
γ p γ ^ (s.e) p ^ (s.e) γ p γ p γ p γ p
−1.50.6−1.517
(0.152)
0.599
(0.072)
0.0170.0010.1000.0040.9910.9880.9970.988
−10.6−1.009
(0.127)
0.597
(0.064)
0.0090.0030.0420.0070.9920.9870.9990.988
−0.50.6−0.486
(0.047)
0.607
(0.019)
−0.014−0.0070.0530.0230.9050.8890.9490.942
0.50.60.517
(0.037)
0.602
(0.005)
−0.017−0.0020.0290.0040.9490.9330.9670.970
10.61.002
(0.065)
0.600
(0.005)
−0.0020.0000.0580.0050.9110.8970.9500.944
1.50.61.503
(0.080)
0.600
(0.005)
−0.0030.0000.0830.0060.8930.9000.9510.934
2.00.62.003
(0.103)
0.600
(0.006)
−0.0030.0000.1050.0060.9040.8980.9500.945
−10.4−0.923
(0.364)
0.357
(0.195)
−0.0770.0430.1250.0650.9950.9920.9980.993
−0.50.4−0.514
(0.051)
0.405
(0.021)
0.014−0.0050.0250.0110.9450.9390.9670.953
0.50.40.516
(0.036)
0.398
(0.005)
−0.0150.0020.0270.0040.9540.9520.9770.974
1.00.41.005
(0.061)
0.400
(0.005)
−0.0050.0000.0560.0050.9260.9000.9680.955
1.50.41.506
(0.081)
0.400
(0.005)
−0.0060.0000.0860.0050.8880.8950.9370.944
2.00.42.010
(0.094)
0.400
(0.005)
−0.0100.0000.1020.0050.9130.9220.9620.958
Table 3. Simulation of power for the likelihood ratio test.
Table 3. Simulation of power for the likelihood ratio test.
p = 0.3 , n = 10 , N = 100 p = 0.5 , n = 10 , N = 100
γ −0.50.511.21.41.62-0.50.511.21.41.62
m α Effect size1.50.500.20.40.611.50.500.20.40.61
PowerPower
1000.05 110.0580.1420.5450.8640.998110.0520.1990.550.8430.993
0.1 110.170.2420.6790.9320.999110.120.3040.6770.910.999
5000.05 110.0520.6830.99711110.0510.710.99811
0.1 110.120.880.99911110.110.815111
10000.05 110.050.951111110.050.947111
0.1 110.0970.974111110.1010.973111
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Roy, S.; Tripathi, R.C.; Balakrishnan, N. A Conway–Maxwell–Poisson Type Generalization of Hypergeometric Distribution. Mathematics 2023, 11, 762. https://doi.org/10.3390/math11030762

AMA Style

Roy S, Tripathi RC, Balakrishnan N. A Conway–Maxwell–Poisson Type Generalization of Hypergeometric Distribution. Mathematics. 2023; 11(3):762. https://doi.org/10.3390/math11030762

Chicago/Turabian Style

Roy, Sudip, Ram C. Tripathi, and Narayanaswamy Balakrishnan. 2023. "A Conway–Maxwell–Poisson Type Generalization of Hypergeometric Distribution" Mathematics 11, no. 3: 762. https://doi.org/10.3390/math11030762

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop