Next Article in Journal
Distributed Optimization Control for Heterogeneous Multiagent Systems under Directed Topologies
Previous Article in Journal
Analysis of a Queuing System with Possibility of Waiting Customers Jockeying between Two Groups of Servers
Previous Article in Special Issue
A New Quantile-Based Approach for LASSO Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms

1
Department of Statistics and Data Science, Southern University of Science and Technology, Shenzhen 518055, China
2
Department of Supply Chain and Information Management, The Hang Seng University of Hong Kong, Shatin, N.T., Hong Kong, China
3
Department of Mathematics, Southern University of Science and Technology, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(6), 1478; https://doi.org/10.3390/math11061478
Submission received: 17 January 2023 / Revised: 8 March 2023 / Accepted: 15 March 2023 / Published: 17 March 2023
(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Abstract

:
Under-dispersed count data often appear in clinical trials, medical studies, demography, actuarial science, ecology, biology, industry and engineering. Although the generalized Poisson (GP) distribution possesses the twin properties of under- and over-dispersion, in the past 50 years, many authors only treat the GP distribution as an alternative to the negative binomial distribution for modeling over-dispersed count data. To our best knowledge, the issues of calculating maximum likelihood estimates (MLEs) of parameters in GP model without covariates and with covariates for the case of under-dispersion were not solved up to now. In this paper, we first develop a new minimization–maximization (MM) algorithm to calculate the MLEs of parameters in the GP distribution with under-dispersion, and then we develop another new MM algorithm to compute the MLEs of the vector of regression coefficients for the GP mean regression model for the case of under-dispersion. Three hypothesis tests (i.e., the likelihood ratio, Wald and score tests) are provided. Some simulations are conducted. The Bangladesh demographic and health surveys dataset is analyzed to illustrate the proposed methods and comparisons with the existing Conway–Maxwell–Poisson regression model are also presented.

1. Introduction

Under-dispersed count data often appear in clinical trials, medical studies, demography, actuarial science, ecology, biology, industry and engineering. Examples include the number of embryonic deaths in mice in a clinical experiment [1], the number of power outages on each of 648 circuits in a power distribution system in the southeastern United States [2], the number of automotive services purchased on each visit for a customer at a US automotive services firm [3], the species richness that is the simplest measure of species diversity [4], the number of births during a period for women who live in Bangladesh (https://www.dhsprogram.com/data, accessed on 28 January 2022) and so on.
The Poisson distribution is suitable for modeling equally dispersed count data, while the negative binomial distribution is often utilized to model over-dispersed count data. To fit under-dispersed count data, theoretically speaking, researchers should employ the generalized Poisson (GP) distribution because it possesses the twin properties of under- and over-dispersion [5,6,7,8,9,10,11,12,13]. However, in the past 50 years, most authors just treat the GP distribution as an alternative to the negative binomial distribution by eyeing the former’s over-dispersion property, while seeming to ignore its under-dispersion characteristic [6,8,9,10,11,12,13], although Consul & Famoye [6] proved that there exist unique MLEs of parameters for both over- and under-dispersion cases. The main reason for hindering researchers from using the GP distribution with under-dispersion is that calculating the maximum likelihood estimates (MLEs) of parameters in GP models without/with covariates by a stable algorithm is not so easy. To our best knowledge, the issue of calculating MLEs of parameters in the GP model without/with covariates for the case of under-dispersion was not solved up to now; in other words, we may not obtain the correct MLEs of parameters by using the existing algorithms, see Section 5.
A non-negative integer-valued random variable (r.v.) X is said to follow a generalized Poisson (GP) distribution with parameters λ > 0 and ψ , denoted by X G P ( λ , ψ ) , if its probability mass function (pmf) is given by [5,14]
p ( x | λ , ψ ) = λ ( λ + ψ x ) x 1 e λ ψ x x ! , x = 0 , 1 , , , 0 , x > r , when ψ < 0 ,
where max ( 1 , λ / r ) < ψ < 1 and r ( 4 ) is the largest positive integer for which λ + ψ r > 0 when ψ < 0 . The expectation and variance of X are given by [14]
E ( X ) = λ 1 ψ and Var ( X ) = λ ( 1 ψ ) 3 ,
respectively. The G P ( λ , ψ ) distribution reduces to the Poisson ( λ ) when ψ = 0 , and it has the twin properties of over-dispersion when ψ > 0 and under-dispersion when ψ < 0 .
To formulate the mean regression of the GP distribution, Consul & Famoye [7] introduced a so-called Type I generalized Poisson ( GP ( I ) ) distribution, denoted by Y GP ( I ) ( μ , α ) , through the following reparameterizations:
μ = λ ( 1 ψ ) 1 > 0 and α = ( 1 ψ ) 1 .
It is easy to show that the pmf of Y is
p ( y | μ , α ) = μ [ μ + ( α 1 ) y ] y 1 exp [ μ + ( α 1 ) y ] / α α y y ! , y = 0 , 1 , , , 0 , y > m , when α < 1 ,
where μ > 0 , α max ( 1 / 2 , 1 μ / m ) and m ( 4 ) is the largest positive integer for which μ + ( α 1 ) m > 0 when α < 1 . The mean and variance of Y are given by:
E ( Y ) = μ and Var ( Y ) = α 2 μ ,
respectively, where α denotes the square root of the index of dispersion. The GP ( I ) ( μ , α ) distribution reduces to the Poisson ( μ ) when α = 1 , and it has the twin properties of over-dispersion when α > 1 and under-dispersion when α < 1 . Thus, the mean regression model for the GP ( I ) distribution is [7]
{ Y i } i = 1 n ind GP ( I ) ( μ i , α ) and log ( μ i ) = w i β , i = 1 , , n ,
where the notation “ { Y i } i = 1 n ind GP ( I ) ( μ i , α ) ” means that Y 1 , , Y n follow the same GP ( I ) distribution family but with different mean parameters μ 1 , , μ n , and Y 1 , , Y n are independent; w i = ( 1 , w i 1 , , w i , q 1 ) is the covariate vector of subject i and β = ( β 0 , β 1 , , β q 1 ) is the vector of regression coefficients.
This paper mainly focuses on developing two new MM algorithms to stably calculate the MLEs of parameters in the GP ( I ) ( μ , α ) distribution with under-dispersion (i.e., α < 1 ) and the MLEs of the vector β of regression coefficients and the parameter α for the mean regression model in (3). Besides, we want to compare the performance of goodness-of-fit and computational efficiency between the GP ( I ) mean regression model and the Conway–Maxwell–Poisson regression model in simulations and real data analysis.

2. MLEs of Parameters in Generalized Poisson with Under–Dispersion and Its Mean Regression Model

Let { Y i } i = 1 n iid GP ( I ) ( μ , α ) and Y obs = y i i = 1 n denote the observed counts. Define
I 0 { i : y i = 0 , 1 i n } , I 1 { i : y i = 1 , 1 i n } and I 2 { i : y i 2 , 1 i n } .
Let m k denote the number of elements in I k for k = 0 , 1 , 2 , then we have m 0 + m 1 + m 2 = n . Based on (2), the likelihood function of { μ , α } is
L ( μ , α ) = i I 0 e μ α i I 1 μ α e μ + α 1 α × i I 2 μ [ μ + ( α 1 ) y i ] y i 1 exp [ μ + ( α 1 ) y i ] / α α y i y i ! exp m 0 μ α · μ α m 1 exp m 1 ( μ + α 1 ) α × μ m 2 α i I 2 y i exp [ m 2 μ + ( α 1 ) i I 2 y i ] α · i I 2 [ μ + ( α 1 ) y i ] y i 1 ,
where i I 2 y i = n y ¯ m 1 and y ¯ = ( 1 / n ) i = 1 n y i . Then, the log-likelihood function of { μ , α } is given by
( μ , α ) = m 0 μ α + m 1 [ log ( μ ) log ( α ) ] m 1 ( μ + α 1 ) α + m 2 log ( μ ) ( n y ¯ m 1 ) log ( α ) m 2 μ + ( α 1 ) ( n y ¯ m 1 ) α + i I 2 ( y i 1 ) log [ y i ( y i 1 μ + α 1 ) ] = ( m 1 + m 2 ) log ( μ ) n y ¯ log ( α ) n ( μ y ¯ ) α n y ¯ + i I 2 ( y i 1 ) log ( y i ) + i I 2 ( y i 1 ) log ( y i 1 μ + α 1 ) = ( m 1 + m 2 ) log ( μ ) n y ¯ log ( α ) n ( μ y ¯ ) α + i I 2 ( y i 1 ) log ( y i 1 μ + α 1 ) + c 1 ,
where c 1 is a constant free from { μ , α } .

2.1. MLEs of { μ , α } via a New MM Algorithm

This subsection aims to find the MLEs of { μ , α } for the case of α < 1 . Define y max max i I 2 y i . Because y i 1 μ + α 1 > 0 for all i I 2 , we have y max 1 μ + α 1 > 0 . Thus, we obtain
log ( y i 1 μ + α 1 ) = log y i 1 y max 1 μ + y max 1 μ + α 1 ( A 2 ) v i ( t , t ) log ( μ ) + 1 v i ( t , t ) log [ μ + ( α 1 ) y max ] + c 2 i ( t ) ,
for all i I 2 , where
v i ( t , t ) v i ( μ ( t ) , α ( t ) ) a n d v i ( μ , α ) ( y i 1 y max 1 ) μ y i 1 μ + α 1 , i I 2 ,
and c 2 i ( t ) is a constant free from { μ , α } .
By combining (4) and (5), we have
( μ , α ) a 1 ( t , t ) log ( μ ) n y ¯ log ( α ) n ( μ y ¯ ) α + a 2 ( t , t ) log [ μ + ( α 1 ) y max ] + c 3 ( t ) Q ( μ , α | μ ( t ) , α ( t ) ) ,
which minorizes ( μ , α ) at ( μ , α ) = ( μ ( t ) , α ( t ) ) , where
a 1 ( t , t ) = m 1 + m 2 + i I 2 ( y i 1 ) v i ( t , t ) , a 2 ( t , t ) = i I 2 ( y i 1 ) 1 v i ( t , t ) ,
and c 3 ( t ) is a constant free from { μ , α } . Thus, by maximizing Q ( μ , α | μ ( t ) , α ( t ) ) , we have the following MM iterates:
μ ( t + 1 ) = a 3 ( α ( t ) ) + a 3 2 ( α ( t ) ) + 4 n ( 1 1 / α ( t ) ) y max a 1 ( t , t ) 2 n × α ( t ) and
α ( t + 1 ) = a 4 ( μ ( t + 1 ) ) + a 4 2 ( μ ( t + 1 ) ) + 4 n y ¯ a 2 ( t + 1 , t ) × a 5 ( μ ( t + 1 ) ) 2 n y ¯ a 2 ( t + 1 , t ) ,
where
a 3 ( α ) = n ( 1 α 1 ) y max + n y ¯ , a 4 ( μ ) = n μ ( 1 y ¯ y max 1 ) and a 5 ( μ ) = n ( μ y ¯ ) ( μ y max 1 1 ) .
According to the one-to-one transformation (1), we can obtain the MLEs of { λ , ψ } as
ψ ^ = 1 α ^ 1 a n d λ ^ = α ^ 1 μ ^ ,
where { μ ^ , α ^ } can be calculated through (6) and (7).

2.2. MLEs of { β , α } in the Mean Regression Model

In this subsection, we consider the mean regression model (3) with α < 1 . Similar to (4), the log-likelihood function of { β , α } is given by
( β , α ) = i = 1 n b i 1 w i β μ i y i α y i log ( α ) + b i 2 log ( y i 1 μ i + α 1 ) + c 4 ,
where b i 1 I ( y i 1 ) , b i 2 ( y i 1 ) I ( y i 2 ) , μ i = exp ( w i β ) , and c 4 is a constant free from { β , α } . The goal is to calculate the MLEs of { β , α } .

2.2.1. MLE of β Given { β ( t ) , α }

Since μ i / β = μ i w i , we have
log ( y i 1 μ i + α 1 ) β = y i 1 μ i y i 1 μ i + α 1 w i and 2 log ( y i 1 μ i + α 1 ) β β = ( α 1 ) y i 1 μ i ( y i 1 μ i + α 1 ) 2 w i w i .
According to (8), we know that y i 1 μ i + α 1 > 0 , thus 0 < 1 α < y i 1 μ i . Given β ( t ) and α , to calculate the ( t + 1 ) -th approximation of β ^ , we first restrict β in the following convex set
C ( t ) = β : y i 1 μ i T i ( t ) ( α ) 1 2 1 α + y i 1 μ i ( t ) , i I 2 ,
where T i ( t ) ( α ) is the midpoint of the two endpoints of the open interval ( 1 α , y i 1 μ i ( t ) ) and μ i ( t ) exp ( w i β ( t ) ) . Then, for any i I 2 , since α 1 < 0 , we have
( α 1 ) y i 1 ( y i 1 μ i + α 1 ) 2 ( 10 ) ( α 1 ) y i 1 [ T i ( t ) ( α ) + α 1 ] 2 b i 3 ( t ) ( α ) .
On the other hand, we define
h i ( t ) ( β | α ) = log μ i + ( α 1 ) y i b i 3 ( t ) ( α ) μ i , i I 2 .
By combining (9) with (11), we have
2 h i ( t ) ( β | α ) β β 0 ;
i.e., 2 h i ( t ) ( β | α ) / β β is a positive semi-definite matrix. By applying the second-order Taylor expansion of h i ( t ) ( β | α ) around β ( t ) , we have
h i ( t ) ( β | α ) h i ( t ) ( β ( t ) | α ) + b i 4 ( t ) ( α ) × ( β β ( t ) ) w i ,
where the equality holds iff β = β ( t ) , and b i 4 ( t ) ( α ) { [ μ i ( t ) + ( α 1 ) y i ] 1 b i 3 ( t ) ( α ) } μ i ( t ) . Let 1 ( β | α ) denote the conditional log-likelihood function of β given α , we have
1 ( β | α ) = ( 8 ) ( β , α ) ( 12 ) & ( 13 ) i = 1 n b i 1 + b i 2 b i 4 ( t ) ( α ) w i β α 1 b i 2 b i 3 ( t ) ( α ) exp ( w i β ) + c 5 ( t ) Q 1 ( β | β ( t ) , α ) ,
which minorizes 1 ( β | α ) at β = β ( t ) , where c 5 ( t ) is a constant free from β .
Note that the Q 1 ( β | β ( t ) , α ) is a weighted log-likelihood function of β for the Poisson regression model with weight vector ( α 1 b 12 b 13 ( t ) ( α ) , , α 1 b n 2 b n 3 ( t ) ( α ) ) and observations Y obs * = { y i * } i = 1 n with
y i * = b i 1 + b i 2 b i 4 ( t ) ( α ) α 1 b i 2 b i 3 ( t ) ( α ) , i = 1 , , n .
We can calculate the MLEs of β , denoted by β * ( t + 1 ) , of the weighted Poisson regression model directly through the built-in ‘glm’ function in the VGAM R package. Since β * ( t + 1 ) is restricted in the convex set C ( t ) , we project β * ( t + 1 ) on the convex set C ( t ) , and calculate the ( t + 1 ) -th approximation of β ^ as
β ( t + 1 ) = β ( t ) + s ( t ) ( β * ( t + 1 ) β ( t ) ) ,
where
s ( t ) min min i I 2 s i ( t ) , 1 and s i ( t ) log [ T i ( t ) ( α ) y i ] w i β ( t ) w i ( β * ( t + 1 ) β ( t ) ) I ( w i ( β * ( t + 1 ) β ( t ) ) < 0 ) + I ( w i ( β * ( t + 1 ) β ( t ) ) 0 ) .

2.2.2. MLE of α Given { β , α ( t ) }

Define T min ( β ) min i I 2 ( y i 1 μ i ) . Given β , we have
log ( y i 1 μ i + α 1 ) = log [ y i 1 μ i T min ( β ) ] + [ T min ( β ) + α 1 ] ( A 2 ) u i ( β , α ( t ) ) log T min ( β ) + α 1 + c 6 ( t ) , i I 2 ,
where c 6 ( t ) is a constant free from α and
u i ( β , α ) T min ( β ) + α 1 y i 1 μ i + α 1 .
Let 2 ( α | β ) denote the conditional log-likelihood function of α given β , we have
2 ( α | β ) = ( 8 ) ( β , α )
( 15 ) i = 1 n y i μ i α y i log ( α ) + b i 2 · u i ( β , α ( t ) ) log T min ( β ) + α 1 + c 7 ( t ) Q 2 ( α | β , α ( t ) ) ,
which minorizes 2 ( α | β ) at α = α ( t ) , where c 7 ( t ) is a constant free from α . By setting Q 2 ( α | β , α ( t ) ) / α = 0 , we have the following MM iterates:
α ( t + 1 ) = min ( α * ( t + 1 ) , 1 ) ,
where
α * ( t + 1 ) d 2 ( β , α ( t ) ) + b 2 2 ( β , α ( t ) ) 4 d 1 ( β , α ( t ) ) d 3 ( β ) 2 d 1 ( β , α ( t ) ) , d 1 ( β , α ) i = 1 n b i 2 u i ( β , α ) n y ¯ , d 2 ( β , α ) n [ μ ¯ y ¯ T min ( β ) ] , d 3 ( β ) n [ T min ( β ) 1 ] ( μ ¯ y ¯ ) , μ ¯ = 1 n i = 1 n μ i .

3. Hypothesis Testing

For the GP ( I ) mean regression model (3), suppose that we are interested in testing the following general null hypothesis:
H 0 : C θ = c r against H 1 : C θ c r ,
where C is a known r × ( q + 1 ) matrix with rank ( C ) = r 0 < ( q + 1 ) , θ = ( β , α ) is the vector of parameters and c r is a known r × 1 vector.

3.1. The Likelihood Ratio Test

Let ( θ ) ( β , α ) be given by (8). The likelihood ratio statistic is given by
T L = 2 ( θ ^ ) ( θ ^ H 0 ) ,
where θ ^ is the unconstrained MLEs of θ , which can be calculated by the MM algorithm (14) and (16); while θ ^ H 0 is the constrained MLEs of θ under H 0 . T L asymptotically follows a chi-squared distribution with r 0 degrees of freedom. The corresponding p-value is
p L = Pr ( T L > t L | H 0 ) = Pr ( χ 2 ( r 0 ) > t L ) ,
where t L is the estimated likelihood ratio statistic.

3.2. The Wald Test

The Wald statistic is given by
T W = ( C θ ^ c r ) C I 1 ( θ ^ ) C 1 ( C θ ^ c r ) ,
where θ ^ denotes the unconstrained MLEs of θ and I ( θ ^ ) is the Fishier information matrix (see Appendix B) evaluated at θ = θ ^ . T W is asymptotically distributed as a chi-squared distribution with r 0 degrees of freedom. The corresponding p-value is
p W = Pr ( T W > t W | H 0 ) = Pr ( χ 2 ( r 0 ) > t W ) ,
where t W is the estimated Wald statistic.

3.3. The Score Test

The score statistic is given by
T S = [ s ( θ ^ H 0 ) ] I 1 ( θ ^ H 0 ) s ( θ ^ H 0 ) ,
where θ ^ H 0 denotes the constrained MLEs of θ under H 0 , and
s ( θ ) ( θ ) θ = ( θ ) β 0 , ( θ ) β 1 , , ( θ ) β q 1 , ( θ ) α ,
with details being presented in Appendix B. T S is asymptotically distributed as a chi-squared distribution with r 0 degrees of freedom. The corresponding p-value is
p S = Pr ( T S > t S | H 0 ) = Pr ( χ 2 ( r 0 ) > t S ) ,
where t S is the estimated score statistic.

4. Simulations

4.1. Accuracy of MLEs of Parameters

To investigate the accuracy of MLEs of parameters, we consider dimensions: q = 2 , 4 . The sample sizes are set to be n = 100 , 200 , 400 ; α = 0.6 , 0.8 , 0.95 and other parameters are set as follows:
(A1)
When q = 2 , β = ( 1 , 1 ) ; w i = ( 1 , w i 1 ) , { w i 1 } i = 1 n iid N ( 0.3 , σ 0 2 ) with σ 0 2 = 0.5 ;
(B1)
When q = 4 , β = ( 1 , 1 , 2 , 2 ) ; w i = ( 1 , w i 1 , w i 2 , w i 3 ) , { w i 1 } i = 1 n iid N ( 0.3 , 0.5 ) , { w i 2 } i = 1 n iid U ( 0 , 1 ) , { w i 3 } i = 1 n iid Bernoulli ( 0.5 ) .
For a given { q , n , β , α } , we first generate { w i } i = 1 n , and then generate { Y i = y i } i = 1 n iid GP P ( I ) ( w i β , α ) by the inversion method [15] based on the pmf given by (2). Then, we can calculate the MLEs { β ^ , α ^ } via the MM algorithm (14) and (16) with the generated { y i } i = 1 n and corresponding covariate vectors { w i } i = 1 n . Finally, we independently repeat this process 10,000 times.
The resultant average bias (denoted by Bias; i.e., average MLE minus the true value of the parameter) and the mean square error (denoted by MSE; i.e., Bias2 + (standard deviation) 2 , the standard deviation is estimated by the sample standard deviation of 10,000 MLEs) are reported in Table 1 and Table 2.
Table 1 and Table 2 showed that the absolute values of Bias and MSE tend to zero with the growth of data size for each parameter in Cases (A1) and (B1). For fixing else parameters, the absolute values of Bias and MSE are small for a small α .

4.2. Hypothesis Testing

In this subsection, we explore the performances of the likelihood ratio, Wald and score statistics presented in (18)–(20) for the hypothesis testing in (17) with various parameter configurations. The sample sizes are set to be n = 50 ( 50 ) 400 , where n 1 ( s ) n 2 means from n 1 to n 2 with step size s, and other parameters are set as follows:
(A2)
When q = 2 , β = ( β 0 , β 1 ) , α is set to be 0.75 , 0.85 , 0.95 , C = ( 0 1 0 ) , c r = 0 and θ = ( β , α ) , so that (17) becomes H 0 : β 1 = 0 . The true value of β in H 0 is β = ( 1 , 0 ) , while the value of β in H 1 is β = ( 1 , 0.5 ) . We generate { w i 1 } i = 1 n iid N ( 0.1 , 0.2 ) and set w i = ( 1 , w i 1 ) ;
(B2)
When q = 4 , β = ( β 0 , β 1 , β 2 , β 3 ) , α is set to be 0.75 , 0.85 , 0.95 ,
C = 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 , c r = 0 0 3 a n d θ = ( β , α ) ,
so that (17) becomes H 0 : β 1 = β 2 = β 3 = 0 . The true value of β in H 0 is β = ( 1 , 0 , 0 , 0 ) and the value of β in H 1 is β = ( 1 , 1 , 1 , 0.5 ) . We generate { w i 1 } i = 1 n iid N ( 0.1 , 0.05 ) , { w i 2 } i = 1 n iid U ( 0 , 0.1 ) , { w i 3 } i = 1 n iid 0.4 × Bernoulli ( 0.5 ) , and set w i = ( 1 , w i 1 , w i 2 , w i 3 ) ;
(A3)
When q = 2 , β = ( 1 , 1 ) , C = ( 0 0 1 ) , c r = 1 and θ = ( β , α ) , so that (17) becomes H 0 : α = 1 . The alternative values of α in H 1 are set as 0.9 and 0.95. We generate { w i 1 } i = 1 n iid N ( 1 , 0.1 ) and set w i = ( 1 , w i 1 ) ;
(B3)
When q = 4 , β = ( 1 , 1 , 1 , 0.5 ) , C = ( 0 0 0 0 1 ) , c r = 1 and θ = ( β , α ) , so that (17) becomes H 0 : α = 1 . The alternative values of α in H 1 are set as 0.9 and 0.95. We generate { w i 1 } i = 1 n iid N ( 0.1 , 0.05 ) , { w i 2 } i = 1 n iid U ( 0 , 0.1 ) , { w i 3 } i = 1 n iid 0.4 × Bernoulli ( 0.5 ) , and set w i = ( 1 , w i 1 , w i 2 , w i 3 ) .
All hypothesis testings are conducted at a significant level of 0.05. To calculate the empirical levels of the three tests, we first generate { Y i = y i } i = 1 n ind GP ( I ) ( w i β , α ) under H 0 . Repeating this process for L (=10,000) times, we obtained Y obs ( l ) = { y 1 ( l ) , , y n ( l ) } l = 1 L . Since our MM algorithm (14) & (16) is designed for α < 1 , we apply two-stage method to obtain the MLEs of θ for the GP ( I ) regression model. In the first stage, we calculate the MLEs { β ^ , α ^ } via the MM algorithm (14) & (16) with the generated { y i } i = 1 n and corresponding covariate vectors { w i } i = 1 n . If the estimated α ^ < 1 , implying that the dataset is under-dispersed, we shall keep the estimation result and will not go to the second stage. If the estimated α ^ = 1 , implying that the dataset may be equal- or over-dispersion, we shall go to the next stage; that is recalculating the MLEs { β ^ , α ^ } through the ‘vglm’ function by choosing family as ‘genpoisson1’ in VGAM R package because this function can only calculate the MLEs of the parameter when α 1 . Let { r j } j = 1 3 denote the number of rejecting the null hypothesis H 0 by the likelihood ratio, Wald and score statistics, respectively. Hence, the actual significance level can be estimated by r j / L under H 0 . Similarly, we generate { Y i = y i } i = 1 n ind GP ( I ) ( w i β , α ) under H 1 . Repeating this process for L (=10,000) times, we obtained Y obs ( l ) = { y 1 ( l ) , , y n ( l ) } l = 1 L . The empirical power can be estimated similarly to the empirical level. All results are reported in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8.
Table 3 shows that the significant levels in the three statistics are around 0.05 for different sample sizes. Table 4 shows that the Wald statistic outperforms the likelihood ratio statistic, and the likelihood ratio statistic outperforms the score statistic. At the same time, the differences in the empirical powers among the three tests are very small. So, we can use the likelihood ratio, Wald, and score statistics for the regression hypothesis testing for various values of α when q = 2 . The differences in performance among the three statistics are presented in Figure 1 and Figure 2.
Table 5 shows that the significant level in the likelihood ratio and the score statistics are around 0.05 for different sample sizes, while the significant levels in Wald statistic are around 0.07 for different α when n = 50 and quickly decrease to 0.05 with the growth of sample size. Table 6 shows that the Wald statistic outperforms the likelihood ratio statistic, and the likelihood ratio statistic outperforms the score statistic. Unlike the differences in the empirical power among the three tests are small in Case (A2), Table 5 shows that the differences are more considerable in Case (B2). So, we can use the likelihood ratio and score statistics for the regression hypothesis testing for various values of α and different samples size when q = 4 . Furthermore, we can use the Wald statistic when the sample size is more than 100. The differences in performance among the three statistics are presented in Figure 3 and Figure 4.
According to Table 7 and Table 8, we can see that the Wald statistic outperforms the other two statistics in Cases (A3)–(B3), and the likelihood ratio statistic outperforms the score statistic. Figure 5 and Figure 6 show a significant difference in empirical power among the three statistics. Furthermore, we can see that the empirical significant levels for the likelihood ratio and score statistics are satisfactorily controlled. In contrast, the significant level for the Wald statistic is over 0.08 when n = 50 and gradually decreases to 0.05 with the growth of the sample size. So, we suggest using the likelihood ratio statistic for the dispersion hypothesis testing when the sample size is less than 200; and using the Wald statistic when the sample size is more than 200.

4.3. Comparisons of the G P ( I ) Regression Model with the Conway–Maxwell–Poisson Regression Model

To compare the performance of the goodness-of-fit tests and computational complexity in the GP ( I ) regression model (3) and the Conway–Maxwell–Poisson (CMP) regression model, we consider using both models to fit a dataset, which is generated from one of the two models. A discrete r.v. Y is said to follow the CMP distribution with parameters λ > 0 and ν > 0 , denoted by Y CMP ( λ , ν ) , if its pmf is [16]:
Pr ( Y = y ) = λ y ( y ! ) ν Z ( λ , ν ) , y = 0 , 1 , 2 , , ,
where Z ( λ , ν ) = s = 0 λ s / ( s ! ) ν is a normalizing constant and ν is the dispersion parameter. The CMP ( λ , ν ) distribution reduces to the Poisson ( λ ) when ν = 1 , and it has the twin properties of over-dispersion when ν < 1 and under-dispersion when ν > 1 . The CMP regression model [3,17] is:
{ Y i } i = 1 n ind CMP ( λ i , ν ) and log ( λ i ) = w i β , i = 1 , , n .
The sample size is set to be n = 1000 , q = 4 , β = ( 1 , 0.5 , 1 , 0.5 ) , w i = ( 1 , w i 1 , w i 2 , w i 3 ) , { w i 1 } i = 1 n iid N ( 0.1 , 0.5 ) , { w i 2 } i = 1 n iid U ( 0 , 1 ) , { w i 3 } i = 1 n iid Bernoulli ( 0.5 ) and other parameter configurations are set as follows:
(A4)
For a fixed β , set α = 0.9 and generate X i = x i ind GP ( I ) ( μ i , α ) with μ i = exp ( w i β ) for i = 1 , , n .
(B4)
For a fixed β , set ν = 1.2 and generate X i = x i ind CMP ( λ i , ν ) with λ i = exp ( w i β ) for i = 1 , , n .
To assess the performance of the two models, we use the following three criteria: The average Akaike information criterion (AIC), the average Bayesian information criterion (BIC) and the average Pearson chi-squared statistic χ n q 1 2 [18]:
χ n q 1 2 = i = 1 n ( x i μ ^ i ) 2 σ ^ i 2 ,
where μ ^ i and σ ^ i 2 are the estimated mean and variance of X i and ( n q 1 ) = 995 is the degree of the Pearson chi-squared statistic because we use the MLEs { β ^ , α ^ } or { β ^ , ν ^ } to calculate { μ ^ i , σ ^ i 2 } i = 1 n . To show the differences in performance in the two models with the same data set, we first generate a data set from (A4) and estimate parameters with the GP ( I ) and CMP regression models 1000 times. Specifically, we can calculate the MLEs of parameters for the GP ( I ) regression model with the MM Algorithm (14) & (16). The MLEs of parameters of the CMP regression model can be calculated directly through the built-in ‘glm.cmp’ function in the COMPoissonReg R package. Next, we generate another data set from (B4) and estimate these parameters with the two models. By averaging the obtained results, the log-likelihood, AIC, BIC, χ n p 2 and the time cost of the system when the algorithm converged (denoted by Sys. Time) are reported in Table 9.
According to Table 9, in Case (A4), we can see that the log-likelihood of the GP ( I ) regression model is larger than that of the CMP regression model, and the AIC and BIC of the GP ( I ) regression model are smaller than that of the CMP regression model. However, the values of the log-likelihood, AIC and BIC show an inverse numerical relationship between GP ( I ) and CMP regression models in Case (B4). So, the GP ( I ) /CMP regression model performs better log-likelihood, AIC, and BIC when the data is generated from GP ( I ) /CMP. For the Pearson chi-squared statistic, the GP ( I ) regression model outperforms the CMP regression model in Case (A4) because the value of GP ( I ) is closer to the degree of the Pearson chi-squared statistic, 995. In Case (B4), the χ n p 1 2 of the GP ( I ) regression model is greater than 995 by around 5, and the χ n p 1 2 of the CMP regression model is less than 995 by around 5, implying they have similar performances. For the cost of time, our proposed GP ( I ) regression model converges faster than the CMP model in simulation, in which the time cost of the GP ( I ) regression model is nearly half of the CMP regression model.

5. Births in Last Five Years for Women in Bangladesh

The dataset is obtained from the Bangladesh demographic and health surveys (DHS) program (https://www.dhsprogram.com/data, accessed on 28 January 2022), recording several variables, e.g., Age, Education (educational level), Religion and Division, from 9067 women who are aged between 30 and 35. Our goal is to understand better the relationship between Births (births in the last five years) and its relevant explanatory variables. In this section, we construct a GP ( I ) regression model to link the mean of Births with the values of Age, Education, Religion and Division and the mean regression model is presented as follows:
Births i ind GP ( I ) ( μ i , α ) , i = 1 , , n , and log ( μ i ) = β 0 + Age i × β 1 + Primary i × β 2 + Secondary i × β 3 + Higher i × β 4 + Islam i × β 5 + Hinduism i × β 6 + Chittagong i × β 7 + Dhaka i × β 8 + Khulna i × β 9 + Mymensingh i × β 10 + Rajshahi i × β 11 + + Rangpur i × β 12 + Sylhet i × β 13 .
Meanwhile, for comparisons, we also use the CMP regression model to fit the Bangladesh DHS data
Births i ind C M P ( λ i , ν ) , i = 1 , , n , a n d log ( λ i ) = β 0 + Age i × β 1 + Primary i × β 2 + Secondary i × β 3 + Higher i × β 4 + Islam i × β 5 + Hinduism i × β 6 + Chittagong i × β 7 + Dhaka i × β 8 + + Khulna i × β 9 + Mymensingh i × β 10 + Rajshahi i × β 11 + Rangpur i × β 12 + Sylhet i × β 13 .
The MLEs of parameters for the GP ( I ) regression model in (21) can be calculated through the proposed MM algorithm (14) and (16) and the MLEs of parameters for the CMP regression model in (22) can be calculated through the built-in ‘glm.cmp’ function in the COMPoissonReg R package. For a fixed j ( j = 1 , , 13 ), the Std of β ^ j calculated by the Wald statistic for testing H 0 : β j = 0 is e j I 1 ( θ ^ ) e j , where e j denotes the 15-dimensional vector with 1 for the ( j + 1 ) -th element and 0’s elsewhere and I ( θ ^ ) is the Fisher information matrix in Appendix B. Thus, the z-values (i.e., MLE/Std) and p-values can be calculated by the MLEs and their Stds and the estimation results of the GP ( I ) and the CMP regression models are presented in Table 10.
Table 10 indicates that the Age coefficient is 0.147 implying that the Age affects the number of births in the past five years negatively; that is, the willingness to give birth decreases as the growth of ages. The coefficients of Education shows that women with Higher education levels have more births than those with Primary and Secondary education levels. For the religious factor, we realize that there is no significant difference between women, whether Islam, Hinduism, or Christianity. Finally, we can see that the number of births varies widely depending on the Division where they live. More specifically, women who live in Chittagong, Mymensingh and Sylhet are willing to birth more kids, while those who live in Dhaka, Khulna, Rajshahi and Rangpur choose fewer births.
Table 10 shows that there exists minor difference of the coefficients between the GP ( I ) and CMP regression models. The coefficients of Education shows that the Primary and Secondary, respectively, fails to pass the null hypotheses H 0 : β 2 = 0 and H 0 : β 3 = 0 in the CMP regression model under the significant level 5% because their corresponding p-values are both larger than 0.05. However, the two explanatory factors are significant in the GP ( I ) regression model under the above conditions. It deserves to note that the GP ( I ) regression links the mean with the covariate vector directly, so the model is of statistical meaning. However, the CMP regression lacks such statistical meanings because the regression model only constructs a connection between the parameter λ with the subject’s personalities.
Furthermore, to have a better understanding of the advantages of the proposed MM algorithm (14) and (16), we apply the existing ‘vglm’ function in VGAM R package to calculate the MLEs { β ^ , α ^ } of the GP ( I ) regression model in (3). Further, we choose two functions ‘genpoisson0’ and ‘genpoisson’ in ‘vglm’ to calculate the MLEs of the parameters, in which the ‘genpoisson0’ function restricts α 1 while the ‘genpoisson’ function allows α > max ( 1 / 2 , 1 μ / m ) . The criteria for the goodness-of-fit, like AIC, BIC and the Pearson chi-squared statistic, can be calculated by the obtained MLEs, the number of parameters, sample size, and log-likelihoods. The results are presented in Table 11.
Table 11 shows that the GP ( I ) regression model estimated by the proposed MM algorithm and the CMP regression model share similar performance of the goodness-of-fit statistics, like AIC, BIC and Person Chi-square statistic, implying that both models fit the data set well. However, our MM algorithm converges to the { β ^ , α ^ } as nearly five times faster than the ‘glm.cmp’ function for calculating the CMP regression model. We can also see that the log-likelihood, AIC, BIC and χ n p 1 2 obtained through genpoisson0 and genpoisson functions perform much worse than GP ( I ) and CMP regression models, even though they have a relatively less time for computation.
To test the dispersion, we use the likelihood ratio, Wald and score statistics, which have been proved efficiently for large sample sizes in Cases (A3)–(B3) in Section 4.2. The results in Table 12 show that the p-values of the three tests are zeros, implying that the null hypothesis H 0 : α = 1 should be rejected.

6. Discussion

In the present paper, given { β ( t ) , α } , to avoid directly calculating β ( t + 1 ) in the maximization of the original log-likelihood function ( β , α ) , we successfully constructed a surrogate function Q 1 ( β | β ( t ) , α ) , which is equivalent to the log-likelihood function in a weighted Poisson regression, so that we can compute β * ( t + 1 ) directly by using the VGAM R package. By projecting β * ( t + 1 ) on the convex set C ( t ) , we calculated β ( t + 1 ) as shown in (14). Besides, given { β , α ( t ) } , we obtained an explicit expression for α ( t + 1 ) by maximizing a surrogate function Q 2 ( α | β , α ( t ) ) . The simulation and real data analysis results showed that the proposed MM algorithms could stably obtain the MLEs of parameters for the GP ( I ) distribution without/with covariates for various parameter configurations, while the built-in ‘genpoisson1’ function in the VGAM R package may converge to a wrong estimate of parameters. Besides, the results of the comparison between the proposed model and the existing CMP regression model reflected that the two models possess similar performance from the aspect of the goodness-of-fit. However, the proposed model outperforms the CMP regression model regarding computational efficiency and statistical meanings.

Author Contributions

Conceptualization, X.-J.L. and G.-L.T.; Methodology, X.-J.L. and G.-L.T.; Formal analysis, X.-J.L.; Investigation, M.Z., G.T.S.H. and S.L.; Resources, M.Z., G.T.S.H. and S.L.; Data curation, M.Z., G.T.S.H. and S.L.; Writing—original draft, X.-J.L.; Writing—review & editing, X.-J.L. and G.-L.T.; Supervision, G.-L.T.; Funding acquisition, G.-L.T. and G.T.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China: 12171225; Research Grants Council of Hong Kong: UGC/FDS14/P05/20; Big Data Intelligence Centre in The Hang Seng University of Hong Kong.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://www.dhsprogram.com/data].

Acknowledgments

Guo-Liang TIAN’s research was partially supported by National Natural Science Foundation of China (No. 12171225). G.T.S Ho would like to thank the Research Grants Council of Hong Kong for supporting this research under the Grant UGC/FDS14/P05/20. Furthermore, this research is also supported partially by the Big Data Intelligence Centre in The Hang Seng University of Hong Kong.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Discrete Version of Jensen’s Inequality

Let f ( z ) is a concave function defined on a convex set C , i.e., f ( z ) 0 for all z C . The discrete version of Jensen’s inequality is
f k = 1 K q k z k k = 1 K q k f ( z k ) ,
which is true for any probability weights { q k } k = 1 K satisfying: q k > 0 and k = 1 K q k = 1 . Especially, in (A1) set K = 2 , f ( · ) = log ( · ) , and suppose that u 1 ( ϕ ) > 0 and u 2 ( ϕ ) > 0 , then we obtain
log u 1 ( ϕ ) + u 2 ( ϕ ) v ( ϕ ( t ) ) log u 1 ( ϕ ) v ( ϕ ( t ) ) + [ 1 v ( ϕ ( t ) ) ] log u 2 ( ϕ ) 1 v ( ϕ ( t ) ) = v ( ϕ ( t ) ) log u 1 ( ϕ ) + [ 1 v ( ϕ ( t ) ) ] log u 2 ( ϕ ) + c 0 ( t ) ,
where the equality holds iff ϕ = ϕ ( t ) , c 0 ( t ) is a constant free from ϕ , and
v ( ϕ ( t ) ) u 1 ( ϕ ( t ) ) u 1 ( ϕ ( t ) ) + u 2 ( ϕ ( t ) ) .

Appendix B. The Gradient Vector and Fisher Information Matrix

The gradient vector of ( β , α ) with respect to β and α are given by
( θ ) β = i = 1 n 1 + μ i ( y i 1 ) μ i + ( α 1 ) y i μ i α w i , ( θ ) α = i = 1 n ( y i 1 ) y i μ i + ( α 1 ) y i y i α + μ i y i α 2 .
The Hessian matrix is
H ( θ ) = 2 ( θ ) β β 2 ( θ ) β α * 2 ( θ ) α 2 ,
where
2 ( θ ) β β = i = 1 n ( α 1 ) μ i ( y i 1 ) y i [ μ i + ( α 1 ) y i ] 2 μ i α w i w i , 2 ( θ ) α 2 = i = 1 n ( y i 1 ) y i 2 [ μ i + ( α 1 ) y i ] 2 + y i α 2 2 ( μ i y i ) α 3 , 2 ( θ ) β α = i = 1 n ( y i 1 ) y i μ i [ μ i + ( α 1 ) y i ] 2 + μ i α 2 w i .
The Fisher information matrix is given by
I ( θ ) = E H ( θ ) ,
where
E 2 ( θ ) β β = i = 1 n μ i 2 + 2 α ( α 1 ) μ i [ μ i + 2 ( α 1 ) ] α 2 w i w i , E 2 ( θ ) α 2 = i = 1 n 2 μ i α 2 [ μ i + 2 ( α 1 ) ] , E 2 ( θ ) β α = i = 1 n 2 ( α 1 ) μ i α 2 [ μ i + 2 ( α 1 ) ] w i .

References

  1. Saha, K.K. Analysis of one-way layout of count data in the presence of over or under dispersion. J. Stat. Plan. Inference 2008, 138, 2067–2081. [Google Scholar] [CrossRef]
  2. Guikema, S.D.; Goffelt, J.P. A flexible count data regression model for risk analysis. Risk Anal. Int. J. 2008, 28, 213–223. [Google Scholar] [CrossRef]
  3. Sellers, K.F.; Borle, S.; Shmueli, G. The COM-Poisson model for count data: A survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 2012, 28, 104–116. [Google Scholar] [CrossRef]
  4. Lynch, H.J.; Thorson, J.T.; Shelton, A.O. Dealing with under- and over-dispersed count data in life history, spatial, and community ecology. Ecology 2014, 95, 3173–3180. [Google Scholar] [CrossRef]
  5. Consul, P.C.; Jain, G.C. A generalization of the Poisson distribution. Technometrics 1973, 15, 791–799. [Google Scholar] [CrossRef]
  6. Consul, P.C.; Famoye, F. The truncated generalized Poisson distribution and its estimation. Commun. Stat.–Theory Methods 1989, 18, 3635–3648. [Google Scholar] [CrossRef]
  7. Consul, P.C.; Famoye, F. Generalized Poisson regression model. Commun. Stat.-Theory Methods 1992, 21, 89–109. [Google Scholar] [CrossRef]
  8. Angers, J.F.; Biswas, A. A Bayesian analysis of zero-inflated generalized Poisson model. Comput. Stat. Data Anal. 2003, 42, 37–46. [Google Scholar] [CrossRef]
  9. Joe, H.; Zhu, R. Generalized Poisson distribution: The property of mixture of Poisson and comparison with negative binomial distribution. Biom. J. 2005, 47, 219–229. [Google Scholar] [CrossRef] [PubMed]
  10. Yang, Z.; Hardin, J.W.; Addy, C.L.; Vuong, Q.H. Testing approaches for over-dispersion in Poisson regression versus the generalized Poisson model. Biom. J. 2007, 49, 565–584. [Google Scholar] [CrossRef] [PubMed]
  11. Yang, Z.; Hardin, J.W.; Addy, C.L. A score test for over-dispersion in Poisson regression based on the generalized Poisson-2 model. J. Stat. Plan. Inference 2009, 139, 1514–1521. [Google Scholar] [CrossRef]
  12. Sellers, K.F.; Morris, D.S. Underdispersion models: Models that are “under the radar”. Commun. Stat.–Theory Methods 2017, 46, 12075–12086. [Google Scholar] [CrossRef]
  13. Toledo, D.; Umetsu, C.A.; Camargo, A.F.M.; de Lara, I.A.R. Flexible models for non-equidispersed count data: Comparative performance of parametric models to deal with under-dispersion. AStA Adv. Stat. Anal. 2022, 106, 473–497. [Google Scholar] [CrossRef]
  14. Consul, P.C.; Shoukri, M.M. The generalized Poisson distribution when the sample mean is larger than the sample variance. Commun. Stat.–Theory Methods 1985, 14, 667–681. [Google Scholar] [CrossRef]
  15. Seber, G.A.F.; Salehi, M.M. Adaptive Sampling Designs: Inference for Sparse and Clustered Populations, Chapter 5: Inverse sampling methods; Springer: New York, NY, USA, 2012. [Google Scholar]
  16. Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2005, 54, 127–142. [Google Scholar] [CrossRef]
  17. Sellers, K.F.; Shmueli, G. A flexible regression model for count data. Ann. Appl. Stat. 2010, 4, 943–961. [Google Scholar] [CrossRef] [Green Version]
  18. Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Figure 1. The empirical levels of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = 0 in Case (A2) for different α . (a) The empirical level with H 0 : β = ( 1 , 0 ) for α = 0.75 ; (b) The empirical level with H 0 : β = ( 1 , 0 ) for α = 0.85 ; (c) The empirical level with H 0 : β = ( 1 , 0 ) for α = 0.95 .
Figure 1. The empirical levels of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = 0 in Case (A2) for different α . (a) The empirical level with H 0 : β = ( 1 , 0 ) for α = 0.75 ; (b) The empirical level with H 0 : β = ( 1 , 0 ) for α = 0.85 ; (c) The empirical level with H 0 : β = ( 1 , 0 ) for α = 0.95 .
Mathematics 11 01478 g001
Figure 2. The empirical powers of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = 0 in Case (A2) for different α ’s. (a) The empirical power with H 1 : β = ( 1 , 0.5 ) for α = 0.75 ; (b) The empirical power with H 1 : β = ( 1 , 0.5 ) for α = 0.85 ; (c) The empirical power with H 1 : β = ( 1 , 0.5 ) for α = 0.95 .
Figure 2. The empirical powers of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = 0 in Case (A2) for different α ’s. (a) The empirical power with H 1 : β = ( 1 , 0.5 ) for α = 0.75 ; (b) The empirical power with H 1 : β = ( 1 , 0.5 ) for α = 0.85 ; (c) The empirical power with H 1 : β = ( 1 , 0.5 ) for α = 0.95 .
Mathematics 11 01478 g002
Figure 3. The empirical levels of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = β 2 = β 3 = 0 in Case (B2) for different α . (a) The empirical level with H 1 : β = ( 1 , 0 , 0 , 0 ) for α = 0.75 ; (b) The empirical level with H 1 : β = ( 1 , 0 , 0 , 0 ) for α = 0.85 ; (c) The empirical level with H 1 : β = ( 1 , 0 , 0 , 0 ) for α = 0.95 .
Figure 3. The empirical levels of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = β 2 = β 3 = 0 in Case (B2) for different α . (a) The empirical level with H 1 : β = ( 1 , 0 , 0 , 0 ) for α = 0.75 ; (b) The empirical level with H 1 : β = ( 1 , 0 , 0 , 0 ) for α = 0.85 ; (c) The empirical level with H 1 : β = ( 1 , 0 , 0 , 0 ) for α = 0.95 .
Mathematics 11 01478 g003
Figure 4. The empirical powers of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = β 2 = β 3 = 0 in Case (B2) for different α . (a) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.75 ; (b) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.85 ; (c) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.95 .
Figure 4. The empirical powers of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = β 2 = β 3 = 0 in Case (B2) for different α . (a) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.75 ; (b) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.85 ; (c) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.95 .
Mathematics 11 01478 g004
Figure 5. The empirical powers/level of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = 0 in Case (A3) for different α . (a) The empirical power with H 1 : β = ( 1 , 1 ) for α = 0.9 ; (b) The empirical power with H 1 : β = ( 1 , 1 ) for α = 0.95 ; (c) The empirical significant level with H 1 : β = ( 1 , 1 ) for α = 1 .
Figure 5. The empirical powers/level of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = 0 in Case (A3) for different α . (a) The empirical power with H 1 : β = ( 1 , 1 ) for α = 0.9 ; (b) The empirical power with H 1 : β = ( 1 , 1 ) for α = 0.95 ; (c) The empirical significant level with H 1 : β = ( 1 , 1 ) for α = 1 .
Mathematics 11 01478 g005
Figure 6. The empirical powers/level of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = β 2 = β 3 = 0 in Case (B3) for different α . (a) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.9 ; (b) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.95 ; (c) The empirical significant level with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 1 .
Figure 6. The empirical powers/level of three test statistics ( T L , T W , T S ) for testing H 0 : β 1 = β 2 = β 3 = 0 in Case (B3) for different α . (a) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.9 ; (b) The empirical power with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 0.95 ; (c) The empirical significant level with H 1 : β = ( 1 , 1 , 1 , 0.5 ) for α = 1 .
Mathematics 11 01478 g006
Table 1. Parameter estimates based on 10,000 replications for Case (A1).
Table 1. Parameter estimates based on 10,000 replications for Case (A1).
nPara α = 0.6 α = 0.8 α = 0.95
BiasMSEBiasMSEBiasMSE
100 β 0 −0.00170.0371−0.00340.0494−0.00440.0582
β 1 −0.00110.0703−0.00010.0966−0.00030.1186
α −0.00850.0394−0.01000.0535−0.01870.0556
200 β 0 −0.00070.0260−0.00070.0350−0.00200.0415
β 1 0.00030.0543−0.00010.0764−0.00010.0916
α −0.00370.0280−0.00510.0373−0.00840.0417
400 β 0 −0.00040.0183−0.00030.0242−0.00090.0292
β 1 0.00040.03570.00000.0484−0.00060.0589
α −0.00140.0202−0.00280.0267−0.00420.0314
Table 2. Parameter estimates based on 10,000 replications for Case (B1).
Table 2. Parameter estimates based on 10,000 replications for Case (B1).
nPara α = 0.6 α = 0.8 α = 0.95
BiasMSEBiasMSEBiasMSE
100 β 0 0.00110.0766−0.00430.1050−0.00750.1248
β 1 0.00340.06120.00010.0815−0.00100.0990
β 2 −0.00630.11720.00260.15930.00570.1917
β 3 0.00160.06220.00070.08480.00030.1021
α −0.01220.0421−0.01890.0542−0.02640.0580
200 β 0 −0.00010.0547−0.00180.0743−0.00180.0897
β 1 0.00040.04740.00020.06550.00000.0781
β 2 −0.00180.07670.00140.10390.00010.1256
β 3 0.00030.0436−0.00190.0606−0.00050.0722
α −0.00600.0291−0.01000.0377−0.01320.0429
400 β 0 −0.00010.0363−0.00060.0502−0.00160.0610
β 1 0.00150.02970.00040.04100.00010.0487
β 2 −0.00040.05110.00000.07070.00040.0861
β 3 0.00020.02970.00040.0417−0.00010.0498
α −0.00290.0208−0.00480.0269−0.00630.0313
Table 3. The empirical levels of statistics ( T L , T W , T S ) for Case (A2).
Table 3. The empirical levels of statistics ( T L , T W , T S ) for Case (A2).
n α = 0.75 α = 0.85 α = 0.95
T L T W T S T L T W T S T L T W T S
500.05300.05990.04850.05210.05910.04860.05300.06020.0482
1000.05120.05480.04870.05020.05310.04850.04900.05220.0468
1500.04900.05080.04760.05250.05380.05100.05220.05440.0505
2000.05140.05260.05050.05450.05610.05380.05150.05250.0505
2500.04950.05040.04880.04940.05010.04810.05280.05380.0517
3000.05040.05150.05020.04510.04680.04510.04990.05090.0485
3500.05450.05580.05400.05100.05140.05040.05030.05070.0500
4000.05420.05470.05320.04800.04910.04770.05040.05170.0498
Table 4. The empirical powers of statistics ( T L , T W , T S ) for Case (A2).
Table 4. The empirical powers of statistics ( T L , T W , T S ) for Case (A2).
n α = 0.75 α = 0.85 α = 0.95
T L T W T S T L T W T S T L T W T S
500.20420.22520.19020.15800.17680.14890.13640.15050.1273
1000.54570.56070.53330.42360.43920.41600.33870.35260.3316
1500.67500.68440.66690.54110.55110.53400.44370.45490.4368
2000.77330.78050.77020.64130.65190.63760.53660.54380.5326
2500.89950.90300.89760.79860.80220.79730.67890.68530.6762
3000.94390.94590.94200.85490.85740.85230.75910.76330.7564
3500.97120.97190.97080.91290.91470.91190.83490.83710.8329
4000.99030.99070.99030.95570.95710.95450.89750.89870.8962
Table 5. The empirical levels of statistics ( T L , T W , T S ) for Case (B2).
Table 5. The empirical levels of statistics ( T L , T W , T S ) for Case (B2).
n α = 0.75 α = 0.85 α = 0.95
T L T W T S T L T W T S T L T W T S
500.05880.07940.04390.05940.07610.04490.05390.06690.0405
1000.05460.06480.04750.04970.05680.04350.05240.05790.0459
1500.05230.05880.04860.05050.05630.04700.04920.05220.0449
2000.04700.05070.04210.05410.05640.05060.05350.05730.0501
2500.05100.05450.04850.05280.05470.05020.05310.05560.0506
3000.05250.05500.05060.05010.05260.04860.04980.05150.0479
3500.04900.05220.04760.05180.05350.05020.05270.05440.0510
4000.05100.05270.04870.05600.05710.05400.04580.04740.0453
Table 6. The empirical powers of statistics ( T L , T W , T S ) for Case (B2).
Table 6. The empirical powers of statistics ( T L , T W , T S ) for Case (B2).
n α = 0.75 α = 0.85 α = 0.95
T L T W T S T L T W T S T L T W T S
500.16290.21730.11930.13100.17000.09380.11240.14420.0850
1000.39980.44090.35230.29180.32320.25760.22890.25020.2027
1500.53200.55980.50420.39570.41860.37090.30890.32800.2912
2000.76700.78240.74090.61480.62980.58790.48930.49880.4669
2500.82410.83790.80960.67470.68670.66050.54870.55880.5337
3000.88140.88870.87590.75510.76540.74420.62920.63810.6193
3500.96570.96750.96210.88950.89340.88230.78970.79420.7792
4000.97860.98000.97750.92150.92490.91990.83670.84090.8310
Table 7. The empirical levels/powers of statistics ( T L , T W , T S ) for Case (A3).
Table 7. The empirical levels/powers of statistics ( T L , T W , T S ) for Case (A3).
n α = 0.9 α = 0.95 α = 1
T L T W T S T L T W T S T L T W T S
500.16520.27640.07370.08810.15420.04450.06030.08840.0494
1000.25530.36410.16160.09990.15480.06080.05550.06720.0479
1500.35620.45630.26420.12030.17250.08340.05250.06140.0475
2000.47380.56040.38140.14260.19290.09930.05330.06010.0498
2500.56520.64460.48700.16890.21520.12860.05230.05870.0505
3000.65600.72070.58760.19010.24230.15000.05160.05430.0500
3500.72720.77790.67100.19960.25030.16230.05460.05830.0520
4000.78910.83320.74180.22860.27650.18860.05160.05730.0496
Table 8. The empirical levels/powers of statistics ( T L , T W , T S ) for Case (B3).
Table 8. The empirical levels/powers of statistics ( T L , T W , T S ) for Case (B3).
n α = 0.9 α = 0.95 α = 1
T L T W T S T L T W T S T L T W T S
500.22470.38320.08780.11840.21940.04860.06710.12880.0417
1000.31240.44170.19160.12380.20250.07310.05810.08460.0464
1500.41190.53490.29570.14790.21850.09680.05530.07160.0487
2000.51550.62110.41600.16720.23490.11650.05430.06790.0485
2500.61170.69910.52100.18570.25100.13430.05410.06480.0492
3000.69620.77400.61670.20400.26780.15950.05270.06020.0488
3500.76680.82280.70220.22690.29180.17720.05320.06020.0502
4000.81520.86060.76140.25580.31970.20760.04640.05450.0448
Table 9. Model comparisons based on 1000 replications for Cases (A4) & (B4).
Table 9. Model comparisons based on 1000 replications for Cases (A4) & (B4).
CaseModelLog-LikelihoodAICBIC χ n p 1 2 Sys. Time
(A4) GP ( I ) −2009.894029.774054.31997.070.7783 s
CMP−2011.144032.284056.81999.751.2971 s
(B4) GP ( I ) −2157.294324.584349.12999.961.0038 s
CMP−2160.114330.214354.75990.261.9550 s
Sys. Time represents the averaged time cost of the system when the algorithm converged for each repetition.
Table 10. MLEs and CIs of parameters for the GP ( I ) regression model in (21) and CMP regression model.
Table 10. MLEs and CIs of parameters for the GP ( I ) regression model in (21) and CMP regression model.
GP ( I ) CMP
ParameterMLEStd z -Value p -ValueMLEStd z -Value p -Value
Intercept4.4550.368512.09<0.00014.8430.47001.31<0.0001
α 0.9130.0052−16.71<0.0001
ν 1.8280.066627.44<0.0001
Age−0.1470.0097−15.05<0.0001−0.1570.0121−12.98<0.0001
Education
   Primary−0.0830.0401−2.060.0391−0.0320.0508−0.630.5271
   Secondary−0.0850.0405−2.100.03610.0020.05120.040.9685
   Higher0.2230.05344.18<0.00010.4030.06716.01<0.0001
   No education0.000 0.000
Religion
   Islam−0.3420.1797−1.900.0573−0.2470.2387−1.040.3002
   Hinduism−0.6380.1868−3.420.0006−0.6840.2478−2.760.0057
   Christianity0.000 0.000
Division
   Chittagong0.0640.05511.160.24610.1030.06851.500.1331
   Dhaka−0.0600.0572−1.050.2946−0.0670.0711−0.940.3492
   Khulna−0.3200.0640−5.00<0.0001−0.3510.0794−4.42<0.0001
   Mymensingh0.0520.05840.890.37590.0720.07270.990.3210
   Rajshahi−0.3190.0630−5.06<0.0001−0.3590.0781−4.59<0.0001
   Rangpur−0.0930.0597−1.560.1198−0.1160.0745−1.550.1208
   Sylhet0.4330.05408.03<0.00010.5760.06828.45<0.0001
   Barisal0.000 0.000
Table 11. Comparisons of goodness-of-fit among the GP ( I ) regression model, CMP regression model, log-lambda based GP regression model with constraint λ 0 and the log-lambda based GP regression model without constraint on λ .
Table 11. Comparisons of goodness-of-fit among the GP ( I ) regression model, CMP regression model, log-lambda based GP regression model with constraint λ 0 and the log-lambda based GP regression model without constraint on λ .
ModelLog-LikelihoodAICBIC χ n p 1 2 Sys. Time
GP ( I ) −7624.6315,279.2615,385.948974.1711.0384 s
CMP−7623.6615,277.3215,384.019017.3352.4071 s
genpoisson0−7645.9515,321.9015,428.587675.933.0545 s
genpoisson−7706.9315,443.8715,550.557526.343.7786 s
genpoisson0 means using the function ‘genpoisson0’ in ‘vglm’; genpoisson means using the function ‘genpoisson’ in ‘vglm’; Sys. Time represents the time cost of the system when the algorithm converged.
Table 12. Dispersion test for testing H 0 : α = 1 .
Table 12. Dispersion test for testing H 0 : α = 1 .
TestsValuep-Value
Likelihood ratio164.61<0.0001
Wald279.27<0.0001
Score128.22<0.0001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.-J.; Tian, G.-L.; Zhang, M.; Ho, G.T.S.; Li, S. Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms. Mathematics 2023, 11, 1478. https://doi.org/10.3390/math11061478

AMA Style

Li X-J, Tian G-L, Zhang M, Ho GTS, Li S. Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms. Mathematics. 2023; 11(6):1478. https://doi.org/10.3390/math11061478

Chicago/Turabian Style

Li, Xun-Jian, Guo-Liang Tian, Mingqian Zhang, George To Sum Ho, and Shuang Li. 2023. "Modeling Under-Dispersed Count Data by the Generalized Poisson Distribution via Two New MM Algorithms" Mathematics 11, no. 6: 1478. https://doi.org/10.3390/math11061478

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop