A Family of Finite Mixture Distributions for Modelling Dispersion in Count Data
Abstract
:1. Introduction
2. Finite Mixture of Distributions
2.1. Finite Mixture of Distribution and Its Shifted Distribution
2.2. Finite Mixture of Negative Binomial Distributions
2.3. Weighted Negative Binomial Distribution
2.4. Conditions for Under-, Equi- and Over-Dispersion
2.5. Shapes of the Distribution
2.6. Log-Concavity, Strong Unimodality and Reliability Properties
3. Statistical Inferences
3.1. Parameter Estimation
3.2. Test for Equi-Dispersion
3.2.1. Rao’s Score Test [34]
3.2.2. Generalized Likelihood Ratio Test
3.3. Statistical Power Analysis of the Rao’s Score and Generalized Likelihood Ratio Tests
4. Modeling of Biological Count Data
- (1)
- (2)
5. Concluding Remarks
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Bardwell, G.E.; Crow, E.L. A two-parameter family of hyperPoisson distributions. J. Am. Stat. Assoc. 1964, 59, 133–141. [Google Scholar] [CrossRef]
- Efron, B. Double exponential families and their use in generalized linear regression. J. Am. Stat. Assoc. 1986, 81, 709–721. [Google Scholar] [CrossRef]
- Castillo, J.D.; Pérez-Casany, M. Weighted Poisson distributions for overdispersion and under-dispersion situations. Ann. Inst. Stat. Math. 1998, 50, 567–585. [Google Scholar] [CrossRef]
- Consul, P.C. Generalized Poisson Distributions: Properties and Applications; Marcel Dekker Inc.: New York, NY, USA; Basel, Switzerland, 1989. [Google Scholar]
- Conway, R.W.; Maxwell, W.L. A queueing model with state dependent service rates. J. Ind. Eng. 1962, 12, 132–136. [Google Scholar]
- Sellers, K.F.; Borle, S.; Shmueli, G. The COM-Poisson model for count data: A survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 2012, 28, 104–116. [Google Scholar] [CrossRef]
- Sellers, K.F.; Swift, A.W.; Weems, K.S. A flexible distribution class for count data. J. Stat. Distrib. Appl. 2017, 4, 1–21. [Google Scholar] [CrossRef]
- Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. Appl. Stat. 2005, 54, 127–142. [Google Scholar] [CrossRef]
- Nelson, D.L. Some Remarks on Generalizations of the Negative Binomial and Poisson Distributions. Technometrics 1975, 17, 135–136. [Google Scholar] [CrossRef]
- Scollnik, D.P.M. On the analysis of the truncated generalized Poisson distribution using a Bayesian method. ASTIN Bull. 1998, 28, 135–152. [Google Scholar] [CrossRef]
- Zhu, F. Modeling time series of counts with COM-Poisson INGARCH models. Math. Comput. Model. 2012, 56, 191–203. [Google Scholar] [CrossRef]
- Sellers, K.F.; Peng, S.J.; Arab, A. A flexible univariate autoregressive time-series model for dispersed count data. J. Time Ser. Anal. 2020, 41, 436–453. [Google Scholar] [CrossRef]
- Huang, A. Mean-parametrized Conway-Maxwell-Poisson regression models for dispersed counts. Stat. Model. 2017, 17, 359–380. [Google Scholar] [CrossRef]
- Sellers, K.F.; Morris, D.S. Underdispersion models: Models that are “under the radar”. Commun. Stat.-Theory Methods 2017, 46, 12075–12086. [Google Scholar] [CrossRef]
- Huang, A. On arbitrarily underdispersed discrete distributions. Am. Stat. 2022, 77, 29–34. [Google Scholar] [CrossRef]
- Sim, S.Z.; Ong, S.H. A generalized inverse trinomial distribution with application. Stat. Methodol. 2016, 33, 217–233. [Google Scholar] [CrossRef]
- Aoyama, K.; Shimizu, K.; Ong, S.H. A first–passage time random walk distribution with five transition probabilities: A generalization of the shifted inverse trinomial. Ann. Inst. Stat. Math. 2008, 60, 1–20. [Google Scholar] [CrossRef]
- Kemp, A.W. Convolutions involving binomial pseudo-variables. Sankya 1979, 41, 232–243. [Google Scholar]
- Borges, P.; Rodrigues, J.; Balakrishnan, N.; Bazan, J. A COM-Poisson type generalization of the binomial distribution and its properties and applications. Stat. Probab. Lett. 2014, 87, 158–166. [Google Scholar] [CrossRef]
- Imoto, T. A generalized Conway-Maxwell-Poisson distribution which includes the negative binomial distribution. Appl. Math. Comput. 2014, 247, 824–834. [Google Scholar] [CrossRef]
- Chakraborty, S.; Imoto, T. Extended Conway-Maxwell-Poisson distribution and its properties and applications. J. Stat. Distrib. Appl. 2016, 3, 1–19. [Google Scholar] [CrossRef]
- Chakraborty, S.; Ong, S.H. A COM-Poisson-type generalization of the negative binomial distribution. Commun. Stat.-Theory Methods 2016, 45, 4117–4135. [Google Scholar] [CrossRef]
- Imoto, T.; Ng, C.M.; Ong, S.H.; Chakraborty, S. A modified Conway-Maxwell-Poisson type binomial distribution and its applications. Commun. Stat.-Theory Methods 2017, 46, 12210–12225. [Google Scholar] [CrossRef]
- Zhang, H.; Tan, K.; Li, B. COM-negative binomial distribution: Modeling overdispersion and ultrahigh zero-inflated count data. Front. Math. China 2018, 13, 967–998. [Google Scholar] [CrossRef]
- Cahoy, D.; Di Nardo, E.; Polito, F. Flexible models for overdispersed and underdispersed count data. Stat. Pap. 2021, 62, 2969–2990. [Google Scholar] [CrossRef]
- Ong, S.H. The computer generation of bivariate binomial variables with given marginals and correlation. Commun. Statist.-Simul. Comput. 1992, 21, 285–299. [Google Scholar] [CrossRef]
- McLachlan, G.J.; Lee, S.X.; Rathnayake, S.I. Finite Mixture Models. Annu. Rev. Stat. Its Appl. 2019, 6, 355–378. [Google Scholar] [CrossRef]
- Rao, C.R. On discrete distributions arising out of methods of ascertainment. Sankhyā Indian J. Stat. Ser. A 1965, 27, 311–324. [Google Scholar]
- Rao, C.R. Weighted Distributions Arising Out of Methods of Ascertainment: What Population Does a Sample Represent? In A Celebration of Statistics; Atkinson, A.C., Fienberg, S.E., Eds.; Springer: New York, NY, USA, 1985. [Google Scholar]
- Keilson, J.; Geber, H. Some Results for Discrete Unimodality. J. Am. Stat. Assoc. 1971, 66, 386–389. [Google Scholar] [CrossRef]
- Gupta, P.L.; Gupta, R.C.; Ong, S.H.; Srivastava, H.M. A class of Hurwitz–Lerch Zeta distributions and their applications in reliability. Appl. Math. Comput. 2008, 196, 521–531. [Google Scholar] [CrossRef]
- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
- Sim, S.Z.; Ong, S.H. Parameter estimation for discrete distributions by generalized Hellinger type divergence based on probability generating function. Commun. Stat. Simul. Comput. 2010, 39, 305–314. [Google Scholar] [CrossRef]
- Rao, C.R. Score Test: Historical Review and Recent Developments. In Advances in Ranking and Selection, Multiple Comparisons, and Reliability; Balakrishnan, N., Nagaraja, H.N., Kannan, N., Eds.; Statistics for Industry and Technology; Birkhäuser: Boston, MA, USA, 2005. [Google Scholar] [CrossRef]
- Sasaki, M.S. Chromosomal biodosimetry by unfolding a mixed Poisson distribution: A generalized model. Int. J. Radiat. Biol. 2003, 79, 83–97. [Google Scholar] [CrossRef] [PubMed]
- Leroux, B.G.; Puterman, M.L. Maximum penalized likelihood estimation for independent and Markov dependent mixture models. Biometrics 1992, 48, 545–558. [Google Scholar] [CrossRef] [PubMed]
- Davison, A.C.; Hinkley, D.V. Bootstrap Methods and Their Application; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Is Increased | and p Are Varied | |||||
---|---|---|---|---|---|---|
Case (a) | Case (b) | Case (c) | Case (d) | Case (e) | Case (f) | |
1 | 10 | 50 | 10 | 10 | 10 | |
0.1 | 0.1 | 0.1 | 0.1 | 0.4 | 0.8 | |
p | 0.1 | 0.1 | 0.1 | 0.8 | 0.4 | 0.1 |
ID | 1.01 | 1.09 | 1.11 | 0.73 | 1.61 | 4.99 |
Equi-Dispersion | Over-Dispersion | Under-Dispersion | |||||
---|---|---|---|---|---|---|---|
10 | 10 | 40 | 25 | 3 | |||
0.1 | 0.1 | 0.1 | 0.1 | 0.1 | |||
0.35 | 0.1 | 0.1 | 0.8 | 0.8 | |||
Index of dispersion | 1 | 1.09 | 1.11 | 0.91 | 0.47 | ||
Effect size | 0 | 0.25 | 0.60 | 0.24 | 0.61 | ||
N | Method | ||||||
100 | 0.05 | score | 0.030 | 0.086 | 0.114 | 0.010 | 0.048 |
GLR | 0.036 | 0.077 | 0.105 | 0.072 | 0.911 | ||
0.1 | score | 0.056 | 0.148 | 0.160 | 0.030 | 0.362 | |
GLR | 0.082 | 0.133 | 0.175 | 0.144 | 0.960 | ||
500 | 0.05 | score | 0.035 | 0.280 | 0.391 | 0.059 | 1.000 |
GLR | 0.050 | 0.274 | 0.389 | 0.279 | 1.000 | ||
0.1 | score | 0.069 | 0.394 | 0.493 | 0.124 | 1.000 | |
GLR | 0.099 | 0.387 | 0.509 | 0.407 | 1.000 | ||
1000 | 0.05 | score | 0.032 | 0.494 | 0.637 | 0.152 | 1.000 |
GLR | 0.046 | 0.483 | 0.654 | 0.538 | 1.000 | ||
0.1 | score | 0.080 | 0.626 | 0.742 | 0.286 | 1.000 | |
GLR | 0.088 | 0.621 | 0.758 | 0.655 | 1.000 |
No. of Cells | Observed Frequency | Expected Frequency | ||||
---|---|---|---|---|---|---|
Dose 10 | GPD | GIT | COM– Poisson | NB-Shifted NB | ||
MLE | MLE | MLE | MLE | pgf-Estimator | ||
0 | 0 | 1.62 | 1.49 | 1.26 | 0.07 | 0.12 |
1 | 9 | 8.50 | 8.24 | 8.01 | 7.57 | 8.44 |
2 | 26 | 21.54 | 21.52 | 21.73 | 23.56 | 24.39 |
3 | 33 | 35.03 | 35.49 | 35.92 | 37.97 | 37.53 |
4 | 39 | 41.08 | 41.60 | 41.77 | 41.94 | 40.48 |
5 | 36 | 37.02 | 37.10 | 36.97 | 35.63 | 34.30 |
6 | 26 | 26.66 | 26.35 | 26.18 | 24.81 | 24.29 |
7 | 23 | 15.77 | 15.42 | 15.36 | 14.74 | 14.95 |
8 | 3 | 7.81 | 7.65 | 7.65 | 7.68 | 8.20 |
9 | 2 | 3.28 | 3.29 | 3.30 | 3.58 | 4.09 |
10 | 2 | 1.18 | 1.25 | 1.25 | 1.52 | 1.88 |
11 | 1 | 0.50 | 0.62 | 0.60 | 0.91 | 1.33 |
Total | 200 | |||||
10.67 | 10.60 | 10.54 | 9.85 | 9.86 | ||
p value | 0.30 | 0.30 | 0.31 | 0.28 | 0.28 | |
Parameter Estimates | = 4.82 | = 0.26 | = 1.22 | = 0.08 | = 0.15 | |
= −0.09 | = 0.13 | = 6.34 | = 0.99 | = 0.99 | ||
n = 10 | = 38.58 | = 19.75 | ||||
ID | 0.85 |
No. of Cells | Observed Frequency | Expected Frequency | |||||
---|---|---|---|---|---|---|---|
Dose 6 | Poisson | GPD | GIT | COM– Poisson | NB-Shifted NB | ||
MLE | MLE | MLE | MLE | MLE | pgf-Estimator | ||
0 | 19 | 23.77 | 22.87 | 19.18 | 21.91 | 19.08 | 18.95 |
1 | 56 | 50.62 | 50.50 | 55.43 | 50.88 | 56.17 | 56.93 |
2 | 60 | 53.92 | 54.82 | 59.54 | 55.70 | 56.33 | 55.94 |
3 | 31 | 38.28 | 39.01 | 35.04 | 39.28 | 36.73 | 36.22 |
4 | 18 | 20.38 | 20.46 | 17.28 | 20.27 | 18.74 | 18.61 |
5 | 11 | 8.68 | 8.44 | 7.82 | 8.21 | 8.14 | 8.24 |
6 | 5 | 4.35 | 3.91 | 5.71 | 3.74 | 4.81 | 5.12 |
Total | 200 | ||||||
4.60 | 4.77 | 1.88 | 4.61 | 2.17 | 2.01 | ||
p value | 0.47 | 0.31 | 0.76 | 0.33 | 0.54 | 0.57 | |
Parameter Estimates | = 2.13 | = 2.17 | = 0.34 | = 1.09 | = 0.19 | = 0.21 | |
= −0.02 | = 0.35 | = 2.32 | = 0.63 | = 0.65 | |||
n = 2 | = 6.44 | = 5.48 | |||||
ID | 0.97 |
Number of Movements | Observed Frequency | Expected Frequency | ||||
---|---|---|---|---|---|---|
Number of Interval | GPD | GIT | COM– Poisson | NB-Shifted NB | ||
MLE | MLE | MLE | MLE | pgf-Estimator | ||
0 | 182 | 182.50 | 176.61 | 176.67 | 182.02 | 181.96 |
1 | 41 | 39.49 | 46.71 | 46.63 | 41.22 | 41.60 |
2 | 12 | 11.62 | 12.29 | 12.30 | 10.30 | 10.14 |
3 | 2 | 3.95 | 3.23 | 3.24 | 3.78 | 3.69 |
4 | 2 | 1.46 | 0.85 | 0.85 | 1.52 | 1.49 |
5 | 0 | 0.57 | 0.22 | 0.23 | 0.65 | 0.63 |
6 | 0 | 0.23 | 0.06 | 0.06 | 0.28 | 0.27 |
7 | 1 | 0.17 | 0.02 | 0.02 | 0.23 | 0.23 |
Total | 240 | |||||
6.09 | 48.75 | 48.32 | 4.73 | 4.87 | ||
p value | 0.30 | 0.0 | 0.0 | 0.32 | 0.30 | |
Parameter estimates | = 0.66 | = 0.001 | = 0.001 | = 0.50 | = 0.50 | |
= 0.22 | = 0.26 | = 0.26 | = 0.08 | = 0.09 | ||
n = 1 | = 0.28 | = 0.27 | ||||
ID | 1.84 |
Table 3 | ||||||
MLE | pgf-Estimator | |||||
Estimated parameters | 0.08 | 0.99 | 38.58 | 0.15 | 0.99 | 19.75 |
Mean | 0.09 | 0.98 | 57.32 | 0.14 | 0.98 | 36.97 |
Standard error | 0.07 | 0.06 | 36.64 | 0.08 | 0.01 | 32.35 |
Confidence interval | (0.03, 0.24) | (0.83, 0.99) | (10.51, 100) | (0.03, 0.32) | (0.94, 0.99) | (7.41, 100) |
Bias | 0.01 | −0.02 | 18.74 | −0.01 | −0.01 | 17.22 |
Table 4 | ||||||
MLE | pgf-Estimator | |||||
Estimated parameters | 0.19 | 0.63 | 6.44 | 0.21 | 0.65 | 5.48 |
Mean | 0.17 | 0.59 | 20.50 | 0.20 | 0.61 | 18.35 |
Standard error | 0.10 | 0.19 | 30.70 | 0.11 | 0.16 | 28.73 |
Confidence interval | (0.02, 0.36) | (0.01, 0.81) | (2.41, 100) | (0.02, 0.40) | (0.08, 0.81) | (2.16, 99.99) |
Bias | −0.02 | −0.05 | 14.06 | −0.02 | −0.04 | 12.88 |
Table 5 | ||||||
MLE | pgf-Estimator | |||||
Estimated parameters | 0.5 | 0.08 | 0.28 | 0.5 | 0.09 | 0.27 |
Mean | 0.48 | 0.07 | 0.40 | 0.49 | 0.08 | 0.37 |
Standard error | 0.14 | 0.05 | 0.36 | 0.14 | 0.06 | 0.28 |
Confidence interval | (0.20, 0.72) | (0.01,0.17) | (0.08, 1.28) | (0.22, 0.74) | (0.01, 0.18) | (0.07, 1.10) |
Bias | −0.02 | −0.01 | 0.12 | −0.01 | −0.01 | 0.10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ong, S.H.; Sim, S.Z.; Liu, S.; Srivastava, H.M. A Family of Finite Mixture Distributions for Modelling Dispersion in Count Data. Stats 2023, 6, 942-955. https://doi.org/10.3390/stats6030059
Ong SH, Sim SZ, Liu S, Srivastava HM. A Family of Finite Mixture Distributions for Modelling Dispersion in Count Data. Stats. 2023; 6(3):942-955. https://doi.org/10.3390/stats6030059
Chicago/Turabian StyleOng, Seng Huat, Shin Zhu Sim, Shuangzhe Liu, and Hari M. Srivastava. 2023. "A Family of Finite Mixture Distributions for Modelling Dispersion in Count Data" Stats 6, no. 3: 942-955. https://doi.org/10.3390/stats6030059
APA StyleOng, S. H., Sim, S. Z., Liu, S., & Srivastava, H. M. (2023). A Family of Finite Mixture Distributions for Modelling Dispersion in Count Data. Stats, 6(3), 942-955. https://doi.org/10.3390/stats6030059