Count Random Variables

Mendonça, Sandra; Oliveira, António Alberto; Pestana, Dinis; Rocha, Maria Luísa

doi:10.3390/encyclopedia4030089

Open AccessEntry

Count Random Variables

¹

Departamento de Matemática—FCEE, Universidade da Madeira, Campus Universitário da Penteada, 9020-105 Funchal, Portugal

²

Centro de Estatística e Aplicações, Universidade de Lisboa (CEAUL), Campo Grande, 1749-016 Lisboa, Portugal

³

Instituto Superior de Administração e Finanças (ISAF), Luanda, Angola

⁴

Faculdade de Ciências da Universidade de Lisboa (DEIO), Campo Grande, 1749-016 Lisboa, Portugal

⁵

Instituto de Investigação Científica Bento da Rocha Cabral, Calçada Bento da Rocha Cabral 14, 1250-012 Lisboa, Portugal

⁶

Faculdade de Economia e Gestão, Universidade dos Açores, Rua da Mãe de Deus, 9500-321 Ponta Delgada, Portugal

⁷

CEEAplA—Centro de Estudos de Economia Aplicada do Atlântico, Universidade dos Açores, Rua da Mãe de Deus, 9500-321 Ponta Delgada, Portugal

^*

Author to whom correspondence should be addressed.

Encyclopedia 2024, 4(3), 1367-1380; https://doi.org/10.3390/encyclopedia4030089

Submission received: 8 August 2024 / Revised: 18 September 2024 / Accepted: 20 September 2024 / Published: 23 September 2024

(This article belongs to the Section Mathematics & Computer Science)

Download Versions Notes

Definition

:

The observation of randomness patterns serves as guidance for the craft of probabilistic modelling. The most used count models—Binomial, Poisson, Negative Binomial—are the discrete Morris’ natural exponential families whose variance is at most quadratic on the mean, and the solutions of Katz–Panjer recurrence relation, aside from being members of the generalised power series and hypergeometric distribution families, and this accounts for their many advantageous characteristics. Some other basic count models are also described, as well as models with less obvious but useful randomness patterns in connection with maximum entropy characterisations, such as Zipf and Good models. Simple tools, such as truncation, thinning, or parameter randomisation, are straightforward ways of constructing other count models.

Keywords:

discrete models; count random variables; Panjer’s family; hierarchical models

1. Introduction

For any

S = {\{x_{k}\}}_{k \in K}

, with

K \subseteq N_{0} = \{0, 1, \dots\}

, and for any sequence

{p_{k}}_{k \in K}

such that

p_{k} \geq 0

for any

k \in K

and

\sum_{x_{k} \in S} p_{k} = 1

,

X = \{\begin{matrix} x_{k}, x_{k} \in S \\ p_{k} = P [X = x_{k}] \end{matrix}

is a discrete lattice random variable with support

S

and probability mass function

{\{p_{k}\}}_{k \in K}

. If

x_{k} = k \in N_{0}

, X is a count random variable.

In most cases, the probability mass function

{\{p_{k}\}}_{k \in K}

is not interesting, since it is difficult to deal with and there is no clear interpretation of the pattern of randomness it describes. The craft of probabilistic modelling (Gani (1986) [1]) uses a diversity of criteria to describe and select models, namely, those arising from randomness patterns (such as counts in Bernoulli trials, sampling with or without replacement, and random draws from urns). Another source of the rationale description of count models are characterisation theorems based on structural properties (e.g., a power series distribution with mean = variance, or maximum Shannon entropy with prescribed arithmetic and/or geometric mean). Recurrence relationships (for instance,

p_{k + 1} = (a + \frac{b}{k + 1}) p_{k}, k = ν, ν + 1, \dots

) or mathematical properties (for instance, the variance being at most a quadratic function of the expectation) also define interesting families of discrete random variables. On the other hand, asymptotic properties such as arithmetic properties, namely, infinite divisibility, discrete self-decomposability, and stability, serve as guidance in model choice.

Section 2 describes the discrete uniform random variables, modelling equiprobability patterns resulting from the principle of insufficient reason, of which the Bernoulli random variable with parameter

\frac{1}{2}

is the simplest example. Section 3 is a detailed overview of count models—Binomial, Negative Binomial—arising from the observation of random patterns in Bernoulli trials (including the Poisson random variable, as a limit of Binomial random variables under a mean stability restriction, and the Hypergeometric sampling without replacement model, herein in the context of conditioning on the sum of two independent random variables). In Section 4, the recurrence relation holding for the probability mass function of Binomial, Poisson, and Negative Binomial random variables investigated by Katz [2] and by Panjer [3] is extended to describe Hess et al. [4]’s family of basic count distributions. Section 5 briefly discusses alternative organisations of count models, namely, via Power Series distributions or Kemp’s [5] generalised hypergeometric probability distributions. Section 6 constrasts the equilibrium pattern of Zipf’s [6] law with the equiprobability modelled by the discrete uniform random variables. Section 7 and Section 8 discuss ways of transforming random models, respectively, by randomising parameters and via the discretisation of countinuous random variables. Section 9 briefly discusses the role of characterisations in the craft of modelling count data. Further issues are briefly described in Section 10.

2. Bernoulli Random Variables, Principle of Insufficient Reason and Discrete Uniform Random Variables

Let

p \in [0, 1]

be the probability of the event

E

occurring as the outcome of an experiment. Either

E

(sometimes referred to as success) occurs once, or its complementary event

\bar{E}

(referred to as failure) occurs, so the number of occurrences of

E

in a single trial is either 1, with probability p, or 0, with probability

1 - p

. We shall use the notation

B = \{\begin{matrix} 0 & 1 \\ 1 - p & p \end{matrix},

where the first line indicates the support of the count random variable B, and the second line the probabilities of the support points.

The above random variable is called “Bernoulli”, with parameter p, in honour of the genial probabilist Jacques Bernoulli, author of the fascinating Ars Conjectandi [7], published posthumously in 1713 by his nephew Niklaus Bernoulli (also a probabilist).

We shall use the notation

X \sim B e r n o u l l i (p)

; if

p = \frac{1}{2}

, meaning that

E

and

\bar{E}

are equiprobable, it should be assumed that there is insufficient reason to assign different probabilities to

E

and

\bar{E}

.

The principle of insufficient reason (renamed principle of indifference by Maynard Keynes [8]) was used in the foundation texts of Jacques Bernoulli [7] and Laplace [9] for assigning epistemic probabilities to equiprobable events. The natural extension of

X \sim B e r n o u l l i (\frac{1}{2})

is the equiprobable count model

Y = \{\begin{matrix} k, k = 0, \dots, n \\ p_{k} = \frac{1}{n + 1} \end{matrix},

named Discrete Uniform with parameter n, that we denote as

Y \sim D i s c r e t e U n i f o r m (n)

.

3. Count Random Variables in Bernoulli Trials

Let

\{T_{1}, T_{2}, \dots\}

be a sequence of independent random experiments (trials) whose outcomes are either

E

, with probability

p \in (0, 1)

, or

\bar{E}

, with probability

1 - p

. The assumption of independence means that the outcome of trial

T_{k}, k = 1, 2, \dots

has no influence whatsoever on the outcome of any other trial.

In this setting, the counts that matter are the following:

The number k of outcomes $E$ in n trials (n fixed, k random). From the definition of Bernoulli trials, it is easy to conclude that, with X denoting such a count random variable, $P [X = k] = p_{k} = (\binom{n}{k}) p^{k} {(1 - p)}^{n - k}, k = 0, \dots, n$ . This random variable, with probability mass function ${\{p_{k} = (\binom{n}{k}) p^{k} {(1 - p)}^{n - k}\}}_{k = 0}^{n}$ , is called a binomial random variable with parameters $n \in N^{+}$ and $p \in (0, 1)$ , and we shall denote it as $X \sim B i n o m i a l (n, p)$ or

$X = \{\begin{matrix} k, k = 0, \dots, n \\ p_{k} = (\binom{n}{k}) p^{k} {(1 - p)}^{n - k} \end{matrix} .$

The expectation is $E (X) = n p$ and the variance $var (X) = n p (1 - p)$ . Hence, the dispersion index is $D (X) = \frac{var (X)}{E (X)} = 1 - p < 1$ . For that reason, we say that the random variable $X \sim B i n o m i a l (n, p)$ is underdispersed.
The number n of trials needed to observe k times the outcome $E$ (k fixed, n random). The simple case is $k = 1$ . As in this case, with $W_{1}^{*}$ denoting such a count random variable, $P [W_{1}^{*} = n] = P [{\bar{E}}_{1} \dots {\bar{E}}_{n - 1} E_{n}] = {(1 - p)}^{n - 1} p$ . $W_{1}^{*}$ is called geometric (or sometimes Pascal) random variable, and we shall use the notation $W_{1}^{*} \sim G e o m e t r i c (p)$ , or

$W_{1}^{*} = \{\begin{matrix} n, n = 1, 2, \dots \\ p_{n} = p {(1 - p)}^{n - 1} \end{matrix} .$

Its expectation is $E (W_{1}^{*}) = \frac{1}{p}$ and its variance is $var (W_{1}^{*}) = \frac{1 - p}{p^{2}}$ .
More generally, let $W_{k}^{*}$ be the number of trials needed for $E_{n}$ being the k-th $E$ of outcomes. Obviously, due to the independence of the Bernoulli trials,

$P (W_{k}^{*} = n) = P (k - 1 E^{'} s in n - 1 trials) \times P (E_{n}) = (\binom{n - 1}{k - 1}) p^{k - 1} {(1 - p)}^{n - k} \times p .$

$W_{k}^{*}$ is a Negative Binomial random variable with parameters k and p, that we denote $W_{k}^{*} \sim N e g a t i v e B i n o m i a l (k, p)$ , or

$W_{k}^{*} = \{\begin{matrix} n, n = k, k + 1, \dots \\ p_{n} = (\binom{n - 1}{k - 1}) p^{k} {(1 - p)}^{n - k} \end{matrix} .$

Obviously, $W_{k}^{*} \overset{d}{=} W_{1, 1}^{*} + \dots + W_{1, k}^{*}$ , with the $W_{1, i}^{*} \sim G e o m e t r i c (p), i = 1, \dots, k$ , independent. So $E (W_{k}^{*}) = \frac{k}{p}$ and $var (W_{k}^{*}) = \frac{k (1 - p)}{p^{2}}$ .
It is sometimes convenient to shift the Negative Binomial random variables to start at 0, i.e., to count the number of $\bar{E}$ s that precede the k-th $E$ . In other words, to define

$W_{k} = W_{k}^{*} - k \sim N e g a t i v e B i n o m i a l (k, p, 0), W_{k} = \{\begin{matrix} n, n = 0, 1, \dots \\ p_{n} = (\binom{n + k - 1}{k - 1}) p^{k} {(1 - p)}^{n} \end{matrix} .$

Its dispersion index is $D (W_{n}) = \frac{var (W_{n})}{E (W_{n})} = \frac{\frac{k (1 - p)}{p^{2}}}{k (\frac{1}{p} - 1)} = \frac{1}{p} > 1$ , and for that reason the Negative Binomial random variables are considered overdispersed.
Note also that using the gamma function extension of factorials, $Γ (α) = \int_{0}^{\infty} x^{α - 1} e^{- x} d x$ , $α > 0$ , satisfying the recurrence relation $Γ (α + 1) = α Γ (α)$ , so that $Γ (n + 1) = n!$ , we may consider Negative Binomial random variables with parameter $α > 0, p \in (0, 1)$ : $W_{α} \sim N e g a t i v e B i n o m i a l (α, p, 0) = \{\begin{matrix} n, n = 0, 1, \dots \\ p_{n} = \frac{Γ (n + α)}{n! Γ (α)} p^{α} {(1 - p)}^{n} \end{matrix} .$

In many situations, asymptotic results are paramount in modelling decisions or simplifications. The first central limit theorem (a name coined by Pólya [10] in 1920), establishing that if

X \sim B i n o m i a l (n, p)

and n is large, then

\frac{X - n p}{\sqrt{n p (1 - p})}

can be approximated by

Z \sim G a u s s i a n (0, 1)

, appeared in the second edition of Abraham de Moivre’s [11] The Doctrine of Chances.

The other important asymptotic result about Binomial sequences is Poisson’s [12] law of rare events:

Let

{\{X_{n} \sim B i n o m i a l (n, p_{n})\}}_{n \in N}

, mean-stable in the sense that

E (X_{n}) = n p_{n} \underset{n \to \infty}{⟶} μ > 0

(observe that this implies that

p_{n} = O (\frac{1}{n}) \underset{n \to \infty}{⟶} 0

, and this is the rationale for the name “law of rare events”). Then,

X_{n} ⟶_{n \to \infty}^{d} Y = \{\begin{matrix} k, k = 0, 1, \dots \\ p_{k} = e^{- μ} \frac{μ^{k}}{k!} \end{matrix};

Y is a Poisson random variable with parameter

μ > 0

, that we denote

Y \sim P o i s s o n (μ)

.

E (Y) = var (Y) = μ

, and hence, in what concerns dispersion, the Poisson random variable is a yardstick, in the sense that its dispersion index is

D (Y) = 1 .

McCabe and Skeels (2020) [13] and Di Noia et al. (2024) [14] extensively investigated testing Poissoness vs. overdispersion or underdispersion; cf. also Mijburgh and Visagie (2020) [15]’s overview of goodness-of-fit tests for the Poisson distribution.

The Binomial, the Negative Binomial, and the Poisson random variables are the discrete members of Morris’ [16] Natural Exponential Family (NEF) with quadratic variance function in the mean value (QVF). (Recall that X is a member of NEF if its probability density function can be written as

f_{X} (x | θ) = h (x) exp [θ x - A (θ)]

, and therefore its cumulant generating function has the simple form

K_{X} (t) = A (θ + t) - A (θ)

.) In the sequel, Morris [17] treated “topics that can be handled within this unified NEF-QVF formulation, including unbiased estimation, Bhattacharyya and Cramér-Rao lower bounds, conditional distributions and moments, quadratic regression, conjugate prior distributions, moments of conjugate priors and posterior distributions, empirical Bayes and

G_{2}

minimax, marginal distributions and their moments, parametric empirical Bayes, and characterisations”, and this shows the relevance of Binomial, Negative Binomial, and Poisson count models in Statistical Inference.

It is also worth mentioning that Binomial, Negative Binomial, and Poisson random variables have relevant additive properties, in the sense that

If $X \sim B i n o m i a l (n, p)$ and $Y \sim B i n o m i a l (m, p)$ are independent, then $X + Y \sim B i n o m i a l (n + m, p)$ .
If $X \sim N e g a t i v e B i n o m i a l (k, p)$ and $Y \sim N e g a t i v e B i n o m i a l (j, p)$ are independent, then $X + Y \sim N e g a t i v e B i n o m i a l (k + j, p)$ .
If $X \sim P o i s s o n (μ)$ and $Y \sim P o i s s o n (θ)$ are independent, then $X + Y \sim P o i s s o n (μ + θ)$ .

Furthermore, if

X \sim P o i s s o n (μ)

is subject to Binomial filtering or thinning, the resulting Y has probability mass function

\begin{matrix} P (Y = k) & = \sum_{j = 0}^{\infty} P (Y = k | X = k + j) P (X = k + j) \\ = \sum_{j = 0}^{\infty} (\binom{k + j}{k}) p^{k} {(1 - p)}^{j} e^{- μ} \frac{μ^{k + j}}{(k + j)!} = e^{- p μ} \frac{{(p μ)}^{k}}{k!}, k = 0, 1, \dots \end{matrix}

i.e.,

Y \sim P o i s s o n (p μ)

.

Many natural phenomena are subject to filtering; for instance, using a Poisson model for the number of eggs laid by turtles, the number of hatched eggs, the number of surviving newborns until they reach the ocean, the number of turtles surviving the first year, and so forth, can be modelled by a chain of filtered Poisson random variables, a very useful tool in population dynamics modelling.

On the other hand, the Negative Binomial is a “random Poisson”, obtained when its parameter is a random variable with a Gamma distribution: Let

X | Λ \sim Poisson (Λ)

with

Λ \sim Gamma (α, δ)

. Then,

\begin{matrix} P [X = k] & = \int_{0}^{\infty} \frac{λ^{k} e^{- λ}}{k!} \frac{λ^{α - 1} e^{- λ / δ}}{δ^{α} Γ (α)} d λ = \frac{1}{k! δ^{α} Γ (α)} \int_{0}^{\infty} {(\frac{δ y}{1 + δ})}^{α + k - 1} \frac{{δ e}^{- y}}{1 + δ} d y \\ = \frac{Γ (α + k)}{k! Γ (α)} {(\frac{1}{1 + δ})}^{α} {(\frac{δ}{1 + δ})}^{k} = (\binom{α + k - 1}{k}) {(\frac{1}{1 + δ})}^{α} {(\frac{δ}{1 + δ})}^{k}, \end{matrix}

k = 0, 1, …, so we obtain

X \sim NegativeBinomial (α, \frac{1}{1 + δ})

. For this reason, and as the Negative Binomial random variable is overdispersed, the Poisson being the yardstick, some authors in population dynamics modelling consider the Negative Binomial a more dispersed Poisson.

The above results on the sums of Morris’ discrete variable have interesting consequences when conditioning is applied:

If

X \sim P o i s s o n (μ)

and

Y \sim P o i s s o n (θ)

are independent,

\begin{matrix} P (X = k | X + Y = n) & = \frac{P (X = k) P (Y = n - k)}{P (X + Y = n)} \\ = \frac{e^{- μ} \frac{μ^{k}}{k!} e^{- θ} \frac{θ^{n - k}}{(n - k)!}}{e^{- (μ + θ)} \frac{{(μ + θ)}^{n}}{n!}} = (\binom{n}{k}) {(\frac{μ}{μ + θ})}^{k} {(\frac{θ}{μ + θ})}^{n - k}, k = 0, 1, \dots, \end{matrix}

i.e.,

X | X + Y = n \sim B i n o m i a l (n, \frac{μ}{μ + θ})

.

Reasoning in the same way, if

X \sim B i n o m i a l (n, p)

and

Y \sim B i n o m i a l (m, p)

are independent,

\begin{matrix} P (X = k | X + Y = ℓ) & = \frac{P (X = k) P (Y = ℓ - k)}{P (X + Y = ℓ)} \\ = \frac{(\binom{n}{k}) p^{k} {(1 - p)}^{n - k} (\binom{m}{ℓ - k}) p^{ℓ - k} {(1 - p)}^{m - ℓ + k}}{(\binom{n + m}{ℓ}) p^{ℓ} {(1 - p)}^{n + m - ℓ}} \\ = \frac{(\binom{n}{k}) (\binom{m}{ℓ - k})}{(\binom{n + m}{ℓ})}, k = max {0, ℓ - k}, \dots, min {ℓ, n} . \end{matrix}

This is the probability mass function of

X | X + Y = ℓ \sim H y p e r g e o m e t r i c (n + m, ℓ, p)

with

p = \frac{m}{m + n}

.

If

X \sim H y p e r g e o m e t r i c (N, n, p)

,

E (X) = n p

and

var (X) = n p (1 - p) \frac{N - n}{N - 1}

.

It is interesting to observe that if

X_{P} \sim P o i s s o n (μ)

,

Y_{B} \sim B i n o m i a l (n, p)

, and

T_{H} \sim H y p e r g e o m e t r i c (N, n, p)

have the same expectation

n p

, then,

var (X_{P}) = μ = n p > var (Y_{B}) = n p (1 - p) > var (T_{H}) = n p (1 - p) \frac{N - n}{N - 1} .

In other words, for these three count models, the number of parameters implies a tradeoff between decreasing simplicity and increasing information. Augmenting the number of parameters supplies more information. However, one should always bear in mind that all models are wrong, but some are useful (as Box [18] judiciously stated), and that the parsimony principle hints that the simplest model that is useful enough should be used. Observe that in model choice, criteria such as the Akaike Information Criterion (AIC) [19] penalise the number of parameters of the model, and this often leads us to choose a model with fewer parameters, although it provides a better fit.

The Hypergeometric random variable is used in simple random sampling without replacement, while the Binomial is appropriate for simple random sampling with replacement. In the Hypergeometric context, we are dealing with exchangeability; in the Binomial setting, with a stronger independence concept.

The Hypergeometric random model is also the basis for estimating the size of a population with the technique of capture–recapture, cf. Seber and Schofield [20]. Suppose that in a controlled investigation, k animals are captured, marked, and released, and that after a while, there is a second instance of capturing, in which j of the n captured are marked (i.e., recaptured). If the unknown size of the population is N, k of which have been marked and

N - k

are unmarked, j is the observed value of

Y \sim H y p e r g e o m e t r i c (N, n, p)

, with

p = \frac{k}{N}

. The Peterson estimator

\tilde{N} = \frac{n k}{j}

of the size of the population is a method of moments estimator (observe that we described a technique with just one recapture of j marked animals, and in a sample

(y)

of size one, this is the mean).

4. Katz Count Models and Extensions

The probability mass function of the discrete Morris NEF-QVF Binomial, Poisson, and Negative Binomial random variables share the interesting property

p_{k + 1} = (a + \frac{b}{k + 1}) p_{k}, k = 0, 1, \dots

(1)

Let

N_{a, b}

denote the non-degenerate count random variables whose probability mass function satisfies the recursive relation (1).

Multiplying both sides of (1) by

s^{k + 1}

and adding for

k \geq 0

, we find that the corresponding probability generating functions

G_{a, b} (s) = \sum_{k = 0}^{\infty} p_{k} s^{k}

are the solutions of the differential equation

\frac{{G^{'}}_{a, b} (s)}{G_{a, b} (s)} = \frac{a + b}{1 - a s}

; for details, cf. Pestana and Velosa (2004) [21]; namely:

$G_{a, b} (s) = {(1 - \frac{a}{a - 1} + \frac{a}{a - 1} s)}^{- \frac{a + b}{a}}$ , $a < 0, \frac{a}{a - 1} = p \in (0, 1), - \frac{a + b}{a} = n \in N$ , i.e., $N_{a, b} \sim B i n o m i a l (- \frac{a + b}{a}, \frac{a}{a - 1})$ .
$G_{a, b} (s) = e^{b (s - 1)}$ , $a = 0, b > 0$ , i.e., $N_{0, b} \sim P o i s s o n (b)$ .
$G_{a, b} (s) = {(\frac{1 - a}{1 - a s})}^{\frac{a + b}{a}}$ , $a \in (0, 1), a + b > 0$ , i.e., $N_{a, b} \sim N e g a t i v e B i n o m i a l (\frac{a + b}{a}, a)$ .

The simple expression (1) for the successive probability atoms has been several times rediscovered in different contexts (McCabe and Skeels, 2020 [13]). Katz (1945, 1965) [2,22] used it to organise a family of discrete models in the same spirit of the continuous Pearson family, and for that reason, the family of random variables

\{N_{a, b}\}

whose probability mass function satisfies (1) is referred to as the Katz family. In Risk Theory and Insurance,

\{N_{a, b}\}

is known as Panjer’s class, due to Panjer’s (1981) [3] important breakthrough that the recurrence relation (1) implies that the distribution of aggregate claims can be iteratively computed or approximated:

Let

S_{1}, S_{2}, \dots

denote the identically distributed discrete claim sizes, with

π_{k} = P [S_{i} = k],

k \in N_{0},

and assume that the

S_{i}

are mutually independent and independent of the claim number

N_{a, b}

. Then,

E (S_{1} |\sum_{i = 1}^{n} S_{i} = j) = \frac{j}{n}, and P (S_{1} = k |\sum_{i = 1}^{n} S_{i} = j) = \frac{π_{k} π_{j - k}^{* (n - 1)}}{π_{j}^{* n}},

where

\{π_{k}^{* n}\}

denotes the n-th convolution of

\{π_{k}\}

. Consequently, the probabilities

p_{X, k} = P [X = j]

of the aggregate claim amount

X = \sum_{i = 1}^{N_{a, b}} S_{i}

(understood as

X = 0

if

N_{a, b} = 0

) can be obtained recursively using Panjer’s algorithm:

P [X = j] = \{\begin{matrix} G_{a, b} (π_{0}), & j = 0 \\ \frac{\sum_{k = 1}^{j} (a + \frac{b k}{j}) π_{k} p_{X, j - k}}{1 - a π_{0}}, & j \in N \end{matrix} .

This requires only

O (j)

computations, while the traditional convolution method would require

O (j^{2})

. More generally, the cumulative distribution function

F_{X} = \sum_{k = 0}^{\infty} p_{k} F_{S}^{* k}

of the aggregate claim

X = \sum_{k = 1}^{N_{a, b}} S_{k}

for an arbitrary distribution function

F_{S}

of the individual claim sizes, when the number of claims is

N_{a, b}

with probability mass

\{p_{k}\}

, satisfies the integral equation

F_{X} (x) = p_{0} + a F_{S} * F_{X} (x) + b \int_{0}^{x} v \int_{0}^{x - v} \frac{d F_{X} (y)}{v + y} d F_{S} (v)

for

x > 0

, if

F_{S} (0) = 0

. For nonnegative claim sizes with

F_{S} (0) > 0

,

F_{X} (0) = \{\begin{matrix} e^{b [F_{S} (0) - 1]}, & a = 0 \\ {[\frac{1 - a}{1 - a F_{S} (0)}]}^{\frac{a + b}{a}}, & a \neq 0 \end{matrix} .

For detailed proofs, cf. Rólski et al. [23] (pp. 118–124). Klugman et al. [24] (pp. 221–224) use an example to discuss in depth the use of the empiric dispersion index as guidance for model choice in the Panjer family.

If we relax the condition on the support,

p_{k + 1} = (a + \frac{b}{k + 1}) p_{k}, k = 1, 2, \dots

, as investigated by Jewell [25], Willmot [26], or Sundt [27], we also obtain the Logarithmic random variable

X_{L} = \{\begin{matrix} k, k = 1, 2, \dots \\ p_{k} = \frac{θ^{k} / k}{- ln (1 - θ)} \end{matrix}, θ \in (0, 1),

used by Fisher [28] to model species abundance, and the Engen Extended Negative Binomial (ENB) [29] random variable,

X_{E N B} = \{\begin{matrix} k, k = 1, 2, \dots \\ p_{k} = \frac{1}{1 - {(1 - θ)}^{β}} (\binom{β + k - 1}{β - 1}) θ^{k} {(1 - θ)}^{β} \end{matrix}, θ \in (0, 1), β \in (- 1, 0) .

The dispersion indices of the Logarithmic and of the Engen random variables depend on

θ

. For

X_{L} \sim L o g a r i t h m i c (θ)

,

D (X_{L}) = \frac{- \frac{θ^{2} + θ ln (1 - θ)}{{(1 - p)}^{2} {[ln (1 - θ)]}^{2}}}{- \frac{θ}{(1 - θ) ln (1 - θ)}} = \frac{θ + ln (1 - θ)}{(1 - θ) ln (1 - θ)} ⟹ D (X_{L}) \{\begin{matrix} < 1 & if θ \in (0, 1 - \frac{1}{e}) \\ = 1 & if θ = 1 - \frac{1}{e} \\ > 1 & if θ \in (1 - \frac{1}{e}, 1) \end{matrix},

and for

X_{E N B} \sim E n g e n (β, θ)

D (X_{E N B}) = \frac{1 + β θ - \frac{β θ}{1 - {(1 - θ)}^{β}}}{(1 - θ)} ⟹ D (X_{E}) \{\begin{matrix} < 1 & if θ \in (0, 1 - {(\frac{1}{β + 1})}^{1 / β}) \\ = 1 & if θ = 1 - {(\frac{1}{β + 1})}^{1 / β} \\ > 1 & if θ \in (1 - {(\frac{1}{β + 1})}^{1 / β}, 1) \end{matrix} .

Hess et al. (2002) [4] investigated

ν

-Panjer classes,

ν \in N_{0}

, whose probability mass function satisfies

p_{k + 1} = (a + \frac{b}{k + 1}) p_{k}, k = ν, ν + 1, \dots

. Excluding the 0-Panjer family, which does not contain the logarithmic and ENB distributions, Hess et al. (2002) [4] established that any

ν

-Panjer distribution is the left endpoint truncation of an

(ν - 1)

-Panjer distribution, for

ν \geq 2

. For this reason, they call the Binomial, Poisson, Negative Binomial, Logarithmic, and ENB distributions basic count models.

Klugman et al.’s [24] notation

N (a, b, n)

has been used by Fackler (2024) [30] to supply a unified, practical, and intuitive representation of the Panjer distributions and their parameter space, and give an inventory of parameterisations used for Panjer distributions.

Panjer’s recursion has been extended by Tzaninis and Bozikas (2024) [31] to mixed compound distributions, following the use of finite mixtures for modelling dispersion in count data by Ong et al. (2023) [32].

5. Other Families of Count Models

The Katz [2,22] organisation of discrete models using the difference Equation (1), similar to Pearson’s continuous model types description using a differential equation, inspired later essays on the rationale description of discrete models.

5.1. Power Series Distributions

The discrete random variable

X_{θ}, θ > 0

has a power series distribution (PSD) if

X_{θ} = \{\begin{matrix} k, k \in N_{0} \\ p_{k} = \frac{a_{k} θ^{k}}{A (θ)} \end{matrix}, a_{k} \geq 0, A (θ) = \sum_{k \in N_{0}} a_{k} θ^{k} .

A power series distribution is a discrete linear exponential distribution.

If we replace

N_{0}

by

S

, a nonempty enumerable subset of

N_{0}

, we obtain a generalised power series distribution (GPSD):

X_{θ} = \{\begin{matrix} k, k \in S \\ p_{k} = \frac{a_{k} θ^{k}}{A (θ)} \end{matrix}, a_{k} \geq 0, A (θ) = \sum_{k \in S} a_{k} θ^{k} .

Observe that the probability generating function of

X_{θ}

is

G_{X_{θ}} (s) = \frac{A (θ s)}{A (θ)}

.

The random variables

X_{B} \sim B i n o m i a l (n, θ)

, with

A (θ) = {(1 + θ)}^{n}

,

Y_{P} \sim P o i s s o n (θ)

, with

A (θ) = e^{θ}

,

W_{N B} \sim N e g a t i v e B i n o m i a l (k, θ)

, with

A (θ) = \frac{1}{{(1 - θ)}^{k}}, k > 0

, and

X_{L} \sim L o g a r i t h m i c (θ), θ \in (0, 1)

, with

A (θ) = - ln (1 - θ)

, have GPSD.

Obviously, if a generalised power series distribution is truncated, another generalised power series distribution is obtained. An important issue, seldom discussed in the literature: restricting the support may enlarge the parameter space

Θ

; for example truncating to the right of

N \in N

the support of

X_{L} \sim L o g a r i t h m i c (θ), θ \in (0, 1)

, the resulting truncated logarithmic random variable

X_{T L}

X_{T L} = \{\begin{matrix} k, k = 1, \dots, N \\ p_{k} = \frac{θ^{k} / k}{\sum_{k = 1}^{N} θ^{k} / k} \end{matrix}, θ > 0,

has a parameter space

Θ = (0, \infty)

, since

\sum_{k = 1}^{N} \frac{θ^{k}}{k} < \infty

for any

θ > 0

.

5.2. Kemp’s Generalised Hypergeometric Probability Distributions

The work by Johnson et al. (2005) [33] is a coordinated thorough description of useful discrete random variables, structured on Kemp’s (1968) [5] use of hypergeometric functions, and the corresponding characterisation sections enhance their randomness patterns. Kemp’s [5] generalised hypergeometric probability distributions, with probability generating function

\frac{{}_{p}F_{q} (λ z)}{{}_{p}F_{q} (λ)}

, where

{}_{p}F_{q} [a_{1}, \dots, a_{p}; b_{1}, \dots, b_{q}; λ z] = \sum_{j = 0}^{\infty} \frac{{(a_{1})}_{j} \dots {(a_{p})}_{j}}{{(b_{1})}_{j} \dots {(b_{q})}_{j}} \frac{{(λ z)}^{j}}{j!}

(

{(α)}_{j} = α (α + 1) \dots (α + j - 1)

is the Pochhammer’s symbol) is the generalised hypergeometric function with numeratorial parameters

a_{1}, \dots a_{p}

and denominatorial parameters

b_{1}, \dots, b_{q}

. This huge family—Johnson et al. (2005) [33] (pp. 93–94) identify the name of 37 of the more than 70 studied in Patil et al. (1984) [34]—has many useful properties.

6. Uniformity and Power Law Randomness, a Striking Example of Opposite Patterns

The simplest equiprobability pattern of count randomness is modelled by the discrete uniform distribution

X \sim D i s c r e t e U n i f o r m (n)

,

X = \{\begin{matrix} k, k = 0, 1, \dots n \\ p_{k} = \frac{1}{n + 1} \end{matrix} .

Equiprobability is, however, the exception; natural phenomena are more prone to present other patterns of equilibrium, such as with Zipf’s model, which is the special case

q = 0

of the Zipf–Mandelbrot family of discrete models given by

X_{N, q, α} = \{\begin{matrix} k, k \in \{1, 2, \dots, N\} \\ p_{k} = \frac{1}{H_{N, q, α}} \frac{1}{{(k + q)}^{α}} \end{matrix},

where

α > 0, q \geq 0

, k is the rank of the data and

H_{N, q, α} = \sum_{j = 1}^{N} \frac{1}{{(j + q)}^{α}}

is a generalisation of the harmonic number

H_{N} = \sum_{j = 1}^{N} \frac{1}{j}

. The cumulative distribution function is, for

x \in {1, \dots, N}

,

F (x) = \frac{H_{⌊ x ⌋, q, α}}{H_{N, q, α}}

, where

⌊ x ⌋

is the largest integer not greater than x.

Zipf’s law has been introduced to model the equilibrium—in the sense that rank × frequency is approximately constant—in verbal communication, which employs mainly words of ordinary usage (social trend) and scarcely words tied to the user’s vocabulary and thematic preferences (individual trend). In what concerns extra-pair paternity (EPP), there is some evidence that the observed variability of the number of extra-pair offspring (EPO) in the brood, in various species of passerines, results from a delicate equilibrium of needing a social partner to care for raising the brood and the tendency to have offspring sired by a stronger male. This is the reason why the Zipf–Mandelbrot family seems to be a plausible model, cf. Marques et al. (2005) [35].

Notice that Zipf’s law is also a particular case of the extended parameter space of the right-truncated logarithmic distribution where

p_{k} = \frac{θ^{k} / k}{ln (1 - θ)},

for

k = 1, 2, \dots

and

θ \in (0, 1) .

In fact, when considering the truncated logarithmic model where

p_{k} = \frac{θ^{k} / k}{\sum_{j = 1}^{ν} θ^{j} / j}

for

k = 1, 2, \dots, ν

, the parameter space can be extended to

θ \in (0, \infty)

, as already observed. In particular, if

θ = 1

, we obtain Zipf’s law. For more information, cf., e.g., Johnson et al. (2005) [33] (Chapter 11) and references therein. It is a power law in the sense that

P [X = k] \propto k^{- α}

, for some positive

α

, and therefore its log-log plot exhibits a linear signature. When an almost linear signature is observed, it is considered a strong support for choosing a power law model, namely, in the discrete context, a Zipf–Mandelbrot law.

7. Hierarchical Models and Parameter Randomisation

A powerful tool to obtain models is to randomise parameters. For instance, let

X | P \sim Binomial (n, P)

and

P \sim Uniform (0, 1)

. Since

(\binom{n}{k}) \int_{0}^{1} p^{k} {(1 - p)}^{n - k} d p = (\binom{n}{k}) B (k + 1, n + 1 - k) = \frac{1}{n + 1},

we have

P [X = k] = \frac{1}{n + 1}

,

k = 0, 1, \dots, n

, i.e.,

X \sim DiscreteUniform (n)

. Observe that

P \sim Uniform (0, 1)

can be considered as “null” information about the value of p, thus resulting in discrete equiprobability.

More generally, if

X | P \sim Binomial (n, P)

with

P \sim Beta (μ, ν)

, we obtain Beta-Binomial distributions, with probability mass function

\begin{matrix} p_{k} = P [X = k] & = (\binom{n}{k}) \int_{0}^{1} p^{k} {(1 - p)}^{n - k} \frac{p^{μ - 1} {(1 - p)}^{ν - 1}}{B (μ, ν)} d p \\ = (\binom{n}{k}) \frac{Γ (k + μ) Γ (n - k + ν) Γ (μ + ν)}{Γ (n + μ + ν) Γ (μ) Γ (ν)}, k = 0, 1, \dots, n . \end{matrix}

Let

X | P \sim Geometric (P)

with

P \sim Uniform (0, 1)

. As

\int_{0}^{1} p {(1 - p)}^{k - 1} d p = B (2, k)

, for

k = 1, 2, \dots

we obtain

P [X = k] = \frac{1}{k (k + 1)}

.

More generally, if

X | P \sim Geometric (P)

with

P \sim Beta (μ, ν)

,

p_{k} = \int_{0}^{1} p {(1 - p)}^{k - 1} \frac{p^{μ - 1} {(1 - p)}^{ν - 1}}{B (μ, ν)} d p = \frac{μ Γ (μ + ν) Γ (ν + k - 1)}{Γ (ν) Γ (μ + ν + k)} .

For example, if

μ = 2

and

ν = 1

, for

k = 1, 2, \dots

we obtain

p_{k} = \frac{2 Γ (k) Γ (3)}{Γ (k + 3)} = \frac{4}{k (k + 1) (k + 2)} .

Let

X | P \sim Geometric (P)

with

P \sim BetaBoop (1, 1, 1, 2)

; cf. Brilhante and Pestana [36]. Using formula 1,

\int_{0}^{1} x^{p - 1} {(1 - x^{r})}^{q - 1} (- ln x) d x = \frac{1}{r^{2}} B (\frac{p}{r}, q) [ψ (\frac{p}{r} + q) - ψ (\frac{p}{r})]

, from subsubsection 4.253 in Gradshteyn and Ryzhik (2007) [37] (p. 538), where

ψ (z) = \frac{d}{d z} ln Γ (z) = \frac{Γ^{'} (z)}{Γ (z)}

is the logarithmic derivative of the gamma function, we conclude that [37],

p_{k} = \int_{0}^{1} p {(1 - p)}^{k - 1} (- ln p) d p = B (2, k) [ψ (2 + k) - ψ (2)] .

Observing that taking the derivative of

Γ (z + 1) = z Γ (z)

,

Γ^{'} (z + 1) = z Γ^{'} (z) + Γ (z)

, and dividing by

Γ (z + 1)

, we obtain

\frac{Γ^{'} (z + 1)}{Γ (z + 1)} = \frac{z Γ^{'} (z)}{z Γ (z)} + \frac{Γ (z)}{z Γ (z)},

and consequently

ψ (z + 1) = ψ (z) + \frac{1}{z}

, we deduce that

p_{k} = \frac{1}{k (k + 1)} [ψ (k) + \frac{1}{k} + \frac{1}{k + 1} - ψ (2)] .

Continuing to apply the recursive expression to

ψ

, we obtain

X = \{\begin{matrix} k, k = 1, 2, \dots \\ p_{k} = \frac{H_{k + 1} - 1}{k (k + 1)} \end{matrix},

where

H_{n} = \sum_{j = 1}^{n} \frac{1}{j}

is the n-th harmonic number.

8. Discrete Analogues of Continuous Random Variables

Discrete analogues of continuous random variables may inherit interesting patterns. For instance, if we define

W = \{\begin{matrix} k, k = 0, 1, \dots \\ P [W = k] = F_{X} (k + 1) - F_{X} (k) \end{matrix},

with

X \sim E x p o n e n t i a l (λ)

, then

W \sim G e o m e t r i c (1 - e^{- λ})

inherits the memoryless property of the Exponential in the sense that

P [W \geq k + ℓ | W \geq ℓ] = P [W \geq k]

.

In this context, it is worth citing the discrete lognormal random variable:

X = \{\begin{matrix} k, k = 1, 2, \dots \\ p_{k} = \frac{1}{k C (μ, σ)} exp [- \frac{1}{2} {(\frac{ln k - μ}{σ})}^{2}] \end{matrix}

is a model for discrete skew populational data, that is not a power law but imitates a linear signature. For an overview of discrete analogues of continuous probability distributions, cf. Chakraborty (2015) [38].

9. Characterisations of Count Models

Families of discrete distributions have striking structural properties, with relevant consequences for model choice. Characterisation theorems are also helpful in the craft of probabilistic modelling.

As already observed, the dispersion index

D (X) = \frac{Var (X)}{E (X)}

of the Poisson distribution is 1. In fact, a PSD random variable X has

D (X) = 1

if and only if

X \sim P o i s s o n (μ)

, for some

μ > 0

, a characterisation due to Kosambi (1949) [39].

Characterisations are a useful tool in model choice. For instance, Kapur (1989) [40] has shown that a discrete random variable X with fixed arithmetic mean has maximum Shannon entropy

- \sum_{k \in S} p_{k} ln p_{k}

if and only if X is a Geometric random variable. Observe that choosing the largest entropy model within a class of distributions amounts to selecting as default the least informative model, and so minimising the prior information.

A discrete random variable X with fixed geometric mean has maximum Shannon entropy if and only if it is a Zipf or Discrete Pareto random variable

X_{Z} = \{\begin{matrix} k, k = 1, 2, \dots \\ p_{k} = \frac{1}{ζ (η) k^{η}} \end{matrix},

where

ζ (η)

is the Riemann zeta function. A discrete random variable X with fixed arithmetic and geometric means has maximum Shannon entropy if and only if X is a Good random variable

X_{G o} = \{\begin{matrix} k, k = 1, 2, \dots \\ p_{k} = \frac{q^{k}}{k^{η} ϕ (q, η, 1)} \end{matrix},

where

ϕ (q, η, ν) = \sum_{k = 0}^{\infty} \frac{q^{k}}{{(ν + k)}^{η}}, ν \neq 0, - 1, - 2, \dots

, is the Lerch function (subsubsection 9.55 in Gradshteyn and Ryzhik (2007) [37]). Observe that the Geometric random variable with support

1, 2, \dots

is the special case when

η = 0

, and the Logarithmic random variable is the case

η = 1

.

10. Concluding Remarks

As in the continuous case, asymptotic results also play an important role in modelling; for instance, the Poisson limit in discrete settings is, to a certain extent, comparable to the central limit Gaussian law in general; cf. Steutel and van Harn (1979) [41]. In the same spirit, extremal discrete laws do arise naturally in order statistics contexts; cf. Hall (1996) [42]. In addition, useful models can originate other patterns of randomness via truncation (with the eventual broadening of the parameter space), randomly stopped sums, or randomly stopped extremes, and the randomisation of parameters.

Moreover among all continuous probability distributions with support

[0, \infty)

and mean

μ

, the exponential distribution with

λ = 1 / μ

, modelling interarrival times in the Poisson process, has the largest differential entropy

h (f) = - \int_{- \infty}^{\infty} f_{X} (x) ln f_{X} (x) d x,

meaning that the Poisson process has the greatest

λ t

-dimensional entropy in the interval

(0, t)

among all the homogeneous point processes with the same given intensity

λ

. For that reason, Poisson streams of events are generally interpreted as representing the pattern of unconstrained randomness. This is a rationale for considering Poisson modelling, since entropy is non-decreasing, and therefore the Poisson random variable is a weak limit in very general frameworks, and in particular of Binomial and of Negative Binomial sequences with a mean stability condition. In a sense, the Poisson distribution models the unconstrained scattering of the type of throwing rice grains when seeding.

In Population Dynamics, dispersion is a clue to model choice. Zhang et al. (2018) [43] investigated dispersion patterns of finite mixtures, and Cahoy et al. (2021) [44] introduced flexible fractional Poisson distributions to model underdispersed and overdisersed count data. Recent advances in what concerns underdispersed count models were made by Huang (2022) [45] and Seck et al. (2022) [46]. Rana et al. (2023) [47] investigated the influence of outliers on overdispersion, and Sengupta and Roy (2023) [48] the role of under-reported count data and zero-inflated models, an issue also investigated by Aswi et al. (2022) [49].

From the additive properties of Binomial random variables shown in Section 3,

X \sim B i n o m i a l (n, p)

is n-divisible, in the sense that it can be decomposed as the sum of n independent and identically distributed

B_{j} \sim B e r n o u l l i (p), j = 1, \dots, n,

random variables; and

X \sim N e g a t i v e B i n o m i a l (k, p)

is k-divisible, in the sense that it can be decomposed as the sum of k independent and identically distributed

W_{j} \sim G e o m e t r i c (p), j = 1, \dots, k,

random variables. On the other hand

X \sim P o i s s o n (μ)

is infinitely divisible, since for any

n = 2, 3, \dots

, it can be decomposed as the sum of n independent and identically distributed

W_{k} \sim P o i s s o n (μ / n)

random variables.

Moreover, from Raikov’s theorem [50], if

X \sim P o i s s o n (μ)

is decomposed as the sum of two independent random variables,

X = Y + Z

, then Y and Z are also Poisson random variables. This means that the Poisson random variables are extreme points of the convex set of infinitely divisible random variables. A similar result with the Gaussian random variables, conjectured by Lévy [51] and proved by Cramér [52], shows that they are also extreme points of the set of infinitely divisible random variables. From this, Johansen (1966) [53]) supplied a structural proof of the integral representations of infinitely divisible characteristics functions. In fact, the Poisson random variables are the building blocks of infinitely divisible random variables (de Finetti (1929, 1932) [54,55], which is a strong modelling asset, since many observed phenomena are the result of several contributing effects.

Author Contributions

Conceptualization, S.M., A.A.O., D.P. and M.L.R.; methodology, S.M. and D.P.; investigation, S.M., A.A.O., D.P. and M.L.R.; writing—original draft preparation, S.M. and D.P.; writing—review and editing, S.M., A.A.O., D.P. and M.L.R.; supervision, S.M. and D.P.; project administration, S.M. and D.P.; funding acquisition, S.M., D.P. and M.L.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly financed by national funds through FCT—Fundação para a Ciência e a Tecnologia, Portugal, under the project UIDB/00006/2020 (https://doi.org/10.54499/UIDB/00006/2020, accessed on 1 August 2024) and the research grant UIDB/00685/2020 of the BCEEAplA—Centre of Applied Economics Studies of the Atlantic, School of Business and Economics of the University of the Azores.

Acknowledgments

The authors thank the anonymous referees for inspiring comments and suggestions that contributed to improve this entry.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the writing of the manuscript; or in the decision to publish the results.

References

Gani, J. The Craft of Probabilistic Modelling: A Collection of Personal Accounts; Springer: New York, NY, USA, 1986. [Google Scholar] [CrossRef]
Katz, L. Characteristics of Frequency Functions Defined by First Order Difference Equations. Ph.D. Thesis, University of Michigan, Ann Arbor, MI, USA, 1945. [Google Scholar]
Panjer, H.H. Recursive evaluation of a family of compound distributions. ASTIN Bull. 1981, 12, 22–26. [Google Scholar] [CrossRef]
Hess, K.T.; Liewald, A.; Schmidt, K.D. An extension of Panjer’s recursion. ASTIN Bull. 2002, 32, 283–297. [Google Scholar] [CrossRef]
Kemp, A.W. A wide class of discrete distributions and the associated differential equations. Sankhyā Indian J. Stat. Ser. A 1968, 30, 401–410. [Google Scholar]
Zipf, G.K. The Unity of Nature, Least-Action, and Natural Social Science. Sociometry 1942, 5, 48–62. [Google Scholar] [CrossRef]
Bernoulli, J. The Art of Conjecturing, Together with Letter to a Friend on Sets in Court Tennis; John Hopkins University Press: Baltimore, MD, USA, 2005. [Google Scholar]
Keynes, J.M. A Treatise on Probability; Macmillan: London, UK, 1921. [Google Scholar]
Laplace, P.S. Essai Philosophique sur les Probabilités; Courcier: Paris, France, 1814. [Google Scholar]
Pólya, G. Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das Momentenproblem. Math. Z. 1920, 8, 171–181. [Google Scholar] [CrossRef]
De Moivre, A. The Doctrine of Chances; Chelsea: New York, NY, USA, 2000. [Google Scholar]
Poisson, S.D. Probabilité des Jugements en Matière Criminelle et en Matière Civile, Précédés des Règles Générales du Calcul des Probabilités; Bachelier: Paris, France, 1837. [Google Scholar]
McCabe, B.P.; Skeels, C.L. Distributions you can count on …But what’s the point? Econometrics 2020, 8, 9. [Google Scholar] [CrossRef]
Di Noia, A.; Marcheselli, M.; Pisani, C.; Pratelli, L. A family of consistent normally distributed tests for Poissonity. AStA Adv. Stat. Anal. 2024, 108, 209–223. [Google Scholar] [CrossRef]
Mijburgh, P.A.; Visagie, I.J.H. An overview of goodness-of-fit tests for the Poisson distribution. S. Afr. Stat. J. 2020, 54, 207–230. [Google Scholar] [CrossRef]
Morris, C.N. Natural exponential families with quadratic variance function. Ann. Stat. 1982, 10, 65–80. [Google Scholar] [CrossRef]
Morris, C.N. Natural Exponential Families with Quadratic Variance Functions: Statistical Theory. Ann. Stat. 1983, 11, 515–529. [Google Scholar] [CrossRef]
Box, G.E.P. Science and Statistics. J. Am. Stat. Assoc. 1976, 71, 791–799. [Google Scholar] [CrossRef]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Seber, G.; Schofield, M. Capture-Recapture: Parameter Estimation for Open Animal Populations; Springer: Cham, Switzerland, 2019. [Google Scholar]
Pestana, D.; Velosa, S. Extensions of Katz-Panjer families of discrete distributions. REVSTAT—Stat. J. 2004, 2, 145–162. [Google Scholar] [CrossRef]
Katz, L. Unified treatment of a broad class of discrete probability distributions. In Classical and Contagious Discrete Distributions; Patil, G.P., Ed.; Pergamon: Oxford, UK, 1965; pp. 175–182. [Google Scholar]
Rólski, T.; Schmidli, H.; Schmidt, V.; Teugels, J. Stochastic Processes for Insurance and Finance; Wiley: New York, NY, USA, 1999. [Google Scholar]
Klugman, S.A.; Panjer, H.H.; Willmot, G.F. Loss Models. From Data to Decisions; Wiley: New York, NY, USA, 1998. [Google Scholar]
Sundt, B.; Jewell, W. Further results on recursive evaluation of compound distributions. ASTIN Bull. 1981, 12, 27–39. [Google Scholar] [CrossRef]
Willmot, G.E. Sundt and Jewell’s family of discrete distributions. ASTIN Bull. 1988, 18, 17–29. [Google Scholar] [CrossRef]
Sundt, B. On some extensions of Panjer’s class of counting distributions. ASTIN Bull. 1992, 22, 61–80. [Google Scholar] [CrossRef]
Fisher, R.A. A theoretical distribution for the apparent abundance of different species. J. Anim. Ecol. 1943, 12, 54–57. [Google Scholar]
Engen, S. On species frequency models. Biometrika 1974, 61, 263–270. [Google Scholar] [CrossRef]
Fackler, M. Panjer class revisited: One formula for the distributions of the Panjer (a, b, n) class. Ann. Actuar. Sci. 2023, 17, 145–169. [Google Scholar] [CrossRef]
Tzaninis, S.M.; Bozikas, A. Extensions of Panjer’s recursion for mixed compound distributions. arXiv 2024, arXiv:2406.17726. [Google Scholar]
Ong, S.H.; Sim, S.Z.; Liu, S.; Srivastava, H.M. A family of finite mixture distributions for modelling dispersion in count data. Stats 2023, 6, 942–955. [Google Scholar] [CrossRef]
Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar] [CrossRef]
Patil, G.P.; Boswell, M.T.; Joshi, S.W.; Ratnaparkhi, M.V. Dictionary and Classified Bibliography of Statistical Distributions in Scientific Work; International Cooperative Publishing House: Fairland, MD, USA, 1884. [Google Scholar]
Marques, T.A.; Pestana, D.; Velosa, S. Count data models in Biometry and randomness patterns in birds extra-pair paternity. Biom. Lett. 2005, 42, 81–112. [Google Scholar]
Brilhante, M.F.; Pestana, P. BetaBoop function, BetaBoop random variables and extremal population growth. In International Encyclopedia of Statistical Science, 2nd ed.; Lovric, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products, 7th ed.; Elsevier/Academic Press: Amsterdam, The Netherlands, 2007. [Google Scholar]
Chakraborty, S. Generating discrete analogues of continuous probability distributions—A survey of methods and constructions. J. Stat. Distrib. Appl. 2015, 2, 6. [Google Scholar] [CrossRef]
Kosambi, D.D. Characteristic properties of series distributions. Proc. Natl. Inst. Sci. India 1949, 15, 109–113. [Google Scholar]
Kapur, J.N. Maximum-Entropy Models in Science and Engineering; Wiley Eastern Limited: New Delhi, India, 1989. [Google Scholar]
Steutel, F.W.; van Harn, K. Discrete analogues of self-decomposability and stability. Ann. Probab. 1979, 7, 893–899. [Google Scholar] [CrossRef]
Hall, A.O. Maximum term of a particular autoregressive sequence with discrete margins. Commun. Stat. Theory Methods 1996, 25, 721–736. [Google Scholar] [CrossRef]
Zhang, H.; Tan, K.; Li, B. COM-negative binomial distribution: Modeling overdispersion and ultrahigh zero-inflated count data. Front. Math. China 2018, 13, 967–998. [Google Scholar] [CrossRef]
Cahoy, D.; Di Nardo, E.; Polito, F. Flexible models for overdispersed and underdispersed count data. Stat. Pap. 2021, 62, 2969–2990. [Google Scholar] [CrossRef]
Huang, A. On arbitrarily underdispersed discrete distributions. Am. Stat. 2022, 77, 29–34. [Google Scholar] [CrossRef]
Seck, N.K.G.; Ngom, A.; Noba, K. Modelling underdispersed count data: Relative performance of Poisson model and its alternatives. Afr. J. Math. Stat. Stud. 2022, 5, 16–32. [Google Scholar]
Rana, S.; Al Mamun, A.S.M.; Rahman, F.A.; Elgohari, H. Outliers as a source of overdispersion in Poisson regression modelling: Evidence from simulation and real data. Int. J. Stat. Sci. 2023, 23, 31–37. [Google Scholar] [CrossRef]
Sengupta, D.; Roy, S. Modelling zero inflated and under-reported count data. J. Stat. Comput. Simul. 2023, 93, 29–34. [Google Scholar] [CrossRef]
Aswi, A.; Astuti, S.A.; Sudarmin, S. Evaluating the performance of zero-inflated and hurdle Poisson models for modelling overdispersion in count data. Inferensi 2022, 5, 17–22. [Google Scholar] [CrossRef]
Raikov, D. On the decomposition of Poisson laws. Dokl. Acad. Sci. URSS 1937, 14, 9–11. [Google Scholar]
Lévy, P. Propriétés asymptotiques des sommes de variables aléatoires indépendantes ou enchaînées. J. Math. Pures Appl. 1935, 14, 347–402. [Google Scholar]
Cramér, H. Über eine Eigenschaft der normalen Verteilungsfunktion. Math. Z. 1936, 41, 405–414. [Google Scholar] [CrossRef]
Johansen, S. An application of extreme point methods to the representation of infinitely divisible distributions. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1966, 5, 304–316. [Google Scholar] [CrossRef]
De Finetti, B. Sulle funzioni ad incremento aleatorio. Rend. Accad. Naz. Lincei 1929, X, 163–168. [Google Scholar]
De Finetti, B. Funzione caratteristica di un fenomeno aleatorio. In Proceedings of the Atti del Congresso Internazionale dei Matematici, Bologna, Italy, 3–10 September 1928; Volume VI; pp. 179–190. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mendonça, S.; Oliveira, A.A.; Pestana, D.; Rocha, M.L. Count Random Variables. Encyclopedia 2024, 4, 1367-1380. https://doi.org/10.3390/encyclopedia4030089

AMA Style

Mendonça S, Oliveira AA, Pestana D, Rocha ML. Count Random Variables. Encyclopedia. 2024; 4(3):1367-1380. https://doi.org/10.3390/encyclopedia4030089

Chicago/Turabian Style

Mendonça, Sandra, António Alberto Oliveira, Dinis Pestana, and Maria Luísa Rocha. 2024. "Count Random Variables" Encyclopedia 4, no. 3: 1367-1380. https://doi.org/10.3390/encyclopedia4030089

APA Style

Mendonça, S., Oliveira, A. A., Pestana, D., & Rocha, M. L. (2024). Count Random Variables. Encyclopedia, 4(3), 1367-1380. https://doi.org/10.3390/encyclopedia4030089

Article Menu

Count Random Variables

Definition

1. Introduction

2. Bernoulli Random Variables, Principle of Insufficient Reason and Discrete Uniform Random Variables

3. Count Random Variables in Bernoulli Trials

4. Katz Count Models and Extensions

5. Other Families of Count Models

5.1. Power Series Distributions

5.2. Kemp’s Generalised Hypergeometric Probability Distributions

6. Uniformity and Power Law Randomness, a Striking Example of Opposite Patterns

7. Hierarchical Models and Parameter Randomisation

8. Discrete Analogues of Continuous Random Variables

9. Characterisations of Count Models

10. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI