Lagrangian Zero Truncated Poisson Distribution: Properties Regression Model and Applications

Irshad, Muhammed Rasheed; Chesneau, Christophe; Shibu, Damodaran Santhamani; Monisha, Mohanan; Maya, Radhakumari

doi:10.3390/sym14091775

Open AccessArticle

Lagrangian Zero Truncated Poisson Distribution: Properties Regression Model and Applications

by

Muhammed Rasheed Irshad

¹

,

Christophe Chesneau

^2,*,

Damodaran Santhamani Shibu

³,

Mohanan Monisha

³ and

Radhakumari Maya

⁴

¹

Department of Statistics, Cochin University of Science and Technology, Cochin 682 022, Kerala, India

²

Department of Mathematics, Université de Caen Basse-Normandie, LMNO, UFR de Sciences, F-14032 Caen, France

³

Department of Statistics, University College, Thiruvananthapuram 695 034, Kerala, India

⁴

Department of Statistics, Government College for Women, Thiruvananthapuram 695 014, Kerala, India

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(9), 1775; https://doi.org/10.3390/sym14091775

Submission received: 3 May 2022 / Revised: 23 May 2022 / Accepted: 21 August 2022 / Published: 25 August 2022

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we construct a new Lagrangian discrete distribution, named the Lagrangian zero truncated Poisson distribution (LZTPD). It can be presented as a generalization of the zero truncated Poissson distribution (ZTPD) and an alternative to the intervened Poisson distribution (IPD), which was elaborated for modelling both over-dispersed and under-dispersed count datasets. The mathematical aspects of the LZTPD are thoroughly investigated, and its connection to other discrete distributions is crucially observed. Further, we define a finite mixture of LZTPDs and establish its identifiability condition along with some distributional aspects. Statistical work is then performed. The maximum likelihood and method of moment approaches are used to estimate the unknown parameters of the LZTPD. Simulation studies are also undertaken as an assessment of the long-term performance of the estimates. The significance of one additional parameter in the LZTPD is tested using a generalized likelihood ratio test. Moreover, we propose a new count regression model named the Lagrangian zero truncated Poisson regression model (LZTPRM) and its parameters are estimated by the maximum likelihood estimation method. Two real-world datasets are considered to demonstrate the LZTPD’s real-world applicability, and healthcare data are analyzed to demonstrate the LZTPRM’s superiority.

Keywords:

Lagrangian zero truncated Poisson distribution; intervened Poisson distribution; index of dispersion; regression; maximum likelihood estimation; generalized likelihood ratio test; simulation

1. Introduction

In several cases, researchers are not capable of perceiving the unabridged distribution of counts; in particular, the zeros are not often observed, which indicates that zero truncation is found to be an important and common characteristic for various count data processes. With this in mind, ref. [1] employed the zero truncated Poisson distribution (ZTPD) to interpret a chance mechanism whose experimental device becomes active only when at least one event occurs. Ref. [2] discussed numerical examples to demonstrate the statistical applications of the ZTPD in such situations. An alternative to the ZTPD was proposed by [3], the so-called intervened Poisson distribution (IPD), to deal with the real-life situation of a manager in a supermarket who provided extra assistance to the customers at a service counter. An attraction of the IPD over the ZTPD is that it gives information on the effectiveness of intervention in the situation. Ref. [4] applied the IPD in the fields of reliability analysis, queuing problems, and epidemiological studies, etc. Ref. [5] considered a modified version of the IPD which has an advantage over the IPD in stretching the probability in all directions so that clustering of probabilities at initial values of the operating mechanism is overlooked. Ref. [6] illustrated an alternative to the IPD for prevalence reduction. However, the IPD proposed by [3] has a restriction that the variance should be less than the mean. This is referred to as ‘under-dispersion’ in the literature, and this phenomenon is only observed on rare occasions. To solve this limitation, we propose a new ZTPD based on a Lagrangian approach, dubbed the Lagrangian zero truncated Poisson distribution (LZTPD), that can model both under-dispersed and over-dispersed (variance greater than mean) count datasets. More information on the Lagrangian distributional approach is given below.

First, the Lagrangian family (LF) of distributions was derived from the Lagrangian expansion, which was first introduced by [7]. Later, refs. [8,9] proposed a discrete LF (DLF), which itself forms a very large and important class containing numerous families of probability distributions. For example, the Lagrangian negative binomial distribution, obtained by [10], shows its usefulness in a queuing process. The Lagranian Katz family was developed by [11]. Ref. [12] considered the applications of Lagrangian probability distributions to inferential problems in a random mapping theory. Ref. [13] derived the generalized Poisson gamma dependence model from Lagrangian probability models. Recently, ref. [14] applied the Lagrangian probability density function models for collisional turbulent fluid particle flows. Furthermore, ref. [9] proved that all discrete Lagrangian distributions converge to the normal distribution and to the inverse Gaussian distribution under certain conditions. Thus, we propose the LZTPD in this article, motivated by the adaptability of the Lagrangian distributions and the need to propose a flexible model capable of modelling versatile count datasets.

On the other hand, the regression model for count data is gaining more and more attention these days. The use of regression models to describe count data is relatively recent, as detailed in [15]. However, in some real-world situations, the system will only be engaged if at least one event occurs. Examples include the number of international conflicts, daily accidents, industrial injuries, etc. In many circumstances, counting outcomes directly with a normal linear regression model will result in inefficient, inconsistent, and biased estimation, as described in [16]. Positive count data are analyzed using the zero truncated Poisson regression model (ZTPRM), which is more accurate than the traditional Poisson regression model for this type of data. Ref. [17] has discussed the application of Poisson regression models to the analysis of truncated samples of count data. Recently, ref. [18] developed the intervened Poisson regression model (IPRM), which is an alternative to the ZTPRM. In this paper, we offer an alternative regression model to both the ZTPRM and IPRM, the so-called Lagrangian zero truncated Poisson regression model (LZTPRM). The LZTPD and the LZTPRM are motivated by their suitability for both under-dispersed and over-dispersed count datasets, as well as their applicability in situations where the modeled data excludes zero-counts.

The remaining sections of the presented study can be summarized as follows: A brief introduction to the Lagrangian expansion and DLF are given in Section 2. The LZTPD along with its statistical properties are discussed in Section 3. In Section 4, we propose a mixture LZTPD and present the identifiability conditions of finite mixtures of LZTPDs. The maximum likelihood (ML) estimation method used to investigate the parameter estimation of the LZTPD is discussed in Section 5. In Section 6, we test the significance of an additional parameter of the LZTPD using a generalized likelihood ratio test. The simulation results of the considered estimation method are presented in Section 7. The LZTPRM is elucidated in Section 8. Empirical illustrations of the proposed LZTPD and LZTPRM are given in Section 9. Discussions and conclusions are given in Section 10 and Section 11, respectively.

2. Some Preliminaries

2.1. Basics on the Discrete Lagrangian Family

In order to introduce the LZTPD, some mathematical background on the Lagrangian expansion and DLF must be recalled. Let

f_{1} (z)

and

f_{2} (z)

be two analytic and successively differentiable functions defined on the interval

[- 1, 1]

such that

f_{1} (1) = f_{2} (1) = 1

,

f_{1} (0) \neq 0

, and

f_{2} (0) \geq 0

. The following power series expansion was obtained by inverting the Lagrange transformation

u = \frac{z}{f_{1} (z)}

, which provided the value of z as a power series in u:

\frac{f_{2} (z)}{1 - \frac{z f_{1}^{^{'}} (z)}{f_{1} (z)}} = \sum_{j = 0}^{\infty} b_{j} u^{j},

(1)

where

b_{0} = f_{2} (0)

and

b_{j} = \frac{1}{j!} D^{j} [{(f_{1} (z))}^{j} f_{2} (z)] |_{z = 0}

, with

D^{j}

=

\frac{\partial^{j}}{\partial z^{j}}

and

f_{1}^{'} (z) = \frac{\partial f_{1} (z)}{\partial z}

.

The details can be found in [19].

Furthermore, if

0 < f_{1}^{'} (1) < 1 and D^{j} [{(f_{1} (z))}^{j} f_{2} (z)] |_{z = 0} \geq 0, j \geq 0,

(2)

the Lagrangian expansion (1) defines the DLF. Thus, a random variable (rv) Y belonging to the DLF has the following pmf:

P (Y = y) = (1 - f_{1}^{'} (1)) \frac{D^{y} [{(f_{1} (z))}^{y} f_{2} (z)]}{y!} |_{z = 0}, y = 0, 1, 2, \dots

(3)

See [19,20] for more information. The corresponding probability generating function (pgf) is given by

G (u) = \frac{(1 - f_{1}^{'} (z)) f_{2} (z)}{1 - \frac{z f_{1}^{'} (z)}{f_{1} (z)}},

(4)

where

z = u f_{1} (z) .

2.2. Importance of the Lagrangian Family

In the following, we list some results highlighting the choice of the following parametric exponential function:

\begin{matrix} f_{1} (z) = e^{λ (z - 1)}, \end{matrix}

(5)

with

0 < λ < 1

, into the DLF definition, which has generated several distributions of importance.

Proposition 1.

The distribution of the DLF defined with

f_{1} (z)

as in (5) and

f_{2} (z) = \frac{(e^{z θ} - 1) (1 - λ z)}{(e^{θ} - 1) (1 - λ)}

corresponds to the zero truncated generalized Poisson distribution (ZTGPD) defined by [21].

Proof.

Based on (3), the pmf of the considered distribution is obtained as

\begin{matrix} h_{1} (y) & = \frac{1 - λ}{y!} D^{y} [e^{λ y (z - 1)} \frac{(e^{z θ} - 1) (1 - λ z)}{(e^{θ} - 1) (1 - λ)}] |_{z = 0} \\ = \frac{θ {(θ + λ y)}^{y - 1} e^{- θ - λ y}}{(1 - e^{- θ}) y!}, y = 1, 2, 3, \dots, \end{matrix}

which corresponds to the pmf of the ZTGPD. Hence, the result. □

Proposition 2.

The distribution of the DLF defined with

f_{1} (z)

as in (5) and

f_{2} (z) = e^{θ (z - 1)}

corresponds to the Lagrangian type linear function Poisson distribution given by [22].

Proof.

Based on (3), the pmf of the considered distribution is obtained as

\begin{matrix} h_{2} (y) & = \frac{1 - λ}{y!} D^{y} [e^{λ y (z - 1)} e^{θ (z - 1)}] |_{z = 0} \\ = \frac{1 - λ}{y!} {(θ + λ y)}^{y} e^{- θ - λ y}, y = 0, 1, 2, \dots, \end{matrix}

which is the pmf of the Lagrangian type linear function Poisson distribution. The distribution correspondence is proved. □

Proposition 3.

The distribution of the DLF defined with

f_{1} (z)

as in (5) and

f_{2} (z) = z

corresponds to the Sudha Lagrangian distribution given in [23].

Proof.

Based on (3), the pmf of the considered distribution is obtained as

\begin{matrix} h_{3} (y) & = \frac{1 - λ}{y!} D^{y} [e^{λ y (z - 1)} z] |_{z = 0} \\ = (1 - λ) \frac{e^{- λ y} {(λ y)}^{y - 1}}{(y - 1)!}, y = 1, 2, 3, \dots, \end{matrix}

which is the Sudha Lagrangian distribution. Hence, the result. □

Proposition 4.

The distribution of the DLF defined with

f_{1} (z)

as in (5) and

f_{2} (z) = z^{n}

corresponds to the Lagrangian type weighted delta Poisson distribution given in [23].

Proof.

Based on (3), the pmf of the considered distribution is indicated as

\begin{matrix} h_{4} (y) & = \frac{1 - λ}{y!} D^{y} [e^{λ y (z - 1)} z^{n}] |_{z = 0} \\ = (1 - λ) \frac{e^{- λ y} {(λ y)}^{y - n}}{(y - n)!}, y = n, n + 1, \dots, \end{matrix}

which is the pmf of the Lagrangian weighted delta Poisson distribution. The desired result follows. □

In view of the applications of the DLF configured with the function

f_{1} (z)

in (5), it is motivating to explore a new horizon of distribution with the choice of a new function

f_{2} (z)

. The new distribution of the study presented below is based on this idea.

3. Lagrangian Zero Truncated Poisson Distribution (LZTPD)

In this section, based on the DLF, we explicitly define the LZTPD. We also examine its properties, such as median, mode, pgf, moment generating function (mgf), factorial moments, index of dispersion (

I O D

), coefficient of variation (

C V

) and hazard rate function (hrf), etc. Several propositions are made here to discuss the connections between the LZTPD and certain other Lagrangian distributions.

Definition 1.

The LZTPD is the special distribution of the DLF under the following original configuration:

f_{1} (z) = e^{λ (z - 1)}

and

f_{2} (z) = \frac{e^{z θ} - 1}{e^{θ} - 1}

. Then, a rv Y is said to follow the LZTPD, if its pmf has the following form:

h (y) = \frac{(1 - λ) e^{- θ - λ y} [{(θ + λ y)}^{y} - {(λ y)}^{y}]}{(1 - e^{- θ}) y!}, y = 1, 2, 3, \dots,

(6)

where

θ > 0

and

0 < λ < 1

.

Proof.

First, note that

f_{1} (z)

and

f_{2} (z)

satisfy the conditions in (2). Then the pmf given in (3) can be derived as

\begin{matrix} P (Y = y) & = \frac{1 - λ}{y!} D^{y} [e^{λ y (z - 1)} \frac{e^{z θ} - 1}{e^{θ} - 1}] |_{z = 0} \\ = \frac{(1 - λ) e^{- θ - λ y} [{(θ + λ y)}^{y} - {(λ y)}^{y}]}{(1 - e^{- θ}) y!} . \end{matrix}

Hence, the proof. □

A distribution with the pmf given in (6) will be denoted as LZTPD(

θ, λ

). For

λ \to 0

, the LZTPD(

θ, λ

) reduces to the ZTPD (see [24]). As a result, we can say that the LZTPD(

θ, λ

) is a generalization of the ZTPD. Figure 1 and Figure 2 display the graphical representation of the pmf of the LZTPD for different parameter values of

θ

and

λ

.

The hrf of the LZTPD is obtained by substituting the pmf in the following equation

\begin{matrix} h_{y} = P (Y = y | Y \geq y) = \frac{h (y)}{\sum_{j = y}^{\infty} h (j)} . \end{matrix}

(7)

From (7), it is clear that determining the closed form expression of the hrf is more intricate. We have drawn the graph of the hrf to determine its possible shapes.

Figure 3 demonstrates that the LZTPD has an increasing hrf.

Proposition 5.

Let Y be a rv following the LZTPD. Then the median of Y is defined by the smaller integer m in

{1, 2, \dots}

such that

\sum_{y = 1}^{m} \frac{e^{- λ y} ({(θ + λ y)}^{y} - {(λ y)}^{y})}{y!} \geq \frac{e^{θ} - 1}{2 (1 - λ)} .

(8)

Proof.

By the definition, m is the smaller integer in

{1, 2, \dots}

such that

P (Y \leq m) \geq \frac{1}{2}

, which is equivalent to the desired result. □

Proposition 6.

Let Y be a rv following the LZTPD. Then, the mode of Y, denoted by

y_{m}

, exists in

{1, 2, \dots}

, and lies in the case:

\frac{ϱ (y_{m} + 1)}{ϱ (y_{m})} - e^{λ} \leq y_{m} e^{λ} \leq \frac{ϱ (y_{m})}{ϱ (y_{m} - 1)},

(9)

where

ϱ (y_{m}) = {(θ + λ y_{m})}^{y_{m}} - {(λ y_{m})}^{y_{m}}

.

Proof.

By the definition of the mode, we must find the integer

y = y_{m}

for which

h (y)

has the greatest value. That is, we aim to solve

h (y) \geq h (y - 1)

and

h (y) \geq h (y + 1)

. First, note that

h (y)

can also be written as:

h (y) = \frac{1 - λ}{e^{θ} - 1} \frac{e^{- λ y} ϱ (y)}{y!},

(10)

where

ϱ (y) = {(θ + λ y)}^{y} - {(λ y)}^{y}

.

Obviously,

h (y) \geq h (y - 1)

implies that

\frac{ϱ (y)}{ϱ (y - 1)} \geq y e^{λ} .

(11)

Also,

h (y) \geq h (y + 1)

implies that

\frac{ϱ (y + 1)}{ϱ (y)} \leq (y + 1) e^{λ} .

(12)

By combining (11) and (12), we get (9), hence, the proof. □

Proposition 7.

The LZTPD(

θ, λ

) is a member of the modified power series family of distributions defined by [25].

Proof.

According to [25], the pmf of the modified power series distribution (MPSD) is given by

h_{5} (y) = \frac{r_{y} {[Ψ (θ)]}^{y}}{κ (θ)}, y \in G \subseteq N,

where N is the set of non-negative integers, G is a subset of N,

r_{y} \geq 0

for all

y \in N

, and

κ (θ) = \sum_{y \in G} r_{y} {[Ψ (θ)]}^{y}

can be viewed as a normalization constant. By its basic definition, the pmf of the LZTPD in (6) satisfies

\sum_{y = 1}^{\infty} h (y) = 1,

which implies that

\sum_{y = 1}^{\infty} \frac{e^{- λ y} [{(θ + λ y)}^{y} - {(λ y)}^{y}]}{y!} = \frac{e^{θ} - 1}{1 - λ} .

Also, we have

\begin{matrix} \frac{e^{θ} - 1}{1 - λ} & = \sum_{y = 1}^{\infty} \frac{{(1 + y)}^{y - 1} {(θ e^{- θ})}^{y}}{(1 - λ) y!} = \sum_{y = 1}^{\infty} r_{y} {[Ψ (θ)]}^{y}, \end{matrix}

where

r_{y} = \frac{{(1 + y)}^{y - 1}}{(1 - λ) y!}, Ψ (θ) = θ e^{- θ},

and

κ (θ) = \sum_{y = 1}^{\infty} r_{y} {[Ψ (θ)]}^{y}

.

Hence, the pmf of the LZTPD given in (6) can be expressed under the following form:

h (y) = \frac{r_{y} {[Ψ (θ)]}^{y}}{κ (θ)} .

This completes the proof. □

Proposition 8.

The pgf of a rv Y following the LZTPD(

θ, λ

) is expressed as

G (u) = E (u^{Y}) = \frac{(1 - λ) (e^{z θ} - 1)}{(e^{θ} - 1) (1 - λ z)},

(13)

where we recall that z and u are related by the following equation:

z = u e^{λ (z - 1)}

.

Proof.

Based on (4), we directly obtain

\begin{matrix} G (u) & = \frac{(1 - f_{1}^{'} (1)) f_{2} (z)}{1 - \frac{z f_{1}^{'} (z)}{f_{2} (z)}} = \frac{(1 - λ) (e^{z θ} - 1)}{(e^{θ} - 1) (1 - λ z)} . \end{matrix}

Thus, the proof is obtained. □

Corollary 1.

The mgf of a rv Y following the LZTPD(

θ, λ

) is obtained by putting

z = e^{s}

and

u = e^{k}

in (13), and we get

M (k) = E (e^{k Y}) = \frac{(1 - λ) (e^{θ e^{s}} - 1)}{(e^{θ} - 1) (1 - λ e^{s})},

where

s = k + λ (e^{s} - 1)

.

Corollary 2.

The cumulant generating function (cgf) of a rv Y following the LZTPD(

θ, λ

) given in (6) becomes

C (k) = log [M_{Y} (k)] = log [\frac{(1 - λ) (e^{θ e^{s}} - 1)}{(e^{θ} - 1) (1 - λ e^{s})}],

where

s = k + λ (e^{s} - 1) .

Proposition 9.

Let

Y_{1}, Y_{2}, \dots, Y_{n}

be n independently and identically distributed (iid) rvs following the LZTPD(

θ, λ

). Then the distribution of the sample sum

V = \sum_{k = 1}^{n} Y_{i}

has the following pgf:

\begin{matrix} Ψ (u) = \frac{{(1 - λ)}^{n} {(e^{z θ} - 1)}^{n}}{{(e^{θ} - 1)}^{n} {(1 - λ z)}^{n}}, \end{matrix}

where

z = u e^{λ (z - 1)}

.

Proof.

Based on the pgf of the LZTPD given in (13), the pgf of the rv V becomes

\begin{matrix} Ψ (u) & = E (u^{V}) = E (u^{Y_{1} + Y_{2} + \dots + Y_{n}}) = \prod_{k = 1}^{n} E (u^{Y_{k}}) = \prod_{k = 1}^{n} G (u) = {[G (u)]}^{n} \\ = \frac{{(1 - λ)}^{n} {(e^{z θ} - 1)}^{n}}{{(e^{θ} - 1)}^{n} {(1 - λ z)}^{n}} . \end{matrix}

This completes the proof. □

Proposition 10.

For any integer

r \geq 1

, the rth factorial moment of a rv Y following the LZTPD(

θ, λ

) is given by

\begin{matrix} μ_{[r]} = E [Y (Y - 1) \dots (Y - r + 1)] \\ = \{{(e^{θ} - 1)}^{- 1} D_{r} (e^{z θ}) + λ \frac{\sum_{i = 1}^{r} (r - i + 1) μ_{[r - i]} D_{i} (u e^{λ (z - 1)})}{1 - λ}\} |_{u = z = 1}, \end{matrix}

(14)

where

z = u e^{λ (z - 1)}

.

Proof.

By its definition, it is obtained by successively differentiating

G (u)

given in (4) in r times with respect to u and by putting

u = z = 1

. Thus, it is given by

G (u) = \frac{(1 - f_{1}^{'} (1)) f_{2} (z)}{1 - u f_{1}^{'} (z)},

implying that

(1 - u f_{1}^{'} (z)) G (u) = (1 - f_{1}^{'} (1)) f_{2} (z) .

Taking the first derivative with respect to u on both sides, we get

G (u) D_{1} (1 - u f_{1}^{'} (z)) + G^{'} (u) (1 - u f_{1}^{'} (z)) = (1 - f_{1}^{'} (1)) D_{1} f_{2} (z) .

(15)

Again, by taking the derivative of (15) with respect to u on both sides, we get

\begin{matrix} G (u) D_{2} (1 - u f_{1}^{'} (z)) + 2 D_{1} (1 - u f_{1}^{'} (z)) G^{'} (u) + (1 - u f_{1}^{'} (z)) G^{″} (u) \\ = (1 - f_{1}^{'} (1)) D_{2} f_{2} (z) . \end{matrix}

Proceeding like this, we get the rth derivative is of the following form:

G^{r} (u) = \frac{(1 - f_{1}^{'} (1)) D_{r} f_{2} (z) - \sum_{i = 1}^{r} (r - i + 1) D_{i} (1 - u f_{1}^{'} (z)) G^{r - i} (u)}{1 - u f_{1}^{'} (z)}

(16)

Substitute

f_{1} (z) = e^{λ (z - 1)}

,

f_{2} (z) = \frac{e^{z θ} - 1}{e^{θ} - 1}

and

z = u = 1

in (16), we get (14).

Thus the proof is obtained. □

Proposition 11.

The mean and variance of a rv Y following the LZTPD(

θ, λ

) are

μ = E (Y) = \frac{λ}{{(1 - λ)}^{2}} + \frac{θ}{(1 - e^{- θ}) (1 - λ)}

and

σ^{2} = Var (Y) = \frac{λ + λ^{2}}{{(1 - λ)}^{4}} + \frac{θ^{2} (1 - λ) + θ}{(1 - e^{- θ}) {(1 - λ)}^{3}} - \frac{θ^{2}}{{(1 - e^{- θ})}^{2} {(1 - λ)}^{2}},

respectively.

Proof.

The first two factorial moments can be obtained by using (14) as follows:

\begin{matrix} E (Y) = μ & = \frac{f_{2}^{'} (1)}{1 - f_{1}^{'} (1)} + \frac{f_{1}^{″} (1) + f_{1}^{'} (1) - {(f_{1}^{'} (1))}^{2}}{{(1 - f_{1}^{'} (1))}^{2}} \\ = \frac{λ}{{(1 - λ)}^{2}} + \frac{θ}{(1 - e^{- θ}) (1 - λ)} \end{matrix}

and

\begin{matrix} E [Y (Y - 1)] & = \frac{f_{2}^{'} (1) + f_{1}^{″} (1) + 4 f_{2}^{'} (1) f_{1}^{'} (1) + 2 {(f_{1}^{'} (1))}^{2}}{{(1 - f_{1}^{'} (1))}^{2}} \\ + \frac{f_{1}^{‴} (1) + f_{1}^{″} (1) + 3 f_{2}^{'} (1) f_{1}^{″} (1) + 5 f_{1}^{'} (1) f_{1}^{″} (1)}{{(1 - f_{1}^{'} (1))}^{3}} \\ + \frac{3 {(f_{1}^{″} (1))}^{2}}{{(1 - f_{1}^{'} (1))}^{4}} . \end{matrix}

Furthermore, we have

\begin{matrix} Var (Y) & = σ^{2} = E [Y (Y - 1)] + E (Y) - {[E (Y)]}^{2} \\ = \frac{f_{2}^{″} (1) + f_{2}^{'} (1) - {(f_{2}^{'} (1))}^{2}}{{(1 - f_{1}^{'} (1))}^{2}} + \frac{(1 + f_{2}^{'} (1)) (f_{1}^{″} (1) + f_{1}^{'} (1) - {(f_{1}^{'} (1))}^{2})}{{(1 - f_{1}^{'} (1))}^{3}} \\ + \frac{f_{1}^{‴} (1) + f_{1}^{'} (1) f_{1}^{″} (1) + 2 f_{1}^{″} (1)}{{(1 - f_{1}^{'} (1))}^{3}} + \frac{2 {(f_{1}^{″} (1))}^{2}}{{(1 - f_{1}^{'} (1))}^{4}} \\ = \frac{λ + λ^{2}}{{(1 - λ)}^{4}} + \frac{θ^{2} (1 - λ) + θ}{(1 - e^{- θ}) {(1 - λ)}^{3}} - \frac{θ^{2}}{{(1 - e^{- θ})}^{2} {(1 - λ)}^{2}}, \end{matrix}

where

f_{1}^{'} (1), f_{1}^{″} (1), f_{1}^{‴} (1), f_{2}^{'} (1), f_{2}^{″} (1)

denote the values of the successive derivatives of

f_{1} (z)

and

f_{2} (z)

, respectively, evaluated at the special value

z = 1

.

Hence, the proof. □

Proposition 12.

The index of dispersion and coefficient of variation of a rv Y following the LZTPD(

θ, λ

) are

I O D = \frac{(λ + λ^{2}) {(1 - e^{- θ})}^{2} + (θ^{2} - λ θ^{2} + θ) (1 - λ) (1 - e^{- θ}) - θ^{2} {(1 - λ)}^{2}}{λ {(1 - e^{- θ})}^{2} {(1 - λ)}^{2} + θ (1 - e^{- θ}) {(1 - λ)}^{3}}

and

C V = \frac{\sqrt{(λ + λ^{2}) {(1 - e^{- θ})}^{2} + (θ^{2} - λ θ^{2} + θ) (1 - λ) (1 - e^{- θ}) - θ^{2} {(1 - λ)}^{2}}}{λ (1 - e^{- θ}) + θ (1 - λ)},

respectively.

Proof.

A normalized measure of dispersion can be obtained by using the variance to mean relationship. This measure is the well-known

I O D

, and it is given by

\begin{matrix} I O D & = \frac{σ^{2}}{μ} \\ = \frac{(λ + λ^{2}) {(1 - e^{- θ})}^{2} + (θ^{2} - λ θ^{2} + θ) (1 - λ) (1 - e^{- θ}) - θ^{2} {(1 - λ)}^{2}}{λ {(1 - e^{- θ})}^{2} {(1 - λ)}^{2} + θ (1 - e^{- θ}) {(1 - λ)}^{3}} . \end{matrix}

Analogously, the

C V

is given by

\begin{matrix} C V & = \frac{\sqrt{σ^{2}}}{μ} \\ = \frac{\sqrt{(λ + λ^{2}) {(1 - e^{- θ})}^{2} + (θ^{2} - λ θ^{2} + θ) (1 - λ) (1 - e^{- θ}) - θ^{2} {(1 - λ)}^{2}}}{λ (1 - e^{- θ}) + θ (1 - λ)} . \end{matrix}

Hence, the proof. □

The coefficients of skewness and kurtosis, respectively, are used to calculate the asymmetry degree and flatness of a distribution. The first is derived by dividing the third central moment by the variance raised to the power of

3 / 2

, and the second is acquired by dividing the fourth central moment by the square of the variance. These coefficients are required to determine the shape of any distribution. The mean, variance, median, mode,

C V

,

I O D

, skewness and kurtosis for selected values of parameters of the LZTPD(

θ, λ

) are summarized in Table 1.

From Table 1, it can be observed that the LZTPD(

θ, λ

) is both under-dispersed, i.e.,

I O D < 1

, and over-dispersed, i.e.,

I O D > 1

. This makes a strong difference with the ZTPD and IPD, defined on a similar mathematical basis.

4. Finite Mixtures of Lagrangian Zero Truncated Poisson Distribution

In recent years, finite mixture models have been given much attention in practical situations. Mixture models are widely used in astronomy, biology, genetics, medicine, psychiatry, marketing, etc. For the details, see [26]. The properties of finite mixtures of the IPD and the modified IPD are discussed by [27]. In this section, we derive finite mixtures based on the LZTPD(

θ, λ

). This mixture model may be consistent with the situation of further interventions.

Let Z be a discrete rv with pmf

h (z) = \sum_{i = 1}^{g} l_{i} h_{i} (z)

, where

i = 1, 2, \dots g

,

l_{i} > 0

such that

\sum_{i = 1}^{g} l_{i} = 1

,

h_{i} (z) \geq 0

and

\sum_{z} h_{i} (z) = 1

. Then, we state Z has a mixture distribution and

h (z)

is a finite mixture of distributions. The parameters

l_{1}, l_{2}, \dots, l_{g}

are known as the mixing weights and

h_{1}, h_{2}, \dots, h_{g}

as the components of the mixture. The collection of all parameters occurring in the components is represented as

Θ

and the complete collection of all parameters in the mixture model is represented as

Ψ

.

Suppose that

Δ = {U (z; θ_{i}) : θ_{i} \in Θ, z \in R}

is the class of pmf’s from which mixtures are to be formed. Then the class of finite mixtures of

Δ

with the appropriate class of pmf’s is

\hat{H} = {H (z) : H (z) = \sum_{i = 1}^{g} l_{i} U (z; θ_{i}), l_{i} > 0, U (z; θ_{i}) \in Δ, i = 1, 2, \dots g}

. In this setting,

\hat{H}

is the convex hull of

Δ

.

Definition 2.

A rv Z is said to have a g component mixture of LZTPDs if it has the pmf

h (z) = P (Z = z)

of the following form:

h (z) = \sum_{i = 1}^{g} l_{i} h_{i} (z), z = 1, 2, \dots,

(17)

where

0 \leq l_{i} \leq 1

,

\sum_{i = 1}^{g} l_{i} = 1

, and for each

i = 1, 2, 3, \dots g

,

h_{i} (z) = \frac{(1 - λ_{i}) e^{- θ_{i} - λ_{i} z} [{(θ_{i} + λ_{i} z)}^{x} - {(λ_{i} z)}^{z}]}{(1 - e^{- θ_{i}}) z!},

with

0 < λ_{i} < 1

and

θ_{i} > 0

.

A distribution with the pmf given in (17) is called the Lagrangian zero truncated Poisson mixture distribution with g components, or in short, LZTPMD_g.

The following theorem from [28] is adopted to construct the identifiability condition of the finite mixture model:

Theorem 1.

A necessary and sufficient condition for

\hat{H}

to be identifiable is that Δ is linearly independent over the field of real numbers.

Proof.

Proof is given in [28] and hence omitted. □

We are now able to present the identifiability conditions of the LZTPMD_g.

Theorem 2.

The identifiability conditions for the LZTPMD_g with the pmf

h (z)

as given in (17) are

θ_{i} \neq θ_{j}

,

λ_{i} \neq λ_{j}

for

i, j \in {1, 2, \dots, g}

such that

i \neq j

.

Proof.

Take

g = 2

and consider the equation

d_{1} H_{1} (z) + d_{2} H_{2} (z) = 0,

(18)

where

d_{1}

and

d_{2}

are any two arbitrary real numbers,

H_{1} (z) = \sum_{j = 1}^{z} h (j)

and

H_{2} (z) = \sum_{j = 1}^{z} ϕ (j)

for

z = 1, 2, \dots,

in which

ϕ (j)

is obtained from

h (j)

by replacing

θ_{j}

by

δ_{j}

and

λ_{j}

by

γ_{j}

.

Assume that,

θ_{i} \neq θ_{j}

,

λ_{i} \neq λ_{j}

for

i, j \in (1, 2)

such that

i \neq j

,

θ_{i} \neq θ_{j}

and

λ_{i} \neq λ_{j}

. Thus, for

l_{1} = l

, we have

\begin{matrix} H_{1} (z) = \sum_{j = 1}^{z} & {l \frac{(1 - λ_{1}) e^{- θ_{1} - λ_{1} j} [{(θ_{1} + λ_{1} j)}^{j} - {(λ_{1} j)}^{j}]}{(1 - e^{- θ_{1}}) j!} \\ + (1 - l) \frac{(1 - λ_{2}) e^{- θ_{2} - λ_{2} j} [{(θ_{2} + λ_{2} j)}^{j} - {(λ_{2} j)}^{j}]}{(1 - e^{- θ_{2}}) j!}} \end{matrix}

(19)

and

\begin{matrix} H_{2} (z) = \sum_{j = 1}^{z} & {l \frac{(1 - γ_{1}) e^{- δ_{1} - γ_{1} j} [{(δ_{1} + γ_{1} j)}^{j} - {(γ_{1} j)}^{j}]}{(1 - e^{- δ_{1}}) j!} \\ + (1 - l) \frac{(1 - γ_{2}) e^{- δ_{2} - γ_{2} j} [{(δ_{2} + γ_{2} j)}^{j} - {(γ_{2} j)}^{j}]}{(1 - e^{- δ_{2}}) j!}} . \end{matrix}

(20)

Now, from (18)–(20), we obtain the following equations:

\begin{matrix} d_{1} \sum_{j = 1}^{z} \frac{(1 - λ_{1}) e^{- θ_{1} - λ_{1} j} [{(θ_{1} + λ_{1} j)}^{j} - {(λ_{1} j)}^{j}]}{(1 - e^{- θ_{1}}) j!} \\ + d_{2} \sum_{j = 1}^{z} \frac{(1 - γ_{1}) e^{- δ_{1} - γ_{1} j} [{(δ_{1} + γ_{1} j)}^{j} - {(γ_{1} j)}^{j}]}{(1 - e^{- δ_{1}}) j!} = 0 \end{matrix}

(21)

and

\begin{matrix} d_{1} \sum_{j = 1}^{z} \frac{(1 - λ_{2}) e^{- θ_{2} - λ_{2} j} [{(θ_{2} + λ_{2} j)}^{j} - {(λ_{2} j)}^{j}]}{(1 - e^{- θ_{2}}) j!} \\ + d_{2} \sum_{j = 1}^{x} \frac{(1 - γ_{2}) e^{- δ_{2} - γ_{2} j} [{(δ_{2} + γ_{2} j)}^{j} - {(γ_{2} j)}^{j}]}{(1 - e^{- δ_{2}}) j!} = 0 . \end{matrix}

(22)

Solving (21) and (22), we get

\begin{matrix} d_{1} \sum_{j = 1}^{z} \frac{(1 - λ_{1}) (1 - γ_{2}) e^{- θ_{1} - λ_{1} j - δ_{2} - γ_{2} j} [{(θ_{1} + λ_{1} j)}^{j} - {(λ_{1} j)}^{j}] [{(δ_{2} + γ_{2} j)}^{j} - {(γ_{2} j)}^{j}]}{(1 - e^{- θ_{1}}) (1 - e^{- δ_{2}}) j!} \\ = d_{1} \sum_{j = 1}^{z} \frac{(1 - λ_{2}) (1 - γ_{1}) e^{- θ_{2} - λ_{2} j - δ_{1} - γ_{1} j} [{(θ_{2} + λ_{2} j)}^{j} - {(λ_{2} j)}^{j}] [{(δ_{1} + γ_{1} j)}^{j} - {(γ_{1} j)}^{j}]}{(1 - e^{- θ_{2}}) (1 - e^{- δ_{1}}) j!}, \end{matrix}

(23)

which implies

d_{1} = 0

and, logically,

d_{2} = 0

by (18). Hence, by Theorem 1, it can be concluded that

H_{1}

and

H_{2}

are linearly independent. Now, the argument can be extended to the case of any positive integer g and thus the proof follows. □

Proposition 13.

The pgf of a rv following the LZTPMD_g with pmf given in (17) is given by

G (u) = \sum_{i = 1}^{g} l_{i} \frac{(1 - λ_{i}) (e^{z θ_{i}} - 1)}{(e^{θ_{i}} - 1) (1 - λ_{i} z)},

where

z = u e^{λ_{i} (z - 1)}

.

Proof.

Given the pgf of the LZTPD(

θ, λ

) stated in (13), the proof follows directly from Definition 2. □

5. Estimation

In this section, we explore two popular methods of estimation, namely the method of moments (MM) and maximum likelihood (ML), for estimating the parameters of the LZTPD(θ, λ).

5.1. Maximum Likelihood

Let

Y_{1}, Y_{2}, \dots, Y_{n}

be n iid rvs derived from a rv Y following the LZTPD(

θ, λ

) (so with the pmf given in (6)), and

y_{1}, y_{2}, \dots, y_{n}

be n observations. Then, the likelihood function of the parameter vector

Θ = (θ, λ)

is given by

L_{n} = L_{n} (Θ) = \frac{{(1 - λ)}^{n} e^{- λ \sum_{i = 1}^{n} y_{i}} \prod_{i = 1}^{n} [{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}]}{{(e^{θ} - 1)}^{n}} .

The log-likelihood function is given by

\begin{matrix} L_{n} & = L_{n} (Θ) = log (L_{n}) = n log (1 - λ) - λ \sum_{i = 1}^{n} y_{i} + \sum_{i = 1}^{n} log [{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}] \\ - n log (e^{θ} - 1) . \end{matrix}

(24)

The first partial derivatives of

L_{n}

with respect to the parameters

θ

and

λ

are, respectively, given by

\frac{\partial L_{n}}{\partial θ} = \sum_{i = 1}^{n} \frac{y_{i} {(θ + y_{i})}^{y_{i} - 1}}{[{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}]} - \frac{n e^{θ}}{e^{θ} - 1}

and

\frac{\partial L_{n}}{\partial λ} = \sum_{i = 1}^{n} \frac{y_{i}^{2} [{(θ + λ y_{i})}^{y_{i} - 1} - {(λ y_{i})}^{y_{i} - 1}]}{{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}} - \sum_{i = 1}^{n} y_{i} - \frac{n}{1 - λ} .

The ML estimate (MLE) (vector) of

Θ

, say

\hat{Θ} = (\hat{θ}, \hat{λ})

, is obtained by the solutions of the likelihood equations

\frac{\partial L_{n}}{\partial θ} = 0

and

\frac{\partial L_{n}}{\partial λ} = 0

with respect to

θ

and

λ

. With these notations,

\hat{θ}

and

\hat{λ}

are also called MLEs of

θ

and

λ

, respectively. Analytical solutions to the likelihood equations are not possible. However, one can still compute the MLEs numerically by maximizing the log-likelihood function given in (24) by using the optim function in the R programming language under the L-BFGS-B algorithm. The associated R-code is provided in Appendix A.

In order to obtain the asymptotic confidence intervals for the parameters

θ

and

λ

, we consider the second partial derivatives of

L_{n}

taken at

\hat{Θ} = (\hat{θ}, \hat{λ})

. In this way, the Hessian matrix of the LZTPD(

θ, λ

) can be obtained, and it is given by

H (\hat{Θ}) = {(\begin{matrix} \frac{\partial^{2} L_{n}}{\partial θ^{2}} & \frac{\partial^{2} L_{n}}{\partial θ \partial λ} \\ \frac{\partial^{2} L_{n}}{\partial λ \partial θ} & \frac{\partial^{2} L_{n}}{\partial λ^{2}} \end{matrix})|}_{Θ = \hat{Θ}},

where

\begin{matrix} \frac{\partial^{2} L_{n}}{\partial λ^{2}} = - \frac{n}{{(1 - λ)}^{2}} & + \sum_{i = 1}^{n} y_{i}^{3} \{\frac{(y_{i} - 1) [{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}] [{(θ + λ y_{i})}^{y_{i} - 2} - {(λ y_{i})}^{y_{i} - 2}]}{{[{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}]}^{2}}\} \\ - \sum_{i = 1}^{n} y_{i}^{3} \{\frac{y_{i} {[{(θ + λ y_{i})}^{y_{i} - 1} - {(λ y_{i})}^{y_{i} - 1}]}^{2}}{{[{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}]}^{2}}\}, \end{matrix}

\begin{matrix} \frac{\partial^{2} L_{n}}{\partial θ^{2}} = \{\frac{y_{i} {(θ + λ y_{i})}^{y_{i} - 2} [(y_{i} - 1) {{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}} - {(θ + λ y_{i})}^{y_{i}}]}{{[{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}]}^{2}}\} + \frac{n}{e^{θ} {(1 - e^{- θ})}^{2}} \end{matrix}

and

\begin{matrix} \frac{\partial^{2} L_{n}}{\partial θ \partial λ} = \sum_{i = 1}^{n} y_{i}^{2} {(θ + λ y_{i})}^{y_{i} - 1} \{\frac{{{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}} {(θ + λ y_{i})}^{- 1} - y_{i} {{(θ + λ y_{i})}^{y_{i} - 1} - {(λ y_{i})}^{y_{i} - 1}}}{{[{(θ + λ y_{i})}^{y_{i}} - {(λ y_{i})}^{y_{i}}]}^{2}}\} . \end{matrix}

Therefore, the observed Fisher information matrix

J (\hat{Θ})

can be obtained by the negative of the Hessian matrix. That is,

J (\hat{Θ}) = - H (\hat{Θ})

. Moreover, the variance-covariance matrix of the MLEs is the inverse of the observed Fisher information matrix. It is given as follows:

Σ = J^{- 1} (\hat{Θ}) = (\begin{matrix} Σ_{11} & Σ_{12} \\ Σ_{21} & Σ_{22} \end{matrix})

and

Σ_{i j} = Σ_{j i}

for

i \neq j = 1, 2 .

The asymptotic distribution of the random version of

\hat{Θ}

follows a normal distribution that has been thoroughly established under the regularity constraints. That is, the random version of

\hat{Θ} - Θ

follows the multivariate normal distribution

N_{2} (0, Σ)

asymptotically. For

ρ \in (0, 1)

, we calculate the

100 \times (1 - ρ) %

asymptotic confidence intervals for parameters using the following formulae:

θ \in \{\hat{θ} \mp Z_{ρ / 2} \sqrt{Σ_{11}}\}, λ \in \{\hat{λ} \mp Z_{ρ / 2} \sqrt{Σ_{22}}\},

where

Z_{ρ}

is the upper

ρ_{th}

percentile of the standard normal distribution.

5.2. Method of Moments

In this portion, the parameters of the LZTPD are estimated by means of the method of moments (MM). This method’s concept is to solve theoretical moments using empirical moments. So we use the first and second sample moments, say

m_{1}

and

m_{2}

, respectively. Using this idea, we have

m_{1} = μ_{1}^{'} = \frac{λ}{{(1 - λ)}^{2}} + \frac{θ}{(1 - e^{- θ}) (1 - λ)}

(25)

and

\begin{matrix} m_{2} = μ_{2}^{'} & = \frac{λ + λ^{2}}{{(1 - λ)}^{4}} + \frac{θ^{2} (1 - λ) + θ}{(1 - e^{- θ}) {(1 - λ)}^{3}} - \frac{θ^{2}}{{(1 - e^{- θ})}^{2} {(1 - λ)}^{2}} \\ + \frac{λ}{{(1 - λ)}^{2}} + \frac{θ}{(1 - e^{- θ}) (1 - λ)} . \end{matrix}

(26)

Solving the above two equations, we get the method of moment estimates (MMEs) of

θ

and

λ

, say

{\hat{θ}}_{M M}

and

{\hat{λ}}_{M M}

, respectively, governed by the following equations:

\begin{matrix} {\hat{θ}}_{M M} = & \frac{m_{1} {(1 - {\hat{λ}}_{M M})}^{2} - {\hat{λ}}_{M M}}{{(1 - {\hat{λ}}_{M M})}^{5}} \\ [(m_{2} - m_{1}^{2}) {(1 - {\hat{λ}}_{M M})}^{4} - (m_{1} + {\hat{λ}}_{M M}^{2}) {(1 - {\hat{λ}}_{M M})}^{2} + {(m_{1} {(1 - {\hat{λ}}_{M M})}^{2} - {\hat{λ}}_{M M})}^{2}] \end{matrix}

and

{\hat{λ}}_{M M} = \frac{\sqrt{{(1 + 2 m_{1} - q)}^{2} - 4 m_{1} (m_{1} - q)} + (1 + 2 m_{1} - q)}{2 m_{1}},

where

q = \frac{{\hat{θ}}_{M M}}{1 - e^{- {\hat{θ}}_{M M}}}

.

6. Generalized Likelihood Ratio Test

In this section, we test the significance of an additional parameter included in the LZTPD using the generalized likelihood ratio test (GLRT). For the details, see [29].

Thus, in order to test the significance of the additional parameter

λ

of the LZTPD(

θ, λ

), we take over the GLRT procedure. Here, the null hypothesis is

H_{0} :

Y follows the ZTPD, against the alternative hypothesis:

H_{1} :

Y follows the LZTPD(

θ, λ

).

In the case of the GLRT, the test statistic is

- 2 log Λ = - 2 log (\frac{L_{n} ({\hat{Θ}}^{*})}{L_{n} (\hat{Θ})}),

(27)

where

\hat{Θ}

is the MLE of

Θ = (θ, λ)

with no restrictions and

{\hat{Θ}}^{*}

is the MLE of

Θ

under

H_{0}

. The random version of the test statistic given in (27) is asymptotically distributed as the chi-square distribution with one degree of freedom.

7. Simulation Study

In this section, we present a brief simulation exercise to assess the limited sample performance of estimates derived using the ML estimation approach by the R programming language (see [30]). Here, the iteration process is repeated

N = 1000

times. The specification of the parameter values is as follows:

θ = 1

and

λ = 0.5

(over-dispersion case), and

θ = 0.5

and

λ = 0.1

(under-dispersion case). Thus, we computed the average of the mean square error (MSE), and bias using the MLEs. The results obtained are summarized in Table 2 corresponding to the samples of sizes 25, 50, 200, 500, and 1000.

The average bias of the simulated estimates equals

\frac{1}{1000} \sum_{i = 1}^{1000} ({\hat{Θ}}_{i} - Θ)

and the average MSE of the simulated estimates equals

\frac{1}{1000} \sum_{i = 1}^{1000} {({\hat{Θ}}_{i} - Θ)}^{2}

, in which i is the rank of the considered iteration,

Θ \in \{θ, λ\}

and

\hat{Θ}

is the MLE of

Θ

.

From Table 2, it can be observed that the MSE in both the cases of the parameter sets is in decreasing order as the sample size increases, and also, the MLEs of the parameters come closer to the original parameter values as the sample size increases.

8. Lagrangian Zero Truncated Poisson Regression Model

The first thought that comes to mind when modelling a discrete response variable with associated covariates is a Poisson regression model. Except in the case of equi-dispersion, it can be seen that the Poisson regression provides erroneous findings when the response variable is over-dispersed or under-dispersed. Several models have been proposed to deal with these dispersions, including mixed Poisson models, generalized Poisson models, etc. However, we frequently encounter cases in which count data has no zeros; refs. [31,32] provided examples of length of hospital stay data. In this case, the ZTPRM performs well. Here, we introduce a novel count regression model called the LZTPRM, which is based on the LZTPD and provides additional options for predicting over-dispersed and under-dispersed zero truncated counts. Finally, we see that the LZTPRM performs well compared to the ZTPRM, ZTGPRM, and IPRM in the case of length of hospital stay data.

To link the covariates to the mean of the response rv X, we use the log-link function such that

μ = E (X) = e^{y^{T} α}, i = 1, 2, \dots, n,

(28)

where

y^{T} = (y_{1}, y_{2}, \dots, y_{k})

is the vector of k covariates and

α = (α_{1}, α_{2}, \dots, α_{k})

is the unknown vector of regression coefficients. Now, by considering the notations involved for the LZTPD(

θ, λ

) and the following re-parametrization:

λ = \frac{\sqrt{{[\frac{θ}{1 - e^{- θ}} - (2 μ + 1)]}^{2} - 4 μ (μ - \frac{θ}{1 - e^{- θ}})} - [\frac{θ}{1 - e^{- θ}} - (2 μ + 1)]}{2 μ},

the pmf of the LZTPD can be re-expressed as

h (x | θ, μ) = \frac{1 - V}{(e^{θ} - 1) x!} e^{- V x} [{(θ + V x)}^{x} - {(V x)}^{x}],

(29)

where

V = \frac{\sqrt{{[\frac{θ}{1 - e^{- θ}} - (2 μ + 1)]}^{2} - 4 μ (μ - \frac{θ}{1 - e^{- θ}})} - [\frac{θ}{1 - e^{- θ}} - (2 μ + 1)]}{2 μ},

θ > 0

and

μ \geq max \{\frac{θ}{1 - e^{- θ}}, \frac{1}{4} (\frac{2 θ}{1 - e^{- θ}} - \frac{θ^{2}}{{(1 - e^{- θ})}^{2}} - 1)\}

. Based on n independent observations of the regression model, say

(x_{1}, y_{1}^{T}), (x_{2}, y_{2}^{T}), \dots, (x_{n}, y_{n}^{T})

, and substituting (28) in (29),

X_{i} | y_{i}^{T}

follows the LZTPRM(

θ, μ_{i}

), where

y_{i}^{T} = (y_{i 1}, y_{i 2}, \dots, y_{i k})

, with the following pmf:

h (x_{i} | y_{i}^{T}, θ) = \frac{1 - W_{i}}{(e^{θ} - 1) x_{i}!} e^{- W_{i} x_{i}} [{(θ + W_{i} x_{i})}^{x_{i}} - {(W_{i} x_{i})}^{x_{i}}],

where

W_{i} = \frac{\sqrt{{[\frac{θ}{1 - e^{- θ}} - (2 e^{y_{i}^{T} α} + 1)]}^{2} - 4 e^{y_{i}^{T} α} (e^{y_{i}^{T} α} - \frac{θ}{1 - e^{- θ}})} - [\frac{θ}{1 - e^{- θ}} - (2 e^{y_{i}^{T} α} + 1)]}{2 e^{y_{i}^{T} α}} .

The log-likelihood function of the LZTPRM based on a sample of n independent observations

(x_{1}, y_{1}^{T}), (x_{2}, y_{2}^{T}), \dots, (x_{n}, y_{n}^{T})

is expressed as

\begin{matrix} log L & = \sum_{i = 1}^{n} \{log (1 - W_{i}) - W_{i} x_{i} + log [{(θ + W_{i} x_{i})}^{x_{i}} - {(W_{i} x_{i})}^{x_{i}}] - log x_{i}!\} \\ - n log (e^{θ} - 1) . \end{matrix}

(30)

As in Section 5, for finding the MLEs of the parameters, we use the optim function in the R programming language under the L-BFGS-B algorithm.

9. Applications and Empirical Study

This section contains three real datasets to demonstrate the empirical importance of the LZTPD. The first two datasets are used to compare the LZTPD’s modeling ability to that of some competing models, while the third dataset is for the regression study. The form of the hrf of the datasets is determined using a graphical method based on Total Time on Test (TTT). If the empirical TTT plot is convex, concave, convex then concave, and concave then convex, then the form of the associated hrf is decreasing, increasing, bathtub shape, and upside-down bathtub shape, respectively (see [33]). The following distributions are considered to demonstrate the potential advantage of the LZTPD:

The ZTPD proposed by [1], and defined by the following pmf:

$h_{6} (y) = \frac{θ^{y}}{y! (e^{θ} - 1)}, y = 1, 2, \dots,$

with $θ > 0$ .
The ZTGPD proposed by [21], and specified by the following pmf:

$h_{7} (y) = \frac{θ {(θ + λ y)}^{y - 1} e^{- λ y}}{y! (e^{θ} - 1)}, y = 1, 2, \dots,$

wih $θ, λ > 0$ .
The IPD elaborated by [3], and indicated by the following pmf:

$h_{8} (y) = \frac{[{(1 + φ)}^{y} - φ^{y}] ζ^{y}}{e^{φ ζ} (e^{ζ} - 1) y!}, y = 1, 2, \dots$

with $ζ > 0$ and $φ \geq 0$ .
The zero truncated discrete Shankar distribution (ZTDSD) proposed by [34], and defined by the following pmf:

$h_{9} (y) = \frac{(θ^{2} + 1 + θ y) (1 - e^{- θ}) - θ e^{- θ}}{(θ^{2} + θ + 1)}, y = 1, 2, \dots,$

with $θ > 0$ .
The two-parameter zero truncated Poisson-Lindley distribution (ZTPLD) introduced by [35], and indicated by the following pmf:

$h_{10} (y) = \frac{θ^{2}}{θ^{2} + 2 θ α + θ + α} \frac{α y + θ + α + 1}{{(θ + 1)}^{y}}, y = 1, 2, \dots,$

with $θ, α > 0$ .
The zero truncated generalized negative binomial distribution (ZTGNBD) proposed by [36], and defined by the following pmf:

$h_{11} (y) = \frac{θ}{θ + λ y} (\binom{θ + λ y}{y}) \frac{α^{y} {(1 - α)}^{θ + λ y - y}}{1 - {(1 - α)}^{θ}}, y = 1, 2, \dots,$

with $θ, λ > 0$ and $0 < α < 1$ .

The MLEs of the parameters, negative estimated Log Likelihood (

- {\hat{L}}_{n}

), Akaike information criterion (AIC), Bayesian information criterion (BIC), and the

χ^{2}

statistic value are now calculated for the first two datasets. The best model is the one with minimum values for its model adequacy measures, such as the AIC and BIC, and the best fitted model is the one having a minimum value for the goodness of fit statistic (

χ^{2}

).

9.1. University Course Enrollments

Ref. [37] provided the following data on student enrollments in selected senior mathematics and statistics courses at the University of Calgary over a five-year period:

1, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 9, 9, 9, 9, 13, 13, 14, 16, 16, 17, 17, 17, 18, 19, 20, 24, 24, 24, 24, 27, 31, 33, 35, 37.

Table 3 shows the descriptive measures of the data, which include sample size n, minimum (min), first quartile (

Q_{1}

), median (Md), third quartile (

Q_{3}

), maximum (max), and interquartile range (IQR).

The empirical

I O D

of the data is equal to

7.7131

. As a result, our model employed to describe the current data set is capable of dealing with over-dispersion. In addition, Figure 4 shows an empirical TTT plot of the data and it reveals an increasing hrf.

The MLEs, model adequacy measures, and

χ^{2}

-value for the data are computed. They are given in Table 4. From Table 4, it is clear that the LZTPD has better performance compared to all the other competing models considered here since it has the smallest

χ^{2}

-value and model adequacy measures.

Now, the Hessian matrix related to the MLEs is given as

H (\hat{Θ}) = (\begin{matrix} 1392.5626 & 57.7502 \\ 57.7502 & 4.8318 \end{matrix})

and the quadrated estimated variance-covariance matrix is

Σ = (\begin{matrix} 0.0014 & - 0.0170 \\ - 0.0170 & 0.4103 \end{matrix}) .

It is noticed that the determinant value of the observed information matrix

J (\hat{Θ})

is non-zero and hence meets the non-singularity conditions of the information matrix.

For comparison purposes only, we compute the MMEs of the LZTPD parameters for the university course enrollments data, and they are obtained as

{\hat{θ}}_{M M} = 2.0154

and

{\hat{λ}}_{M M} = 0.5815

. It is concluded that MMEs and MLEs are approximately equal.

In the case of the GLRT, the calculated value based on the test statistic (27) is

2 (- 186.7358

+ 300.1443) = 226.817 (p-value

= 0.00002

). As a result, at any level >0.00002, the null hypothesis is rejected in favor of the alternative hypothesis. Hence, we conclude that the additional parameter

λ

in the LZTPD is significant in the light of the test procedure outlined in Section 6. Furthermore, the approximate

95 %

confidence intervals for

θ

and

λ

are given by

(2.3321, 2.7431)

and

(0.5374, 0.6853)

, respectively.

9.2. Demographic Data

Secondly, we utilize the data available in [38] as a demographic study, which represent the number of mothers of the completed fertility having experienced at least one child death. Table 5 provides the descriptive measures of the data, such as n, min,

Q_{1}

, Md,

Q_{3}

, max and IQR.

The empirical

I O D

of the data is equal to

0.6787

. As a result our model is employed to explain this data set. Furthermore, Figure 5 shows an empirical TTT plot of the data. It can be concluded that the data have an increasing hrf.

Table 6 displays the MLEs, model adequacy measures, and

χ^{2}

-value for the data. From Table 6, it is clear that the LZTPD has better performance compared to all the other competing models considered here since it has the smallest

χ^{2}

-value and model adequacy measures.

Now, the Hessian matrix related to the MLEs is obtained as

H (\hat{Θ}) = (\begin{matrix} 500.0923 & 95.5925 \\ 95.5925 & 13.5116 \end{matrix})

and the quadrated estimated variance-covariance matrix is

Σ = (\begin{matrix} 0.0020 & - 0.0108 \\ - 0.0108 & 0.0740 \end{matrix}) .

Here again, the observed information matrix’s determinant value

J (\hat{Θ})

is non-zero, indicating that the non-singularity condition is satisfied.

For comparison purposes only, we compute the MMEs of the LZTPD parameters for the demographic data, and they are obtained as

{\hat{θ}}_{M M} = 4.1979 \times 10^{- 8}

and

{\hat{λ}}_{M M} = 0.2361

. It is concluded that MMEs and MLEs are approximately equal.

In the case of GLRT, the computed value based on the test statistic (27) is

2 (- 143.0373 + 150.0619) = 14.0492

(p-value

= 0.0001

). As a result, the null hypothesis is rejected in favor of the alternative hypothesis at any level >0.0001. Hence, we conclude that the additional parameter

λ

in the LZTPD is significant in the light of the test procedure outlined in Section 6. The approximate

95 %

confidence intervals for

θ

and

λ

are given by

(8.7193 \times 10^{- 9}, 9.7470 \times 10^{- 9})

and

(0.5787, 0.7276)

, respectively.

9.3. Biological Science

The third data set, which is included in the ‘azpro’ package of the R software (also, available in the ‘COUNT’ package of the R software), is about a 1991 Arizona cardiovascular patient. We have 3589 patients and the aim is to model the length of hospital stay, say

x_{i}

for the

i t h

patient, with the following covariates:

y_{i 1}

represents the cardiovascular procedure (the variable takes the value 1 for CABG procedure and 0 for PTCA procedure),

y_{i 2}

represents the sex of the patients (the variable takes the value 1 for male and 0 for female patients),

y_{i 3}

represents the type of the admission (the variable takes the value 1 for urgent and 0 for electiive), and

y_{i 4}

represents the age of the patients (the variable takes the value 1 for the age > 75 and 0 for the age

\leq 75

).

The empirical

I O D

of the response variable is calculated as

5.432

. So the response variable is over-dispersed. Therefore, the LZTPRM is able to handle this over-dispersion, with the configuration

μ_{i} = e^{α_{0} + α_{1} y_{i 1} + α_{2} y_{i 2} + α_{3} y_{i 3} + α_{4} y_{i 4}} .

The following regression models are used to compare the LZTPRM:

the ZTPRM given in [39];
the ZTGPRM given in [40];
the IPRM given in [18].

Table 7 compares the LZTPRM’s performance to that of the ZTPRM, ZTGPRM, and IPRM, as well as summaries based on the real data set, such as standard errors (SEs), p-values, negative log-likelihood (-

logL

), and AIC values. According to Table 7, the LZTPRM has the lowest values across all model selection criteria, indicating that it is the best count regression model among the ZTPRM, ZTGPRM, and IPRM.

10. Discussion

10.1. Brief Summary

In the case of over-dispersion and under-dipersion, new count models must be discovered, which could provide additional options for better fitting real datasets by selecting the appropriate models for the situation. We developed a new over-dispersed count model and analysed its regression characteristics in this regard. The primary reason for creating this model is also explained. We discovered that the suggested model outperformed the compared models in every way, including its main competitors: the ZTPD and IPD.

10.2. This Work

A novel discrete distribution, the LZTPD, is developed and thoroughly examined. The median, mode, pgf, cgf, factorial moments, mean, variance,

C V

, skewness, and kurtosis were all given precise formulations. The distribution parameters were estimated using the classical ML and MM techniques. A simulation study on the MLEs was also conducted. On the basis of a real data set, a new LZTPD-based regression model for count data is developed and compared to competitive regression models. The new model was demonstrated using three real-world datasets: one with university course enrollments data, another with demographic data, and the third with healthcare data.

10.3. Contributions and Limitations

A new discrete distribution with its own count model and its own regression model is proposed in this work. We feel the proposed models are suitable for the in-depth analysis of data in the domains of demography and health care, and we hope that they can be applied to other fields of study as well. The proposed distribution’s potential shortcoming is the inability to display a bimodal character.

10.4. Future Work

Based on the demand of applied scientists for our proposed LZTPD, one may explore more features, such as its generalizations using conventional ideas as well as its bivariate and multivariate extensions. This requires significant investigation, which we will delegate to further research.

11. Concluding Remarks

The LZTPD is suitable in both under-dispersed and over-dispersed count datasets, whereas the IPD is only useful in under-dispersed cases. Several key LZTPD features have been determined, and it has been observed that they are more flexible than those of the IPD. The LZTPD is compared to the well-known IPD and a few other competing distributions, and it is discovered that the LZTPD outperforms competing models for the datasets under consideration.

Author Contributions

Conceptualization, M.R.I., C.C., D.S.S., M.M. and R.M.; methodology, M.R.I., C.C., D.S.S., M.M. and R.M.; software, M.R.I., C.C., D.S.S., M.M. and R.M.; validation, M.R.I., C.C., D.S.S., M.M. and R.M.; formal analysis, M.R.I., C.C., D.S.S., M.M. and R.M.; investigation, M.R.I., C.C., D.S.S., M.M. and R.M.; resources, M.R.I., C.C., D.S.S., M.M. and R.M.; data curation, M.R.I., C.C., D.S.S., M.M. and R.M.; writing—original draft preparation, M.R.I., C.C., D.S.S., M.M. and R.M.; writing—review and editing, M.R.I., C.C., D.S.S., M.M. and R.M.; visualization, M.R.I., C.C., D.S.S., M.M. and R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

On the initial version of the manuscript, we appreciate the constructive comments from the three reviewers and the associate editor.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

LZTPD	Lagrangian Zero Truncated Poisson Distribution
ZTPD	Zero Truncated Poisson Distribution
IPD	Intervened Poisson Distribution
LF	Lagrangian Family
DLF	Discrete Lagrangian Family
LZTPRM	Lagrangian Zero Truncated Poisson Regression Model
ZTPRM	Zero Truncated Poisson Regression Model
ZTGPRM	Zero Truncated Generalized Poisson Regression Model
IPRM	Intervened Poisson Regression Model
pmf	Probability Mass Function
hrf	Hazard Rate Function
$I O D$	Index Of Dispersion
pgf	Probability Generating Function
mgf	Moment Generating Function
cgf	Cumulant Generating Function
$C V$	Coefficient of Variation
iid	independent and identically distributed
rv	random variable
ML	Maximum Likelihood
MLEs	Maximum Likelihood Estimates
MM	Method of Moments
MMEs	Method of Moments Estimates
GLRT	Generalized Likelihood Ratio Test
MSE	Mean Squared Error
LZTPMD_g	Lagrangian Zero Truncated Poisson Mixture Distribution with g components
TTT	Total Time on Test
ZTGPD	Zero Truncated Generalized Poisson Distribution
ZTDSD	Zero Truncated Discrete Shankar Distribution
ZTPLD	Zero Truncated Poisson Lindley Distribution
ZTGNBD	Zero Truncated Generalized Negative Binomial Distribution
AIC	Akaike Information Criterion
BIC	Bayesian Information Criterion
IQR	Inter Quartile Range
Md	Median
min	Minimum
max	Maximum
SE	Standard Error

Appendix A

Below is the main R-code for determining the MLEs of the LZTPD parameters, as well as the model adequacy measures.

library(fitdistrplus)

dfn <- function(y, theta, lambda){

d <- (exp(-lambda*y)/(factorial(y)*(exp(theta)-1)))

* (theta *((theta+(lambda*y)^(y))-(lambda*y)^(y)))

return(d)

}

pfn <- function(q,theta,lambda){

cumsum(dfn(q,theta,lambda))

}

#

pfn(x,3,0.4)

#

pre <- prefit(x, "fn", "mle", list(theta=0.1, lambda=0.1),

lower=c(0, 0), upper = c(Inf, 1))

fit.fn <- fitdist(x, "fn",

start = list(theta = pre$theta, lambda = pre$lambda),

optim.method = "L-BFGS-B", lower=c(0, 0), upper = c(Inf, 1),

discrete = TRUE)

summary(fit.fn)

gofstat(fit.fn)

References

Cohen, A.C. Estimating parameters in a conditional Poisson distribution. Biometrics 1960, 16, 203–211. [Google Scholar] [CrossRef]
Singh, J. A characterization of positive Poisson distribution and its application. SIAM J. Appl. Math. 1978, 34, 545–548. [Google Scholar] [CrossRef]
Shanmugam, R. An intervened Poisson distribution and its medical application. Biometrics 1985, 41, 1025–1029. [Google Scholar] [CrossRef] [PubMed]
Shanmugam, R. An inferential procedure for the Poisson intervention parameter. Biometrics 1992, 48, 559–565. [Google Scholar] [CrossRef] [PubMed]
Kumar, C.S.; Shibu, D.S. Modified intervened Poisson distribution. Statistica 2011, 71, 489–499. [Google Scholar]
Singh, B.P.; Dixit, S.; Shukla, U. An alternative to intervened Poisson distribution for prevalence reduction. J. Math. Stat. Sci. 2016, 2016, 730–740. [Google Scholar]
Lagrange, J.L. Mécanique Analytique; Jacques Gabay: Paris, France, 1788. [Google Scholar]
Consul, P.C.; Shenton, L.R. Use of Lagrange expansion for generating generalized probability distributions. SIAM J. Appl. Math. 1972, 23, 239–248. [Google Scholar] [CrossRef]
Consul, P.C.; Shenton, L.R. Some interesting properties of Lagrangian distributions. Commun. Stat. 1973, 2, 263–272. [Google Scholar] [CrossRef]
Mohanty, S.G. On a generalized two- coin tossing problem. Biom. Z. 1966, 8, 266–272. [Google Scholar] [CrossRef]
Consul, P.C.; Famoye, F. Lagrangian Katz family of distributions. Commun. Stat. Theory Methods 1996, 25, 415–434. [Google Scholar] [CrossRef]
Berg, K.; Nowicki, K. Statistical inference for a class of modified power series distribution with applications to random mapping theory. J. Stat. Plan. Inference 1991, 28, 247–261. [Google Scholar] [CrossRef]
Li, S.; Black, D.; Lee, C.; Famoye, F. Dependence models arising from the Lagrangian probability distributions. Commun. Stat. Theory Methods 2010, 29, 1729–1742. [Google Scholar] [CrossRef]
Innocenti, A.R.; Fox, O.; Chibbaro, S. A Lagrangian probability density-function model for collisional turbulent fluid particle flows i. Model derivation. J. Fluid Mech. 2019, 862, 449–489. [Google Scholar] [CrossRef]
Dobson, A.J.; Dobson, A. An Introduction to Generalized Linear Models, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2001. [Google Scholar]
Long, J.S.; Freese, J. Regression Models for Categorical Dependent Variables Using Stata, 2nd ed.; Stata Press: College Station, TX, USA, 2005. [Google Scholar]
Shaw, D. On-site samples regression problems of non-negative integers, truncation, and endogenous stratification. J. Econom. 2005, 37, 211–223. [Google Scholar] [CrossRef]
Shibu, D.S. On Intervened Poisson Distribution and Its Generalization. Ph.D. Thesis, University of Kerala, Thiruvananthapuram, India, 2013. [Google Scholar]
Janardan, K.G.; Rao, B.R. Lagrangian distributions of second kind and weighted distributions. SIAM J. Appl. Math. 1983, 43, 302–313. [Google Scholar] [CrossRef]
Janardan, K.G. A wider class of Lagrange distributions of the second kind. Commun. Stat. Theory Methods 1997, 26, 2087–2097. [Google Scholar] [CrossRef]
Consul, P.C.; Famoye, F. The truncated generalized Poisson distribution and its estimation. Commun. Stat. Theory Methods 1989, 18, 3635–3648. [Google Scholar] [CrossRef]
Jain, G.C. A Linear function Poisson distribution. Biom. J. 1975, 17, 501–506. [Google Scholar] [CrossRef]
Consul, P.C.; Famoye, F. Lagrangian Probability Distributions; Birkhaüser: New York, NY, USA, 2006. [Google Scholar]
Consul, P.C.; Jain, G.C. A generalization of the Poisson distribution. Technometrics 1973, 15, 791–799. [Google Scholar] [CrossRef]
Gupta, R.C. Modified power series distribution and some of its applications. Sankhya Ser. B 1974, 35, 288–298. [Google Scholar]
McLachlan, G.; Peel, D. Finite Mixture Models; Wiley: Hoboken, NJ, USA, 2000. [Google Scholar]
Kumar, C.S.; Shibu, D.S. On finite mixtures of modified intervened Poisson distribution and its applications. J. Stat. Theory Appl. 2014, 13, 344–355. [Google Scholar] [CrossRef]
Titterington, D.M.; Smith, A.F.; Markov, U.E. Statistical Analysis of Finite Mixture Distributions; Wiley: Hoboken, NJ, USA, 1985. [Google Scholar]
Rao, C.R. Minimum variance and the estimation of several parameters. Math. Proc. Camb. Philos. 1947, 43, 280–283. [Google Scholar] [CrossRef]
R Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria. Available online: https://www.R-project.org/ (accessed on 6 September 2021).
Hardin, J.; Hilbe, J. Generalized Linear Models and Extensions, 2nd ed.; StatCorp LP Texas: College Station, TX, USA, 2007. [Google Scholar]
Hilbe, J.M. Negative Binomial Regression; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Aarset, M.V. How to identify a bathtub hazard rate. IEEE Trans Reliab. 1987, 36, 106–108. [Google Scholar] [CrossRef]
Borah, M.; Saikaia, K.R. Zero- tuncated discrete Shankar distribution and its applications. Biom. Biostat. Int. J. 2017, 5, 232–237. [Google Scholar]
Shanker, R.; Shukla, K.K. A zero- truncated two-parameter Poisson-Lindley distribution with an application to biological science. Turk. Klin. J. Biostat. 2017, 9, 85–95. [Google Scholar] [CrossRef]
Famoye, F.; Consul, P.C. The truncated generalized negative binomial distribution. J. Appl. Stat. Sci. 1993, 1, 141–157. [Google Scholar]
Huang, M.; Fung, K.Y. Intervened truncated Poisson distribution. Sankhya Ser. B 1989, 51, 302–310. [Google Scholar]
Shanker, R.; Fesshaye, H.; Selvaraj, S.; Yemane, A. On zero-truncation of Poisson and Poisson-Lindley distributions and their applications. Biom. Biostat. Int. J. 2015, 2, 168–181. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Ong, M.; Liu, H. Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method. In Proceedings of the Section on Statistics in Epidemiology—JSM 2011, Miami Beach, FL, USA, 30 July–4 August 2011. [Google Scholar]
Zhao, W.; Feng, Y.; Li, Z. Zero-truncated generalized Poisson regression model and its score tests. J. East China Norm. Univ. Sci. 2010, 1, 17–23. [Google Scholar]

Figure 1. Various shapes of the LZTPD pmf when

λ

increases.

Figure 1. Various shapes of the LZTPD pmf when

λ

increases.

Figure 2. Various shapes of the pmf of the LZTPD when

θ

increases.

Figure 2. Various shapes of the pmf of the LZTPD when

θ

increases.

Figure 3. Plots of the hrf of the LZTPD distribution.

Figure 4. TTT plot for the university course enrollments data.

Figure 5. TTT plot for the demographic data.

Table 1. Values for some moment measures of the LZTPD for different values of

θ

and

λ

.

Table 1. Values for some moment measures of the LZTPD for different values of

θ

and

λ

.

$θ$	$λ$	Mean	Variance	Median	Mode	$IOD$	$CV$	Skewness	Kurtosis
0.5	0.05	1.4045	1.0797	1	1	0.7687	0.7398	1.4039	5.1257
	0.1	1.5531	1.3000	1	1	0.8370	0.7341	1.9099	7.8036
	0.2	1.9061	1.9476	1	1	1.0217	0.7321	2.0265	10.0765
	0.3	2.3599	3.0631	2	1	1.2979	0.7416	1.3564	14.9253
	0.4	2.9650	5.1144	2	1	1.7248	0.7627	2.8461	15.4684
	0.5	3.8122	9.2482	3	1	2.4259	0.7977	3.5461	17.4684
1	0.05	1.6652	2.4915	1	1	1.4962	0.9478	1.9810	5.1030
	0.1	1.7577	2.8956	2	1	1.6473	0.9680	2.0319	9.7862
	0.2	1.9774	4.0345	2	1	2.0402	1.0157	2.4052	10.8162
	0.3	2.2599	5.9023	2	1	2.6117	1.0750	2.9737	12.2294
	0.4	2.6366	9.1847	3	1	3.4835	1.1494	3.1804	15.0489
	0.5	3.1639	15.5245	4	1	4.9066	1.2453	3.6804	17.0489
2	0.05	2.3739	6.7172	2	2	2.8296	1.0917	2.0589	6.1850
	0.1	2.4415	7.6566	2	2	3.1359	1.1333	2.8669	9.0253
	0.2	2.6021	10.2187	3	2	3.9270	1.2284	3.1943	11.4901
	0.3	2.8086	14.2461	3	2	5.0721	1.3438	3.5848	15.2456
	0.4	3.0840	21.0251	4	2	6.8173	1.4867	3.8983	16.0534
	0.5	3.4695	33.5493	5	2	9.6696	1.6694	3.9983	17.0534
3	0.05	3.2125	13.0707	3	3	4.0686	1.1253	1.4589	6.9185
	0.1	3.2741	14.7966	3	3	4.5192	1.1748	1.9866	10.0253
	0.2	3.4202	19.4386	4	3	5.6833	1.2890	2.6943	11.4901
	0.3	3.6082	26.5960	5	3	7.3709	1.4292	2.9848	13.2456
	0.4	3.8587	38.3929	5	4	9.9494	1.6057	3.1983	16.7534
	0.5	4.2095	59.6845	7	4	14.1782	1.8352	3.8983	18.0534

Table 2. Estimates of the parameters and the corresponding bias and MSE.

Parameter Set	Sample Size	Parameters	Estimates	MSE	Bias
	$n = 25$	$θ$	0.7554	0.0598	−0.2464
		$λ$	0.0379	0.0031	−0.0590
	$n = 50$	$θ$	0.7592	0.0579	−0.2488
		$λ$	0.4490	0.0026	−0.0510
$θ = 1, λ = 0.5$	$n = 200$	$θ$	0.7801	0.0483	−0.2199
		$λ$	0.4601	0.0015	−0.0399
	$n = 500$	$θ$	0.8023	0.0390	−0.1977
		$λ$	0.4780	0.0004	−0.0220
	$n = 1000$	$θ$	0.9567	0.0018	−0.0433
		$λ$	0.4901	0.0001	−0.0099
	$n = 25$	$θ$	0.3646	0.0183	−0.1354
		$λ$	0.0300	0.0049	−0.0700
	$n = 50$	$θ$	0.3699	0.0169	−0.1301
		$λ$	0.0542	0.0020	−0.0458
$θ = 0.5, λ = 0.1$	$n = 200$	$θ$	0.3978	0.0104	−0.1022
		$λ$	0.0801	0.0003	−0.0199
	$n = 500$	$θ$	0.4736	0.0006	−0.0264
		$λ$	0.0891	0.0001	−0.0109
	$n = 1000$	$θ$	0.4983	0.00001	−0.0017
		$λ$	0.0940	0.00003	−0.0060

Table 3. Descriptive statistics for the university course enrollments data.

Statistics	n	min	$Q_{1}$	Md	$Q_{3}$	max	IQR
Values	56	1	4	7	17	37	13

Table 4. MLEs, model adequacy measures and

χ^{2}

value for the university course enrollments data.

Table 4. MLEs, model adequacy measures and

χ^{2}

value for the university course enrollments data.

Model	MLEs	$- {\hat{L}}_{n}$	$χ^{2}$	AIC	BIC
ZTPD	$θ = 11.25$	300.1443	2997.476	602.2886	604.3140
IPD	$φ = 11.249$	300.1443	2998.871	604.2886	608.3393
	$ζ = 3.3197 \times 10^{- 8}$
ZTDSD	$θ = 0.1709$	187.8862	144.3541	377.7725	381.7978
ZTPLD	$θ = 0.1785$	187.856	144.7294	379.7121	383.7628
	$α = 0.2975$
ZTGPD	$θ = 3.8053$	186.8137	154.0068	377.6273	381.6780
	$λ = 0.6540$
ZTGNBD	$θ = 20.5554$	186.8197	153.9518	379.6394	385.7154
	$λ = 4.0554$
	$α = 0.1687$
LZTPD	$θ = 2.5878$	186.7358	126.0983	377.4716	381.5223
	$λ = 0.6113$

Table 5. Descriptive statistics for the demographic data.

Statistics	n	min	$Q_{1}$	Md	$Q_{3}$	max	IQR
Values	135	1	1	1	2	6	1

Table 6. MLEs, model adequacy measures and

χ^{2}

-value for the demographic data.

Table 6. MLEs, model adequacy measures and

χ^{2}

-value for the demographic data.

Model	MLEs	$- {\hat{L}}_{n}$	$χ^{2}$	AIC	BIC
ZTPD	$θ = 1.0381$	150.0619	7.9012	302.12	305.029
IPD	$φ = 1.0382$	150.0619	14.863	304.70	309.93
	$ζ = 4.5998 \times 10^{- 10}$
ZTDSD	$θ = 0.9999$	148.8624	12.1887	299.7248	302.6301
ZTPLD	$θ = 1.6466$	143.8747	2.8235	291.7493	297.5599
	$α = 0.00038$
ZTGPD	$θ = 0.2838$	143.3546	1.746	290.709	296.520
	$λ = 0.2855$
ZTGNBD	$θ = 0.2041$	143.2366	1.5554	292.4731	301.189
	$λ = 1.0002$
	$α$ = 0.5281
LZTPD	$θ = 4.5593 \times 10^{- 8}$	143.0373	1.304	290.6747	296.4852
	$λ = 0.2112$

Table 7. The results of the regression models for the length of hospital stay data (SEs in brackets).

Covariates	ZTPRM		ZTGPRM		IPRM		LZTPRM
Covariates	Estimate	p-Value	Estimate	p-Value	Estimate	p-Value	Estimate	p-Value
$α_{0}$	1.2367	<0.001	1.1961	<0.001	1.1981	<0.001	2.0181	<0.001
	(0.0213)		(0.0160)		(0.0019)		(0.0021)
$α_{1}$	0.5609	<0.001	0.5931	<0.001	0.5751	<0.001	0.1361	<0.001
	(0.0305)		(0.0280)		(0.0345)		(0.0145)
$α_{2}$	−0.0739	<0.001	−0.0781	<0.001	−0.0766	<0.001	−0.0141	<0.001
	(0.0365)		(0.0156)		(0.0019)		(0.0232)
$α_{3}$	0.1452	<0.001	0.1499	<0.001	0.2908	<0.001	0.0982	<0.001
	(0.0168)		(0.0255)		(0.0217)		(0.0251)
$α_{4}$	0.0934	<0.001	0.0991	<0.001	0.1352	<0.001	0.0142	<0.001
	(0.0134)		(0.0346)		(0.0109)		(0.0019)
$- logL$	6921.34		6629.25		6579.37		6494.85
AIC	13854.68		13272.50		13172.74		13003.70

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Irshad, M.R.; Chesneau, C.; Shibu, D.S.; Monisha, M.; Maya, R. Lagrangian Zero Truncated Poisson Distribution: Properties Regression Model and Applications. Symmetry 2022, 14, 1775. https://doi.org/10.3390/sym14091775

AMA Style

Irshad MR, Chesneau C, Shibu DS, Monisha M, Maya R. Lagrangian Zero Truncated Poisson Distribution: Properties Regression Model and Applications. Symmetry. 2022; 14(9):1775. https://doi.org/10.3390/sym14091775

Chicago/Turabian Style

Irshad, Muhammed Rasheed, Christophe Chesneau, Damodaran Santhamani Shibu, Mohanan Monisha, and Radhakumari Maya. 2022. "Lagrangian Zero Truncated Poisson Distribution: Properties Regression Model and Applications" Symmetry 14, no. 9: 1775. https://doi.org/10.3390/sym14091775

APA Style

Irshad, M. R., Chesneau, C., Shibu, D. S., Monisha, M., & Maya, R. (2022). Lagrangian Zero Truncated Poisson Distribution: Properties Regression Model and Applications. Symmetry, 14(9), 1775. https://doi.org/10.3390/sym14091775

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lagrangian Zero Truncated Poisson Distribution: Properties Regression Model and Applications

Abstract

1. Introduction

2. Some Preliminaries

2.1. Basics on the Discrete Lagrangian Family

2.2. Importance of the Lagrangian Family

3. Lagrangian Zero Truncated Poisson Distribution (LZTPD)

4. Finite Mixtures of Lagrangian Zero Truncated Poisson Distribution

5. Estimation

5.1. Maximum Likelihood

5.2. Method of Moments

6. Generalized Likelihood Ratio Test

7. Simulation Study

8. Lagrangian Zero Truncated Poisson Regression Model

9. Applications and Empirical Study

9.1. University Course Enrollments

9.2. Demographic Data

9.3. Biological Science

10. Discussion

10.1. Brief Summary

10.2. This Work

10.3. Contributions and Limitations

10.4. Future Work

11. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI