1. Introduction
In probability theory, positive discrete distributions called “zero-truncated distributions” are used to model data that exclude zero counts. For instance, the number of times a voter casts a ballot during the general election, the number of journal articles published in various disciplines, the number of stressful events reported by patients, and the length of hospital stay, which must be at least one day. Various zero-truncated discrete distributions, such as the zero-truncated Poisson distribution (ZTPD) (see [
1]), zero-truncated negative-binomial distribution (see [
2]), zero-truncated Katz distribution (ZTKD) (see [
3]), zero-truncated generalized negative-binomial distribution (ZTGNBD) (see [
4]), zero-truncated generalized Poisson distribution (see [
5]), intervened Poisson distribution (IPD) (see [
6]), intervened generalized Poisson distribution (IGPD) (see [
7]), a generalization of the Poisson–Sujatha distribution (AGPSD) (see [
8]), and zero-truncated discrete Lindley distribution (ZTDLD) (see [
9]), have been proposed in the literature to model such count data. In spite of the abundance of practical situations with counting data without zero categories, there is a notable sparseness of zero-truncated discrete distributions in the scientific literature, in contrast to the vast number of classical discrete distributions.
Since the early 1970s, researchers studying discrete distributions seem to have focused more on “Lagrangian distributions”, so named because they are connected to the Lagrange expansions (see [
10,
11]). The authors in [
12] considered the possibility of using Lagrangian distributions to address inferential problems in a random mapping theory. A study in [
13] showed that, in certain circumstances, all the discrete Lagrangian distributions converged to the Gaussian distribution and the inverse Gaussian distribution. The authors in [
14] proposed certain mixture distributions based on Lagrangian distributions. Recently, Lagrangian distributions were used for turbulent collisional fluid–particle flows (see [
15]). A unified method for creating the class of “quasi” distributions, which includes the quasi-binomial, quasi-Polya, quasi-hypergeometric, and several new quasi-distributions, was presented in [
16] using the Lagrange expansions. As a result, the distributions arose from Lagrange expansions and have gained traction from both theoretical and applied perspectives.
The Lagrangian distributions of the first kind (
) and the Lagrangian distributions of the second kind (
) were the first divisions of the class of Lagrangian distributions. The authors in [
13] were the first to present and study the
. Several Lagrangian distributions have been constructed using the
, but four fundamental distributions, which are the generalized negative binomial distribution, the generalized geometric series distribution, the generalized Poisson distribution, and the generalized logarithmic series distribution, are of particular note and have proven to be very useful in practical applications (see [
4]). The authors in [
17] defined a Lagrangian Katz distribution (LKD) using the
. The author in [
18] showed that the LKD was a subclass of the generalized Polya–Eggenberger family of distributions. The authors in [
19] obtained the LKD as a limiting distribution of the Markov–Polya distribution. The authors in [
20] discussed the application of the LKD to time series data.
On the other hand, the authors in [
21,
22] conducted extensive research on the
. The Geeta distribution and its characteristics were derived in [
23] based on the
. The authors in [
24] proposed the Dev distribution and some of its applications in queuing theory by using the
. Ref. [
25] proposed the Harish distribution and inferred some of its characteristics, with applications in the branching process and queuing theory based on the
. Furthermore, the authors in [
18] also used the
to create the generalized LKD of type two. The competence of the distributions proposed based on the
profoundly attracted our team, and as a result, we suggested the Lagrangian version of the ZTPD, the zero-truncated binomial distribution, and the IPD (see [
26,
27,
28]). Moreover, the authors in [
24] demonstrated that every member of the
was also a member of the
. Thus, the authors observed from the literature that several members of both
and
were based on various variants of classical discrete distributions that have thoroughly been explored in the literature. Analogously, we were motivated to fill the sparseness of zero-truncated discrete distributions by considering the probability-generating function (PGF) of the ZTKD and generalizing it through the
and so we named the new distribution LZTKD.
An overview of the remaining study sections is provided below:
Section 2 provides a brief summary of the Lagrange expansions. The construction of the LZTKD and its statistical features are explored in
Section 3 and
Section 4, respectively. In
Section 5, it is established that the LZTKD belongs to the
class. In
Section 6, the maximum likelihood (ML) estimation approach is employed to explore the parameter estimation of the LZTKD. The significance of the additional parameter in the LZTKD is evaluated using the likelihood ratio test in
Section 7. The simulation results based on the maximum likelihood estimates (MLEs) are included in
Section 8.
Section 9 provides an empirical illustration of the LZTKD, and
Section 10 concludes the article.
3. Lagrangian Zero-Truncated Katz Distribution (LZTKD)
In this section, we adopt the PMF of the
given in Equation (
4) to derive the PMF of the LZTKD. Here, we consider
as the PGF of the KD with parameters
and
, and
as the PGF of the ZTKD with parameters
and
to generate the LZTKD.
The analytic functions given in Equation (
6) satisfy the conditions presented in
Section 2.3. That is, we have
Then, under the transformation
, the PMF of the
given in Equation (
4) can be derived as follows:
where
=
.
Hence, the definition of the LZTKD can be formalized as follows:
Definition 1. Assume that a random variable (RV) Y follows the LZTKD, with , , and . Then, the PMF of Y is given by
with
This distribution is denoted as LZTKD(), and one can write to inform that Y follows the LZTKD with the parameters , , and .
Now,
Figure 1 portrays the graphical representation of the PMF of the LZTKD for different parameter values of
,
, and
. We see that it is monotonically decreasing for increasing values of the parameters
and
, and decreasing the value of the parameter
as the value of
y increases. In addition, this graph takes on a bell-shaped appearance as the value of
y increases if both the
and
parameters increase but the parameter
remains constant.
The hazard rate function (HRF) of the LZTKD is obtained by substituting the PMF in the following equation:
From Equation (
8), it goes without saying that determining the closed-form expression of the HRF is more difficult. However, to determine the shape of the HRF, we sketched its graph.
Figure 2 demonstrates that it has increasing, decreasing, bathtub, and upside-down bathtub shapes for various parameter values.
Proof. For
, the LZTKD defined with the PMF given in Equation (
7) reduces to the ZTKD; the following PMF is obtained:
In this sense, the LZTKD is a generalization of the ZTKD. □
Proof. For
in Equation (
6), the PMF of the
given in Equation (
4) can be rederived as follows:
which is the PMF of the ZTKD given in [
3]. The proof is completed. □
4. Mathematical Properties
In this section, we present some important mathematical properties of the LZTKD, including the median, mode, factorial moments, mean, variance, coefficient of variation (CV), index of dispersion (IOD), skewness, and kurtosis.
4.1. Median
Let
Y be a RV following the LZTKD. The median of
Y is then defined by the smaller integer
such that
, also written as
4.2. Mode
Let
Y be a RV following the LZTKD. Then, the mode of
Y, denoted by
, exists in
. It corresponds to the integer
y for which the PMF
has the greatest value. That is, we aim to solve
and
. First, we note that
can also be written as
where
.
Obviously, the inequality
implies that
Moreover, the inequality
implies that
By combining Equations (
10) and (
11), we obtain the following condition:
4.3. Probability Generating Function
The Lagrangian transformation
, when expanded in powers of
u, provides the PGF of the
given in Equation (
5). That is,
where
with
.
Remark 1. The moment-generating function (MGF) of a RV Y following the LZTKD is obtained by putting and in Equation (13). This yieldswherewith.
4.4. Distribution of Sample Sum
Let
be
n independently and identically distributed (iid) RVs following the LZTKD. Then, the distribution of the sample sum
has the following PGF:
where
with
.
Indeed, based on the PGF of the LZTKD given in Equation (
13), the PGF of the RV
W becomes
4.5. Factorial Moment
For any integer
, the
rth factorial moments
of the LZTKD is calculated by successively differentiating
in Equation (
4)
r times with respect to
u, and by setting
. Thus, we consider
and
Taking the first derivative with respect to
u on both sides, we obtain
Then, taking second derivative with respect to
u, we obtain
Proceeding like this, we obtain an
rth derivative of the following form:
For
, Equation (
15) can be written as
We have
and
, which are substituted in Equation (
16) to yield
4.6. Mean and Variance
The mean () and variance () for the LZTKD are now determined.
Using Equation (
17), we have
and
4.7. Index of Dispersion and Coefficient of Variation
A normalized measure of dispersion can be obtained by using the variance-to-mean relationship. This measure, the well-known IOD, is given by
Analogously, the CV of the RV
Y has the following form:
The skewness and kurtosis coefficients of a distribution are frequently used to measure the degree of asymmetry and flatness, respectively. These coefficients are essential to characterize the shape of any distribution, but for the LZTKD, the expressions obtained for such measures were extensive and too lengthy. However, they can be calculated numerically. They are given in
Table 1, as well as the mean, variance, CV, and IOD for particular values of the parameters.
It is clear from this table that for and , the LZTKD exhibits overdispersion (IOD > 1) and for and , the LZTKD exhibits underdispersion (IOD < 1). When the parameter value of increases, the mean and variance of the LZTKD increases. Moreover, it is noteworthy that the LZTKD has various kurtosis levels and is mainly right-skewed.
6. Estimation of the Parameters
In this section, we estimate the unknown parameters of the LZTKD by the method of the ML.
As a first remark, the model related to the LZTKD is a three-parameter model with parameters
,
, and
. Let a random sample of size
n be from the LZTKD and let the observed frequency be
,
, so that
, where
k is the largest of the observed value having nonzero frequencies. Then, the corresponding likelihood function is given by
Thus, the log-likelihood function is obtained as
where
.
The maximization of
with respect to the parameters gives their respective MLEs. They can also be obtained by considering the following differentiation approach. The score function associated with this log-likelihood function is
Now, by solving
,
=0, and
simultaneously, we obtain the associated nonlinear log-likelihood equations. Consequently, these equations are given by
and
Thus, the solutions of these three equations give the MLEs.
In this research, we maximized the log-likelihood function to find the MLEs in the numerical optimization. The
fitdistrplus package of RStudio software was used to fix a lower and upper bound for each parameter using the numerical optimization technique “L-BFGS-B”, see [
30]. When there are uncertainties about the initial guesses and convergence of the algorithm,
fitdistrplus is a highly useful tool that provides original solutions for the MLEs. In order to provide the algorithm with good starting values, we employed the
prefit function of that package. Convergence is indicated using certain integer codes as one of the
mledist function’s returning components, with “0” denoting a successful convergence and “1” denoting that the maximum number of iterations is used. As a result, a value of “10” indicates that the algorithm is degenerate, and a value of “100” shows that the algorithm made a mistake inside. One can click on the following link for further information about this package
https://CRAN.R-project.org/package=fitdistrplus accessed on 3 January 2023. The corresponding R code is given in
Appendix A.
8. Simulation Study
To evaluate the performance of the estimates obtained using the ML estimation approach, we ran a quick simulation exercise in this section. We simulated an LZTKD random sample using the inverse transformation method (see [
32]). The following is the inverse transform algorithm for generating a value from the LZTKD:
- Step1:
Generate a random number from the uniform distribution.
- Step2:
, , .
- Step3:
If , set and stop.
- Step4:
, , .
- Step5:
Go to Step 3.
In the above description, P is the probability that , and F is the probability that X is less than or equal to i.
The iteration process was repeated times and three parameter sets were considered. The specification of these sets was as follows:
(i) and .
(ii) , and .
(iii) , and .
Thus, we computed the average of the mean square error (MSE), and average absolute bias using the MLEs.
The average absolute bias of the simulated estimates was calculated as and the average MSE of the simulated estimates was calculated as , in which i is the number of iterations, and is the MLE of .
Table 2 provides a summary of the study for samples of sizes 50, 250, 500, and 1000. As the sample size increases and for the three parameter sets, it can be seen that the MSEs are in decreasing order, and the MLEs of the parameters become closer to their original parameter values, indicating their consistency property.