1. Introduction
Compound discrete distributions serve as probabilistic models in various areas of applications, for instance, in ecology, genetics and physics. See, for example, [
1]. Distributions obtained by compounding a parent distribution with a discrete distribution are very common in statistics and in many applied areas. Suppose we have a system consisting of
N components, the lifetime of each of which is a random variable. Let
X be the maximum lifetime of the components. Clearly,
X has a compound distribution arising out of a random number
N of components; i.e.,
. On the other hand, in case of a system consisting of
N components whose energy consumption is a random variable, and assuming that
Z is the component whose energy consumption is minimal, we obtain the compound distribution of
The compounding principle is applied in the many different areas: insurance [
2], ruin problems [
3], compound risk models and their actuarial applications [
4,
5]. The development of the theory of compounding distribution is skipped here, because it has been covered in detail in [
6].
The random variable
N is often determined by economy, customer demand, etc. There is a practical reason why
N might be considered as a random variable. A failure can occur due to initial defects being present in the system. A discrete version of this distribution has been studied in [
7], having its applications in count data related to insurance.
We will say that random variable
X possesses the discrete Lindley distribution introduced by [
7] if its probability mass function is given by
where
and
The probability generating function (PGF) (see Equation (4) in [
7]) is given by the typo error. The corrected version is defined by
In this manuscript, we consider the previously discrete Lindley distribution for the random variable Why do we assume a discrete Lindley distribution? For example, using a Poisson distribution has an important assumption: equidispersion of data. The assumption of equidispersion is not valid in real cases. Some alternative distributions to the model of overdispersed data are available—binomial negative, generalized Poisson or zero inflated Poisson. However, judging by the number of parameters used, these alternatives are more complex than the Poisson distribution. That is why we are introducing a continuous Lindley distribution with one parameter, which is similar to the Poisson distribution. The application of the Lindley distribution in modeling the number of claim data is less suitable because the number of claims data is a discrete number, as opposed to the Lindley distribution’s continuous nature. That is why we are introducing a new discrete Lindley distribution, created through discretisation of a continuous Lindley distribution with one parameter.
Assuming that
M is the zero truncated version of
N with PGF (
1), we will construct two new families of distributions: the discrete Lindley-generated families of distributions of the first and second kinds.
The paper is organized as follows. In
Section 1, we construct two discrete Lindley generated families.
Section 2 is devoted to shape characteristics. In
Section 3 we derive some mathematical properties of the families. Estimation issues are investigated in
Section 4 and
Section 5. The simulation study is presented in
Section 6. Two applications to real data are addressed in
Section 7. The paper is finalized with concluding remarks.
2. Construction of the Families of Distributions
There are various methods for getting the discrete Lindley distribution. For example, in [
8], the authors considered a method of infinite series for constructing the discrete Lindley distribution. On the other hand, in [
9], the discrete Lindley distribution was built using the survival function method. In this manuscript, we employ the so-called max-min procedure. This construction is widely used in practice. For a comprehensive literature review, we refer the reader to [
10] and references therein.
In this section, we introduce two new families of distributions as follows. Let
be a sequence of independent and identically distributed (iid) random variables with baseline cumulative distribution function (CDF)
, where
and
is the parameter vector. Suppose that
N is a discrete random variable with the PGF
and let
M have the zero-truncated distribution of the random variable
N obtained by removing zero from
N. Then, the probability mass function (pmf) of
M is given by
In order to prove that
, let us recall that
. After some algebra, we find
Using serial representations
and
, one can calculate
Equation (
3) coincides with
This completes the proof that
First, we introduce the family of distributions based on the maximum of random variables. We define the random variable
. Then, the CDF and probability density function (PDF) of
X are given by
and
respectively.
Further, if we suppose that the random variable
N has the PGF given by (
1), the CDF and PDF of
X for
are given by
and
respectively. We say that the family of distributions defined by (
4) and (
5) is the
discrete Lindley generated family of the first kind (“LiG1” for short). A random variable
X having PDF (
5) is denoted by
LiF1
.
The hazard rate function (HRF) of
X can be expressed as
Let us study the identifiable property of the distribution given by (
4) under the exponential baseline distribution
. We will get the discrete Lindley exponential distribution of the first kind. We will designate this distribution LiE1.
Theorem 1. The LiE1 distribution is identifiable with respect to the parameters λ and
Proof. Let us suppose that
for all
and when
is the CDF of exponential distribution. If we let
into both sides of (
7) and after some algebra, it can be concluded that
. Now it is not hard to verify that
Hence the proof of the theorem. □
Second, in [
6], it was demonstrated that the random variable
has CDF and PDF given by
and
respectively.
Now, inserting (
1) in Equation (
8), the CDF of the random variable
Y becomes
where
is the survival function of the random variable
.
In a similar manner, by replacing (
1) in the Equation (
9), the PDF of
Y reduces to
The random variable
Y having the PDF (
11) is called the discrete Lindley generated family of the second kind,
LiF2
.
From Equations (
10) and (
11), the HRF of
Y follows as
where
is the HRF of the random variable
.
There are at least four motivations for having two families of distributions: Reliability: From the stochastic representations
X and
Y, we note that the two families can arise in parallel and series systems with identical components, which appear in many industrial applications and biological organisms. The first-activation scheme: If we assume that an individual is susceptible to a cancer type, then we can call the number of carcinogenic cells that survived the initial treatment
M, and
is the time needed for the
th carcinogenic cell to metastasise into a detectable tumour, for
. If we assume that
is a sequence of a total of iid random variables, all independent of
M, where
M is given by (
2), we can conclude that the time to relapse of cancer of a susceptible individual is defined by the random variable
Y. Last-activation scheme: Let us assume that
M equals the number of latent factors that have to be active by failure, and
is the time of disease resistance due to the latent factor
i. According to the last-activation scheme, the failure occurs once all
N factors are active. If the
s are iid random variables that are independent of
N having the baseline distribution
F, where
N follows (
2), the random variable
X can model time to the failure according to the last-activation scheme. The times to the last and first failures: Let us assume that the device failure happens due to initial defects numbering
M, and that these can be identified only after causing the failure, and that they are being repaired perfectly. We will define
as the time to the device failure due to the defect number
i, where
. Under the assumptions that the
s are iid random variables independent of
M given by (
2), the random variables
X and
Y are appropriate for modeling the times to the last and first failures.
3. Shape Characteristics of the Proposed Models under the Exponential Baseline Distribution
Let us examine the shapes of the PDF and HRF for the case of the exponential baseline distribution. Let the random variables
have the exponential distribution with scale parameter
. If we set
and replace it in (
5), we will get the LiE1 distribution. Its PDF is for
The exponential distribution is widely used due to its simplicity and applicability. For its usage in the theory of the compounding distribution, we recommend [
10], where it is possible to find a long list of the corresponding references.
In order to study the shape of the last PDF, firstly we will give the following example. The next example will serve us to prove Theorem 2. It will play a crucial role in the study of the inequality that is important for drawing the conclusion about the PDF’s shape.
Example 1. Suppose Find λ such that
Solution: An analytical solution of the above inequality is not possible, so we will use numerical algorithms. Let us consider the corresponding equation
Using function
Solve in Mathematica software ([
11]), we get that
Furthermore, using the function
Reduce we see that for
the inequality holds. The graphical solution is given in
Figure 1.
Theorem 2. The PDF of LiE1 with parameters and is unimodal if Otherwise, it is decreasing.
Proof. The first derivative of the logarithm of the PDF
can be represented in the form
where
,
and
. We transform the function
to a quadratic function
,
. Let
and
represent the roots of the equation
. Some calculations indicate that
,
,
and
Thus,
so we have
and
After some calculations, it can be shown that discriminant
is positive and that
is concave. We need to find when solution
If we set
, one gets
If
, then
It is not difficult to verify that the right-hand side of the last inequality is positive and we can quadrate (
13). Then, the inequality (
13) reduces to
Now, the assertion of the first part of Theorem follows from Example 1.
In case , is always positive on the interval , and hence the PDF is decreasing. □
Different shapes of the PDF in cases of LiE1 model are given in
Figure 2.
The HRF of the LiE1 distribution is
Determining the shape of a HRF of a distribution is an important issue in statistical reliability and survival analysis. We give it for the LiE1 model in the following theorem.
Theorem 3. The HRF of the LiE1 with parameters and is an increasing function.
Proof. The first derivative of the
can be represented as
where
a and
b were defined in Theorem 2,
and
. After extensive calculations, it can be shown that
Again, using the transformation
where
, we get quadratic equation
with
Thus, we have
The function
is concave, and it holds that
for all
Finally, the HRF is increasing. Hence, we proved Theorem. □
Different shapes of the HRF in the case of the LiE1 model are outlined in
Figure 3.
Now, we will study the shapes of the discrete Lindley exponential distribution of the second kind (LiE2) of distribution. By replacing
in Equation (
11), we obtain the PDF of the LiE2 distribution as
The shapes of the LiE2 distribution are given by the following theorem.
Theorem 4. The PDF of the LiE2 with parameters and is a decreasing function with and .
Proof. Similarly to in Theorem 2, we have
where
,
and
. We can prove that
is positive for all
. Letting
, we transform the function
to a quadratic function
;
. Let
represent the roots of the equation
. Since we have
,
and
,
which implies
. Since
and the discriminant
is positive, it follows that
is concave and positive on
, which means that
is positive for
. Finally,
is positive for all
and
. □
The HRF of the LiE2 distribution for
is given by
The shape of the HRF of the LiE2 distribution is given in the following theorem.
Theorem 5. The HRF of the LiE2 distribution with parameters and is an increasing function with and .
Proof. We consider the logarithm of the HRF
. Its first derivative can be expressed as
where
a and
b are defined as in the proof of the previous theorem,
and
. By letting
, we transform the function
to the quadratic function
;
. As before, let
be the roots of the equation
. Some calculations indicate that
,
,
and
, which implies that
Thus, two cases can be considered,
and
. The first case is not possible, since
which follows from the fact that
. Thus,
. Since
and the discriminant
is positive, it follows that
is a convex function and positive on
. This implies that
is positive for all
. Finally,
, which means that the HRF is an increasing function. □
Using similar calculations, we can derive the shapes of the PDF and HRF of
X and
Y given by (
5), (
6), (
11) and (
12), respectively, under various baseline distributions.
Figure 4 represents plots of the LiE2 density function, while on
Figure 5 we have plots of the LiE2 hazard rate functions for various parameter values.
Theorem 6. The LiE2 distribution function is identifiable with respect to the parameters θ and λ.
Proof. As was the case in the proof of Theorem 1, we will assume that for all and is the CDF of an exponential distribution. As a consequence, we have Then, from Theorem 5, we have that when Now, since after some algebra, it can be shown that from follows □
5. On the Maximum-Likelihood Estimation of Parameters
We propose to use the maximum likelihood (ML) estimation method for the parameter estimation of the introduced distributions. The log-likelihood function for the general case (
5) is given by
In this special case, we consider the exponential baseline distribution. Thus, for the LiE1 model, the estimating equations are given by
Now, we will study the existence of the ML estimators when the other parameter is known in advance (or given).
Theorem 7. If the parameter λ is known, then the Equation (25) has at least one root in the interval Proof. One can readily verify that
and
Thus, there exists at least one root of the Equation (
25). □
Theorem 8. Assuming thatand if the parameter θ is known, then (26) has at least one root on the interval Proof. Applying L’Hôpital’s rule, we get and
In order to have at least one solution, it is necessary to have . Hence the theorem. □
On the other hand, the estimating equations for the LiE2 model are given by
The next two theorems examine the existence problem of the ML estimates via (
27) and (28). Their proofs are very similar to those cases of Theorems 7 and 8, so we here omit them.
Theorem 9. If the parameter λ is known, then the Equation (27) has at least one root on the interval Theorem 10. If the parameter θ is known and if it is assumed thatthen the Equation (28) has at least one root on the interval Clearly, the log-likelihood estimating equations for the parameters are nonlinear in the sense that the estimators cannot be obtained in closed forms. Thus, a numerical iterative method such as the Newton–Raphson one should be used in the estimation.
7. Simulation Study
In this section, we consider LiE1 and LiE2 models and present a simulation study testing the performances of the estimators using the EM algorithm. We generated 10,000 random samples in batches of 50, 100 and 200 from both models.
We can generate random numbers from the
distribution by using the inverse transform method. Let
u be a random number from the uniform distribution on
. Employing some algebra, we have
, a number from the
distribution. Here,
where
,
,
and
.
Similarly, we can generate random numbers from the
distribution by using the inverse transform method. Let
u be a random number from the uniform distribution on
. Following some calculations, we have
, a number from the
distribution. Here,
where
and
.
We used R (R Core Team, 2020) with
uniroot to run the EM algorithms. We took the parameter values as the starting points for the iterations in the algorithms. The algorithms stopped when
. The simulation results of the empirical means and mean square errors (MSEs) are reported in
Table 1 and
Table 2. We observe that the estimates are close to the parameter values and the MSEs decrease with increasing sample size. This makes the use of the EM algorithm plausible for estimation.