**1. Introduction**

Count data reflect the non-negative integers which represent the frequency of occurrence of a discrete event. Such datasets can be observed in numerous fields, such as actuarial science, finance, medical, sports, etc. For instance, the yearly number of destructive floods, the number of sports people injured in a month and the hourly number of COVID-19 vaccinations given are some examples of count data. Increasing the utilization of discrete distributions for modelling such datasets influenced researchers to propose more flexible distributions by reducing the estimation errors. Discretizing continuous distributions by survival discretization is one of the widely followed methods for introducing discrete distributions. The most famous discretization technique is described below. Assume that *X* is a continuous lifetime random variable with the survival function (sf) *S*(*x*) = Pr(*X* > *x*). Then, the probability mass function (pmf) dealing with *X* is given by:

$$\Pr(X = \mathbf{x}) = S(\mathbf{x}) - S(\mathbf{x} + 1), \quad \mathbf{x} = 0, 1, 2, \dots \tag{1}$$

Some of the recently introduced discrete distributions based on this survival discretization method are as follows: Discrete Lindley distribution by [1], discrete inverse Weibull distribution by [2], discrete Pareto distribution by [3], discrete Rayleigh distribution by [4], two-parameter discrete Lindley distribution by [5], exponentiated discrete Lindley distribution by [6], discrete Burr–Hatke distribution by [7], discrete Bilal distribution [8], discrete three-parameter Lindley distribution by [9], etc. Recently, Ref. [10] proposed a discrete version of Ramos–Louzada distribution [11] for asymmetric and over-dispersed data with a leptokurtic shape.

**Citation:** Irshad, M.R.; Chesneau, C.; D'cruz, V.; Maya, R. Discrete Pseudo Lindley Distribution: Properties, Estimation and Application on INAR(1) Process. *Math. Comput. Appl.* **2021**, *26*, 76. https://doi.org/ 10.3390/mca26040076

Academic Editor: Paweł Olejnik

Received: 14 October 2021 Accepted: 9 November 2021 Published: 12 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Furthermore, count datasets arising in time series can be seen in many applied research areas. Examples include modelling and predicting the number of claims for next month for the insurance sector in a company, predicting the number of deaths from disasters, etc. The first-order integer-valued autoregressive process, or INAR(1), is appropriate for such cases. The authors of [12,13] independently developed the pioneer works of INAR(1) with Poisson innovations. Furthermore, since time series of counts mainly display overdispersion (i.e., empirical mean is less than empirical variance), Poisson for innovation distribution is less efficient (since equi-dispersed). Hence, researchers have assembled many approaches concerning innovations in modelling over-dispersed time series count datasets. The INAR(1) process with geometric innovations (INAR(1)G) by [14], INAR(1) process with Poisson–Lindley innovations (INAR(1)PL) by [15], INAR(1) process with a new Poisson weighted exponential innovation ((INAR(1)NPWE)) by [16], INAR(1) process with discrete three-parameter Lindley as innovation by [9], INAR(1) process with discrete Bilal as innovation by [8], INAR(1) process with Poisson quasi Gamma innovations (INAR(1)PQX) by [17] and the INAR(1) process with Bell innovations (INAR(1)BL) by [18] are some of the recently developed over-dispersed INAR(1) processes.

Even though these processes provide better solutions to over-dispersed time series count datasets, they have some limitations that can sometimes cause computing difficulties. Even if a model has one parameter, the inclusion of special functions in the pmf, cumulative distribution function (cdf) and other statistical properties makes it difficult to obtain explicit expressions and, hence, for estimation procedures to generate them (see, e.g., [9,19]).

Hence, the main objective of the present work is to introduce a two-parameter discrete distribution, the discrete Pseudo Lindley (DPsL) distribution, which can serve as a model to analyse under as well as over-dispersed datasets, having a simple pmf and cdf. The main peculiarity of the proposed distribution is that it has closed-form expressions for its statistical properties such as a hazard rate function (hrf), probability-generating function (pmf), moments, skewness, kurtosis, mean past lifetime (mpl), mean residual lifetime (mrl), stress–strength reliability, etc. We embellish the importance of the DPsL distribution in the INAR(1) process by applying the DPsL distribution as an innovation process.

The remaining parts of the paper are organized as follows: Section 2 defines the proposed distribution and various properties such as moments, mean residual lifetime, mean past lifetime and stress–strength reliability,. Section 3 contains estimation methods and their simulation study. The INAR(1) process with DPsL innovations is developed in Section 4 with its parameter estimation and simulation study. In Section 5, three datasets are analysed by the DPsL distribution, and some other competitive and well-referenced distributions, in order to prove its applicability. Final remarks are provided in Section 6.

#### **2. The Discrete Pseudo Lindley Distribution**

#### *2.1. Some Basics*

A discrete analogue of the PsL distribution is derived in this section, namely, the DPsL distribution by using the survival discretization method. First of all, let us briefly present the work of [20], which introduced the Pseudo Lindley (PsL) distribution by mixing two independent random variables: one having the Exponential (*θ*) distribution, and the other having the Gamma (2,*θ*) distribution, with mixing probabilities *<sup>β</sup>*−<sup>1</sup> *<sup>β</sup>* and <sup>1</sup> *<sup>β</sup>* , respectively. Assume that *X* is a continuous random variable having the PsL distribution; then, its probability density function (pdf) and sf are given by:

$$f\_{\mathrm{PsL}}(\mathbf{x}; \theta, \boldsymbol{\beta}) = \left\{ \begin{array}{c} \frac{\theta(\boldsymbol{\beta} - 1 + \theta \boldsymbol{x}) e^{-\theta \boldsymbol{x}}}{\beta}, \; \mathbf{x} > \mathbf{0} \\\ 0 & \text{, otherwise} \end{array} \right.$$

and

$$S\_{\rm PsL}(\mathbf{x}; \theta, \beta) = \left\{ \begin{array}{ll} \frac{(\beta + \theta \mathbf{x})e^{-\theta \mathbf{x}}}{\beta}, \; \mathbf{x} > 0 \\\ 1 & \text{, otherwise} \end{array} \right. \tag{2}$$

respectively, where *β* ≥ 1 and *θ* > 0. Using the survival discretization technique as described in (1) by using (2), the pmf of the DPsL distribution can be derived as:

$$P\_{\rm DPsL}(\mathbf{x}; \theta, \theta) = \frac{(\beta + \theta \mathbf{x})\epsilon^{-\theta \mathbf{x}} - (\beta + \theta(\mathbf{x} + 1))\epsilon^{-\theta(\mathbf{x} + 1)}}{\beta}, \mathbf{x} = 0, 1, 2, \dots \tag{3}$$

The parameter *β* can be considered as a shape parameter and *θ* as a scale parameter. The DPsL distribution can sometimes be denoted by the DPsL (*θ*, *β*) distribution to indicate the parameters.

The corresponding cdf and sf are given by:

$$F\_{\rm DPsL}(\mathbf{x}; \theta, \beta) = 1 - \frac{\varepsilon^{-\theta(1+\mathbf{x})}(\beta + (\mathbf{x}+1)\theta)}{\beta}$$

and

$$S\_{\rm DPsL}(\mathbf{x}; \theta, \beta) = \frac{e^{-\theta(1+\mathbf{x})}(\beta + (\mathbf{x}+1)\theta)}{\beta},\tag{4}$$

respectively. As a first property, the pmf given in (3) is log concave, since:

$$\frac{P\_{\mathrm{DPsL}}(\mathbf{x} + \mathbf{1}; \boldsymbol{\theta}, \boldsymbol{\beta})}{P\_{\mathrm{DPsL}}(\mathbf{x}; \boldsymbol{\theta}, \boldsymbol{\beta})} = \frac{\boldsymbol{\beta} + \boldsymbol{\theta} + \mathbf{x}\boldsymbol{\theta} - \boldsymbol{\varepsilon}^{-\boldsymbol{\theta}}(\boldsymbol{\beta} + (\mathbf{2} + \mathbf{x})\boldsymbol{\theta})}{\boldsymbol{\beta}(\boldsymbol{\varepsilon}^{\boldsymbol{\theta}} - 1) + \boldsymbol{\theta}((\boldsymbol{\varepsilon}^{\boldsymbol{\theta}} - 1)\mathbf{x} - 1)}.$$

is a decreasing function in *x* for every possible value of the parameters.

The possible pmf shapes plotted for different values of the parameters of the DPsL distribution are displayed in Figure 1.

**Figure 1.** The pmf plots of the DPsL distribution for some set of values for *θ* and *β*.

The figure clearly indicates that the DPsL distribution is rightly skewed and has a longer right tail.

A mode of the DPsL distribution, e.g., *xm*, is an integer value of *x*, for which the pmf *P*DPsL(*x*; *θ*, *β*)is the maximum. That is *P*DPsL(*x*; *θ*, *β*) ≥ *P*DPsL(*x* +1; *θ*, *β*) and *P*DPsL(*x*; *θ*, *β*) ≥ *P*DPsL(*x* − 1; *θ*, *β*), which is equivalent to:

$$\frac{\theta(1+\epsilon^{\theta})-\beta(\epsilon^{\theta}-1)}{\theta(\epsilon^{\theta}-1)}-1 \le \mathbf{x}\_{\mathsf{M}} \le \frac{\theta(1+\epsilon^{\theta})-\beta(\epsilon^{\theta}-1)}{\theta(\epsilon^{\theta}-1)}.$$

Hence, if *<sup>θ</sup>*(1+*e<sup>θ</sup>* )−*β*(*eθ*−1) *<sup>θ</sup>*(*eθ*−1) <sup>≥</sup> 0, and:


If *<sup>θ</sup>*(1+*e<sup>θ</sup>* )−*β*(*eθ*−1) *<sup>θ</sup>*(*eθ*−1) <sup>&</sup>lt; 0, the mode of the DPsL distribution is *xm* <sup>=</sup> 0. The hrf of the DPsL distribution can be obtained as:

$$\begin{split} \eta\_{\mathrm{DPsL}}(\mathbf{x};\theta,\boldsymbol{\theta}) &= \quad \frac{P\_{\mathrm{DPsL}}(\mathbf{x};\theta,\boldsymbol{\theta})}{1 - F\_{\mathrm{DPsL}}(\mathbf{x};\theta,\boldsymbol{\theta})} \\ &= \quad \frac{(\boldsymbol{\beta} + \boldsymbol{\theta}\mathbf{x})\boldsymbol{\epsilon}^{-\theta\mathbf{x}} - (\boldsymbol{\beta} + \boldsymbol{\theta}(\mathbf{x}+1))\boldsymbol{\epsilon}^{-\theta(\mathbf{x}+1)}}{\boldsymbol{\epsilon}^{-\theta(1+\mathbf{x})}(\boldsymbol{\beta} + (\mathbf{x}+1)\boldsymbol{\theta})}. \end{split}$$

The hrf of the DPsL distribution was plotted for some set of values for *θ* and *β* in Figure 2.

**Figure 2.** The pmf plots of the DPsL distribution for some set of values for *θ* and *β*.

Figure 2 clearly indicates that the hrf of the DPsL distribution is always increasing for different values of the parameters.
