*3.4. Simulation Study*

In this section, we assess the behavior of the maximum likelihood estimators for a finite sample of size n. Based on BNDL distribution, a simulation study is carried out. The simulation study is based on the following steps: firstly, generate *N* = 1000 samples of sizes *n* = 25, 50, ... , 500 from the BNDL distribution. Then, compute the maximum likelihood estimators for the model parameters. Lastly, compute the MSEs given by

$$\text{MSE}(p) = \frac{1}{1000} \sum\_{i=1}^{1000} (\not p - p)^2$$

For various parameters' values, the simulation's results provided in Figure 6 indicate that the estimated MSEs fall off toward zero when the sample size n increases. Hence, we have conclusive evidence to claim that the maximum likelihood estimation of *p* satisfies the asymptotic convergence of normality. The asymptotic normality of the MLE is a very well-known classic property given as follows. In a parametric model, we say that an estimator *p*ˆ based on *X*1, *X*2, *X*3, ... , *Xn* is consistent if *p*ˆ → *p* in probability as *n* → ∞. We say that it is asymptotically normal if <sup>√</sup>*n*(*p*<sup>ˆ</sup> <sup>−</sup> *<sup>p</sup>*) converges in distribution to a normal distribution. So *p*ˆ above is consistent and asymptotically normal.

**Figure 6.** Plots of the estimated parameter and MSEs for various values of *p*.

#### **4. Applications to Count Data**

In this section, to show the application, we used a real-life data set to examine the efficiency and superiority of the BNDL distribution in modeling real data practice, recently studied by Balakarishnan et al. [24], consisting of 744 discrete observations. Santiago, Chile is recognized as one of the most environmentally contaminated cities in the world. In order to obtain the level of air pollution and its associated adverse effects on humans in Santiago, the National Commission of Environment (CONAMA) of the government of Chile collects data on sulfur dioxide (SO2) concentrations in the air. The data corresponding to the hourly SO2 concentrations (in ppm) observed at a monitoring station located in Santiago city are:


The descriptive statistics of the data sets are, Mean = 2.93, Median = 2, Mode = 3, SD = 2.02, Coefficient of Variation = 0.69, Skewness = 4.32, Kurtosis = 34.57, Range = 24, Min value = 1 and Max value = 25.

We compare BNDL to Binomial–Discrete Lindley Distribution (BDLD) by Ku¸s et al. [15] and Negative Binomial distribution. The pmf of BDLD is given as

$$p\_x(\mathbf{x}; p) = \frac{p^{2\mathbf{x}} \left[ \left\{ p^3 - (1 - p)(1 - p - \mathbf{x}) \right\} \log(p) + (1 - p) \{ 1 - p(1 - p) \} \right]}{\left\{ 1 - \log(p) \right\} \{ 1 - p(1 - p) \}^{\mathbf{x} + 2}}$$

We considered the AIC (Akaike Information Criterion), CAIC (Consistent Akaike Information Criterion), BIC (Bayesian Information Criterion) and HQIC (Hannan–Quinn Information Criterion). The model with minimum values for these statistics could be chosen as the best model to fit the data. All results in Table 3 were obtained using the R PROGRAM.

**Table 3.** MLEs and their standard errors (in parentheses) with statistics AIC, BIC, HQIC and CAIC values for given data.


Figure 7 gives the quantile–quantile plot (Q-Q plot) and box plot and Figure 8 gives TTT plot versus the EHRF for the given data set. Total Time on Test (TTT plots) showed that the data set has an increasing hazard rate shape which is confirmed by EHRF. Figures 9 and 10 show the fitted model against its comparative distributions. These plots clearly show that the BNDL model is superior to well-known BDLD and Negative Binomial models.

**Figure 7.** (**a**) QQ plot and (**b**) box for the given data.

**Figure 8.** (**a**) TTT plot and (**b**) Expected Hazard Rate Function (EHRF) for the BDLD model for the dataset.

**Figure 9.** Fitted plots of BNDL and BDLD distribution for given data set.

**Figure 10.** Fitted plot of Negative Binomial distributions for given data set.

#### **5. Concluding Remarks**

A new one-parameter discrete distribution was proposed and its important distributional, monotonic, and reliability characteristics were explored. Some statistical and reliability properties of the proposed discrete model were derived. Various estimating approaches were discussed. A simulation study was conducted to determine the MLEs' accuracy and precision. The applicability of the proposed distribution in modeling a real-life discrete data set was demonstrated. It is clear from the comparison that the new distribution is the best distribution for fitting the data sets from among the all-tested distributions and it will be a useful contribution to the field of count data modeling.

**Author Contributions:** Conceptualization, S.S. and S.K.; methodology, W.M.; software, J.G.; validation, S.S. and S.K.; formal analysis, W.M.; investigation, S.S.; resources, F.J.; data curation, W.M.; writing—original draft preparation, S.S. and W.M.; writing—review and editing, S.K.; visualization, J.G.; supervision, S.K.; project administration, F.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.
