Next Article in Journal
Influence of Exhaust Pipe Structure on Urea-Related Deposits in Diesel Engine SCR System
Previous Article in Journal
The Roof-Fall Mechanism and Support-While-Drilling Technology of the Rectangular Roadway with Layered Roofs and Weak Interlayers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Extension of the Poisson Distribution: Features and Application for Medical Data Modeling

by
Mohamed El-Dawoody
1,*,
Mohamed S. Eliwa
2,3,4 and
Mahmoud El-Morshedy
1,3
1
Department of Mathematics, College of Science and Humanities in Al-Kharj, Prince Sattam bin Abdulaziz University, Al-Kharj 11942, Saudi Arabia
2
Department of Statistics and Operation Research, College of Science, Qassim University, Buraydah 51482, Saudi Arabia
3
Department of Mathematics, Faculty of Science, Mansoura University, Mansoura 35516, Egypt
4
Section of Mathematics, International Telematic University Uninettuno, I-00186 Rome, Italy
*
Author to whom correspondence should be addressed.
Processes 2023, 11(4), 1195; https://doi.org/10.3390/pr11041195
Submission received: 9 March 2023 / Revised: 28 March 2023 / Accepted: 10 April 2023 / Published: 13 April 2023
(This article belongs to the Section Advanced Digital and Other Processes)

Abstract

:
This paper introduces and studies a new discrete distribution with one parameter that expands the Poisson model, discrete weighted Poisson Lerch transcendental (DWPLT) distribution. Its mathematical and statistical structure showed that some of the basic characteristics and features of the DWPLT model include probability mass function, the hazard rate function for single and double components, moments with auxiliary statistical measures (expectation, variance, index of dispersion, skewness, kurtosis, negative moments), conditional expectation, Lorenz function, and order statistics, which were derived as closed forms. DWPLT distribution can be used as a flexible statistical approach to analyze and discuss real asymmetric leptokurtic data. Moreover, it could be applied to a hyperdispersive data model. Two different estimation methods were derived, i.e., maximal likelihood and the moments technique for the DWPLT parameter, and some advanced numerical methods were utilized for the estimation process. A simulation was performed to examine and analyze the performance of the DWPLT estimator on the basis of the criteria of the bias and mean squared errors. The flexibility and fit ability of the proposed distribution is demonstrated via the clinical application of a real dataset. The DWPLT model was more flexible and worked well for modeling real age data when compared to other competitive age distributions in the statistical literature.

1. Introduction

In life testing and reliability analysis situations, it is often difficult to measure the life of a device or component on a continuous scale. For example, in survival analysis, the number of days that a patient survived after treatment or of recorded hours/days representing the time from remission to relapse is a random discrete variable. Similarly, in reliability experiments, the life of a device receiving a number of shocks before failure or of an on/off switching machine whose longevity depends on the number of times the device is turned on/off is discrete. In all these cases, lifetime and age cannot be measured on a continuous scale but are simply computed, so discrete distributions are better options for modeling therm.
In the past few decades, many papers focusing on discrete distributions have been presented for modeling lifetime data in many fields, such as insurance, engineering, agriculture, and the medical, physical, and biological sciences. A continuous random variable may be characterized by, for example, its cumulative distribution function, probability density function, moments, hazard rate functions, and reversed hazard rate functions. The discrete analog of the continuous model is mainly constructed on the basis of the principle of preserving one or more characteristic properties of the continuous characteristic. Thus, there are various techniques of discretizing a continuous model depending on the property we want to preserve.
In stress-force models and analysis, the system or component encounters random stress during its function and has an inherent variable force that allows for it to work only when the force is greater than the stress. Its chance of success is called reliability. If the force and stress distributions are known, reliability can usually be obtained using normal transformation techniques. However, when the functional relationships of force and tension are complex, such analytical techniques are intractable. In this case, the exact solution is not available, and an alternative technique must be adopted to roughly approximate the actual reliability, for example, (i) Monte Carlo simulation approaches, (ii) Taylor series methods, (iii) numerical integration techniques, and (iv) discretization techniques. For more details, see Yari and Tondpour [1]. Here, a discretization approach was applied to create a flexible discrete probabilistic model with one parameter. Because of the flexibility of the discretization approach, it has been used by many authors in the statistical literature to produce adaptive discrete models; for instance, discrete Rayleigh distribution by Roy [2], discrete Burr and discrete Pareto distributions by Krishna and Pundir [3], the generalization of geometric distribution by Gomez-Déniz [4], discrete inverse Weibull distribution by Jazi et al. [5], discrete Burr Type III distribution by Al-Huniti and Al-Dayian [6], the discrete generalized exponential distribution of the second type by Nekoukhou et al. [7], discrete inverse Rayleigh distribution by Hussain and Ahmad [8], two-parameter discrete Lindley distribution by Hussain et al. [9], discrete Lindley distribution by Abebe and Shanker [10], new discrete Lindley distribution by Al-Babtain et al. [11], new three-parameter discrete Lindley distribution by Eliwa et al. [12], and new extended geometric distribution by Almazah et al. [13].
This study focuses on an extension of Poisson distribution. The Poisson model is used extensively for modeling count data in a range of different scientific fields. For this reason, many researchers have studied its properties, extensions, and modifications, and proposed generalizations to modify and increase the initial distribution proposed by Poisson [14]. Although the mean of a Poisson distribution equalling its variance is useful in specific situations, this often limits the ability of the distribution to accurately model practical data. To weaken this assumption and overcome the limitations of Poisson distribution, several authors have developed different discrete distributions for modeling asymmetrically dispersed count data. Castillo and Pérez-Casany [15] proposed a weighted Poisson distribution for over- and underdispersion situations from the concept of the weighted distribution introduced by Fisher [16] in conjunction with Poisson distribution. The weighted Poisson distribution overcomes the inherent limitation of scattering equivalence and can model multimodal data. Another valuable property of this distribution is its ability to model and describe truncated data (see Dietz and Bhning [17]). Moreover, several general properties of the weighted Poisson distribution were examined by Kokonendji et al. [18], who expanded on the paper published by Castillo and Perez-Casany [15], and specifically on how the shape of the weight function relates to the dispersion of the resulting weighted Poisson distribution.
Given the significance of weighted Poisson distribution, a new elastic extension was established in the discrete weighted Poisson–Lerch transcendent (DWPLT) model, which is weighted Poisson distribution w ( x ) = x ! ( x + 1 ) ( x + 2 ) from the discrete Burr–Hatke model (see El-Morshedy et al. [19]). Thus, the resulting model is DWPLT distribution. The cumulative distribution function (CDF) of the DWPLT model can be expressed as follows:
F ( x ; a ) = 1 a x + 3 ( a 1 ) Φ ( a , 1 , x + 3 ) + 1 x + 2 a + ( 1 a ) log ( 1 a ) ; x = 0 , 1 , 2 , 3 , ,
where x is any value belonging to the positive domain of discrete random variable X, 0 < a < 1 is a scale parameter of the DWPLT model, and Φ ( a , s , z ) = k = 0 a k ( z + k ) s is the Lerch transcendent function that can be computed with HurwitzLerchPhi [ a , s , z ] in Mathematica software (see Wolfram Research [20]). The corresponding probability mass function (PMF) to Equation (1) is as follows:
Pr ( X = x ; a ) = a x + 2 a + ( 1 a ) log ( 1 a ) ( x + 1 ) ( x + 2 ) ; x = 0 , 1 , 2 , 3 , .
Equation (2) can be derived via a survival discretization approach as follows:
Pr ( X = x ; a ) = S ( x ; a ) S ( x + 1 ; a ) ; x = 0 , 1 , 2 , 3 ,
where S ( x ; a ) = 1 F ( x ; a ) . It is easy to show that Pr ( X = x + 1 ; a ) Pr ( X = x ; a ) = a ( x + 1 ) x + 3 < 1 ; then, the PMF is the decreasing function in x.If random variables X 1 and X 2 are independent and have the DWPLT model, PMF for Y = X 1 + X 2 and Z = X 1 X 2 can be formulated as follows, respectively:
Pr ( X 1 + X 2 = k ; a ) = n = 0 k Pr ( X 1 = n ; a ) Pr ( X 2 = k n ; a ) = a 4 + k [ a + ( 1 a ) log ( 1 a ) ] 2 n = 0 k 1 Ω ( n , k )
and
Pr ( X 1 X 2 = k ) = n = 0 Pr ( X 2 = n ) Pr ( X 1 = k + n ) = a k [ Θ ( a , k ) + Υ ( a , k ) ] [ a + ( 1 a ) log ( 1 a ) ] 2 k ( k 2 1 ) ,
where k = 0 , 1 , 2 ,
Ω ( n , k ) = ( n + 1 ) ( n + 2 ) ( k n + 1 ) ( k n + 2 ) , Θ ( a , k ) = a 4 ( k + 1 ) Φ ( a 2 , 1 , k + 1 ) + a 4 ( 1 k ) Φ ( a 2 , 1 , k + 2 ) , Υ ( a , k ) = a 2 ( k + 1 ) + ( 1 + k + a 2 ( 1 k ) ) log ( 1 a 2 ) .
In medical and engineering fields, the distribution of two random variables Y = X 1 + X 2 and Z = X 1 X 2 is very important. In engineering, random variable Y or Z can be referred to as the sum of two signals from two different sources or the sum of stresses from two sources on a component/machine. In medicine, random variable Y or Z can be, for example, the effect of two diseases on a specific organ in the human body. Assuming that random variable X had DWPLT distribution, the hazard rate function (HRF) is as follows:
h ( x ; a ) = a ( a 1 ) Φ ( a , 1 , x + 2 ) + 1 x + 1 ( x + 1 ) ( x + 2 ) ; x = 0 , 1 , 2 , 3 , ,
where h ( x ; a ) = Pr ( X = x ; a ) 1 F ( x 1 ; a ) . Figure 1 shows the PMF and HRF plots for different values of the DWPLT parameter.
The shape of the PMF and HRF always decreases. Moreover, PMF can be applied to discuss and evaluate asymmetric unimodal data. Suppose that V 1 and V 2 are independent DWPLT variables with parameters a 1 and a 2 , respectively. Then, the HRF of U = min ( V 1 , V 2 ) is given as follows:
h U ( x ; a 1 , a 2 ) = Pr ( min ( V 1 , V 2 ) x ) Pr ( min ( V 1 , V 2 ) x + 1 ) Pr ( min ( V 1 , V 2 ) x ) = Pr ( V 1 x ) Pr ( V 2 = x ) + Pr ( V 1 = x ) Pr ( V 2 x ) Pr ( V 1 = x ) Pr ( V 2 = x ) Pr ( V 1 x ) Pr ( V 2 x ) = a 1 ( a 2 1 ) Φ ( a 2 , 1 , x + 2 ) + 1 x + 1 + a 2 ( a 1 1 ) Φ ( a 1 , 1 , x + 2 ) + 1 x + 1 ( a 1 1 ) Φ ( a 1 , 1 , x + 2 ) + 1 x + 1 ( a 2 1 ) Φ ( a 2 , 1 , x + 2 ) + 1 x + 1 ( x + 1 ) ( x + 2 ) ,
where Pr ( X 1 = x , X 2 = x ) 0 . Since the PMF of X decreased in x, X had decreasing reversed HRF (DRHRF). Let F be the DWPLT life distribution, and the corresponding PMF be denoted by sequence f k , k 0 . Then, this sequence has DRHRF.

2. Statistical Properties

2.1. Moments and Auxiliary Statistical Measures

Descriptive statistics is an invaluable tool used in data analysis, as it allows for summarizing and interpreting large datasets, providing a concise overview of the key points. Descriptive statistics provide quantifiable information about the data such as the mean, median, variance, standard deviation, skewness, and kurtosis range that can be presented in graphical form. This enables us to quickly identify patterns and trends within the data, allowing for more accurate conclusions to be drawn. Additionally, descriptive statistics can be used to identify outliers in the data, thus providing an even more comprehensive understanding of the dataset. Assuming that random variable X had the DWPLT model, the probability generating function (PrGF) could be expressed as follows:
E ( s X ) = k = 0 s k Pr ( X = k ; a ) = a 2 a + ( 1 a ) log ( 1 a ) k = 0 ( a s ) k ( k + 1 ) ( k + 2 ) = a s + ( 1 a s ) log ( 1 a s ) s 2 a + ( 1 a ) log ( 1 a ) .
The corresponding moment generating function (MGF) to Equation (6) is as follows:
E ( e t X ) = a e t + ( 1 a e t ) log ( 1 a e t ) e 2 t a + ( 1 a ) log ( 1 a ) .
The first four moments of X are as follows:
E ( X ) = 2 a + ( a 2 ) log ( 1 a ) a + ( 1 a ) log ( 1 a ) , E ( X 2 ) = 3 a 2 + 4 a + ( a 2 5 a + 4 ) log ( 1 a ) a ( 1 a ) + ( 1 a ) 2 log ( 1 a ) , E ( X 3 ) = 4 a 3 + 13 a 2 8 a + ( a 3 10 a 2 + 17 a 8 ) log ( 1 a ) a ( 1 a ) 2 + ( 1 a ) 3 log ( 1 a ) , E ( X 4 ) = 5 a 4 + 32 a 3 41 a 2 + 16 a + ( a 4 19 a 3 + 51 a 2 49 a + 16 ) log ( 1 a ) a ( 1 a ) 3 + ( 1 a ) 4 log ( 1 a ) .
For the moments, symbolic software such as Maple was used. On the basis of the PMF and the law of mathematical expectation, the moments were derived in closed forms. Variance, skewness, and kurtosis can also be derived in closed forms: V a r ( X ) = E ( X 2 ) E ( X ) 2 , S k e w n e s s ( X ) = E ( X 3 ) 3 E ( X 2 ) E ( X ) + 2 E ( X ) 3 V a r i a n c e ( X ) 3 / 2 , and K u r t o s i s ( X ) = E ( X 4 ) 4 E ( X 2 ) E ( X ) + 6 E ( X 2 ) E ( X ) 2 3 E ( X ) 4 V a r i a n c e ( X ) 2 . Figure 2 shows descriptive statistics of the proposed model on the basis of various values of the distribution parameter.
The expectation value was less than the value of the variance for all parameter spaces/domains. Thus, the proposed distribution could be used to model overdispersed data. This important feature can be applied to discuss and evaluate actual data. Moreover, it can be used as a flexible probability model for analyzing positively skewed leptokurtic data. The first-order negative moment (NeM) for any 0 < a < 1 and b > 0 is as follows:
E [ ( X + b ) 1 ] = a 2 Φ ( a , 1 , b ) + a b a + ( 2 a 1 + b a b ) log ( 1 a ) a + ( 1 a ) log ( 1 a ) ( b 1 ) ( b 2 ) .
In general, the theory behind the existence of negative moments (NeMs) is very difficult and not as complete as that involving positive moments. However, NeMs have many applications, especially in communication networks and the Fourier transforms of function density.

2.2. Conditional Expectation

Conditional distribution is a statistical concept that describes the probability of an event occurring when another event has already occurred. This type of distribution is useful in examining the probability of outcomes that are dependent on a specific event. Conditional distribution is used in a variety of applications, including predictive analytics, forecasting, and decision making. This type of probability is calculated by dividing the probability of the two events occurring together by the probability of the first event occurring alone. Conditional distribution can also be used to analyze the relationship between two variables. This type of analysis can help in understanding how a change occurs. Considering that random variable X had the DWPLT model, the conditional expectations for X | X x and X | X > x are as follows, respectively:
E ( X | X x ) = 1 F ( x ) t = 0 x t Pr ( X = t ; a ) = ( a 2 ) a x + 3 Φ ( a , 1 , x + 3 ) + a x + 3 x + 2 2 a + ( a 2 ) log ( 1 a ) F ( x ; a ) a + ( 1 a ) log ( 1 a )
and
E ( X | X > x ) = 1 1 F ( x + 1 ) t = x + 1 t Pr ( X = t ; a ) = a x + 3 2 Φ ( a , 1 , x + 3 ) Φ ( a , 1 , x + 2 ) 1 F ( x + 1 ; a ) a + ( 1 a ) log ( 1 a ) .
Once the value of the model parameter had been calculated, the conditional expectation was reported to the DWPLT model.

2.3. Order Statistic (OrSc)

In statistics, the nth OrSc of a statistical sample is equal to its n-th smallest value. Order statistics (OrSs) and rank statistics are among the most basic approaches to nonparametric statistics and inferences. Important special cases of the OrSs are the minimal and maximal values, median, and other quantiles of a sample. OrSs are the estimation basis for upper- or lower-score data, and all types of censored data (left, right, climatic, interval, random, progressive). Thus, the possibility of using corresponding CDF and PMF for recorded/ordered observations is intriguing, especially in medical and engineering fields. Suppose that X 1 , X 2 , , X n is a random sample from the DWPLT model, and let X 1 : n , X 2 : n ,⋯, X n : n be their corresponding order statistics. Then, the CDF of the i-th order statistics X i : n for an integer value of x is as follows:
F i : n ( x ; a ) = k = i n n k F ( x ; a ) k [ 1 F ( x ; a ) ] n k = k = i n j = 0 k Θ ( j ) ( n , k ) [ 1 F ( x ; a ) ] n k + j = k = i n j = 0 k Θ ( j ) ( n , k ) a x + 3 ( a 1 ) Φ ( a , 1 , x + 3 ) + 1 x + 2 a + ( 1 a ) log ( 1 a ) n k + j ,
where Θ ( j ) ( n , k ) = ( 1 ) j n k k j and F i : n ( 1 ) = 0 . The PMF of the ith order statistics can be listed as
Pr ( X i : n = x ; a ) = F i : n ( x ; a ) F i : n ( x ; a ) ; x = 0 , 1 , 2 , 3 ,
According to the PMF of the i-th order statistics, some descriptive statistics can be derived on the basis of the L-moment concept.

2.4. Lorenz Curve

Variability in a statistical series can be measured via various scales such as the Lorenz curve (LoC), which is the cumulative percentage curve. Generally, Lorenz curves are applied to measure the variance/variability in the distribution of income and wealth. Hence, the LoC is a measure of deviation in the actual distribution of the statistical series from the line of the isoquant. The extent of this deviation is the Lorenz modulus. If the distance between the LoC and the isoquant curve is greater, there is more inequality or variance in the series and vice versa. Assuming that random variable X had DWPLT distribution, the LoC function of X is defined as follows:
L ( i ) = 1 j = 0 j Pr ( X = j ; a ) j = 0 i j Pr ( X = j ; a ) = ( 1 a ) a i + 3 Φ ( a , 1 , i + 3 ) a i + 3 i + 2 + a + ( 1 a ) log ( 1 a ) 2 a + ( a 2 ) log ( 1 a ) ; i = 0 , 1 , 2 , 3 ,
The advantages of the LoC are that it is attractive and gives a rough idea of the extent of dispersion. Further, LoC facilitates comparing two or more series/chains. Its disadvantages are as follows: with the use of the LoC, one can only have a relative idea of the dispersion of a given distribution compared to the isotropy line. Moreover, it does not provide any numerical variance values for the given distribution.

3. Estimation Methods: Unbiased and Consistent Estimators

In this section, two different estimation approaches are derived and discussed in detail: maximal probability and the moment method. The main objective of studying different estimation approaches is to find the best estimator for data analysis to perfect modeling and predictions.

3.1. Maximal Likelihood Estimation

Maximal likelihood estimation (MLE) is a method of estimating unknown parameters by selecting values that maximize the likelihood of observing a given set of data. This technique is often used in various types of statistical modeling such as regression and classification. It is a popular approach due to its simplicity and easy implementation. At its core, MLE is a mathematical approach for finding the probability distribution of an unknown variable on the basis of a given sample of data. This is achieved by finding the maximal value of the likelihood function, which is based on the probability distribution of the given data. The likelihood function is computed by taking the product of the probability of the observations in the dataset. MLE is used in many areas of study, including economics, biology, engineering, and computer science. In economics, it is used for predictions of the probability of future events based on past observations. In biology, it is used to estimate gene frequencies and the relationships between genes. In this section, we determine the MLE of the DWPLT parameter according to a complete sample. X 1 , X 2 , , X n was assumed to be a random sample of size n from DWPLT distribution. The log-likelihood function (L) is as follows:
L ( x ; a ) = log a i = 1 n x i + 2 i = 1 n log a + ( 1 a ) log ( 1 a ) i = 1 n log x i + 2 i = 1 n log x i + 2 .
To estimate model parameter a, first partial derivative L ( x ; a ) a should be as follows:
L ( x ; a ) a = 1 a i = 1 n x i + 2 + i = 1 n log ( 1 a ) a + ( 1 a ) log ( 1 a ) ,
Then, the resulting equation equates to zero, namely, it becomes a “normal equation” that cannot be solved analytically.So, an iterative procedure such as Newton–Raphson is required to solve it numerically.

3.2. Moment Estimation

Moment estimation (MoE) is a nonparametric statistical approach utilized to estimate the parameters of a population or a probability model. This technique is often applied when the moments of a probabilistic model/system or model are in closed view. Random variable X had DWPLT distribution. Then, the value of the estimator a ^ could be derived via solving the following equation for a:
2 a + ( a 2 ) log ( 1 a ) a + ( 1 a ) log ( 1 a ) i = 1 n x i n = 0 .
The estimator of the model could not be expressed in a closed expression. Thus, a digital approach had to be applied.

4. Estimator Performance: Simulation Results

Simulation studies are a popular and effective method of testing estimator performance in a variety of scenarios. By running several simulations, it is possible to approximate real-world performance and gain insight into how well the estimator could function in the field. The first step of a simulation study is to establish the parameters of the experiment. This involves setting up criteria for the data to be sampled, including sample number and size, and the sampling process. Once the parameters are established, the next step is to generate a simulated dataset that matches the specified criteria. This dataset should have features that are as close as possible to the features of real-world data. Once the dataset is generated, the estimator can be applied to the data. Depending on the type of estimator, different metrics may be used to measure the performance of the estimator. These metrics include bias, root mean squared error, and mean absolute error. In this segment, the performance of the MLE and MoE was tested under different schemes for the DWPLT parameter as follows: Scheme I: (∀a = 0.1 | n 1 = 20, n 2 = 50, n 3 = 100, n 4 = 150, n 5 = 300, n 6 = 500, n 7 =700, n 8 = 1000); Scheme II: (∀a = 0.4 | n 1 = 20, n 2 = 50, n 3 = 100, n 4 = 150, n 5 = 300, n 6 = 500, n 7 =700, n 8 = 1000); Scheme III: (∀ a = 0.7 | n 1 = 20, n 2 = 50, n 3 = 100, n 4 = 150, n 5 = 300, n 6 = 500, n 7 =700, n 8 = 1000); Scheme IV: (∀ a = 0.9 | n 1 = 20, n 2 = 50, n 3 = 100, n 4 = 150, n 5 = 300, n 6 = 500, n 7 =700, n 8 = 1000). Numerical assessments were performed depending on the bias and mean squared errors (MSE). First, we generated N = 10000 samples of the DWPLT model; the results are listed in Table 1 and Table 2, and substantiated in Figure 3 and Figure 4.
According to the simulation results and performance of n + , the MSE and bias decreased, and an unbiased estimator was thereby achieved for large samples under consistency. Thus, both the maximal likelihood approach and method of moments could be used to effectively estimate model parameters.

5. Data Analysis: Kidney Dysmorphogenetics

In this section, we illustrate the flexibility of the DWPLT model by using real medical data. Here, we examine the fitting capability of the DWPLT distribution with other competitive distributions, namely, geometric (Geo), discrete Rayleigh (DR) (see Roy [2]), discrete inverse Rayleigh (DIR), (see Hussain and Ahmad [8]), discrete Bilal (DBL; see Altun et al. [21]), Poisson (Poi; see Poisson [22]), discrete Pareto (DPa; see Krishna and Pundir [3]), one-parameter discrete flexible (DF-I; see Eliwa and El-Morshedy [23]), discrete log-logistic (DLogL; see Para and Jan [24]), discrete inverse Weibull (DIW; see Jazi et al. [5]), discrete Lomax (DLo; see Para and Jan [25]), binomial (Bin), discrete Burr Type II (DB-II; see Para and Jan [25]), one-parameter discrete Lindley (DL-I), two-parameter discrete Lindley (DL-II), three-parameter discrete Lindley (DL-III), Poisson Lindley (PoiL; see Shanker and Mishra [26]), natural discrete Lindley (NDL; see Almazah et al. [27]), discrete inverted Topp-Leone (DITL; see Eldeeb et al. [28]), and discrete gamma Lindley (DGL; see El-Morshedy et al. [29]) distributions. The fitted models were compared using the criteria of negative maximized log-likelihood ( L ), Akaike information criterion (Ac), corrected Akaike information criterion (CAc), Hannan–Quinn information criterion (Hc), Bayesian information criterion (Bc), and the chi-squared (Chi 2 ) test with its corresponding P-value (Pv) based on the degree of freedom (Dm). The Ac is a mathematical approach for evaluating and discussing the suitability of a distribution for the real data from which it was generated. In statistics, the Ac is applied to compare different possible distributions and models, and to select the most appropriate distribution/model for the data. The standard correction of Akaike’s information criterion assumes the same predictors for training and validation, thus underestimating the prediction error of random predictors. The CAc was derived for regression models containing a mixture of random and fixed predictors. Both Ac and CAc are estimators of the prediction error and thus the relative quality of statistical distributions for a given set of real data. The Hc is a measure of the suitability of statistical distribution that is often applied as a criterion for distribution selection among a limited set of models; it is not based on the log-likelihood function, but is related to the Ac. The Hc is an alternative to Ac in some positions in practical fields. In statistics, the Bc or Schwarz information criterion is for distribution selection among a finite set of distributions where models with a lower Bc are generally preferred. It is partly based on the probability function and is closely related to the Ac. When fitting distributions, it is possible to increase the probability by adding parameters, but this may result in overfitting. Bc, Ac, and CAc try to solve this problem by proposing a partial term for the number of parameters in the distribution. The penalty duration is greater in Bc than that in Ac for samples larger than 7. Bc was derived and developed by Gideon E. Schwarz. who provided a Bayesian argument for its adoption. Lastly, chi-squared is a statistical suitability test utilized to determine whether a variable is likely to have come from a specified distribution. It is often applied to assess whether sample data are representative of the entire population. The chi-squared quality-of-fit test, in other words, is a kind of Pearson chi-squared test. Statisticians can apply it to test whether the observed distribution of a categorical variable differs from their expectations.
The kidney dysmorphogenetic dataset was taken from the study of Chan et al. [30]. It was discussed and analyzed by several authors in the statistical literature; for more details, see citations on Google Scholar for Chan et al. (2010). The data are: 0, 1, 0, 0, 3, 2, 0, 1, 0, 4, 0, 0, 1, 0, 3, 0, 0, 0, 2, 0, 1, 0, 0, 0, 1, 8, 2, 3, 0, 5, 0, 0, 0, 0, 7, 0, 10, 0, 0, 1, 0, 0, 0, 2, 11, 0, 6, 0, 0, 1, 0, 8, 0, 0, 7, 0, 1, 2, 0, 4, 0, 0, 0, 0, 1, 0, 9, 3, 0, 0, 0, 6, 2, 0, 1, 0, 2, 0, 4, 0, 11, 0, 0, 2, 0, 4, 0, 1, 3, 0, 0, 2, 0, 0, 5, 0, 1, 0, 0, 0, 0, 0, 2, 1, 0, 3, 0, 0, 0, 0, 1. The raw shape of kidney dysmorphogenetic data was explored using nonparametric sketches: relative frequency visualization, box plot, score diagrams, and strip representation. According to the previous nonparametric plots, kidney dysmorphogenetic data were not symmetric, and some extreme values were reported (see, Figure 5).
MLEs, standard errors (SEs), upper (U) and lower (L) confidence intervals (CIs) for the parameters, and goodness-of-fit measures (GOFMs) for kidney dysmorphogenetic data are reported in Table 3, Table 4 and Table 5.
The DPa, DLogL, DIW, DLo and DB-II distributions worked quite well beside DWPLT. However, the DWPLT distribution was the best among all tested models. Figure 6 shows that the MLEs were unique because the estimator was monomorphic. We could not draw the contour diagram because the proposed model had only one parameter. Thus, the log-likelihood profile was sufficient to substantiate our claim.
Figure 7 shows our empirical results reporting that DWPLT was more fit to analyze the data set I.
The moment estimator showed that the maximal-likelihood and moment estimators were equal and unique.

6. Results and Future Work

In this article, a new one-parameter discrete model was proposed, DWPLT, which is considered a generalization to the Poisson distribution. Some basic distributional characteristics of the presented distribution were derived in closed forms and are discussed. The DWPLT model was capable of modeling positively skewed leptokurtic data and describing hyperscattered data. Moreover, the hazard rate function decreased on the basis of the model parameter. Two estimation methods were applied to obtain the best estimator for the DWPLT parameter. Both approaches worked very well for this purpose across simulation schemes. The practical significance of the DWPLT distribution was demonstrated using a real medical dataset, and it was compared to other competitive lifetime distributions. The DWPLT model had more flexibility in fitting the dataset than that of the mentioned models. Lastly, DWPLT distribution could be useful in many applications, such as environmental studies, reliability theory, and actuarial and medical sciences. As future work, the regression model and time series analysis will be discussed.

Author Contributions

Conceptualization, M.S.E. and M.E.-M.; methodology, M.E.-D., M.S.E. and M.E.-M.; software, M.E.-M.; validation, M.E.-D.; formal analysis, M.S.E.; investigation, M.E.-M.; resources, M.E.-D.; data curation, M.S.E., M.E.-D. and M.E.-M.; writing—original draft preparation, M.S.E. and M.E.-M.; writing—review and editing, M.S.E.; visualization, M.S.E. and M.E.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Research & Innovation of the Ministry of Education in Saudi Arabia grant number [IF-PSAU-2022/01/22580].

Data Availability Statement

All datasets are listed within the paper.

Acknowledgments

The authors extend their appreciation to the Deputyship for Research and Innovation of the Ministry of Education in Saudi Arabia for funding this research work through project number IF-PSAU-2022/01/22580.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yari, G.; Tondpour, Z. Some new discretization methods with application in reliability. Appl. Appl. Math. Int. J. (AAM) 2018, 13, 6. [Google Scholar]
  2. Roy, D. Discrete Rayleigh distribution. IEEE Trans. Reliab. 2004, 53, 255–260. [Google Scholar] [CrossRef]
  3. Krishna, H.; Pundir, P.S. Discrete Burr and discrete Pareto distributions. Stat. Methodol. 2009, 6, 177–188. [Google Scholar] [CrossRef]
  4. Gomez-Déniz, E. Another generalization of the geometric distribution. Test 2010, 19, 399–415. [Google Scholar] [CrossRef]
  5. Jazi, M.A.; Lai, C.D.; Alamatsaz, M.H. A discrete inverse Weibull distribution and estimation of its parameters. Stat. Methodol. 2010, 7, 121–132. [Google Scholar] [CrossRef]
  6. Al-Huniti, A.A.; AL-Dayian, G.R. Discrete Burr type III distribution. Am. J. Math. Stat. 2012, 2, 145–152. [Google Scholar] [CrossRef]
  7. Nekoukhou, V.; Alamatsaz, M.H.; Bidram, H. Discrete generalized exponential distribution of a second type. Statistics 2013, 47, 876–887. [Google Scholar] [CrossRef]
  8. Hussain, T.; Ahmad, M. Discrete inverse Rayleigh distribution. Pak. J. Stat. 2014, 30. [Google Scholar]
  9. Hussain, T.; Aslam, M.; Ahmad, M. A two parameter discrete Lindley distribution. Rev. Colomb. Estad. 2016, 39, 45–61. [Google Scholar] [CrossRef]
  10. Abebe, B.; Shanker, R.A. Discrete Lindley distribution with applications in biological sciences. Biom. Biostat. Int. J. 2018, 7, 48–52. [Google Scholar] [CrossRef]
  11. Al-Babtain, A.A.; Ahmed, A.H.N.; Afify, A.Z. A new discrete analog of the continuous Lindley distribution, with reliability applications. Entropy 2020, 22, 603. [Google Scholar] [CrossRef]
  12. Eliwa, M.S.; Altun, E.; El-Dawoody, M.; El-Morshedy, M. A new three-parameter discrete distribution with associated INAR (1) process and applications. IEEE Access 2020, 8, 91150–91162. [Google Scholar] [CrossRef]
  13. Almazah, M.M.A.; Erbayram, T.; Akdoğan, Y.; Al Sobhi, M.M.; Afify, A.Z. A new extended geometric distribution: Properties, regression model, and actuarial applications. Mathematics 2021, 9, 1336. [Google Scholar] [CrossRef]
  14. Poisson, S.D. Mémoire sur l’équilibre et le mouvement des corps élastiques. Mém. Acad. R. Sci. Inst. Fr. 1829, 8, 357–570. [Google Scholar]
  15. Del Castillo, J.; Pérez-Casany, M. Weighted Poisson distributions for overdispersion and underdispersion situations. Ann. Inst. Stat. Math. 1998, 50, 567–585. [Google Scholar] [CrossRef]
  16. Fisher, R.A. The effect of methods of ascertainment upon the estimation of frequencies. Ann. Eugen. 1934, 6, 13–25. [Google Scholar] [CrossRef]
  17. Dietz, E.; Bhning, D. On estimation of the Poisson parameter in zero-modified Poisson models. Comput. Stat. Data Anal. 2000, 34, 441–459. [Google Scholar] [CrossRef]
  18. Kokonendji, C.C.; Mizere, D.; Balakrishnan, N. Connections of the Poisson weight function to overdispersion and underdispersion. J. Stat. Plan. Inference 2008, 138, 1287–1296. [Google Scholar] [CrossRef]
  19. El-Morshedy, M.; Eliwa, M.S.; Altun, E. Discrete Burr-Hatke distribution with properties, estimation methods and regression model. IEEE Access 2020, 8, 74359–74370. [Google Scholar] [CrossRef]
  20. Wolfram Research. HurwitzLerchPhi Function. J. Appl. Stat. 2008, 32, 1461–1478. [Google Scholar]
  21. Altun, E.; El-Morshedy, M.; Eliwa, M.S. A study on discrete Bilal distribution with properties and applications on integervalued autoregressive process. Revstat-Stat. J. 2022, 20, 501–528. [Google Scholar]
  22. Poisson, S.D. Probabilité des jugements en matière criminelle et en matière civile, précédées des règles générales du calcul des probabilitiés; Bachelier: Paris, France, 1837; Volume 1, p. 1837. [Google Scholar]
  23. Eliwa, M.S.; El-Morshedy, M. A one-parameter discrete distribution for over-dispersed data: Statistical and reliability properties with applications. J. Appl. Stat. 2022, 49, 2467–2487. [Google Scholar] [CrossRef] [PubMed]
  24. Para, B.A.; Jan, T.R. Discrete version of log-logistic distribution and its applications in genetics. Int. J. Mod. Math. Sci. 2016, 14, 407–422. [Google Scholar]
  25. Para, B.A.; Jan, T.R. On discrete three parameter Burr type XII and discrete Lomax distributions and their applications to model count data from medical science. Biom. Biostat. J. 2016, 4, 1–15. [Google Scholar]
  26. Shanker, R.; Mishra, A. A two-parameter Poisson-Lindley distribution. Int. J. Stat. Syst. 2014, 9, 79–85. [Google Scholar]
  27. Almazah, M.M.A.; Alnssyan, B.; Ahmed, A.H.N.; Afify, A.Z. Reliability properties of the NDL family of discrete distributions with its inference. Mathematics 2021, 9, 1139. [Google Scholar] [CrossRef]
  28. Eldeeb, A.S.; Ahsan-Ul-Haq, M.; Babar, A. A discrete analog of inverted Topp-Leone distribution: Properties, estimation and applications. Int. J. Anal. Appl. 2021, 19, 695–708. [Google Scholar]
  29. El-Morshedy, M.; Altun, E.; Eliwa, M.S. A new statistical approach to model the counts of novel coronavirus cases. Math. Sci. 2022, 16, 37–50. [Google Scholar] [CrossRef]
  30. Chan, S.K.; Riley, P.R.; Price, K.L.; McElduff, F.; Winyard, P.J.; Welham, S.J.; Long, D.A. Corticosteroid-induced kidney dysmorphogenesis is associated with deregulated expression of known cystogenic molecules, as well as Indian hedgehog. Am. J.-Physiol.-Ren. Physiol. 2010, 298, F346–F356. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. PMF and HRF plots.
Figure 1. PMF and HRF plots.
Processes 11 01195 g001
Figure 2. Descriptive statistics of DWPLT distribution.
Figure 2. Descriptive statistics of DWPLT distribution.
Processes 11 01195 g002
Figure 3. Simulation results for DWPLT parameters using the MLE method.
Figure 3. Simulation results for DWPLT parameters using the MLE method.
Processes 11 01195 g003
Figure 4. Simulation results for DWPLT parameters using the MoE method.
Figure 4. Simulation results for DWPLT parameters using the MoE method.
Processes 11 01195 g004
Figure 5. Nonparametric data plots.
Figure 5. Nonparametric data plots.
Processes 11 01195 g005
Figure 6. L profile of the DWPLT parameter based on the dataset.
Figure 6. L profile of the DWPLT parameter based on the dataset.
Processes 11 01195 g006
Figure 7. Estimated PMFs for the dataset.
Figure 7. Estimated PMFs for the dataset.
Processes 11 01195 g007aProcesses 11 01195 g007b
Table 1. Simulation results for DWPLT parameters using the MLE method.
Table 1. Simulation results for DWPLT parameters using the MLE method.
Scheme I ( a = 0.1 )Scheme II ( a = 0.4 )
nBiasMSEBiasMSE
20 0.65434672 0.46384901 0.43678491 0.34579056
50 0.55763945 0.37466489 0.33578994 0.28456410
100 0.37655988 0.26654540 0.24234667 0.17746939
150 0.21457409 0.15646744 0.13456397 0.10099874
300 0.14673813 0.10397464 0.09478516 0.07756379
500 0.07846543 0.01547531 0.00466599 0.00298575
700 0.00847645 0.00376549 0.00016859 0.00008946
1000 0.00006455 0.00002135 0.00000344 0.00000037
scheme III ( a = 0.7 )scheme IV ( a = 0.9 )
BiasMSEBiasMSE
20 1.28766550 0.74578902 0.83462601 0.62399457
50 0.98464654 0.56458012 0.66357910 0.42095752
100 0.67351458 0.31254684 0.37847655 0.28546781
150 0.38734698 0.17455449 0.21824765 0.13683602
300 0.21985857 0.07387475 0.12049875 0.07488585
500 0.12857676 0.00354784 0.08465547 0.00110487
700 0.08957654 0.00018366 0.00015424 0.00028475
1000 0.00023248 0.00004571 0.00000154 0.00000114
Table 2. Simulation results for DWPLT parameters using the MoE method.
Table 2. Simulation results for DWPLT parameters using the MoE method.
Scheme I ( a = 0.1 )Scheme II ( a = 0.4 )
nBiasMSEBiasMSE
20 0.68456473 0.49365504 0.44984619 0.36204875
50 0.57735547 0.38835547 0.36825548 0.29344648
100 0.39465473 0.28143054 0.25846510 0.18834750
150 0.22287464 0.16637550 0.13876450 0.10274654
300 0.15344649 0.11846548 0.09535378 0.09547785
500 0.08876354 0.01610476 0.00524367 0.00217455
700 0.00345674 0.00437386 0.00022467 0.00002356
1000 0.00009365 0.00003765 0.00000673 0.00000019
scheme III ( a = 0.7 ) scheme IV ( a = 0.9 )
BiasMSEBiasMSE
20 1.13664785 0.68465541 0.76388994 0.58635531
50 0.90354789 0.51873654 0.61048655 0.39746571
100 0.61546489 0.28465483 0.32286556 0.27454547
150 0.35478436 0.14438761 0.18455675 0.12445678
300 0.19354465 0.05645831 0.11957650 0.09877577
500 0.10456738 0.00273568 0.07754891 0.00465587
700 0.07745831 0.00004654 0.00003456 0.00065972
1000 0.00003865 0.00000314 0.00000064 0.00000946
Table 3. GOFM for the dataset—Part I.
Table 3. GOFM for the dataset—Part I.
ObservedExpected Frequencies
XFrequenciesDWPLTGeoDRDIRDBLPoiDPa
065 63.34 45.98 10.89 60.89 32.08 27.39 65.84
114 19.78 26.76 26.62 33.99 37.10 38.08 18.27
210 9.26 15.58 29.45 8.12 21.66 26.47 8.16
36 5.20 9.06 22.29 3.00 10.63 12.26 4.51
44 3.25 5.28 12.63 1.42 4.84 4.26 2.82
52 2.17 3.07 5.54 0.78 2.12 1.19 1.91
62 1.53 1.79 1.91 0.47 0.91 0.27 1.37
72 1.11 1.04 0.53 0.31 0.38 0.05 1.02
81 0.83 0.61 0.12 0.21 0.16 0.01 0.79
91 0.64 0.35 0.02 0.15 0.07 0.00 0.63
101 0.49 0.21 0.00 0.11 0.03 0.00 0.51
112 2.40 0.27 0.00 0.55 0.02 0.02 4.17
Total110110110110110110110110
    a MLE a Se a L . C . I a U . C . I a 0.937 0.028 0.881 0.992 0.582 0.030 0.522 0.641 0.901 0.009 0.883 0.919 0.554 0.049 0.458 0.649 0.643 0.020 0.604 0.682 1.390 0.112 1.171 1.611 0.268 0.034 0.201 0.336
L 169.32 178.77 277.78 186.55 207.44 246.21 171.19
Ac 340.64 359.53 557.56 375.09 416.87 494.42 344.38
Bc 343.34 362.23 560.26 377.79 419.57 497.12 347.08
CAc 340.68 359.57 557.59 375.13 416.91 494.46 344.42
Hc 341.74 360.63 558.65 376.19 417.97 495.52 345.48
Chi 2 2.548 19.109 306.515 40.456 61.37 89.277 3.430
Dm4442334
Pv 0.636 <0.001<0.001<0.001<0.001<0.001 0.489
Table 4. GOFM for the dataset—Part II.
Table 4. GOFM for the dataset—Part II.
ObservedExpected Frequencies
XFrequenciesDWPLTDF-IDLogLDIWDLoBinDB-II
065 63.34 45.26 63.19 63.91 61.62 27.94 64.74
114 19.78 29.09 20.10 20.69 21.02 38.44 19.18
210 9.26 16.51 8.64 8.05 9.69 26.29 8.48
36 5.20 8.89 4.66 4.23 5.28 11.92 4.63
44 3.25 4.70 2.86 2.59 3.19 4.03 2.86
52 2.17 2.49 1.92 1.75 2.09 1.08 1.92
62 1.53 1.34 1.39 1.26 1.44 0.24 1.37
72 1.11 0.73 1.02 0.95 1.04 0.05 1.01
81 0.83 0.41 0.79 0.74 0.77 0.00 0.78
91 0.64 0.23 0.62 0.59 0.59 0.00 0.61
101 0.49 0.14 0.50 0.49 0.46 0.00 0.49
112 2.40 0.21 4.31 4.75 2.81 0.01 3.93
Total110110110110110110110110
    a MLE a Se a L . C . I a U . C . I a 0.937 0.028 0.881 0.992 0.623 0.031 0.563 0.684 0.780 0.136 0.514 1.046 0.581 0.048 0.488 0.675 0.152 0.089 0 0.345 170.608 0.831 168.979 172.237 0.278 0.045 0.189 0.366
XFrequenciesDWPLTDF-IDLogLDIWDLoBinDB-II
    b MLE b Se b L . C . I b U . C . I b 1.208 0.159 0.895 1.520 1.049 0.146 0.763 1.335 1.830 0.952 0 3.698 0.008 0.012 0 0.032 1.053 0.167 0.725 1.381
L 169.32 182.29 171.72 172.94 170.48 247.74 171.14
Ac 340.64 366.58 347.43 349.87 344.96 499.48 346.28
Bc 343.34 369.28 352.84 355.27 350.36 504.88 351.68
CAc 340.68 366.61 347.55 349.99 345.07 499.59 346.39
Hc 341.74 367.67 349.62 352.07 347.15 501.67 348.47
Chi 2 2.548 31.702 4.033 6.445 3.238 94.729 2.587
Dm4433322
Pv 0.636 <0.001 0.258 0.092 0.356 <0.001 0.274
Table 5. GOFM for the dataset—Part III.
Table 5. GOFM for the dataset—Part III.
ObservedExpected Frequencies
XFrequenciesDWPLTDL-IDL-IIDL-IIINDLPoiLDITLDGL
065 63.34 40.29 46.03 46.01 41.96 44.14 52.94 46.01
114 19.78 29.83 26.77 26.77 28.80 28.00 28.29 26.76
210 9.26 18.36 15.57 15.58 17.57 16.7 12.09 15.57
36 5.20 10.34 9.06 9.06 10.05 9.57 5.99 9.06
44 3.25 5.52 5.27 5.27 5.52 5.34 3.34 5.27
52 2.17 2.85 3.07 3.07 2.95 2.92 2.03 3.07
62 1.53 1.44 1.79 1.78 1.54 1.57 1.31 1.78
72 1.11 0.71 1.04 1.04 0.79 0.84 0.89 1.04
81 0.83 0.35 0.60 0.60 0.40 0.44 0.63 0.60
91 0.64 0.17 0.35 0.35 0.20 0.23 0.46 0.35
101 0.49 0.08 0.20 0.20 0.10 0.12 0.35 0.20
112 2.40 0.06 0.25 0.27 0.12 0.13 1.68 0.29
Total110110110110110110110110110
    a MLE a Se a L . C . I a U . C . I a 0.937 0.028 0.881 0.992 0.436 0.026 0.385 0.488 0.581 0.045 0.492 0.670 0.582 0.005 0.493 0.671 0.542 0.026 0.491 0.594 1.087 0.109 0.873 1.301 2.281 0.221 1.849 2.714 0.582 0.045 0.493 0.670
    b MLE b Se b L . C . I b U . C . I b 0.001 0.058 0 0.116 358.728 11 , 863.370 0 2.3 × 10 4 0.351 0.065 0.223 0.479
    c MLE c Se c L . C . I c U . C . I c 0.001 20.698 0 22.691
L 169.32 189.11 178.77 178.77 185.98 183.11 174.95 178.77
Ac 340.64 380.22 361.53 363.53 373.96 368.23 351.89 361.53
Bc 343.34 382.92 366.93 371.63 376.66 370.93 354.59 366.93
CAc 340.68 380.26 361.65 363.76 373.99 368.26 351.93 361.65
Hc 341.74 381.32 363.72 366.82 375.05 369.32 352.99 363.72
Chi 2 2.548 34.635 19.091 19.096 29.505 24.824 12.065 19.092
Dm44324434
Pv 0.636 <0.001<0.001<0.001<0.001<0.001 0.007 <0.001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

El-Dawoody, M.; Eliwa, M.S.; El-Morshedy, M. An Extension of the Poisson Distribution: Features and Application for Medical Data Modeling. Processes 2023, 11, 1195. https://doi.org/10.3390/pr11041195

AMA Style

El-Dawoody M, Eliwa MS, El-Morshedy M. An Extension of the Poisson Distribution: Features and Application for Medical Data Modeling. Processes. 2023; 11(4):1195. https://doi.org/10.3390/pr11041195

Chicago/Turabian Style

El-Dawoody, Mohamed, Mohamed S. Eliwa, and Mahmoud El-Morshedy. 2023. "An Extension of the Poisson Distribution: Features and Application for Medical Data Modeling" Processes 11, no. 4: 1195. https://doi.org/10.3390/pr11041195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop