Next Article in Journal
Second-Order Weak Approximations of CKLS and CEV Processes by Discrete Random Variables
Previous Article in Journal
Hybrid MCDM Based on VIKOR and Cross Entropy under Rough Neutrosophic Set Theory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Extended Geometric Distribution: Properties, Regression Model, and Actuarial Applications

by
Mohammed Mohammed Ahmed Almazah
1,2,
Tenzile Erbayram
3,
Yunus Akdoğan
3,
Mashail M. AL Sobhi
4 and
Ahmed Z. Afify
5,*
1
Department of Mathematics, College of Sciences and Arts (Muhyil), King Khalid University, Muhyil 61421, Saudi Arabia
2
Department of Mathematics and Computer, College of Sciences, Ibb University, Ibb 70270, Yemen
3
Department of Statistics, Faculty of Science, Selçuk University, 42250 Konya, Turkey
4
Department of Mathematics, Umm-Al-Qura University, Makkah 24227, Saudi Arabia
5
Department of Statistics, Mathematics and Insurance, Benha University, Benha 13511, Egypt
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(12), 1336; https://doi.org/10.3390/math9121336
Submission received: 23 April 2021 / Revised: 18 May 2021 / Accepted: 28 May 2021 / Published: 9 June 2021
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this paper, a new modified version of geometric distribution is proposed. The newly introduced model is called transmuted record type geometric (TRTG) distribution. TRTG distribution is a good alternative to the negative binomial, Poisson and geometric distributions in modeling real data encountered in several applied fields. The main statistical properties of the new distribution were obtained. We determined the measures of value at risk and tail value at risk for the TRTG distribution. These measures are important quantities in actuarial sciences for portfolio optimization under uncertainty. The TRTG parameters were estimated via maximum likelihood, moments, proportions, and Bayesian estimation methods, and the simulation results were determined to explore their performance. Furthermore, a new count regression model based on the TRTG distribution was proposed. Four real data applications were adopted to illustrate the applicability of the TRTG distribution and its count regression model. These applications showed empirically that the TRTG distribution outperforms some important discrete models such as the negative binomial, transmuted geometric, discrete Burr, discrete Chen, geometric, and Poisson distributions.

1. Introduction

Discrete models are very important in handling count data encountered in several theoretical and applied sciences such as medicine, insurance, life testing, biology, and agriculture. Recently, there has been an increased interest among statisticians to construct new flexible discrete distributions. Chakraborty and Chakravarty [1] mentioned that almost all observed values are actually discrete because they are measured to only a finite number of decimal places and cannot really constitute all points in a continuum.
On the other hand, in some life testing and survival analysis studies, lifetimes can be treated as a discrete random variable and hence its reliability function is a function of a discrete random variable. For example, the reliability of a switching device is a function of the number of times the switch is operated or the reliability of a computer is a function of the number of time the computer has broken down. Recently, many continuous lifetime distributions have been discretized for modeling discrete lifetime data. For example, discrete Weibull by [2], discrete Burr and discrete Pareto distributions by [3].
Furthermore, some discrete distributions have been proposed by compounding two discrete distributions—for example, the uniform-Poisson distribution by [4], uniform-geometric distribution by [5], and binomial discrete-Lindley distribution by [6]. Recently, Al-Babtain et al. [7] proposed a natural discrete analog of the continuous-Lindley distribution as a mixture of negative binomial and geometric distributions.
In recent decades, several works have been introduced in the statistical literature to discretize continuous distributions. However, there is still a clear need to introduce a more flexible discrete lifetime distributions to model several types of count data in many applied areas including insurance, social sciences, reliability studies, and economics. It is worth noting that the probability mass functions (pmfs) of most recently introduced discrete distributions are developed by discretizing the continuous survival functions of continuous distributions and have quite a complex structure in terms of their parameter estimation—for example, the two discrete Lindley models by [8,9].
Apart from discretization techniques, we were motivated to propose a more flexible extension of the geometric distribution using a transmuted record type (TRT) method due to [10]. The TRT approach can be summarized as follows. Let X 1 , X 2 , , X n be a random sample from a distribution having cdf G · . Let X U 1 and X U 2 be the first and second upper records, respectively, based on this sample. Consider a random variable X that is defined as follows:
X = d X U 1 with probability 1 θ , X = d X U 2 with probability θ .
Hence, the cdf of X follows as
F X x   =   1 θ P X U 1 x   +   θ P X U 2 x ,
where θ 0 , 1 .
According to Equation (3.1) of [11], the cdf of the first upper record, X U 1 , reduces to:
F U 1 x   = P X U 1 x   = G x .
The cdf of the second upper record, X U 2 , takes the form:
F U 2 x   = P X U 2 x   = 1 1 G x j = 0 1 log 1 G x j j ! .
By inserting Equations (2) and (3) in (1), we obtain:
F X x   =   1 θ G ( x ) + θ 1 1 G ( x ) 1 log 1 G ( x ) .
After some algebra, the cdf of X reduces to:
F X x   = G x + θ 1 G ( x ) log 1 G ( x ) ,
where G x is any baseline cdf and Equation (4) is referred as the cdf of the TRT method. Tanış and Saraçoğlu [12] constructed the TRT-Weibull distribution using the TRT approach. Further information about the TRT approach can be explored in [10].
To the best of our knowledge, this is the first article that applies the TRT method to construct an extended form of geometric distribution. The proposed discrete model is called transmuted record type geometric (TRTG) distribution and it is suitable for over-dispersed data and hence can be applied in collective risk models and can be considered a competitive distribution to the negative binomial and Poisson distributions for fitting automobile claim frequency data.
Additionally, we derive explicit expressions for its basic distributional properties including moment generating and probability generating functions, mean, variance, skewness, kurtosis quantile function, stochastic orders, mean deviation, and mean residual life. In addition, we derived two important risk measures, namely the value at risk and tail value at risk for the TRTG model. The TRTG parameters were estimated via the maximum likelihood, moments, proportions, and Bayesian estimation methods. The simulation results were determined to explore their performance in estimating the TRTG parameters θ and q. The applicability of the TRTG distribution was studied by three data sets from the actuarial sciences showing its superiority as compared with competing models, namely transmuted geometric [13], discrete Burr [3], discrete Chen [14], negative binomial, geometric, and Poisson distributions. We were also motivated to propose a count regression model based on the TRTG distribution. The new TRTG regression model outperformed the Poisson, geometric, and Poisson–Lindey (PL) [15] regression models.
The rest of the paper is organized as follows. We define the TRTG distribution in Section 2. Some of its distributional properties are provided in Section 3. In Section 4, we derive two important risk measures of the TRTG model and provide some numerical computations for them. The TRTG parameters, θ and q, are estimated via four estimation methods in Section 5. In Section 6, a Monte Carlo simulation study is conducted to investigate the efficiency of different proposed estimates. In Section 7, we analyze three insurance data sets to illustrate the flexibility of the TRTG model. The TRTG count regression model is discussed in modeling real life count data in Section 8. Finally, the paper is concluded in Section 9.

2. The TRTG Distribution

First, we applied the TRT methodology to propose the two-parameter TRTG distribution as an extended version of the geometric distribution with pmf, p ( x ; q ) = ( 1 q ) q x , x N , q 0 , 1 , and a cumulative distribution function (cdf), G ( x ; q ) = 1 q x + 1 .
By inserting the cdf, the geometric distribution in Equation (4), we obtain the cdf of the TRTG distribution as follows:
F ( x ; q , θ ) = 1 q x + 1 + θ q x + 1 log q x + 1 , x N .
The TRTG distribution is specified by the following pmf:
p x ; q , θ = F Y x ; q , θ F Y x 1 ; q , θ = q x + 1 θ log q x + 1 1 q x θ log q x 1 , x N ,
where q , θ 0 , 1 . If X has the pmf (6), then it is denoted by X T R T G q , θ .
The survival function of the TRTG model is specified by
S x ; q , θ = P X x   = 1 P X x 1 = 1 x = 0 x 1 q x + 1 θ log q x + 1 1   +   q x θ log q x 1 = 1 + q x θ log q x 1 .
It is clear that lim θ 0 F X ( x ; q , θ ) 1 q x + 1 . In other words, the TRTG distribution behaves like a geometric model when θ lies around zero.
Consequently, the hazard function (hf) of X reduces to:
h x ; q , θ = q x + 1 θ log q x + 1 1 q x θ log q x 1 1 + q x θ log q x q x .
Figure 1 presents the plots of the pmf of the TRTG model for some choices of q and θ . Figure 1 shows that the probabilities can only be decreasing or increasing–decreasing-shaped. Furthermore, it is observed that as θ increases, in most diagrams, the mode moves to the right, showing that the TRTG model is so versatile and that small values of θ have a substantial effect on the TRTG distribution. Figure 2 displays the hf plots of the TRTG model for some choices of q and θ and it reveals that the TRTG model has a decreasing discrete hazard rate.

3. Distributional Properties

3.1. Moments and Quantile Function

The moment generating function of the TRTG distribution takes the form:
M X t = E e t X = x = 0 e t x q x + 1 θ log q x + 1 1 + q x θ log q x 1 = q exp t q 1 θ log q + 1 q + θ q log ( q ) q exp t 1 2 .
Note that by using M X ( t ) , we can obtain the probability generating function of the TRTG distribution as follows:
ψ X ( t ) = E ( t X ) = M X ( ln ( t ) ) = q θ t 1 log q + q 1 q t 1 q t 1 2 .
Hence, the first fourth moments of X can be derived as
E X = x = 0 x q x + 1 θ log q x + 1 1 + q x θ log q x 1 = q 1 q θ log q 1 q 2 ,
E X 2 = x = 0 x 2 q x + 1 θ log q x + 1 1 + q x θ log q x 1 = q q 2 + 3 q θ log q + θ log q 1 q 1 1 q 2 ,
E X 3 = x = 0 x 3 q x + 1 θ log q x + 1 1 + q x θ log q x 1 = ( q 3 + 7 q 2 θ log ( q ) + 3 q 2 + 10 q θ log ( q ) 3 q + θ log ( q ) 1 ) q ( q 1 ) 4
and:
E X 4 = x = 0 x 4 q x + 1 θ log q x + 1 1 + q x θ log q x 1 = 15 q 4 θ log ( q ) + 55 q 3 θ log ( q ) + q 5 ( q 1 ) 5 + 25 q 2 θ log ( q ) + 10 q 4 10 q 2 + q θ log ( q ) q ( q 1 ) 5 .
Then, the variance, skewness, and kurtosis of X are given by
V a r X = E X 2 E X 2 = q q 1 2 + q 2 1 θ log q θ 2 q log q 2 q 1 4 ,
γ 1 ( X ) = E ( X E ( X ) ) 3 E ( X E ( X ) ) 2 3 2 = 2 θ 3 log ( q ) 3 q 2 + ( 3 θ 2 q 3 + 3 q θ 2 ) log ( q ) 2 ( q 1 ) 6 + θ ( 4 q + q 2 + 1 ) ( q 1 ) 2 log ( q ) + ( q + 1 ) ( q 1 ) 3 ( q 1 ) 6
and:
γ 2 ( X ) = E ( X E ( X ) ) 4 E ( X E ( X ) ) 2 2 3 = 3 q 3 θ 4 log ( q ) 4 + ( 6 θ 3 q 2 + 6 q 4 θ 3 ) log ( q ) 3 4 ( q 1 ) 2 ( q 2 + 11 2 q + 1 ) q θ 2 log ( q ) 2 q ( q θ 2 log ( q ) 2 + ( q 2 θ θ ) log ( q ) + ( q 1 ) 2 ) 2 + θ ( q + 1 ) ( q 2 + 16 q + 1 ) ( q 1 ) 3 log ( q ) + ( q 2 + 7 q + 1 ) ( q 1 ) 4 q ( q θ 2 log ( q ) 2 + ( q 2 θ θ ) log ( q ) + ( q 1 ) 2 ) 2 3 .
The quantile function (qf) of the TRTG distribution is derived as
Q u   = 1 + W u 1 θ exp 1 / θ θ θ log q θ log q ,
where W refers to the Lambert function. From Equation (9), the a t h quantile, x a , of TRTG distribution is written by
x a =   1 + W u 1 θ exp 1 / θ θ θ log q θ log q + 1 , Q a   Q a 1 + W u 1 θ exp 1 / θ θ θ log q θ log q , 1 + W u 1 θ exp 1 / θ θ θ log q θ log q +   1 , Q a   = Q a ,
where x denotes the integer part of x. That is, x a satisfies F x a p F x a , where F is the cdf (5) of the TRTG distribution. The median of the TRTG distribution follows by simply equating a with 0.5 .
The most important measure of any discrete distribution is the dispersion index (DI) which is defined as D I = V a r ( X ) / E ( X ) . Some statistical measures of the TRTG distribution are computed and reported in Table 1. To interpret the individual effects of the parameters θ and q, the results are calculated for fixed θ = 0.5 and q = 0.5 . As seen from Table 1, the mean, variance, and DI are increasing functions of the parameter q for fixed θ = 0.5 . In addition, the mean and variance are increasing functions of θ for fixed q = 0.5 . The DI decreases when the parameter θ increases. Furthermore, the results show that the TRTG distribution is suitable for over-dispersed count data.

3.2. Stochastic Orders

Shaked and Shanthikumar [16] illustrated that several stochastic orders exist and have many applications. Stochastic orders are important measures to judge comparative behaviors of random variables.
The following theorem illustrates that the TRTG distribution is ordered according to the likelihood ratio ( l r ) order as the strongest stochastic order.
Theorem 1.
If X∼TRTG q , θ 1 and Y TRTG q , θ 2 . Then X < l r Y for all θ 1 < θ 2 .
Proof. 
The pmf of X can be expressed as
f x   = q x + 1 θ log q x + 1 1 W 1 x + q x θ log q x 1 W 2 x .
The density ratio of TRTG distribution, say W x , is obtained in two parts, as W 1 x and W 2 x . If the two ratios, W 1 x and W 2 x , are decreasing functions in x, then the density ratio, W x , is also a decreasing function of x. Then, W 1 x and W 2 x can be expressed as
W 1 x   = g 1 x ; q , θ 1 g 1 x ; q , θ 2 = θ 1 log q x + 1 1 θ 2 log q x + 1 1
and:
W 2 x   = g 2 x ; q , θ 1 g 2 x ; q , θ 2 = θ 1 log q x 1 θ 2 log q x 1 .
Firstly, the first derivative of W 1 x with respect to x is given by
W 1 x   = log q θ 1 θ 2 θ 2 log q x + 1 1 2 > 0 ,
where θ 1 θ 2   < 0 for θ 1 < θ 2 . Hence, W 1 x   < 0 for θ 1 < θ 2 .
Similarly, the first derivative of W 2 x with respect to x has the form:
W 2 x   = log q θ 1 θ 2 θ 2 log q x 1 2 > 0 ,
where θ 1 θ 2   < 0 for θ 1 < θ 2 . Then, W 2 x   < 0 for θ 1 < θ 2 . It is seen that both density ratios, W 1 x and W 2 x , are decreasing functions in x. Hence, W x   = W 1 x + W 2 x is also a decreasing function in x. The proof is completed. □
Based on the chain of stochastic orders (see, [16] and Definition 4 in [7]), we conclude that the TRTG distribution can be ordered according to the hazard rate ( h r ), reversed hazard rate ( r h ), mean residual life ( m r l ), and stochastic ( s t ) orders. That is, X < h r Y , X < r h Y , X < m r l Y , and X < s t Y .

3.3. Mean Deviation and Mean Residual Life

The mean deviation ( M D ) of the TRTG model is derived as
M D = x = 0 t x p ( x ) = x = 0 t x q x + 1 θ log q x + 1 1 q x θ log q x 1 = q ( t + 2 ) t + q ( t + 2 ) q ( t + 3 ) t q 2 q ( t + 1 ) t q ( t + 1 ) + q ( t + 2 ) t + q ( 1 + q ) 2 + x = 0 t x q ( x + 1 ) θ log ( q ( x + 1 ) ) x q x θ log ( q x ) .
The m r l of the TRTG model is defined by
m ( x ) = k = x + 1 S k S x = 1 k = 0 x q k θ log ( q k ) + q k q x θ log ( q x ) + q x = ( q 1 ) k = 0 x q k θ log ( q k ) q + q ( x + 1 ) ( q 1 ) q x ( θ log ( q x ) 1 ) .

4. Actuarial Measures

In this section, we determined the value at risk (VaR) and tail value at risk (TVaR) measures of the TRTG distribution.

4.1. VaR Measure of the TRTG Distribution

Let X denote a loss random variable. The VaR α of X at the 100 α % level, denoted by π α , is the 100 α percentile (or quantile) of the distribution of X. Hence, the VaR α of the TRTG model is defined by
P X > π α = 1 α π α = F 1 α ,
where α   0 , 1 and F is the cdf of the TRTG distribution given in (5). The VaR α of the TRTG distribution with qf (9) is derived as
π p = 1 + W α 1 θ exp 1 / θ θ θ log q θ log q .

4.2. TVaR Measure of the TRTG Distribution

Let X denote a loss random variable. The TVaR of X at the 100 α % security level, denoted by TVaR α is the expected loss given that the loss exceeds the 100 α percentile of the distribution of X. For engineering or actuarial applications, it is more common to consider the distribution of losses—in this case the right-tail TVaR is considered (typically for α = 95 % or α = 99 % ). The TVAR α is defined by
T V a R α = E X X > π α   = x = π α x f x 1 F π α .
Using the pmf and cdf of the TRTG model, the TVaR α is derived as
T V a R α = E X X > π α = 1 1 α x = π α x q x + 1 θ log q x + 1 1 + q x θ log q x 1 .
Table 2 provides some numerical computations for the VaR α and TVaR α measures of the TRTG q , θ distribution for different parametric values.

5. Estimation

In this section, the estimation of the TRTG parameters is examined using some classical and Bayesian methods.

5.1. Method of Maximum Likelihood

Let x 1 , , x n be the observations of n independent and identically random variables X 1 , , X n from the TRTG distribution. Then, the corresponding log-likelihood function reduces to:
n ( q , θ ) = log q i = 0 n x i + i = 0 n log θ q log q x i + 1 q θ log q x i 1 .
Then, the maximum likelihood (ML) estimators of q and θ , say q ^ and θ ^ , are the solution of the following linear equations: n ( q , θ ) / q = 0 and n ( q , θ ) / θ = 0 . The ML estimators of q and θ cannot be obtained explicitly. Therefore, they can be obtained by numerical methods. The fminsearch command in Matlab is used for this purpose.

5.2. Method of Moments

The moments (MM) estimators of the parameters q and θ were obtained by simultaneously solving the following two equations:
q 1 q θ log q 1 q 2 = 1 n i = 1 n X i
and:
q q 2 + 3 q θ log q + θ log q 1 q 1 1 q 2 = 1 n i = 1 n X i 2 .
Equations (10) and (11) can be numerically solved using the Newton–Raphson method. The solutions of these two equations are the MM estimators of the parameters q and θ .

5.3. Method of Proportions

The method of proportions is proposed by [17] to estimate the parameters of discrete Weibull distribution. Then, we used this method to estimate the TRTG parameters. Let X 1 , X 2 , , X n be a random sample from the TRTG q , θ distribution. Consider the indicator function, say ν ( . ) , which is defined (for i = 1 , 2 , , n ) as
ν X i   =   1 , X i = 0 0 , X i > 0 , i = 1 , 2 , , n .
Then, Y = 1 n i = 1 n υ X i denotes the proportion of zeros in the sample and estimates the probability f 0   = q θ log q 1   +   1 . Similarly, Z denotes the proportion of ones in the sample and it estimates the probability f 1   = q 2 θ log q 2 1     q θ log q     1 . Hence, the proportions (MP) estimators of parameters q and θ are obtained by solving the following two equations simultaneously:
q θ log q 1   +   1 = Y
and:
q 2 θ log q 2 1     q θ log q     1 = Z .
Equations (12) and (13) can be numerically solved using the Newton–Raphson method. The solutions of the last two equations represent the MP estimators of the parameters q and θ .

5.4. Bayesian Method

To obtain the Bayes estimators of the parameters q and θ , we suppose that q has a beta distribution with parameters α 1 and β 1 and θ has a beta distribution with parameters α 2 and β 2 and q and θ are independent. Then, the prior density functions of q and θ are given by
π 1 ( q ) = 1 B ( α 1 , θ 1 ) q α 1 1 ( 1 q ) β 1 1 , p ( 0 , 1 )
and:
π 2 ( θ ) = 1 B ( α 2 , θ 2 ) θ α 2 1 ( 1 θ ) β 2 1 , θ ( 0 , 1 ) ,
respectively, where β · , · is the beta function. Therefore, the joint prior of q , θ can be expressed as
π ( q , θ ) = π 1 ( q ) π 2 ( θ ) = 1 B ( α 1 , θ 1 ) B ( α 2 , θ 2 ) q α 1 1 ( 1 q ) β 1 1 θ α 2 1 ( 1 θ ) β 2 1 .
Under the squared error loss function, the Bayes estimators of q and θ can be expressed as
q ^ = E ( q | x ) = 0 1 0 1 q L ( q , θ / x ) π ( q , θ ) d q d θ 0 1 0 1 L ( q , θ / x ) π ( q , θ ) d q d θ and : θ ^ = E ( θ | x ) = 0 1 0 1 θ L ( q , θ / x ) π ( q , θ ) d q d θ 0 1 0 1 L ( q , θ ^ 2 / x ) π ( q , θ ^ 2 ) d q d θ ^ 2 ,
respectively. These estimators cannot be explicitly obtained but they can be approximately obtained by using Tierney and Kadene’s (1986) method [18].

6. Simulation Study

To obtain information about the performance of the previous estimators, we conducted an appropriate simulation study. In this simulation, we generated samples of size n = 100 , 200 , 300 , 400 , 500 , 1000 from the TRTG ( q , θ ) distribution and then computed the ML, MM, MP, and Bayes estimates of q and θ . We calculated the average absolute biases (ABBs), mean square errors (MSEs), and mean relative errors of the estimates (MREs) for all methods. The ABBs, MREs, and MSEs are calculated by
A B B s ^ φ = 1 N i = 1 N φ ^ φ ,
M S E ^ φ = 1 N i = 1 N φ ^ φ 2
and:
M R E ^ φ = 1 N i = 1 N φ ^ φ / φ ,
where φ = θ , q and φ ^ = θ ^ , q ^ .
The optim-CG routine in the R program were adopted to generate 5000 trials to estimate these indices of the ML, MM, MP, and Bayes estimates. Different sample sizes and two-parameter settings were considered, θ = 0.5 , q = 0.4 and θ = 0.6 , q = 0.5 . The results are given in Table 3 and Table 4. From Table 3 and Table 4, it was concluded that the ABBs, MREs and MSEs of all estimates decrease when n increases as expected. Moreover, the Bayes, ML, and MM methods provide the best estimates in terms of performance criteria. The Bayes, ML and MM estimates are almost identical in terms of ABBs, MSEs, and MREs and they perform better than the MP estimates. Furthermore, as the sample size n increases, the ABBs and MSEs of all estimators reduce as expected.

7. Modeling Three Actuarial Data

In this section, the TRTG distribution was fitted into three real actuarial data sets and compared with the transmuted geometric (TRAG), discrete Burr (DB), discrete Chen (DC), negative binomial (NB), geometric (G), and Poisson (P) distributions.
First data set: These data were reported in Klugman et al. [19] and represent the number of claims of automobile liability policies.
Second data set: The data were analyzed by Klugman et al. [19] and represent the number of hospitalizations per family member and year.
Third data set: These data were studied by Willmot [20] and refer to the number of automobile insurance claims per policy in two portfolios from Belgium during the period 1975–1976, respectively.
The TRTG, TRAG, DB, DC, NB, G, and P distributions were fitted to the three data sets, respectively. Their parameters estimates were obtained via the ML method. The chi-square procedure was adopted to test H 0 : X TRTG ( q , θ ) . To compute the χ 2 statistic, the unknown parameters q and θ were estimated from the three data sets. Under null hypothesis, the estimated probabilities can be calculated by
P ^ ( X = i ) = q ^ i + 1 θ ^ log q ^ i + 1 1 q ^ i θ ^ log q ^ i 1 , i = 0 , 1 , . . .
and the estimated expected frequencies are e i ^ = n α i ^ , where α ^ is an ML estimate of α . For the three data sets, the chi-square test, χ 2 , was computed for the TRTG distribution and other competing distributions. The results of observed and expected frequencies, χ 2 and n are listed in Table 5, Table 6 and Table 7 for the three data sets, respectively. The values in these tables reveal that the TRTG distribution has the lowest values for χ 2 and n among all competing discrete models and it provides a better fit for the given data sets than the TRAG, DB, DC, NB, G, and P distributions. Based on the results, we cannot reject H 0 at the α = 0.05 significance level.
Furthermore, for visual comparisons, the observed and fitted distributions are displayed in Figure 3, Figure 4 and Figure 5 for the three data sets, respectively.

8. TRTG Count Regression Model

Let X be the response variable and y be its associated p × 1 vector of covariates. Assume that the response variable X follows the TRTG distribution with the mean μ y . Furthermore, the mean of the response variable linked with the explanatory variables by log-linear form, i.e., μ i = exp β y i T , where β =   β 1 , β 2 , , β p and y i =   1 , y 1 i , y 2 i , , y p i . by replacing θ with 1 q 1 + μ i q μ i q log q , we obtain the re-parameterized pmf of Equation (6) as
p x i = q x i + 1 1 q 1 + μ i q μ i q log q log q x i + 1 1 q x i 1 q 1 + μ i q μ i q log q log q x i 1 .
The corresponding log-likelihood equation takes the form
n ( q , θ ) = log q i = 0 n x i + i = 0 n log ψ q , μ i log q log q x i + 1 q ψ q , μ i log q log q x i 1 ,
where ψ q , μ i = 1 q 1 + μ i q μ i . Equation (14) is not in closed form and it cannot be solved explicitly. Some numerical methods can be used to achieve solutions. We illustrate the application of TRTG regression model by analyzing a real data about the count of infected blood cells (per mm2) on microscope slides prepared from n = 511 randomly selected individuals [21]. The response variable ( x i : count of infected blood cells) was related to the following explanatory variables: the smoking status of the subject ( y i 1 : 0: yes; 1: no), and their sex ( y i 2 : 0: female; 1: male). Based on Table 8, we can strongly conclude that the proposed TRTG count regression model outperforms the Poisson, geometric, and PL regression models. The log-likelihood ^ max and Akaike information criteria (AIC) are also reported in Table 8.

9. Conclusions

We derived and studied a new discrete distribution which was defined on N using the transmuted record type approach to extend the geometric distribution. The new model is called transmuted record type geometric (TRTG) distribution and it is suitable for over-dispersed count data. We introduced the distributional properties of the TRTG distribution along with two actuarial or risk measures. The TRTG parameters are discussed by four different estimation methods. A Monte Carlo simulation study was conducted to investigate the efficiency of different estimators. A new count regression model was proposed as an alternative count regression model for Poisson, geometric, and Poisson–Lindley regression models. In summary, the TRTG model can be considered as a good alternative to the negative binomial, geometric, and Poisson distributions. The TRTG distribution can be used to model insurance data as compared to the negative binomial, transmuted geometric, discrete Burr, discrete Chen, geometric, and Poisson distributions. The results show that the TRTG distribution outperforms some important discrete models.

Author Contributions

Conceptualization, T.E., Y.A. and M.M.A.A.; methodology, A.Z.A.; software, Y.A.; validation, T.E., Y.A., M.M.A.S. and A.Z.A.; formal analysis, Y.A. and M.M.A.S.; writing—original draft preparation, Y.A., M.M.A.S. and A.Z.A.; writing—review and editing, T.E., Y.A. and A.Z.A.; visualization, M.M.A.A.; supervision, A.Z.A.; funding acquisition, M.M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The first author extends their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under grant number (GRP/106/42), Received by Mohammed M. Almazah. www.kku.edu.sa.

Acknowledgments

The authors would like to thank the Editorial Board and the two anonymous reviewers for their constructive comments that greatly improved the final version of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chakaraborty, S.; Chakaraborty, D. Discrete gamma distribution: Properties and parameter estimation. Commun. Stat. Theory Methods 2012, 41, 3301–3324. [Google Scholar] [CrossRef]
  2. Nakagawa, T.; Osaki, S. Discrete Weibull distribution. IEEE Trans. Reliab. 1975, 24, 300–301. [Google Scholar] [CrossRef]
  3. Krishna, H.; Pundir, P.S. Discrete Burr and discrete Pareto distributions. Stat. Methodol. 2009, 6, 177–188. [Google Scholar] [CrossRef]
  4. Gómez-Déniz, E. A new discrete distribution: Properties and applications in medical care. J. Appl. Stat. 2013, 40, 2760–2770. [Google Scholar] [CrossRef]
  5. Akdoğan, Y.; Kuş, C.; Asgharzadeh, A.; Kınacı, I.; Shafari, F. Uniform-geometric distribution. J. Stat. Comput. Simul. 2016, 86, 1754–1770. [Google Scholar] [CrossRef]
  6. Kuş, C.; Akdoğan, Y.; Asgharzadeh, A.; Kınacı, I.; Karakaya, K. Binomial-discrete Lindley distribution. Commun. Fac. Sci. Univ. Ank. Ser. A1 Math. Stat. 2018, 68, 401–411. [Google Scholar] [CrossRef]
  7. Al-Babtain, A.A.; Ahmed, A.H.N.; Afify, A.Z. A new discrete analog of the continuous lindley distribution, with reliability applications. Entropy 2020, 22, 603. [Google Scholar] [CrossRef] [PubMed]
  8. Gómez-Déniz, E.; Calderın-Ojeda, E. The discrete Lindley distribution: Properties and applications. J. Stat. Comput. Simul. 2011, 81, 1405–1416. [Google Scholar] [CrossRef]
  9. Bakouch, H.S.; Jazi, M.A.; Nadarajah, S. A new discrete distribution. Statistics 2014, 48, 200–240. [Google Scholar] [CrossRef]
  10. Balakrishnan, N.; He, M. A Record-Based Transmuted Family of Distributions. In Advances in Statistics-Theory and Applications. Emerging Topics in Statistics and Biostatistics; Ghosh, I., Balakrishnan, N., Ng, H.K.T., Eds.; Springer: Cham, Switzerland, 2021. [Google Scholar]
  11. Shakil, M.; Ahsanullah, M. Record values of the ratio of Rayleigh random variables. Pak. J. Stat. 2011, 27, 307–325. [Google Scholar]
  12. Tanış, C.; Saraçoğlu, B. On the record-based transmuted model of balakrishnan and He based on Weibull distribution. Commun. Stat.-Simul. Comput. 2020. [Google Scholar] [CrossRef]
  13. Chakraborty, S.; Bhati, D. Transmuted geometric distribution with applications in modeling and regression analysis of count data. SORT 2016, 40, 153–176. [Google Scholar]
  14. Noughabi, M.S.; Rezaei Roknabadi, A.H.; Mohtashami Borzadaran, G.R. Some discrete lifetime distributions with bathtub-shaped hazard rate functions. Qual Eng. 2013, 25, 225–236. [Google Scholar] [CrossRef]
  15. Sankaran, M. The discrete Poisson-Lindley distribution. Biometrics 1970, 26, 145–149. [Google Scholar] [CrossRef]
  16. Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Springer: New York, NY, USA, 2007. [Google Scholar]
  17. Khan, M.S.A.; Khalique, A.; Abouammoh, A.M. On estimating parameters in a discrete Weibull distribution. IEEE Trans. Reliab. 1989, 38, 348–350. [Google Scholar] [CrossRef]
  18. Tierney, L.; Kadene, J. Accurate approximation for posterior moments and marginal densities. J. Am. Stat. Assoc. 1986, 81, 82–86. [Google Scholar] [CrossRef]
  19. Klugman, S.; Panjer, H.; Willmot, G. Loss Models: From Data to Decisions; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 715. [Google Scholar]
  20. Willmot, G.E. The Poisson-inverse Gaussian distribution as an alternative to the negative binomial. Scand. Actuar. J. 1987, 1987, 113–127. [Google Scholar] [CrossRef]
  21. Crawley, M.J. The R Book, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
Figure 1. Plots of the pmf of the TRTG distribution for some choices of q and θ .
Figure 1. Plots of the pmf of the TRTG distribution for some choices of q and θ .
Mathematics 09 01336 g001
Figure 2. Plots of the hf of the TRTG distribution for some choices of q and θ .
Figure 2. Plots of the hf of the TRTG distribution for some choices of q and θ .
Mathematics 09 01336 g002
Figure 3. Observed and fitted distributions for first data.
Figure 3. Observed and fitted distributions for first data.
Mathematics 09 01336 g003
Figure 4. Observed and fitted distributions for second data.
Figure 4. Observed and fitted distributions for second data.
Mathematics 09 01336 g004
Figure 5. Observed and fitted distributions for third data.
Figure 5. Observed and fitted distributions for third data.
Mathematics 09 01336 g005
Table 1. Some numerical measures of the TRTG model for θ = 0.5 and q = 0.5 .
Table 1. Some numerical measures of the TRTG model for θ = 0.5 and q = 0.5 .
q
Measure0.010.10.20.30.40.50.60.70.80.9
Mean0.0340.2530.5010.7971.1761.6932.4583.7206.23113.741
Variance0.0340.2770.6261.1612.0403.5996.66413.71435.104157.600
DI1.0071.0941.2491.4561.7352.1262.7133.6865.63311.469
θ
0.010.10.20.30.40.50.60.70.80.9
Mean1.0141.1391.2771.4161.5541.6931.8321.9702.1092.248
Variance2.0412.3972.7553.0753.3563.5993.8033.9704.0974.186
DI2.0132.1052.1572.1722.1592.1262.0762.0151.9431.862
Table 2. Numerical values of VaR α and TVaR α measures of the TRTG distribution.
Table 2. Numerical values of VaR α and TVaR α measures of the TRTG distribution.
θ qSecurity LevelVaR α TVaR α
0.250.250.901.04782.5654
0.951.62805.1309
0.992.943310.8642
0.250.500.903.095711.0154
0.954.25617.2157
0.996.88672.8204
0.250.750.908.868215.4903
0.9511.66429.1255
0.9918.00242.2378
0.500.250.901.36013.6725
0.951.96697.3451
0.993.32106.5062
0.750.250.901.60674.7797
0.952.22134.4352
0.993.58608.9130
Table 3. Simulation results of the TRTG model for θ = 0.5 and q = 0.4 .
Table 3. Simulation results of the TRTG model for θ = 0.5 and q = 0.4 .
MLMPMMBayes
n θ q θ q θ q θ q
100MSEs0.02530.00350.03690.06560.04160.08960.02090.0008
ABBs0.15920.05890.19220.25610.29940.50980.14450.0290
MREs0.31830.14730.38430.64020.50980.74850.28900.0726
200MSEs0.01390.00190.02440.05380.02080.05740.01490.0005
ABBs0.11770.04360.15610.23190.14420.23950.12210.0226
MREs0.23550.10900.31220.57970.36050.59880.24420.0564
300MSEs0.00910.00130.01770.04260.01340.04510.01030.0004
ABBs0.09540.03580.13320.20640.11570.21240.10140.0188
MREs0.19080.08940.26630.51600.28910.53110.20290.0470
400MSEs0.00650.00100.01270.03600.01010.03880.00780.0003
ABBs0.08040.03100.11280.18970.10030.19710.08840.0171
MREs0.16080.07740.22560.47430.25080.49260.17680.0428
500MSEs0.00510.00080.01070.03230.00770.03430.00490.0007
ABBs0.07130.02760.10360.17980.08760.18520.06980.0278
MREs0.14250.06900.20710.44950.21890.46310.14290.0678
1000MSEs0.00260.00040.00530.02360.00370.02510.00230.0003
ABBs0.05130.02000.07280.15350.06080.15850.04760.0186
MREs0.10250.04990.14560.38370.15210.39630.10310.0510
Table 4. Simulation results of the TRTG model for θ = 0.6 and q = 0.5 .
Table 4. Simulation results of the TRTG model for θ = 0.6 and q = 0.5 .
MLMPMMBayes
n θ q θ q θ q θ q
100MSEs0.02200.00260.04240.02620.03180.01300.0159 0.0018
ABBs0.14820.05060.20580.16200.1782 0.11380.12620.0425
MREs0.29640.08440.41160.26990.2971 0.18970.21040.0850
200MSEs0.01110.00120.02770.01560.01520.00550.0091 0.0008
ABBs0.10520.03530.16650.12490.1233 0.07430.09540.0274
MREs0.21050.05890.33310.20820.2055 0.12390.15900.0549
300MSEs0.00740.00090.02130.01220.01040.00400.0060 0.0004
ABBs0.08620.02990.14580.11060.1019 0.06340.07740.0199
MREs0.17260.04980.29170.18430.1698 0.10570.12900.0397
400MSEs0.00510.00060.01640.00940.00780.00340.0046 0.0004
ABBs0.07110.02490.12800.09690.0882 0.05790.06320.0212
MREs0.14230.04150.25590.16150.1471 0.09660.11420.0424
500MSEs0.00430.00050.01400.00840.00620.00310.00280.0002
ABBs0.06550.02240.11820.09170.7870.05560.04980.0196
MREs0.13110.03740.23640.15290.13120.09260.10870.0265
1000MSEs0.00200.00020.00810.00570.00310.00280.0016 0.0002
ABBs0.04480.01560.08980.07560.0557 0.05260.03640.0127
MREs0.08960.02600.17950.12590.1114 0.08770.06380.0172
Table 5. Results of observed and expected frequencies, χ 2 and n , respectively, for first data.
Table 5. Results of observed and expected frequencies, χ 2 and n , respectively, for first data.
CountObservedExpected
TRTGNBTRAGDBDCGP
09996.6595.8696.32101.65108.84110.0654.01
16574.3075.8375.7193.2559.8269.4192.24
25750.2450.3550.2439.8844.8643.7878.77
33531.7131.331.1019.6032.5627.6144.85
42019.1818.7918.6111.1522.2017.4119.15
51011.2611.0410.947.0314.0110.986.54
≥6129.9810.0210.1315.5115.6412.131.97
Total n = 298
Parameters q ^ 0.50190.46060.57480.54780.76760.3693
θ ^ 0.50221.4590−0.41722.27280.5104 1.7080
χ 2 3.093.583.5530.097.106.60297.53
n 528.62528.77528.73551.45532.82531.46577.01
Table 6. Results of observed and expected frequencies, χ 2 and n , respectively, for second data.
Table 6. Results of observed and expected frequencies, χ 2 and n , respectively, for second data.
CountObservedExpected
TRTGNBTRAGDBDCGP
096,97896,966.5497,038.5797,031.2496,974.1597,008.0897,148.3196,684.62
192409252.809080.289129.309290.989145.878923.199777.85
2704703.06783.53747.53611.61786.07819.61494.42
34348.2565.7160.5876.2433.4075.2816.67
493.125.434.914.90.506.910.42
Total n = 106,974
Parameters q ^ 0.05230.92090.08100.03270.25120.9081
θ ^ 0.26681.1844−0.16091.73860.5945 0.1011
χ 2 4.5025.1913.1041.9720.7154.86118.89
n 36,021.6136,111.6636,104.3036,123.0136,122.6036,123.5936,188.25
Table 7. Results of observed and expected frequencies, χ 2 and n , respectively, for third data.
Table 7. Results of observed and expected frequencies, χ 2 and n , respectively, for third data.
CountObservedExpected
TRTGNBTRAGDBDCGP
026592658.922660.382659.032658.882659.682661.702649.90
1244243.98241.05243.65245.29241.56238.77260.80
21919.5320.6719.4816.9821.7021.4212.80
≥321.461.741.422.211.041.920.42
Total n = 2924
Parameters q ^ 0.05450.91910.07990.03130.24690.0896
θ ^ 0.228011.1960−0.14891.69110.5827 0.0984
χ 2 0.160.220.180.240.870.434.44
n 967.24969.12969.06969.62969.42969.25972.26
Table 8. Estimated parameters for the TRTG, Poisson, geometric, and PL regression models along with ^ max and AIC.
Table 8. Estimated parameters for the TRTG, Poisson, geometric, and PL regression models along with ^ max and AIC.
VariableParameterTRTGPLGeometricPoisson
Intercept β 1 0.45080.61100.65730.5091
Smoking β 2 −1.0077−0.8799−0.9999−1.1775
Gender β 3 0.14940.19561.96070.1846
Dispersion ϕ 0.4654
^ max −632.16−640.78−685.59−693.77
AIC 1272.321287.561377.181393.53
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Almazah, M.M.A.; Erbayram, T.; Akdoğan, Y.; AL Sobhi, M.M.; Afify, A.Z. A New Extended Geometric Distribution: Properties, Regression Model, and Actuarial Applications. Mathematics 2021, 9, 1336. https://doi.org/10.3390/math9121336

AMA Style

Almazah MMA, Erbayram T, Akdoğan Y, AL Sobhi MM, Afify AZ. A New Extended Geometric Distribution: Properties, Regression Model, and Actuarial Applications. Mathematics. 2021; 9(12):1336. https://doi.org/10.3390/math9121336

Chicago/Turabian Style

Almazah, Mohammed Mohammed Ahmed, Tenzile Erbayram, Yunus Akdoğan, Mashail M. AL Sobhi, and Ahmed Z. Afify. 2021. "A New Extended Geometric Distribution: Properties, Regression Model, and Actuarial Applications" Mathematics 9, no. 12: 1336. https://doi.org/10.3390/math9121336

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop