Next Article in Journal
Lie Group Cohomology and (Multi)Symplectic Integrators: New Geometric Tools for Lie Group Machine Learning Based on Souriau Geometric Statistical Mechanics
Previous Article in Journal
On Training Neural Network Decoders of Rate Compatible Polar Codes via Transfer Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Odds Exponential-Pareto IV Distribution: Regression Model and Application

by
Lamya A. Baharith
1,*,
Kholod M. AL-Beladi
1,2 and
Hadeel S. Klakattawi
1
1
Department of Statistics, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia
2
Department of Statistics, Faculty Science, University of Jeddah, Jeddah 21959, Saudi Arabia
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(5), 497; https://doi.org/10.3390/e22050497
Submission received: 19 March 2020 / Revised: 15 April 2020 / Accepted: 23 April 2020 / Published: 25 April 2020
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
This article introduces the odds exponential-Pareto IV distribution, which belongs to the odds family of distributions. We studied the statistical properties of this new distribution. The odds exponential-Pareto IV distribution provided decreasing, increasing, and upside-down hazard functions. We employed the maximum likelihood method to estimate the distribution parameters. The estimators performance was assessed by conducting simulation studies. A new log location-scale regression model based on the odds exponential-Pareto IV distribution was also introduced. Parameter estimates of the proposed model were obtained using both maximum likelihood and jackknife methods for right-censored data. Real data sets were analyzed under the odds exponential-Pareto IV distribution and log odds exponential-Pareto IV regression model to show their flexibility and potentiality.

1. Introduction

Pareto distribution was named after the Italian economist Vilfredo Pareto (1848–1923). The Pareto distribution has gained considerable attention in modeling many applications with heavy-tailed distributions, such as income distribution, earthquakes, forest fire areas, and disk drive sector errors [1,2]. The Pareto IV family is a general family of distributions. Pareto I, Pareto II, and Pareto III distributions are special cases of the Pareto IV family. Also, the Burr family can be regarded as a special case of Pareto IV (see, [3,4]). There are several studies in the literature generalizing the Pareto distribution to make it richer and more flexible for modeling data. These include the generalized Pareto [5], beta-Pareto [6], beta-generalized Pareto [7], Weibull–Pareto [8], gamma-Pareto [9,10], Kumaraswamy exponentiated Pareto [11], and exponentiated Weibull–Pareto distribution [12].
In recent works, adding new parameters to existing distributions or using different methods makes the resulting new distribution more appropriate and efficient for modeling the lifetime data. Many distributions have been generalized in the literature. These include the logit of the Kumaraswamy distribution [13], the generalized beta-generated distribution [14], the Weibull-G family of distribution [15], the gamma-exponentiated exponential distribution [16], and the transmuted Weibull-Pareto distribution [17]. Very recently, some new odd distributions were proposed in the literature, such as the odd Birnbaum–Saunders distribution [18], the odd Burr-III family of distributions [19], the odds exponential-log logistic distribution [20], the odd log-logistic-Fréchet distribution [21], the odd log-logistic-Burr XII distribution [22], the odd exponentiated half-logistic Burr XII distribution [23], the odd Lomax-G family of distributions [24], the odd Dagum-G family of distributions [25], and the odd log-logistic Lindley-exponential distribution [26].
This article used the transformed-transformer (T-X) family by Alzaatreh et al. [27] to introduce an odds exponential-Pareto IV distribution, in which the cumulative distribution function (CDF) is defined by
G ( x ) = a W ( F ( x ) ) r ( t ) d t = R { W ( F ( x ) ) } ,
where r(t) is the probability density function (PDF) of a random variable T [ a , b ] , such that a < b and W(F(x)) is a function of any CDF, that takes different forms, see Alzaatreh et al. [27]. In this study, we consider the odds function form, W ( F ( x ) ) = F ( x ) 1 F ( x ) . That is, the CDF will be
G ( x ) = 0 F ( x ) 1 F ( x ) r ( t ) d t = R F ( x ) 1 F ( x ) ,
and we considered the exponential distribution for r ( t ) = λ e λ t , t 0 , and F ( x ) = 1 1 + x θ 1 a α , x > 0 , is the Pareto IV distribution with parameters ( a , θ , α ) in Equation (2). The resulting generated distribution will provide more flexibility in accommodating different types of the hazard function for the generated distribution. Also, this proposed distribution will be more suitable for modeling and fitting different real-life data
Therefore, we now define the odds exponential-Pareto IV (OEPIV) distribution with CDF given by
G ( x ; λ , a , θ , α ) = 1 exp λ 1 + x θ 1 a α 1 , x > 0 .
The PDF of OEPIV is
g ( x ; λ , a , θ , α ) = λ α a θ exp ( λ ) x θ 1 a 1 1 + x θ 1 a α 1 exp λ 1 + x θ 1 a α , x > 0 ,
where λ > 0 , α > 0 are the shape parameters, θ > 0 is the scale parameter, and a > 0 is the inequality parameter.
Recently, there has been a great deal of interest in the literature investigating the relationship between survival time and some other covariates, such as sex, weight, blood pressure, and many others. In a number of applications, different parametric regression models were used to estimate the effect of covariate variables on the survival time, including the log-location-scale regression model. The log-location-scale regression model is distinguished since it is commonly used in clinical trials and in many other fields of application. It is also widely used in engineering models where failure is accelerated by voltage, temperature, or other stress factors [28]. Several studies in the literature applied the log-location-scale regression model based on different distributions, such as the log-modified Weibull [29], the log-Weibull extended [30], the log-exponentiated Weibull [31], the log-Burr XII [32], the log-beta Weibull [33], the log-beta log-logistic [34], the log-Fréchet [35], the log-Exponentiated Fréchet [36], and the log-gamma-logistic [37]. Recent studies used the log-location-scale regression model built from the logarithm odd of the distribution. For instance, the odd log-logistic-Weibull [38], odd log-logistic generalized half normal [39], and odd Weibull [40].
This article is organized as follows: In Section 2, we define the survival and hazard functions of the OEPIV distribution with some graphical representations. We derived some of the OEPIV properties in Section 3. In Section 4, we explain the maximum likelihood estimation for parameters of the odds exponential-Pareto IV distribution. Simulation studies are provided to illustrate the performance of the OEPIV distribution in Section 5. In Section 6, we address the log odds exponential-Pareto IV (LOEPIV) distribution along with some of its statistical properties, in addition to introducing a log-location regression model based on LOEPIV and discussed its parameter estimates via maximum likelihood and Jackknife methods. In Section 7, three applications are analyzed to demonstrate the performance of the introduced new distribution and its regression model. Finally, we report our conclusions in Section 8.

2. The Odds Exponential-Pareto IV Distribution

The survival (SF) and hazard functions (HF) are, respectively, as follows:
S F ( x ; λ , a , θ , α ) = exp λ 1 + x θ 1 a α 1 ,
H F ( x ; λ , a , θ , α ) = λ α a θ x θ 1 a 1 1 + x θ 1 a α 1 .
The Exponential-Pareto (EP) distribution [41] can be treated as a special case of OEPIV distribution by setting α = 1 and 1 / a = θ . For α = 1 , 1 / a = σ and λ = 1 / β , we obtain the odds exponential-log logistic (OELL) distribution [20].
Graphical representations of the PDF in Equation (4) and HF in Equation (6) are, respectively, shown in Figure 1 and Figure 2. From Figure 1, we note that the OEPIV distribution has different shapes at different parameter values, which indicate its great flexibility. Based on Figure 2, the OEPIV takes the following HF shapes: increasing, decreasing, and upside-down.

3. Statistical Properties

We discuss in this section some statistical properties of the OEPIV distribution.

3.1. The Quantile and Median

The quantile of the OEPIV distribution is computed as
q O E P I V = θ log ( 1 p ) λ + 1 1 α 1 a .
Then, the median of the OEPIV distribution can be obtained by setting p = 0.5 in Equation (7),
M e d = θ log ( 2 ) λ + 1 1 α 1 a .

3.2. The Mode

The mode of the OEPIV distribution can be obtained by computing the derivative of the log PDF in Equation (4) with respect to x and equating to zero
d d x log g ( x ; λ , a , θ , α ) = 0
( 1 / a 1 ) x + ( α 1 ) ( x / θ ) 1 / a 1 a θ ( 1 + ( x / θ ) 1 / a ) λ α θ a ( x / θ ) 1 / a 1 ( 1 + ( x / θ ) 1 / a ) α 1 = 0 .
Thus, the mode can be obtained numerically by solving Equation (9).

3.3. The r-th Order Moment and Moment Generating Function

The r-th order raw moment is defined as
μ r = 0 x r g ( x ; λ , a , θ , α ) d x .
Thus,
μ r = 0 x r λ α a θ exp ( λ ) x θ 1 a 1 1 + x θ 1 a α 1 exp λ 1 + x θ 1 a α d x .
Let
u = λ 1 + x θ 1 a α d u = λ α a θ x θ 1 a 1 1 + x θ 1 a α 1 d x .
Also, x = θ u λ 1 / α 1 a .
Thus, we put the above formulas in the integration to have
μ r = e λ θ r λ u λ 1 / α 1 a r e u d u .
Using the binomial expansion of u λ 1 / α 1 a r , we obtain
μ r = k = 0 a r k ( 1 ) k e λ θ r λ a r k α λ u ( a r k ) / α e u d u .
Using the gamma function definition,
Γ ( s , x ) = x t s 1 e t d t .
Thus, the r-th moment can be written as
μ r = E ( x r ) = k = 0 a r k ( 1 ) k e λ θ r λ a r k α Γ ( a r k α + 1 , λ ) .
Therefore, the moment generating function (mgf) can be obtained based on r-th moment of OEPIV distribution as
M x ( t ) = E ( e t x ) = r = 0 t r r ! μ r .
Substituting from Equation (10) into Equation (11), we find
M x ( t ) = r = 0 k = 0 a r k ( 1 ) k ( θ t ) r r ! λ a r k α e λ Γ ( a r k α + 1 , λ ) .
Then, the mean of the OEPIV distribution is
μ 1 = E ( x ) = k = 0 a k ( 1 ) k e λ θ λ a k α Γ ( a k α + 1 , λ ) .
The mean, variance, skewness, and kurtosis of the OEPIV distribution for different values of λ , a, θ , and α are calculated in Table 1, to illustrate the effects on these measures.

3.4. Order Statistics

Suppose X 1 , X 2 , X 3 , , X n is a random sample from the PDF in Equation (4). Let X ( 1 ) , X ( 2 ) , X ( 3 ) , , X ( n ) , denote the corresponding order statistic. The probability density function and the cumulative distribution function of the k t h order statistic, say Y = X ( k ) , given by
f Y ( y ) = n ! ( k 1 ) ! ( n k ) ! F k 1 ( y ) [ 1 F ( y ) ] n k f ( y ) ,
where f ( y ) and F ( y ) are the PDF and CDF of OEPIV distribution given by Equations (4) and (3), respectively. Using the binomial expansion of [ 1 F ( y ) ] n k , given as follows
[ 1 F ( y ) ] n k = i = 0 n k n k i ( 1 ) i [ F ( y ) ] i .
Substituting Equation (13) into (12), we have
f Y ( y ) = n ! ( k 1 ) ! ( n k ) ! f ( y ) i = 0 n k n k i ( 1 ) i [ F ( y ) ] i + k 1 .
Substituting Equations (3) and (4) into (14), we obtain
f ( y ) = n ! ( k 1 ) ! ( n k ) ! i = 0 n k ( 1 ) i n k i λ α a θ exp ( λ ) y θ 1 a 1 1 + y θ 1 a α 1 1 exp λ 1 + y θ 1 a α 1 i + k 1 exp λ 1 + y θ 1 a α
Using binomial expansion of 1 exp λ 1 + y θ 1 a α 1 i + k 1 , we get
f ( y ) = n ! ( k 1 ) ! ( n k ) ! j = 0 i = 0 n k n k i i + k 1 j ( 1 ) i + j λ α a θ exp ( λ ) y θ 1 a 1 1 + y θ 1 a α 1 exp λ j 1 + y θ 1 a α 1 exp λ 1 + y θ 1 a α
f ( y ) = n ! ( k 1 ) ! ( n k ) ! λ α a θ j = 0 i = 0 n k n k i i + k 1 j ( 1 ) i + j exp ( λ ( 1 + j ) ) y θ 1 a 1 1 + y θ 1 a α 1 exp λ 1 + y θ 1 a α 1 + j .

3.5. Rényi Entropy

The Rényi entropy of a random variable X represents a measure of variation of the uncertainty. It is given by
H R ( x ) = 1 1 R log 0 g ( x ) R d x , R > 0 , R 1 .
Using the PDF in Equation (4), we can write
g ( x ) R = α λ exp ( λ ) a θ R ( x θ ) 1 / a 1 R 1 + ( x θ ) 1 / a α 1 R exp R λ 1 + ( x θ ) 1 / a α .
I R ( x ) = 0 g ( x ) R d x
= 0 α λ exp ( λ ) a θ R ( x θ ) 1 / a 1 R 1 + ( x θ ) 1 / a α 1 R exp R λ 1 + ( x θ ) 1 / a α d x
Let u = R λ 1 + ( x θ ) 1 / a α , so
I R ( x ) = e λ R R α λ a θ R 1 0 u R λ R ( 1 1 α ) + 1 α 1 u R λ 1 α 1 R ( 1 a ) + a 1 e u d u .
Using binomial expansion of u R λ 1 α 1 R ( 1 a ) + a 1 , given as follows
u R λ 1 α 1 R ( 1 a ) + a 1 = k = 0 R ( 1 a ) + a 1 k ( 1 ) k u R λ R ( 1 a ) + a 1 k α .
Thus, we put the above formula in the integration to have
I R ( x ) = e λ R R α λ a θ R 1 k = 0 R ( 1 a ) + a 1 k ( 1 ) k 1 R λ 1 α ( a ( 1 R ) k ) + R 1 0 u 1 α ( a ( 1 R ) k ) + R 1 e u d u
I R ( x ) = e λ R α a θ R 1 k = 0 R ( 1 a ) + a 1 k ( 1 ) k λ 1 / α ( a ( 1 R ) k ) Γ ( 1 / α ( a ( 1 R ) k ) + R ) R 1 / α ( ( 1 R ) k ) + R .
log ( I R ( x ) ) = λ R + ( R 1 ) log α a θ + log k = 0 R ( 1 a ) + a 1 k ( 1 ) k λ 1 / α ( a ( 1 R ) k ) Γ ( 1 / α ( a ( 1 R ) k ) + R ) R 1 / α ( ( 1 R ) k ) + R .
The Rényi entropy of the OEPIV distribution is
H R ( x ) = λ R 1 R log α a θ + 1 1 R log k = 0 R ( 1 a ) + a 1 k ( 1 ) k λ 1 / α ( a ( 1 R ) k ) Γ ( 1 / α ( a ( 1 R ) k ) + R ) R 1 / α ( ( 1 R ) k ) + R .

4. Estimation of the OEPIV Parameters

We assume that x 1 , x 2 , , x n is a random sample from the OEPIV distribution. Then, the log-likelihood () for ϕ = ( λ , a , θ , α ) is
= n log ( λ ) + n log ( α ) n log ( a ) n log ( θ ) + n λ + ( 1 a 1 ) i = 1 n log ( x i θ ) + ( α 1 ) i = 1 n log ( h i ) λ i = 1 n ( h i ) α ,
where h i = 1 + ( x i θ ) 1 / a . The likelihood equations are given by
λ = n λ + n i = 1 n ( h i ) α ,
a = n a 1 a 2 i = 1 n log ( x i θ ) ( α 1 ) a 2 i = 1 n 1 h i ( x i θ ) 1 / a ln ( x i θ ) + λ α a 2 i = 1 n h i α 1 ( x i θ ) 1 / a ln ( x i θ ) ,
θ = n θ ( 1 / a ) 1 θ ( α 1 ) a θ i = 1 n 1 h i ( x i θ ) 1 / a + λ α a θ i = 1 n ( x i θ ) ( 1 / a ) h i α 1 ,
and
α = n α + i = 1 n log ( h i ) λ i = 1 n h i α log ( h i ) .
We can obtain maximum likelihood (ML) estimates of the parameters by directly maximizing Equation (17) using the nlm or optim functions in R package or by solving Equations (18)–(21). Under standard regularity conditions, we can obtain approximate intervals estimation of the parameters using multivariate normal distribution N 4 ( 0 , J ( ϕ ^ ) 1 ) by numerically evaluating the elements of the 4 × 4 observed information matrix J ( ϕ ) at ϕ ^ , J ( ϕ ) = 2 ϕ j ϕ k . In addition, the likelihood ratio (LR) test can be applied to discriminate between nested models.

5. Simulation Studies

We conducted a Monte Carlo simulation to illustrate the performance of the ML parameter estimates of the OEPIV distribution. That is, we randomly generated 10,000 samples with size 30, 50, 100, 200, and 500 from the OEPIV distribution for two different sets of parameter values as follows:
S e t I : λ = 0.3 , a = 0.4 , θ = 0.5 , α = 0.2 .
S e t I I : λ = 0.2 , a = 0.1 , θ = 0.6 , α = 0.15 .
The estimates for the parameters were obtained along with their calculated bias and mean square error (MSE), given by
B i a s ^ b = 1 n i = 1 n ( b ^ i b ) ,
M S E ^ b = 1 n i = 1 n ( b ^ i b ) 2 ,
where b = λ , θ , a , α . The results of the simulation are displayed in Table 2. We concluded from these results that the empirical means tend to the true value of the parameters as the sample size increases. In addition, the MSEs and biases decreased as we increased the sample size.

6. The Log Odds Exponential-Pareto IV Regression Model

If X is a random variable from the OEPIV distribution, as given in Equation (4), then Y = l o g ( X ) is a random variable that has a LOEPIV distribution with the transformation parameter σ = a and μ = log ( θ ) . Therefore, the PDF and CDF of the LOEPIV distribution are as follows:
f ( y ; λ , α , σ , μ ) = λ α σ exp ( λ ) exp y μ σ 1 + exp y μ σ α 1 exp λ 1 + exp y μ σ α ,
F ( y ; λ , α , σ , μ ) = 1 exp ( λ ) exp λ 1 + exp y μ σ α , < y <
where σ > 0 is the scale parameter, λ > 0 , α > 0 are the shape parameters, and < μ < is the location parameter. The LOEPIV model becomes the log exponential-Pareto (LEP) distribution for α = 1 . The PDF (for < y < ) of the LEP distribution with parameters λ > 0 , σ > 0 and < μ < , is
f ( y ) = λ σ exp ( λ ) exp y μ σ exp λ 1 + exp y μ σ
The SF and HF are given by
S F ( y ; λ , α , σ , μ ) = exp ( λ ) exp λ 1 + exp y μ σ α ,
H F ( y ; λ , α , σ , μ ) = λ α σ exp y μ σ 1 + exp y μ σ α 1 .
The following are the properties for the LOEPIV distribution:
The quantile of the LOEPIV distribution
y = σ ln 1 1 λ ln ( 1 p ) 1 α 1 + μ .
The mode of the LOEPIV distribution
d d y log f ( y ; σ , μ ) = 1 σ 1 + ( α 1 ) exp y μ σ 1 + exp y μ σ λ α 1 + exp y μ σ α 1 exp y μ σ = 0 .
Then, the mode can be obtained by solving Equation (27) numerically.
The median of the LOEPIV distribution
M e d = σ ln 1 + 1 λ ln ( 2 ) 1 α 1 + μ .
The mgf of LOEPIV distribution
M Y ( t ) = exp ( t y ) f ( y ; λ , α , σ , μ ) d y .
Thus,
= exp ( t y ) λ α σ exp ( λ ) exp y μ σ 1 + exp y μ σ α 1 exp λ 1 + exp y μ σ α d y .
Substituting u = 1 + exp y μ σ α d u = α σ exp y μ σ 1 + exp y μ σ α 1 , will reduce the above integration to
M Y ( t ) = λ e λ exp ( t μ ) 1 u 1 / α 1 t σ e λ u d u .
Then, using the binomial expansion
u 1 / α 1 t σ = j = 0 t σ j ( 1 ) j u 1 / α t σ j ,
M Y ( t ) can be rewritten as
M Y ( t ) = λ e λ exp ( t μ ) j = 0 t σ j ( 1 ) j 1 u 1 / α t σ j e λ u d u .
Using the gamma function. Thus, the mgf of LOEPIV distribution is as follows
M Y ( t ) = e λ exp ( t μ ) j = 0 t σ j ( 1 ) j 1 λ t σ j α Γ t σ j α + 1 , λ .
The standardized random variable for y in Equation (22) is defined as z = ( y μ ) / σ , then z has the following PDF
f ( z ) = λ α exp ( λ ) exp ( z ) ( 1 + exp ( z ) ) α 1 exp { λ ( 1 + exp ( z ) ) α } , < z <
with SF given as
S F ( z ) = exp ( λ ) exp { λ ( 1 + exp ( z ) ) α } .
Hence, a linear location-scale regression model with response variable y i and explanatory vector x i = ( x i 1 , , x i p ) T can be defined as
y i = β T x i + σ z i , i = 1 , 2 , , n ,
where z i is the random error with PDF in Equation (24), β = ( β 1 , , β p ) T , and σ > 0 , λ > 0 , and α > 0 are the unknown parameters. y i is the location of μ i = β T x i and the location vector μ = ( μ 1 , , μ n ) T can be represented as a linear model μ = β T x , in which ( x 1 , , x n ) T is the known model matrix. Therefore, the SF of Y i | x is expressed as:
S F ( y i | x ) = exp ( λ ) exp λ 1 + exp y i β T x i σ α .

6.1. Estimation of the LOEPIV Regression Model

6.1.1. ML Method

For the right-censored lifetime data, we have t i = min ( f i , c i ) , where f i is the lifetime and c i is the censoring time, then, we have y i = log ( t i ) for the i t h individual i = 1 , , n . If we have a random sample with n observations ( y 1 , τ 1 , x 1 ) ,..., ( y n , τ n , x n ) , where τ i = 1 for y i = log ( t i ) 0 for y i = log ( c i ) , and assuming the censoring and lifetimes are independent and random. Then, the likelihood function for the regression model in (31) with θ = ( λ , α , σ , β ) T assuming right censoring is as follows:
L ( θ ) = i = 1 n ( f ( y i ) ) τ i ( S F ( y i ) ) 1 τ i ,
where f ( y i ) and S F ( y i ) are given by Equations (17) and (19) of Y i , respectively. The for θ reduces to
= r log ( λ ) + r log ( α ) r log ( σ ) + r λ + i = 1 n τ i [ z i + ( α 1 ) log ( 1 + exp ( z i ) ) λ ( 1 + exp ( z i ) ) α ] + i = 1 n ( 1 τ i ) log ( exp ( λ ) exp [ λ ( 1 + exp ( z i ) ) α ] ) ,
where i = 1 n τ i = r represents the uncensored data, and z i = ( y i β T x i ) / σ . The ML estimate for the parameter vector θ could be obtained using an optimization algorithm that maximizes Equation (32).

6.1.2. Jackknife Method

The jackknife technique was developed by Quenouille (1949) to estimate the bias of an estimator. It is an alternative method to estimate the LOEPIV parameters based on “leaving one out”.
Suppose that θ ^ is the parameter estimation of the whole sample and θ ^ i is the parameter estimation when we dropped the i t h observation from the data. That is, the pseudo-value of the i t h observation is obtained as
θ ˜ i = n θ ^ ( n 1 ) θ ^ i .
Then, the jackknife estimate of θ is the mean of pseudo-values, denoted θ ^ * is
θ ^ * = 1 n i = 1 n θ ˜ i .
For more details, see [42,43,44].

6.2. Sensitivity Analysis: Global Influence

Global influence, introduced by [45], is used to conduct a sensitivity analysis that represents the diagnostic effect depending on the case deletion. Case deletion measures the impact of dropping the i t h observation from the data set on the estimate of the parameters. That is, this method is based on comparing the difference of θ ^ and θ ^ i where θ ^ i is the estimated parameters when the i t h observation is dropped from data. If θ ^ i is distant from θ ^ , then this case is considered as influential. The case deletion model for the LOEPIV regression Model (31) is
Y J = β T x i + σ Z i ; J = 1 , 2 , , n , J i .
We denote the ML estimate of θ when the i t h observation is dropped by θ ^ i = ( λ ^ ( i ) , α ^ ( i ) , σ ^ ( i ) , β ^ ( i ) ) T . Then, we describe two methods of global influence below.

6.2.1. Generalized Cook Distance

Generalized Cook distance (GD) is the first measure of global influence and is defined as
G D i ( θ ) = ( ( θ ^ i θ ^ ) ) T { M ¨ ( θ ^ ) } ( θ ^ i θ ^ ) ,
where M ¨ ( θ ^ ) denotes the observed information matrix.

6.2.2. Likelihood Distance

Likelihood distance (LD) measures the differences between θ ^ and θ ^ i , and is given by
L D i ( θ ) = 2 { ( θ ^ ) ( θ ^ i ) } ,
where ( θ ^ i ) is the log likelihood function of θ when the i t h observation is dropped from the data.

6.3. Residual Analysis

In the regression model, checking the assumptions and appropriateness of the fitted model is an essential step. Therefore, we used residual analysis to check the assumptions and detect outlier observations. In this study, we consider the following types.

6.3.1. Martingale Residual

Barlow and Prentice [46] proposed the martingale residual as
r M i = δ i + log ( S F ( y i ; θ ^ ) ) ,
where δ i denotes the censor indicator, where δ i = 0 , if the i t h observation is censored, and δ i = 1 , if the i t h observation is not censored, and S F ( y i ; θ ^ ) denotes the SF for the regression model. Therefore, the martingale residual of the LOEPIV regression model is
r M i = 1 + log [ exp ( λ ) exp ( λ ( 1 + exp ( z ^ i ) ) α ) ] if i l if e t i m e log [ exp ( λ ) exp ( λ ( 1 + exp ( z ^ i ) ) α ) ] if i c e n s o r e d
where r M i has a range between and 1 and has skewness. Thus, the transformation of r M i will be used to reduce the skewness.

6.3.2. Deviance Residual

This is a further improvement of the martingale residual, which reduces the skewness and make it more symmetrical, around zero. It can be expressed as
r D i = s i g n ( r M i ) 2 [ r M i + δ i log ( δ i r M i ) ] ,
where r M i is defined in Equation (36), and the deviance for the LOEPIV regression model is
r D i = s i g n ( 1 + log [ exp ( λ ) exp ( λ ( 1 + exp ( z ^ i ) ) α ) ] ) 2 { 1 + log [ exp ( λ ) exp ( λ ( 1 + exp ( z ^ i ) ) α ) ] + log ( log [ exp ( λ ) exp ( λ ( 1 + exp ( z ^ i ) ) α ) ] ) } 1 2 if i l if e t i m e s i g n ( log [ exp ( λ ) exp ( λ ( 1 + exp ( z ^ i ) ) α ) ] ) { 2 { log [ exp ( λ ) exp ( λ ( 1 + exp ( z ^ i ) ) α ) ] } } 1 2 if i c e n s o r e d .

7. Simulation Study for the Log Odds Exponential-Pareto IV Regression Model

We performed a Monte Carlo simulation to explore the empirical distribution of the r M i and r D i for different values of n and different censoring levels. The lifetimes t 1 , , t n were from the OEPIV distribution in Equation (4), and x i was generated from uniform ( 0 , 1 ) . We sampled the censoring times c 1 , , c n from uniform ( 0 , ρ ) , where ρ was adjusted until we obtained the required censoring level. For each fit, the log lifetimes were obtained as y i = min { log ( t i ) , log ( c i ) } . We generated 1000 samples. For each selection of n , λ , α , σ , β 0 , and β 1 , and the censoring levels. The simulation was conducted for n = 30 , 50, and 100 with λ = 0.3 , α = 0.36 , σ = 0.6 , β 0 = 0.6 , and β 1 = 1 , and the censoring levels 0.1, 0.3, and 0.5. Figure 3 and Figure 4 present normal probability plots (NPP) for the residuals. These figures show that the r D i empirical distribution provided more agreement with the standard normal distribution (SND) compared to r M i . r D i also approached the SND as we increased the sample size or decreased the censoring level.

8. Applications

We analyzed three real data sets to investigate the flexibility of the OEPIV distribution and the LOEPIV regression model.

8.1. The Strength of Glass Fibers Data

This data was analyzed by [47], and it represents the strength of glass fibers with the length 1.5 cm. This data consists of 63 observations.
We will compare the fits of the OEPIV with the Pareto IV, Weibull BurrXII (WBXII) in [48], Weibull Frechet (WFr) in [49], Weibull Lomax (WL) in [50], Odd exponential-weibull (OE-W), Odd exponential-normal (OE-N) in [51], and Gamma distributions.
We considered the following criteria to compare these distributions: the values of the negative log-likelihood function ( ^ ), Akaike information criterion (AIC), and corrected Akaike Information Criterion (CAIC). The smaller the values for these statistics, the better the fit to the data.
The ML estimates, standard errors (SE), ^ , AIC and CAIC statistics for the OEPIV, WBXII, WL, WFr, Pareto IV,OE-W, OE-N, and Gamma distributions are reported in Table 3. From the results in Table 3, it is clear that the OEPIV distribution provides better fit for the data having lowest AIC and CAIC values and could be selected as a more appropriate model than other models. Figure 5 displays the QQ-plot of the OEPIV distribution and the estimated PDFs of the fitted distributions. It is clear from these plots that the OEPIV captures the skewness of the glass fibers data than other competitive fitted distributions.

8.2. Sum of Skin Folds Data

The authors of [52] discussed this data set, and it represents 102 male and 100 female athletes collected at the Australian Institute of Sports, provided by Richard Telford and Ross Cunningham.
We compare the ML estimates and their corresponding SE, and the values of the ( ^ ), and the AIC and CAIC statistic for fitted OEPIV distribution with the results of the Kumaraswamy Pareto-IV (KwPIV) in [53], gamma-Pareto IV (GPIV) [10], Pareto IV (PIV) in [53], and exponentiated Pareto (EP) distributions provided in [54], and the Weibull distribution. These results are reported in Table 4. From the results in Table 4, it is clear that the OEPIV distribution provides the lowest AIC and CAIC values among those of the fitted distributions. Therefore, OEPIV could be selected as the best modal for this data. Figure 6 displays the QQ-plot of the OEPIV distribution and the estimated PDFs of the fitted distributions. It is clear from these plots that the OEPIV provides a good fit to this data.

8.3. Stanford Heart Transplant Data

This data was obtained from Kalbfleisch and Prentice [55] and has information on n = 103 patients. The patient’s survival time was specified as the number of days from the acceptance into a heart transplant program to death. The following are associated with each patient: y i : log survival time (days); s t a t u s i : censoring indicator (1 = dead, 0 = censoring); x i 1 : is the age (in years); x i 2 : is the prior surgery coded as (0 = No, 1 = Yes); and x i 3 : is the transplant coded as (0 = No, 1 = Yes). This data set was used by [38], [35], and [36] for illustrating the log-odd log-logistic Weibull (LOLLW), log-Fréchet (LF), and log-exponentiated Fréchet (LEF) regression models. The LOEPIV regression model will be compared with the log-Weibull (LW), LEP, LOLLW, LF, and LEF regression models.
That is, we present the results from fitting the following model
y i = β 0 + β 1 x i 1 + β 2 x i 2 + β 3 x i 3 + σ z i ,
where y i follows the LOEPIV distribution in Equation (22).
To examine the suitability of the proposed model, a plot of the empirical SF estimates from the Kaplan–Meier (KM) model and the SF from the fitted OEPIV model are displayed in Figure 7. Therefore, we concluded that the logarithm of times to event follow the LOEPIV distribution.

8.3.1. ML and Jackknife Estimation

The estimates, their corresponding SE, p-values, AIC, CAIC, and Bayesian Information Criterion (BIC) statistics for the LOEPIV, LEF, LOLLW, LF, LW and LEP regression models are shown in Table 5. The results demonstrated that the LOEPIV regression model had the lowest AIC, CAIC, and BIC. This shows the superiority of the LOEPIV model over other models. The LR test can be used to discriminate between LOEPIV and LEP regression models since they are nested.That is, the LR statistic for testing the hypotheses H 0 : α = 1 versus H 1 : H 0 is not true given in Table 6 and rejects the LEP model in favor of the LOEPIV model.
Table 7 lists the jackknife parameter estimates of the LOEPIV model, their corresponding SE and 95% confidence intervals. Based on the results in Table 5 and Table 7, we observed that the explanatory variables x 1 , x 2 , and x 3 are significant for the fitted model and both methods displayed similar estimates.
The plots of the SF that corresponded to the explanatory variables for the fitted LOEPIV regression model are presented in Figure 8. From Figure 8a, we observed that S ^ ( 1 | a g e = 8 ) = 0.96808 , which means that ≈ 97% of the patients who are 8 years old will be thriving when y = 1 (≈3 days). However, for patients between 44 and 64 years old, S ^ ( 1 | a g e = 44 ) = 0.34676 and S ^ ( 1 | a g e = 64 ) = 0.00064 , which indicated that the percentages of living patients at y = 1 decreased to 34% and 0.06%, respectively. These results indicate decreases in survival of the patients as their age increased. Similarly, Figure 8b,c indicated that approximately 58% of patients who did not have surgery or receive a transplant were thriving at y = 3 (≈21 days). Furthermore, for the patients who undertook surgery, we observed that approximately 98% of them were thriving at y = 3, while patients that received a transplant, S ^ ( 3 | t r a n s p l a n t = 1 ) = 0.9943 , increased to 99% at y = 3 in the survival percentage. Therefore, it can be stated that receiving a heart transplant increased the survival time when undergoing surgery.

8.3.2. Global Influence Analysis

The case deletion measures G D i ( θ ) and L D i ( θ ) were numerically computed and Figure 9 represents the influence measure index plots. It is clear that case 99 could be an influential observation in the LOEPIV regression model.

8.3.3. Residual Analysis

In order to detect possible outlaying observations, a plot for the r D i versus the observations index is shown in Figure 10a. This demonstrated that almost all of the observations fall within (−3, 3), except for observation 8. Therefore, observation 8 was a possible outlier. Figure 10b shows the NPP for the deviance residuals with a generated envelope. Approximately all of the observations fell inside the envelope, which indicated that the proposed model was appropriate to fit the heart transplant data.

9. Concluding Remarks

In this article, we introduced the odd exponential-Pareto IV distribution. We derived some of its statistical and mathematical properties. The model parameters were estimated using the ML method, and simulation studies were carried out to examine the performance of the ML estimators based on biases and mean squared errors. Moreover, a new log-location regression model for censored data based on the OEPIV distribution was introduced. The ML and jackknife estimation methods for right censored data were used to estimate the unknown parameters of the new regression model. The model assumptions were checked using martingale and deviance residuals. Furthermore, generalized Cook and likelihood distance measures were defined to detect the influence observations for the regression model. Finally, we analyzed three real data sets to examine the usefulness of the OEPIV distribution and LOEPIV regression model. The results demonstrated that the OEPIV distribution outperformed other competitive distributions in terms of goodness of fit. In addition, the LOEPIV regression model provides a good fit for the Stanford heart transplant data.

Author Contributions

Conceptualization, L.A.B. and H.S.K.; methodology, L.A.B. and H.S.K.; software, L.A.B. and K.M.A.-B.; validation, L.A.B., H.S.K. and K.M.A.-B.; formal analysis, K.M.A.-B.; investigation of inference, H.S.K. and K.M.A.-B.; writing–original draft preparation, K.M.A.-B.; writing–review and editing, L.A.B. and H.S.K.; visualization, L.A.B., H.S.K. and K.M.A.-B.; supervision, L.A.B. and H.S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank the referees and the editor for carefully reading the paper and for their great help in improving the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Burroughs, S.M.; Tebbens, S.F. Upper-truncated power laws in natural systems. Pure Appl. Geophys. 2001, 158, 741–757. [Google Scholar] [CrossRef]
  2. Schroeder, B.; Damouras, S.; Gill, P. Understanding latent sector errors and how to protect against them. ACM Trans. Storage (TOS) 2010, 6, 9. [Google Scholar] [CrossRef]
  3. Brazauskas, V. Information matrix for Pareto (IV), Burr, and related distributions. Commun. Stat. Theory Methods 2003, 32, 315–325. [Google Scholar] [CrossRef]
  4. Arnold, B. Pareto Distributions; International Co-operative Publishing House: Fairland, MD, USA, 1983. [Google Scholar]
  5. Pickands, J., III. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar]
  6. Akinsete, A.; Famoye, F.; Lee, C. The beta-Pareto distribution. Statistics 2008, 42, 547–563. [Google Scholar] [CrossRef]
  7. Mahmoudi, E. The beta generalized Pareto distribution with application to lifetime data. Math. Comput. Simul. 2011, 81, 2414–2430. [Google Scholar] [CrossRef]
  8. Alzaatreh, A.; Famoye, F.; Lee, C. Weibull-Pareto distribution and its applications. Commun. Stat. Theory Methods 2013, 42, 1673–1691. [Google Scholar] [CrossRef]
  9. Alzaatreh, A.; Famoye, F.; Lee, C. Gamma-Pareto distribution and its applications. J. Mod. Appl. Stat. Methods 2012, 11, 7. [Google Scholar] [CrossRef]
  10. Alzaatreh, A.; Ghosh, I. A study of the Gamma-Pareto (IV) distribution and its applications. Commun. Stat. Theory Methods 2016, 45, 636–654. [Google Scholar] [CrossRef]
  11. Elbatal, I. The Kumaraswamy exponentiated Pareto distribution. Econ. Qual. Control 2013, 28, 1–8. [Google Scholar] [CrossRef]
  12. Afify, A.Z.; Yousof, H.M.; Hamedani, G.; Aryal, G. The exponentiated Weibull-Pareto distribution with application. J. Stat. Theory Appl. 2016, 15, 328–346. [Google Scholar] [CrossRef] [Green Version]
  13. Cordeiro, G.M.; de Castro, M. A new family of generalized distributions. J. Stat. Comput. Simul. 2011, 81, 883–898. [Google Scholar] [CrossRef]
  14. Alexander, C.; Cordeiro, G.M.; Ortega, E.M.; Sarabia, J.M. Generalized beta-generated distributions. Comput. Stat. Data Anal. 2012, 56, 1880–1897. [Google Scholar] [CrossRef]
  15. Bourguignon, M.; Silva, R.B.; Cordeiro, G.M. The Weibull-G family of probability distributions. J. Data Sci. 2014, 12, 53–68. [Google Scholar]
  16. Ristić, M.M.; Balakrishnan, N. The gamma-exponentiated exponential distribution. J. Stat. Comput. Simul. 2012, 82, 1191–1206. [Google Scholar] [CrossRef]
  17. Afify, A.Z.; Yousof, H.M.; Butt, N.S.; Hamedani, G.G. The transmuted Weibull-Pareto distribution. Pakistan J. Stat. 2016, 32, 183–206. [Google Scholar]
  18. Ortega, E.M.; Lemonte, A.J.; Cordeiro, G.M.; Nilton da Cruz, J. The odd Birnbaum–Saunders regression model with applications to lifetime data. J. Stat. Theory Pract. 2016, 10, 780–804. [Google Scholar] [CrossRef]
  19. Jamal, F.; Nasir, M.A.; Tahir, M.; Montazeri, N.H. The odd Burr-III family of distributions. J. Stat. Appl. Probab. 2017, 6, 105–122. [Google Scholar] [CrossRef]
  20. Rosaiah, K.; Gadde, S.R.; Kalyani, K.; Charana Udaya Sivakumar, D. Odds Exponential Log Logistic Distribution: Properties and Estimation. J. Math. Stat. 2017, 13, 14–23. [Google Scholar] [CrossRef]
  21. Yousof, H.M.; Altun, E.; Hamedani, G. A New Extension Of FrÉChet Distribution With Regression Models, Residual Analysis And Characterizations. J. Data Sci. 2018, 16, 743–769. [Google Scholar]
  22. Altun, E.; Yousof, H.M.; Hamedani, G. A New Log-location Regression Model with Influence Diagnostics and Residual Analysis. Facta Univ. Ser. Math. Informat. 2018, 33, 417–449. [Google Scholar]
  23. Aldahlan, M.; Afify, A.Z. The odd exponentiated half-logistic Burr XII distribution. Pak. J. Stat. Oper. Res. 2018, 14, 305–317. [Google Scholar] [CrossRef] [Green Version]
  24. Cordeiro, G.M.; Afify, A.Z.; Ortega, E.M.; Suzuki, A.K.; Mead, M.E. The odd Lomax generator of distributions: Properties, estimation and applications. J. Comput. Appl. Math. 2019, 347, 222–237. [Google Scholar] [CrossRef]
  25. Afify, A.; Alizadeh, M. The Odd Dagum Family of Distributions: Properties and Applications. J. Appl. Probab. Stat. 2020, 15, 45–72. [Google Scholar]
  26. Alizadeh, M.; Afify, A.Z.; Eliwa, M.; Ali, S. The odd log-logistic Lindley-G family of distributions: Properties, Bayesian and non-Bayesian estimation with applications. Comput. Stat. 2020, 35, 281–308. [Google Scholar] [CrossRef]
  27. Alzaatreh, A.; Lee, C.; Famoye, F. A new method for generating families of continuous distributions. Metron 2013, 71, 63–79. [Google Scholar] [CrossRef] [Green Version]
  28. Lawless, J.F. Statistical Models and Methods for Lifetime Data; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 362. [Google Scholar]
  29. Carrasco, J.M.; Ortega, E.M.; Paula, G.A. Log-modified Weibull regression models with censored data: Sensitivity and residual analysis. Comput. Stat. Data Anal. 2008, 52, 4021–4039. [Google Scholar] [CrossRef]
  30. Silva, G.O.; Ortega, E.M.; Cancho, V.G. Log-Weibull extended regression model: Estimation, sensitivity and residual analysis. Stat. Methodol. 2010, 7, 614–631. [Google Scholar] [CrossRef]
  31. Hashimoto, E.M.; Ortega, E.M.; Cancho, V.G.; Cordeiro, G.M. The log-exponentiated Weibull regression model for interval-censored data. Comput. Stat. Data Anal. 2010, 54, 1017–1035. [Google Scholar] [CrossRef]
  32. Hashimoto, E.M.; Ortega, E.M.; Cordeiro, G.M.; Barreto, M.L. The Log-Burr XII regression model for grouped survival data. J. Biopharm. Stat. 2012, 22, 141–159. [Google Scholar] [CrossRef]
  33. Ortega, E.M.; Cordeiro, G.M.; Kattan, M.W. The log-beta Weibull regression model with application to predict recurrence of prostate cancer. Stat. Pap. 2013, 54, 113–132. [Google Scholar] [CrossRef]
  34. Mahmoud, M.R.; EL-Sheikh, A.; Morad, N.A.; Ahmad, M.A. Log-beta log-logistic regression model. Int. J. Sci. Basic Appl. Res. (IJSBAR) 2015, 22, 389–405. [Google Scholar]
  35. Alamoudi, H.H.; Mousa, S.A.; Baharith, L.A. Estimation and application in log-Fréchet regression model using censored data. Int. J. Adv. Stat. Probab. 2017, 5, 23–31. [Google Scholar] [CrossRef] [Green Version]
  36. Al-Amoudi, H.H.; Mousa, S.A.; Baharith, L.A. Log-Exponentiated Frechet regression model with censored data. Int. J. Adv. Appl. Sci. 2016, 3, 1–9. [Google Scholar]
  37. Hashimoto, E.M.; Ortega, E.M.; Cordeiro, G.M.; Hamedani, G. The Log-gamma-logistic Regression Model: Estimation, Sensibility and Residual Analysis. J. Stat. Theory Appl. 2017, 16, 547–564. [Google Scholar] [CrossRef] [Green Version]
  38. Cruz, J.N.d.; Ortega, E.M.; Cordeiro, G.M. The log-odd log-logistic Weibull regression model: Modelling, estimation, influence diagnostics and residual analysis. J. Stat. Comput. Simul. 2016, 86, 1516–1538. [Google Scholar] [CrossRef]
  39. Pescim, R.R.; Ortega, E.M.; Cordeiro, G.M.; Alizadeh, M. A new log-location regression model: Estimation, influence diagnostics and residual analysis. J. Appl. Stat. 2017, 44, 233–252. [Google Scholar] [CrossRef]
  40. Ortega, E.M.; Cordeiro, G.M.; Hashimoto, E.M.; Cooray, K. A log-linear regression model for the odd Weibull distribution with censored data. J. Appl. Stat. 2014, 41, 1859–1880. [Google Scholar] [CrossRef]
  41. Al-Kadim, K.A.; Boshi, M.A. Exponential Pareto Distribution. Math. Theory Model. 2013, 3, 135–146. [Google Scholar]
  42. Sahinler, S.; Topuz, D. Bootstrap and jackknife resampling algorithms for estimation of regression parameters. J. Appl. Quant. Methods 2007, 2, 188–199. [Google Scholar]
  43. Algamal, Z.Y.; Rasheed, K.B. Re-sampling in Linear Regression Model Using Jackknife and Bootstrap. Iraqi J. Stat. Sci. 2010, 18, 59–73. [Google Scholar]
  44. Abdi, H.; WIlliams, L.J. Jackknife. Encyclopedia of Research Design 2; Salkind, N.J., Ed.; Sage: Thousand Oaks, CA, USA, 2010. [Google Scholar]
  45. Cook, R.D. Detection of influential observation in linear regression. Technometrics 1977, 19, 15–18. [Google Scholar]
  46. Barlow, W.E.; Prentice, R.L. Residuals for relative risk regression. Biometrika 1988, 75, 65–74. [Google Scholar] [CrossRef]
  47. Smith, R.L.; Naylor, J. A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution. J. R. Stat. Soc. Ser. C 1987, 36, 358–369. [Google Scholar] [CrossRef]
  48. Afify, A.Z.; Cordeiro, G.M.; Ortega, E.M.; Yousof, H.M.; Butt, N.S. The four-parameter Burr XII distribution: Properties, regression model, and applications. Commun. Stat. Theory Methods 2018, 47, 2605–2624. [Google Scholar] [CrossRef]
  49. Afify, A.Z.; Yousof, H.M.; Cordeiro, G.M.; Ortega, E.M.; Nofal, Z.M. The Weibull Fréchet distribution and its applications. J. Appl. Stat. 2016, 43, 2608–2626. [Google Scholar] [CrossRef]
  50. Tahir, M.H.; Cordeiro, G.M.; Mansoor, M.; Zubair, M. The Weibull-Lomax distribution: Properties and applications. Hacet. J. Math. Stat. 2015, 44, 461–480. [Google Scholar] [CrossRef]
  51. Tahir, M.H.; Cordeiro, G.M.; Alizadeh, M.; Mansoor, M.; Zubair, M.; Hamedani, G.G. The odd generalized exponential family of distributions with applications. J. Stat. Distrib. Appl. 2015, 2, 1. [Google Scholar] [CrossRef] [Green Version]
  52. Weisberg, S. Applied Linear Regression; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 528. [Google Scholar]
  53. Tahir, M.; Cordeiro, G.M.; Mansoor, M. The Kumaraswamy Pareto IV Distribution. Austrian J. Stat. 2015. Available online: https://www.academia.edu/12965162/The_Kumaraswamy_Pareto_IV_distribution (accessed on 25 April 2020).
  54. Gupta, R.C.; Gupta, P.L.; Gupta, R.D. Modeling failure time data by Lehman alternatives. Commun. Stat. Theory Methods 1998, 27, 887–904. [Google Scholar] [CrossRef]
  55. Kalbfleisch, J.D.; Prentice, R.L. The Statistical Analysis of Failure Time Data; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 360. [Google Scholar]
Figure 1. Density function plots of the OEPIV distribution.
Figure 1. Density function plots of the OEPIV distribution.
Entropy 22 00497 g001
Figure 2. Hazard function plots of the OEPIV distribution.
Figure 2. Hazard function plots of the OEPIV distribution.
Entropy 22 00497 g002
Figure 3. Normal probability plots (NPP) for r M i for different sample sizes (n) and censoring levels (c). (a) n = 30; c = 0.1 (b) n = 30; c = 0.3 (c) n = 30; c = 0.5 (d) n = 50; c = 0.1 (e) n = 50; c = 0.3 (f) n = 50; c = 0.5 (g) n = 100; c = 0.1 (h) n = 100; c = 0.3 (i) n = 100; c = 0.5.
Figure 3. Normal probability plots (NPP) for r M i for different sample sizes (n) and censoring levels (c). (a) n = 30; c = 0.1 (b) n = 30; c = 0.3 (c) n = 30; c = 0.5 (d) n = 50; c = 0.1 (e) n = 50; c = 0.3 (f) n = 50; c = 0.5 (g) n = 100; c = 0.1 (h) n = 100; c = 0.3 (i) n = 100; c = 0.5.
Entropy 22 00497 g003
Figure 4. NPP for r D i for different sample sizes (n) and censoring levels (c). (a) n = 30; c = 0.1 (b) n = 30; c = 0.3 (c) n = 30; c = 0.5 (d) n = 50; c = 0.1 (e) n = 50; c = 0.3 (f) n = 50; c = 0.5 (g) n = 100; c = 0.1 (h) n = 100; c = 0.3 (i) n = 100; c = 0.5.
Figure 4. NPP for r D i for different sample sizes (n) and censoring levels (c). (a) n = 30; c = 0.1 (b) n = 30; c = 0.3 (c) n = 30; c = 0.5 (d) n = 50; c = 0.1 (e) n = 50; c = 0.3 (f) n = 50; c = 0.5 (g) n = 100; c = 0.1 (h) n = 100; c = 0.3 (i) n = 100; c = 0.5.
Entropy 22 00497 g004aEntropy 22 00497 g004b
Figure 5. QQ-plot of the OEPIV model and the estimated PDFs of the OEPIV and other competitive distributions for the glass fibers data.
Figure 5. QQ-plot of the OEPIV model and the estimated PDFs of the OEPIV and other competitive distributions for the glass fibers data.
Entropy 22 00497 g005
Figure 6. QQ-plot of the OEPIV distribution and the estimated PDFs of the OEPIV and other competitive distributions for the skin folds data.
Figure 6. QQ-plot of the OEPIV distribution and the estimated PDFs of the OEPIV and other competitive distributions for the skin folds data.
Entropy 22 00497 g006
Figure 7. Estimated SF based on the OEPIV distribution and the Kaplan–Meier (KM) model for the heart transplant data.
Figure 7. Estimated SF based on the OEPIV distribution and the Kaplan–Meier (KM) model for the heart transplant data.
Entropy 22 00497 g007
Figure 8. Fitted SF from the LOEPIV regression model (a) for x 1 = age, (b) for x 2 = surgery, (c) for x 3 = transplant.
Figure 8. Fitted SF from the LOEPIV regression model (a) for x 1 = age, (b) for x 2 = surgery, (c) for x 3 = transplant.
Entropy 22 00497 g008aEntropy 22 00497 g008b
Figure 9. The index plot of (a) G D i ( θ ) and (b) L D i ( θ ) for the LOEPIV regression model.
Figure 9. The index plot of (a) G D i ( θ ) and (b) L D i ( θ ) for the LOEPIV regression model.
Entropy 22 00497 g009
Figure 10. The index plot of (a) the deviance residual and (b) the NPP for the deviance residual with envelopes.
Figure 10. The index plot of (a) the deviance residual and (b) the NPP for the deviance residual with envelopes.
Entropy 22 00497 g010
Table 1. Mean, variance, skewness, and kurtosis of OEPIV model selected parameter values.
Table 1. Mean, variance, skewness, and kurtosis of OEPIV model selected parameter values.
λ a θ α MeanVarianceSkewnessKurtosis
22.50.51.51.128123.56770.52810.0424
23.50.51.54.3493192.02610.12230.0665
24.50.51.524.8511488.301113.39347.6745
22.52.51.55.6405589.19170.52810.0424
22.53.51.57.89671154.81580.52810.0424
22.50.51.51.128123.56770.52810.0424
0.52.51.51.51.11535.36310.72410.4752
0.52.51.52.50.94869.60070.02980.0131
0.52.51.54.50.856713.20120.03020.0084
1.53.50.51.53.031747.00370.58000.3771
2.53.50.51.57.7388568.55490.11480.0424
3.53.50.51.542.80191795.25424.73372.5407
Table 2. Parameter estimates, along with their MSE, and bias for two different cases with different sample sizes.
Table 2. Parameter estimates, along with their MSE, and bias for two different cases with different sample sizes.
Set ISet II
EstimateMSEBiasEstimateMSEBias
n = 30 λ 0.764634.31490.46460.44441.14100.2444
a0.18060.1159−0.21940.03470.0086−0.0653
θ 1.07731009.59160.57730.65950.05700.0595
α 0.07780.0364−0.12220.04400.0374−0.1060
n = 50 λ 0.57741.18370.27740.35260.45630.1526
a0.23330.0893−0.16670.04950.0074−0.0505
θ 0.68250.76050.18250.63660.02350.0366
α 0.10080.0228−0.09920.06310.0161−0.0869
n = 100 λ 0.43240.36720.13240.26280.09090.0628
a0.30720.0540−0.09280.06830.0051−0.0317
θ 0.60420.29700.10420.61470.01320.0147
α 0.14300.0128−0.05700.09530.0105−0.0547
n = 200 λ 0.35350.09820.05350.22430.02210.0243
a0.35320.0256−0.04680.08300.0028−0.0170
θ 0.54630.10180.04630.60540.00640.0054
α 0.17180.0057−0.02820.12110.0058−0.0289
n = 500 λ 0.31560.01400.01560.20690.00380.0069
a0.38470.0082−0.01530.09420.0010−0.0058
θ 0.51490.02110.01490.60150.00200.0015
α 0.19110.0017−0.00890.14030.0020−0.0097
Table 3. Maximum likelihood (ML) estimates, SE in (), ^ , and Akaike information criterion (AIC) and corrected Akaike Information Criterion (CAIC) statistics for the glass fibers data.
Table 3. Maximum likelihood (ML) estimates, SE in (), ^ , and Akaike information criterion (AIC) and corrected Akaike Information Criterion (CAIC) statistics for the glass fibers data.
Distribution ML Estimate and SE in () ^ AICCAIC
OEPIV λ = 0.0401a = 0.2862 θ = 1.1455 α = 2.154913.950735.90236.591
(0.0810)(0.1368)(0.4016)((1.4014)
WBXIIa = 0.0026b = 1.8888 α = 1.6077 β = 2.740914.303536.60737.297
(0.0032)(0.7680)(0.3760)(1.0100)
WLa = 581.4052b = 5.1752 α = 17.5336 β = 110.710414.93437.86838.558
(28.2900)(0.2010)(102.1130)(659.3920)
WFra = 1.4762b = 16.8561 α = 0.3865 β = 0.243615.500539.00139.691
(4.7820)(20.4850)(0.7990)(0.2850)
Pareto IVa = 0.1626 θ = 2.3513 α = 10.2153-15.478136.95637.363
(0.0187)(0.4477)(9.9080)
OE-W λ = 0.0721 β = 1.9603--16.461336.92237.123
(0.0162)(0.0940)
OE-N λ = 0.0121 σ = 0.7385--17.597939.19539.396
(0.0043)(0.0364)
Gamma β = 17.4411 θ = 11.5748--23.951551.903152.1031
(3.0783)(2.0725)
Table 4. ML estimates, SE in (), ^ , and AIC and CAIC statistics for skin folds data.
Table 4. ML estimates, SE in (), ^ , and AIC and CAIC statistics for skin folds data.
Distribution ML Estimate and SE in () ^ AICCAIC
OEPIV λ = 0.348a = 0.024 θ = 29.579 α = 0.036-944.26871896.5371896.740
(0.090)(0.006)(0.678)(0.010)
KwPIVa = 2.928b = 21.746 α = 0.023 γ = 0.060 θ = 23.430945.2001900.4011900.707
(1.188)(33.283)(0.019)(0.033)(4.633)
GPIVc = 0.520 α = 81.355 σ = 0.098--950.0071906.0141906.135
(0.198)(8.071)(0.035)
PIV α = 0.463 γ = 0.182 θ = 46.812--956.3331918.6661918.787
(0.183)(0.041)(5.595)
EPc = 28 α = 2.155 θ = 2.737--951.8781907.7571907.878
(0.154)(0.298)
Weibull α = 2.2635 θ = 78.2664---975.24271954.4851954.545
(0.1159)(2.5832)
Table 5. The ML estimates, SE in (), p-values in [], AIC, CAIC, and ayesian Information Criterion (BIC) statistics of the log odds exponential-Pareto IV (LOEPIV), log-exponentiated Fréchet (LEF), log-odd log-logistic Weibull (LOLLW), log-Fréchet (LF), log-Weibull (LW), and log exponential-Pareto (LEP) regression models for the heart transplant data.
Table 5. The ML estimates, SE in (), p-values in [], AIC, CAIC, and ayesian Information Criterion (BIC) statistics of the log odds exponential-Pareto IV (LOEPIV), log-exponentiated Fréchet (LEF), log-odd log-logistic Weibull (LOLLW), log-Fréchet (LF), log-Weibull (LW), and log exponential-Pareto (LEP) regression models for the heart transplant data.
Models λ α σ β 0 β 1 β 2 β 3 AICCAICBIC
1.37540.12570.55693.5186−0.05391.74942.5405343.42344.61361.87
LOEPIV(1.9087)(0.0974)(0.1689)(1.0747)(0.0192)(0.5524)(0.3621)
[0.00106][0.00507][0.00154][<0.001]
-6.27463.58828.6744−0.06240.89102.7241346.72347.59362.53
LEF-(7.5737)(1.4492)(3.5491)(0.0206)(0.5059)(0.3780)
---[0.016][0.002][0.078][<0.001]
-4.628316.203258.74485−0.076921.405502.59196347.59348.47363.40
LOLLW-(3.5307)(4.6851)(1.7603)(0.0199)(0.5745)(0.3884)
---[<0.001][<-0.001][0.016][<0.001]
--1.74574.2129−0.04310.69022.6572349.15349.77362.33
LF--(0.1484)(0.9153)(0.0189)(0.5034)(0.3782)
---[<0.001][0.023][0.170][<0.001]
--1.46587.9742−0.09241.21432.5375353.42354.03366.59
LW--(0.13148)(0.93397)(0.02061)(0.64700)(0.37336)
---[<0.001][<0.001][0.063][<0.001]
0.1439-1.46555.1321−0.09231.2141272.537713355.42356.29371.22
LEP(1.1088)-(0.1314)(11.3276)(0.0206)(0.6469)(0.3733)
---[0.6505][<0.001][0.061][<0.001]
Table 6. LR statistic for heart transplant.
Table 6. LR statistic for heart transplant.
Heart TransplantHypothesesStatistic wp-Values
LOEPIV vs. LEP H 0 : α = 1 versus H 1 : H 0 is not true13.99220.00018
Table 7. The Jackknife parameter estimates of the LOEPIV regression model.
Table 7. The Jackknife parameter estimates of the LOEPIV regression model.
ParameterEstimateSE95% Confidence Intervals
λ 1.40431.5262(0.0000, 4.3957)
α 0.08380.0988(0.0000, 0.2775)
σ 0.65860.1885(0.2891, 1.0281)
β 0 3.86161.1072(1.6915, 6.031)
β 1 -0.05360.0196(−0.0921, −0.0152)
β 2 1.73040.5262(0.6989, 2.7619)
β 3 2.55630.3881(1.7955, 3.3172)

Share and Cite

MDPI and ACS Style

Baharith, L.A.; AL-Beladi, K.M.; Klakattawi, H.S. The Odds Exponential-Pareto IV Distribution: Regression Model and Application. Entropy 2020, 22, 497. https://doi.org/10.3390/e22050497

AMA Style

Baharith LA, AL-Beladi KM, Klakattawi HS. The Odds Exponential-Pareto IV Distribution: Regression Model and Application. Entropy. 2020; 22(5):497. https://doi.org/10.3390/e22050497

Chicago/Turabian Style

Baharith, Lamya A., Kholod M. AL-Beladi, and Hadeel S. Klakattawi. 2020. "The Odds Exponential-Pareto IV Distribution: Regression Model and Application" Entropy 22, no. 5: 497. https://doi.org/10.3390/e22050497

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop