Next Article in Journal
Research on a Non-Intrusive Load Recognition Algorithm Based on High-Frequency Signal Decomposition with Improved VI Trajectory and Background Color Coding
Previous Article in Journal
A Novel Deterministic Probabilistic Forecasting Framework for Gold Price with a New Pandemic Index Based on Quantile Regression Deep Learning and Multi-Objective Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Extension of the Akash Distribution: Properties, Inference and Application

by
Yolanda M. Gómez
1,
Luis Firinguetti-Limone
1,*,
Diego I. Gallardo
1 and
Héctor W. Gómez
2
1
Departamento de Estadística, Facultad de Ciencias, Universidad del Bío-Bío, Concepción 4081112, Chile
2
Departamento de Estadística y Ciencias de Datos, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(1), 31; https://doi.org/10.3390/math12010031
Submission received: 16 October 2023 / Revised: 9 November 2023 / Accepted: 10 November 2023 / Published: 22 December 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this article we introduce an extension of the Akash distribution. We use the slash methodology to make the kurtosis of the Akash distribution more flexible. We study the general probability density function of this new model, some properties, moments, skewness and kurtosis coefficients. Statistical inference is performed using the methods of moments and maximum likelihood via the EM algorithm. A simulation study is carried out to observe the behavior of the maximum likelihood estimator. An application to a real data set with high kurtosis is considered, where it is shown that the new distribution fits better than other extensions of the Akash distribution.

1. Introduction

The slash distribution is an extended version of the normal distribution. It is characterized by the ratio of two separate random variables: one following a normal distribution and the other following a power of the uniform distribution. Therefore, we define a slash distribution for variable S as:
S = U 1 / U 2 ,
where U 1 N ( 0 , 1 ) , U 2 B e t a ( q , 1 ) , U 1 is independent of U 2 and q > 0 ; its representation can be seen in Johnson et al. [1]. The distribution in question exhibits heavier tails compared to the normal distribution, indicating a higher level of kurtosis. The characteristics of this particular distribution are explored in detail in the works of Rogers and Tukey [2] and Mosteller and Tukey [3]. Kafadar [4] delves into the topic of maximum likelihood estimation for the location and scale parameters. Wang and Genton [5] present a multivariate version of the slash distribution as well as a multivariate skew version. The slash distribution is further extended by Gomez and Venegas [6] through the incorporation of the multivariate elliptic distributions. This methodology to increase the weight of the queues has also been used in distributions with positive support. To name a few, we mention the works of Olmos et al. [7] in the half-normal and Rivera et al. [8] in the Rayleigh model, among others. Based on the work of Rivera et al. [8], the scale mixture of Rayleigh (SMR) model is proposed. We say that Y S M R ( θ , q ) with θ > 0 and q > 0 if the probability density function (pdf) of Y is
f Y ( y ; θ , q ) = q y 2 θ y 2 2 θ + 1 q 2 + 1 , y > 0 .
Also, a necessary distribution in the development of this paper is the gamma distribution, whose pdf is given by
g ( t ; a , b ) = b a Γ ( a ) t a 1 e b t ,
where a , b , t > 0 . Its corresponding cumulative distribution function (cdf) is denoted by:
G ( z ; a , b ) = 0 z g ( t ; a , b ) d t
Shanker [9] introduced the Akash distribution and applied it to real lifetime data sets from medical science and engineering. Thus, we say that a random variable (r.v.) Y has an Akash model (AK) with shape parameter θ if its pdf is
f Y ( y ; θ ) = θ 3 θ 2 + 2 ( 1 + y 2 ) exp ( θ y ) ,
where θ , y > 0 and we denote it by Y A K ( θ ) . The parameter θ is a shape parameter, and if we add a scale parameter the pdf is given by
f Y ( y ; σ , θ ) = θ 3 σ ( θ 2 + 2 ) ( 1 + y 2 / σ 2 ) exp ( θ y / σ ) ,
where σ > 0 is a scale parameter and θ > 0 is a shape parameter. We denote it by Y A K ( σ , θ ) .
Extensions of the AK distribution are carried out by Shanker and Shukla [10,11], among others. Both extensions consider adding a parameter and we will compare them with the new distribution. The two-parameter Akash distribution (TPAD) introduced by Shanker and Shukla [10] has the following pdf:
f Y ( y ; θ , α ) = θ 3 α θ 2 + 2 ( α + y 2 ) exp ( θ y ) ,
where θ , α , y > 0 and we denote it by Y T P A D ( θ , α ) .
The power Akash distribution (PAD), introduced by Shanker and Shukla [11], has the following pdf:
f Y ( y ; θ , α ) = α θ 3 θ 2 + 2 ( 1 + α y 2 α ) y α 1 exp ( θ y α ) ,
where θ , α , y > 0 and we denote it by Y P A D ( θ , α ) .
The main motivation of this work is to introduce an extended version of the AK distribution given in Equation (6), making use of the slash methodology, in order to obtain a new distribution with greater kurtosis to be able to accommodate outliers. Pronounced fluctuations in the data sets encountered in such diverse disciplines as economic and actuarial sciences, environmental and earth sciences, among others, are very frequent. Thus, heavy-tailed models are necessary to perform better modelling in the presence of extreme values. For example, the normal distribution does not perform well in modelling data sets with extreme observations. We must therefore resort to heavy-tailed distributions. For example, in problems in which the involved r.v. has a high kurtosis, the probability that a rare event occurs can be highly underestimated if a model without heavy tails is used, which is solved by using a model with these characteristics. In the economy, practical examples of rare events are pandemics, and the 2008–2009 financial crisis, to name a few. In geology, a rare event might be a mega earthquake or a sudden eruption of a volcano that has been dormant for centuries.
The paper is structured as follows: in Section 2 we deliver our proposal and present its properties. In Section 3, we perform inference using the method of moments and maximum likelihood via the EM algorithm and a simulation study is also carried out. In Section 4, we apply the distribution to a real data set and compare it with other extensions of the AK distribution. Finally, in Section 5, we provide the main conclusions.

2. New Density and Its Properties

In this section, we introduce the representation, density and properties of the new distribution.

2.1. Representation

The representation of this new distribution is given by
X = Y Z ,
where Y A K ( θ ) , Z B e t a ( q , 1 ) , Y and Z are independent r.v.’s with θ , q > 0 . We name the distribution of X slash AK (SAK) and denote it by X S A K ( θ , q ) .

2.2. Density Function

The following Proposition shows the pdf of the SAK distribution is generated using the representation given in Equation (9).
Proposition 1. 
Let X S A K ( θ , q ) . Then, the  pdf of X is given by
f X ( x ; θ , q ) = q 2 Γ ( q ) x ( q + 1 ) ( θ 2 + 2 ) θ q θ 2 G ( θ x ; q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) G ( θ x ; q + 3 , 1 ) ,
where θ , q , x > 0 and G is the cdf of the gamma distribution given in Equation (4).
Proof. 
Using the stochastic representation given in Equation (9) and the Jacobian method, we obtain
X = Y Z 1 V = Z Y = X V Z = V J = Y X Y V Z X Z V = v x 0 1 = v .
f X , V ( x , v ) = | J | f Y , Z ( x v , v ) , f X , V ( x , v ) = v f Y ( x v ) f Z ( v ) , x > 0 , 0 < v < 1 , f X , V ( x , v ) = θ 3 q θ 2 + 2 v q ( 1 + x 2 v 2 ) exp ( θ x v ) , x > 0 , 0 < v < 1 .
Then, marginalizing in relation to V we obtain the pdf of X, obtaining
f X ( x ; θ , q ) = θ 3 q θ 2 + 2 0 1 v q ( 1 + x 2 v 2 ) exp ( θ x v ) d v .
With the change in variable t = θ x v and using Equation (4), the result is obtained.    □
Observation 1. 
As the parameter q decreases, Table 1 and Figure 1 illustrate that the weight of the right tail increases.
In particular, Table 1 compares P ( X > x ) in the AK and SAK distributions for different values of x.

2.3. Properties

The following Proposition gives the cdf in closed form. It depends on G, which is the cdf of the gamma distribution given in Equation (4).
Proposition 2. 
Let X S A K ( θ , q ) . Then, the cdf of X is given by
F X ( x ; θ , q ) = ( θ 2 + 2 G ( θ x ; 3 , 1 ) ) ( θ x ) q θ 3 q Γ ( q ) G ( θ x ; q , 1 ) Γ ( q + 3 ) G ( θ x ; q + 3 , 1 ) ( θ 2 + 2 ) ( θ x ) q ,
where θ , q , x > 0 and G is given in Equation (4).
Proof. 
The result is obtained from a direct application of the definition of a cdf.      □

2.3.1. Reliability Analysis

The reliability function r ( t ) = 1 F ( t ) and the hazard function h ( t ) = f ( t ) r ( t ) of the SAK distribution are provided in Corollary 1.
Corollary 1. 
The reliability and hazard functions of the S A K ( θ , q ) model are given by
1. 
r ( t ) = 1 ( θ 2 + 2 G ( θ t ; 3 , 1 ) ) ( θ t ) q θ 3 q Γ ( q ) G ( θ t ; q , 1 ) Γ ( q + 3 ) G ( θ t ; q + 3 , 1 ) ( θ 2 + 2 ) ( θ t ) q ,
2. 
h ( t ) = q 2 Γ ( q ) θ 2 G ( θ t ; q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) G ( θ t ; q + 3 , 1 ) t 2 ( 1 G ( θ t ; 3 , 1 ) ) ( θ t ) q + θ 3 q Γ ( q ) G ( θ t ; q , 1 ) Γ ( q + 3 ) G ( θ t ; q + 3 , 1 ) ,
where θ , q > 0 .
In Figure 2, we present the hazard function of the SAK model for several values of θ and q.

2.3.2. Right Tail of the SAK Distribution

According to Rolski et al. [12], a distribution has a heavy right tail if
lim sup t log r ( t ) t = 0 .
The following result shows that the SAK distribution is heavy-tailed.
Proposition 3. 
The r.v. T S A K ( θ , q ) is heavy-tailed.
Proof. 
Applying L’Hospital’s rule twice we have,
lim sup t log r ( t ) t = lim sup t f T ( t ; θ , q ) 1 F T ( t ; θ , q ) = lim sup t q + 1 t θ 3 g ( θ t ; q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) θ g ( θ t ; q + 3 , 1 ) θ 2 G ( θ t , q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) G ( θ t , q + 3 , 1 ) = 0 .
   □
The following Proposition shows that the SAK distribution can be represented as a scale mixture between the AK and Beta distributions.
Proposition 4. 
If X | Z = z A K ( z 1 , θ ) and Z B e t a ( q , 1 ) then X S A K ( θ , q ) .
Proof. 
The joint density of X and Z is given by
f X , Z ( x , z ) = q z q θ 3 ( θ 2 + 2 ) ( 1 + z 2 y 2 ) exp ( θ z y ) , x > 0 , z ( 0 , 1 ) .
The marginal distribution of X is obtained as
f X ( x ) = 0 1 f X , Z ( x , z ) d z = q θ 3 ( θ 2 + 2 ) 0 1 z q exp ( θ z x ) d x + y 2 0 1 z q + 2 exp ( θ z x ) d z = q θ 3 ( θ 2 + 2 ) [ Γ ( q + 1 ) ( θ x ) q + 1 0 θ x 1 Γ ( q + 1 ) w ( q + 1 ) 1 exp ( w ) d w + x 2 Γ ( q + 3 ) ( θ x ) q + 3 0 θ x w ( q + 3 ) 1 exp ( w ) d w ] = q Γ ( q + 1 ) x ( q + 1 ) ( θ 2 + 2 ) θ q θ 2 G ( θ x ; q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) G ( θ x ; q + 3 , 1 )
   □
The following proposition illustrates that the AK model is a particular case of the SAK distribution for q .
Proposition 5. 
Let X S A K ( θ , q ) and Y A K ( θ ) . If  q , then X converges in law to Y.
Proof. 
Using its representation X = Y Z we analyze the convergence of this quotient, where Y A K ( θ ) and Z Beta ( q , 1 ) . In the Beta( q , 1 ) distribution we have, E [ Z ] = q 1 + q and V a r [ Z ] = q ( q + 2 ) ( q + 1 ) 2 . Then, applying Chebychev’s inequality for Z, we have ϵ > 0
P | Z E [ Z ] | > ϵ V a r ( Z ) ϵ 2 = q ( q + 2 ) ( q + 1 ) 2 ϵ 2 .
If q then the right hand side of Equation (12) tends to zero, i.e.,  W = Z E [ Z ] converges in probability to 0. Also, E [ Z ] = q 1 + q 1 , q , then we have,
Z = W + E [ Z ] P 1 , q .
As Y A K ( θ ) , with the application of the Slutsky’s Lemma to X = Y Z , it is obtained
X L Y A K ( θ ) , q .
Thus, when q is large enough, X converges in law to a A K ( θ ) distribution.    □

2.3.3. Moments

In this subsection we obtain the moments of the SAK distribution. To achieve this aim, we first introduce the following lemma.
Lemma 1. 
Let Y A K ( σ , θ ) with σ , θ > 0 . For  r N , E [ Y r ] exists if and only if q > r and in this case
E [ Y r ] = σ r r ! θ 2 + ( r + 2 ) ! θ r ( θ 2 + 2 ) .
Proof. 
The r-th moment of the r.v. V A K ( θ ) is given by Shanker [9], which is E ( V r ) = r ! θ 2 + ( r + 2 ) ! θ r ( θ 2 + 2 ) . Then calculating the r-th moment of Y = σ V , where σ is a parameter of scale, the result is obtained.    □
Proposition 6 presents the moments of the SAK distribution.
Proposition 6. 
Let X S A K ( θ , q ) with θ, q > 0 . For  r N , E [ X r ] is given by
μ r = E [ X r ] = q r ! θ 2 + ( r + 2 ) ! θ r ( θ 2 + 2 ) ( q r ) , p r o v i d e d   t h a t   q > r .
Proof. 
Using the representation given in the Proposition 4 and by Lemma 1, we obtain
μ r = E [ X r ] = E E X r | Z = E Z r r ! θ 2 + ( r + 2 ) ! θ r ( θ 2 + 2 ) = r ! θ 2 + ( r + 2 ) ! θ r ( θ 2 + 2 ) E Z r = r ! θ 2 + ( r + 2 ) ! θ r ( θ 2 + 2 ) 0 1 q z q r 1 d z .
Solving the integral gives the result.    □
From Proposition 6, we can obtain expressions for the non-central moments, μ r = E [ X r ] , and the variance of X S A K ( θ , q ) , V a r ( X ) , which are presented in Corollary 2.
Corollary 2. 
Let X S A K ( θ , q ) with θ and q > 0 . The noncentral moments and the variance of X, V a r ( X ) , are obtained
μ 1 = q κ 6 θ κ 2 ( q 1 ) , q > 1 , μ 2 = 2 q κ 12 θ 2 κ 2 ( q 2 ) , q > 2 , μ 3 = 6 q κ 20 θ 3 κ 2 ( q 3 ) , q > 3 , μ 4 = 24 q κ 30 θ 4 κ 2 ( q 4 ) , q > 4 ,
V a r ( X ) = q 2 κ 12 κ 2 ( q 1 ) 2 q κ 6 2 ( q 2 ) θ 2 κ 2 2 ( q 1 ) 2 ( q 2 ) , q > 2 .
where κ i = θ 2 + i .
Remark 1. 
Note that when q , V a r ( X ) θ 4 + 16 θ 2 + 12 θ 2 ( θ 2 + 2 ) 2 , which is the variance of an A K ( θ ) distribution.
The next Corollary presents the skewness coefficient, β 1 , of a S A K ( θ , q ) model.
Corollary 3. 
Let X S A K ( θ , q ) , with  θ > 0 and q > 3 . Then the skewness coefficient of X is:
β 1 = 2 q 2 3 κ 20 κ 2 2 ( q 1 ) 3 ( q 2 ) 3 q κ 2 κ 6 κ 12 ( q 1 ) 2 ( q 3 ) + q 2 κ 6 3 ( q 2 ) ( q 3 ) q ( q 3 ) 2 κ 2 κ 12 ( q 1 ) 2 q ( q 2 ) κ 6 2 3 / 2
Proof. 
Recall that
β 1 = E [ ( X E ( X ) ) 3 ] ( V a r ( X ) ) 3 / 2 = μ 3 3 μ 1 μ 2 + 2 μ 1 3 ( μ 2 μ 1 2 ) 3 / 2 ,
where μ 1 , μ 2 and μ 3 were given in Corollary 2.    □
Also, the kurtosis coefficient, β 2 , of a S A K ( θ , q ) distribution is given in the following Corollary.
Corollary 4. 
Let X S A K ( θ , q ) with θ > 0 and q > 4 . The kurtosis coefficient of X is
β 2 = 3 ( q 2 ) 8 κ 2 3 κ 30 q 1 8 q κ 6 κ 20 κ 2 2 q 2 + 4 q 2 κ 6 2 κ 12 κ 2 q 3 q 3 κ 6 4 q 4 q ( q 3 ) ( q 4 ) 2 κ 12 κ 2 ( q 1 ) 2 q κ 6 2 ( q 2 ) 2 .
where q 1 = ( q 1 ) 4 ( q 2 ) ( q 3 ) , q 2 = ( q 1 ) 3 ( q 2 ) ( q 4 ) , q 3 = ( q 1 ) 2 ( q 3 ) ( q 4 ) and q 4 = ( q 2 ) ( q 3 ) ( q 4 ) .
Proof. 
Recall that
β 2 = E [ ( X E ( X ) ) 4 ] ( V a r ( X ) ) 2 = μ 4 4 μ 1 μ 3 + 6 μ 1 2 μ 2 3 μ 1 4 ( μ 2 μ 1 2 ) 2 ,
where μ 1 , μ 2 , μ 3 , and  μ 4 were given in Corollary 2.    □
Remark 2. 
Note that the skewness coefficient of the S A K ( θ , q ) model can be written as
β 1 = 2 ( q 1 ) 3 ( q 2 ) 3 / 2 q ( q 1 ) 3 ( q 3 ) × 3 κ 2 2 κ 20 3 κ 2 κ 6 κ 12 q ( q 3 ) ( q 1 ) ( q 2 ) + κ 6 3 q 2 ( q 3 ) ( q 1 ) 3 2 κ 2 κ 12 κ 6 q ( q 2 ) ( q 1 ) 2 3 / 2 = 2 ( q 2 ) 3 q ( q 3 ) 2 1 / 2 × 3 κ 2 2 κ 20 3 κ 2 κ 6 κ 12 q ( q 3 ) ( q 1 ) ( q 2 ) + κ 6 3 q 2 ( q 3 ) ( q 1 ) 3 2 κ 2 κ 12 κ 6 q ( q 2 ) ( q 1 ) 2 3 / 2 .
From here, it is straighforward to check that
lim q β 1 = 2 [ 3 κ 2 2 κ 20 3 κ 2 κ 6 κ 12 + κ 6 3 ] [ 2 κ 2 κ 12 κ 6 2 ] 2 = 2 ( θ 6 + 30 θ 4 + 36 θ 2 + 24 ) ( θ 4 + 16 θ 2 + 12 ) 3 / 2 .
On the other hand, the kurtosis coefficient of the S A K ( θ , q ) model can be written as
β 2 = 3 ( q 1 ) 4 ( q 2 ) 2 ( q 3 ) q ( q 1 ) 4 ( q 3 ) ( q 4 ) × 8 κ 2 3 κ 30 8 κ 2 2 κ 6 κ 20 q ( q 4 ) ( q 1 ) ( q 3 ) + 4 κ 2 κ 6 2 κ 12 q 2 ( q 4 ) ( q 1 ) 2 ( q 2 ) κ 6 4 q 3 ( q 4 ) ( q 1 ) 4 2 κ 12 κ 2 κ 6 q ( q 2 ) ( q 1 ) 2 2 = 3 ( q 2 ) 2 q ( q 4 ) × 8 κ 2 3 κ 30 8 κ 2 2 κ 6 κ 20 q ( q 4 ) ( q 1 ) ( q 3 ) + 4 κ 2 κ 6 2 κ 12 q 2 ( q 4 ) ( q 1 ) 2 ( q 2 ) κ 6 4 q 3 ( q 4 ) ( q 1 ) 4 2 κ 2 κ 6 κ 12 q ( q 2 ) ( q 1 ) 2 2 .
Therefore, it is simple to check that
lim q β 2 = 3 [ 8 κ 2 2 κ 30 8 κ 2 2 κ 6 κ 20 + 4 κ 2 κ 6 2 κ 12 κ 6 4 ] [ 2 κ 2 κ 12 κ 6 2 ] 2 = 3 ( 3 θ 8 + 128 θ 6 + 408 θ 4 + 576 θ 2 + 240 ) ( θ 4 + 16 θ 2 + 12 ) 2 .
Note that the skewness and kurtosis coefficients of the S A K ( θ , q ) model coincides with that of the AK(θ) for q (see Shanker, 2015).
The findings from the data in Table 2 indicate that the skewness and kurtosis coefficients are influenced by the parameters θ and q. Moreover, it is observed that as the value of q decreases, the skewness and kurtosis coefficients tend to increase. Conversely, when the value of q increases, the skewness and kurtosis coefficients align with those of the AK( θ ) distribution (Proposition 5).

3. Inference

In this section, our focus is on examining the estimation of parameters using the method of moments and maximum likelihood (ML) through the EM algorithm. Additionally, we conduct simulations to analyze the effectiveness of ML estimators in situations with limited data samples.

3.1. Method of Moment Estimators

Let X 1 , , X n be a random sample from X S A K ( θ , q ) . Let X ¯ = i = 1 n X i n and X 2 ¯ = i = 1 n X i 2 n be the first two sample moments.
Proposition 7. 
Given X 1 , , X n a random sample from X S A K ( θ , q ) with q > 2 , the moment method estimators of θ and q provides the following estimators
q ^ M = X ¯ θ ^ M ( θ ^ M 2 + 2 ) θ ^ M ( θ ^ M 2 + 2 ) X ¯ θ ^ M 2 6 ,
X 2 ¯ θ ^ M 2 ( θ ^ M 2 + 6 ) θ ^ M X ¯ ( θ ^ M 2 + 2 ) 2 X ¯ ( θ ^ M 2 + 12 ) = 0 ,
where it is necessary to solve Equation (16) numerically to obtain θ ^ M . Then θ ^ M is replaced in Equation (15) to obtain q ^ M .
Proof. 
The equations for the method of moments are given by
E [ X ] = q ( θ 2 + 6 ) θ ( θ 2 + 2 ) ( q 1 ) = X ¯
E [ X 2 ] = 2 q ( θ 2 + 12 ) θ 2 ( θ 2 + 2 ) ( q 2 ) = X 2 ¯
Solving the Equation (17) for the parameter q we obtain Equation (15). Then the value of q ^ M is substituted into the Equation (18) and the equation given in Equation (16) is obtained.    □

3.2. ML Estimation

Let X 1 , , X n be a random sample from X S A K ( θ , q ) . Then the log-likelihood function is
l ( θ , q ) = c ( θ , q ) ( q + 1 ) i = 1 n log ( x i ) + i = 1 n log θ 2 G ( θ x i ; q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) G ( θ x i ; q + 3 , 1 )
where c ( θ , q ) = 2 n log ( q ) + n log ( Γ ( q ) ) n log ( θ 2 + 2 ) n q log ( θ ) . Taking partial derivatives in l ( θ , q ) in relation to θ and q and equaling those equations to zero, we obtain
i = 1 n 2 θ G ( θ x i ; q + 1 , 1 ) + θ 2 J ( x i , q + 1 ) + ( q + 1 ) ( q + 2 ) J ( x i , q + 3 ) θ 2 G ( θ x i ; q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) G ( θ x i ; q + 3 , 1 ) = 2 n θ θ 2 + 2 + n q θ , i = 1 n θ 2 H ( x i ; q + 1 ) + ( 2 q + 3 ) G ( θ x i ; q + 3 , 1 ) + ( q + 1 ) ( q + 2 ) H ( x i ; q + 3 ) θ 2 G ( θ x i ; q + 1 , 1 ) + ( q + 1 ) ( q + 2 ) G ( θ x i ; q + 3 , 1 ) = η ( θ , q ) i = 1 n log ( x i ) ,
where J ( x i , m ) = x i g ( θ x i ; m , 1 ) , H ( x i ; v ) = 0 θ x i log ( t ) g ( t ; v , 1 ) d t ψ ( v ) G ( θ x i ; v , 1 ) and η ( θ , q ) = 2 n q + n ( ψ ( q ) log ( θ ) ) . Solving this system of equations to find the ML estimates numerically may be a difficult task due to the functions it involves. However, an EM algorithm can be implemented (see Dempster et al. [13]) to obtain the ML estimates. The following subsection is dedicated to achieving this goal.

3.3. EM Algorithm

A different way to represent the SAK model is provided through a stochastic approach.
X i U i = u i , Z i = z i G ( 1 + 2 u i , θ z i ) , U i B e r n o u l l i 2 θ 2 + 2 , Z i B e t a q , 1 .
where U i and Z i , i = 1 , , n , represent non-observable variables. This representation can be used for an alternative estimation procedure based on the EM algorithm (Dempster et al. [13]). In this context, the observed data are given by D o = x , where x = ( x 1 , , x n ) . The vectors z = ( z 1 , , z n ) and u = ( u 1 , , u n ) are the latent variables and the vector D c = ( x , z , u ) are the complete data. Note that the joint distribution of ( X i , U i , Z i ) is given by
f ( x i , u i , z i ) = f ( x i u i , z i ) × f ( u i ) × f ( z i ) = ( θ z i ) 1 + 2 u i Γ ( 1 + 2 u i ) x i 2 u i e θ z i x i × 2 θ 2 + 2 u i θ 2 θ 2 + 2 1 u i × q z i q 1 = q θ 3 z i 2 u i + q 2 u i ( θ 2 + 2 ) Γ ( 1 + 2 u i ) x i 2 u i e θ z i x i .
Therefore, up to a constant that does not depend on the vector of parameters ψ = ( θ , q ) , the complete log-likelihood function for the model is given by
c ( ψ ; D c ) = n log q + 3 log θ log ( θ 2 + 2 ) + i = 1 n q log z i θ x i z i .
With this, the expected value of c ( ψ ; D c ) , given the observed data, is
Q ( ψ ψ ( k ) ) = n log q + 3 log θ log ( θ 2 + 2 ) + i = 1 n q κ ^ i ( k ) θ x i z ^ i ( k ) ,
where z ^ i ( k ) = E ( Z i x i , ψ = ψ ^ ( k ) ) and κ ^ i ( k ) = E ( log Z i x i , ψ = ψ ^ ( k ) ) . Note that
f ( z i , u i x i ) ( θ x i ) 2 u i + q + 1 Γ ( 2 u i + q + 1 ) z i ( 2 u i + q + 1 ) 1 e θ x i z i G ( 1 ; 2 u i + q + 1 , θ x i ) Z i u i , x i T G ( 0 , 1 ) ( 2 u i + q + 1 , θ x i ) × Γ ( 2 u i + q + 1 ) Γ ( 2 u i + 1 ) 2 θ 2 u i G ( 1 ; 2 u i + q + 1 , θ x i ) U i x i B e r n o u l l i ν i ,
where ν i = Γ ( q + 3 ) G ( θ x i ; q + 3 ) / [ θ 2 Γ ( q + 1 ) G ( θ x i ; q + 1 ) + Γ ( q + 3 ) G ( θ x i ; q + 3 ) ] , G ( x ; a ) = 0 x 1 Γ ( a ) t a 1 e t d t is the cdf for the gamma model and TG ( 0 , 1 ) ( a , b ) denotes the gamma distribution with shape a and rate b truncated in the interval (0,1). Therefore, using properties of conditional expectations, we have E ( Z i x i ) = E E ( Z i U i , x i ) x i and by (19) such expectations are simple to be computed. In a similar manner, we can compute E ( log Z i x i ) , obtaining as results
E ( Z i x i ) = ν i ( q + 3 ) G ( θ x i , q + 4 ) θ x i G ( θ x i , q + 3 ) + ( 1 ν i ) ( q + 1 ) G ( θ x i , q + 2 ) θ x i G ( θ x i , q + 1 ) , E ( log Z i x i ) = ν i Γ ( q + 3 ) G ( 1 ; q + 3 , θ x i ) 0 θ x i log w i θ x i w i q + 2 e w i d w i
            + ( 1 ν i ) Γ ( q + 1 ) G ( 1 ; q + 1 , θ x i ) 0 θ x i log w i θ x i w i q e w i d w i .
Therefore, the kth iteration of the algorithm comprises the following steps:
  • E-step: given θ ^ ( k 1 ) and q ^ ( k 1 ) , for  i = 1 , , n compute z ^ i ( k ) and κ ^ i ( k ) using Equations (20) and (21), respectively.
  • M1-step: update q ^ ( k ) as
    q ^ ( k ) = n i = 1 n κ ^ i ( k ) .
  • M2-step: update  θ ^ ( k ) as the solution for the non-linear equation
    3 θ 2 θ θ 2 + 2 = 1 n i = 1 n x i z ^ i ( k ) .
The E, M1 and M2 steps are iterated until convergence is achieved. Convergence is defined as reaching a point where the difference between the estimates obtained in two consecutive iterations is smaller than a predetermined value. Note that the M1-step was obtained in a closed form, whereas the solution for θ can be obtained, for instance, with the uniroot function of R [14].
In the following subsection we run some simulations to study the behavior of the ML estimators.

3.4. Simulation Study

In this subsection, we will conduct a brief simulation study using R software 4.3.2 to evaluate the performance of the ML estimators obtained through the EM algorithm for the SAK ( θ , q ) model discussed earlier. To generate the data, we will consider three different values for θ (0.5, 3, and 10), three values for q (0.5, 1, and 2), and five sample sizes (30, 50, 100, 200, and 500). For each combination of θ , q, and n, we will draw 1000 replicates and calculate the ML estimators. The initial value to start the EM algorithm is based on θ ^ ( 0 ) , the estimation of θ obtained from the AK model (with scale fixed at 1) and q ^ ( 0 ) = 1 . In addition, for each replicate we estimate the standard errors based on the observed information matrix. Table 3 reports the empirical bias (bias), the average of the estimated bias (bias), the mean of the standard errors, the square root of the mean squared error based on empirical data, and the 95% probability that the estimated parameters fall within the asymptotic distribution are all indicators of the performance of maximum likelihood estimators. The table presented, Table 3, demonstrates that as the sample size (n) increases, the estimator’s performance improves.

4. Application

In this section, we analyze a real data set showing that the SAK distribution can be more appropriate than other commongly used distributions to model heavy right-tailed data for this particular data set and based on some model selection criteria. The data correspond to plasma beta-carotene levels (ng/mL) of 314 patients. This data set contains 14 variables and is available online at http://Lib.stat.cmu.edu/datasets/PlasmaRetinol (accessed on 31 October 2023). In this study, we consider the variable Betaplasma. The medical interest in this variable comes from the fact that low levels of plasma beta-carotene may be associated with higher risk of developing certain types of cancer. In Table 4, we present some descriptive statistics including the sample skewness, b 1 , and sample and kurtosis b 2 . We may observe high kurtosis in this data set.
The moment estimates for the parameters of the SAK distribution are θ ^ M = 0.025 and q ^ M = 2.810 . These estimates are useful starting values, required to implement maximum likelihood estimation using numerical methods. The ML estimates for the parameters of the AK, TPAD, PAD, SMR, and SAK models are displayed in Table 5. For each distribution, we present the maximum of the log-likelihood function. It can be seen that the SAK model presented a larger value of log-likelihood than the other models.
In order to compare the fit of the distributions, we considered the usual Akaike information criterion (AIC), introduced by Akaike [15], and the Bayesian information criterion (BIC), proposed by Schwarz [16]. It is known that AIC = 2 k 2 log l i k and BIC = k log n 2 log l i k where k is the number of parameters in the model, n is the sample size and log l i k is the maximized value of the log-likelihood function. Table 6 shows the AIC and BIC for each model, indicating that the SAK distribution leads to a better fit than the other distributions. Figure 3 presents the histogram for the data together with the fitted densities.
In our analysis, we also calculated the quantile residuals (QR). If the model is suitable for the data, the QR should follow a distribution similar to the standard normal distribution, as explained in Dunn and Smyth [17]. To confirm this assumption, conventional normality tests such as the Anderson-Darling (AD), Cramer-von Mises (CVM), and Shapiro-Wilkes (SW) tests can be used. Figure 4 demonstrates the quantile residuals of the PAD, SMR, and SAK distributions through a qqplot. The QR results for the AK and TPAD models are just as unsatisfactory as those of the PAD distribution. Considering the outcomes of all three tests, it seems that the SAK model offers a better fit for the dataset.

5. Discussion

This paper presents an extended version of the AK model based on the slash methodology. Some properties of this new distribution are derived. It is also compared with two other distributions using a real data set. Estimation is performed through ML via the EM algorithm. The new SAK distribution is an alternative to fit heavy-tailed right-skewed data. Additional features of the SAK distribution are:
  • The distribution has two stochastic representations, one of them based on the quotient of two independent r.v.’s and another based on a scale mixture between the AK and Beta distributions.
  • The pdf, cdf and hazard function of the SAK distribution are explicit and are represented by the cdf of the gamma model.
  • The proposed model has a heavy right tail.
  • The model contains the AK distribution as a limit, that is, when the parameter q tends to infinity in the distribution SAK, the AK distribution is obtained.
  • The moments and the skewness and kurtosis coefficient have an explicit form.
  • In the application, observing the AIC and BIC and the AD, CVM and SW statistical tests, we may conclude that the SAK distribution fits the Betaplasma data set better than the PAD and SMR distributions, which are also extensions of the AK distribution.

Author Contributions

Conceptualization, H.W.G.; software, D.I.G.; methodology, Y.M.G., L.F.-L. and D.I.G.; formal analysis, L.F.-L. and H.W.G.; investigation, Y.M.G., D.I.G. and H.W.G.; writing—original draft preparation, Y.M.G., D.I.G. and H.W.G.; writing—review and editing, L.F.-L.; validation, Y.M.G. and L.F.-L.; resources, L.F.-L. All authors have read and agreed to the published version of the manuscript.

Funding

The research was partially funded by Proyecto Puente-UA and Universidad del Bío-Bío.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [http://Lib.stat.cmu.edu/datasets/PlasmaRetinol].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jonhson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley: New York, NY, USA, 1995; Volume 1. [Google Scholar]
  2. Rogers, W.H.; Tukey, J.W. Understanding some long-tailed symmetrical distributions. Statist. Neerlandica 1972, 26, 211–226. [Google Scholar] [CrossRef]
  3. Mosteller, F.; Tukey, J.W. Data Analysis and Regression; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
  4. Kafadar, K. A biweight approach to the one-sample problem. J. Am. Statist. Assoc. 1982, 77, 416–424. [Google Scholar] [CrossRef]
  5. Wang, J.; Genton, M.G. The multivariate skew-slash distribution. J. Stat. Plan. Inference 2006, 136, 209–220. [Google Scholar] [CrossRef]
  6. Gómez, H.W.; Venegas, O. Erratum to: A new family of slash-distributions with elliptical contours [Statist. Probab. Lett. 77 (2007) 717–725]. Stat. Probab. Lett. 2008, 78, 2273–2274. [Google Scholar] [CrossRef]
  7. Olmos, N.M.; Varela, H.; Gómez, H.W.; Bolfarine, H. An extension of the half-normal distribution. Stat. Pap. 2012, 53, 875–886. [Google Scholar] [CrossRef]
  8. Rivera, P.A.; Barranco-Chamorro, I.; Gallardo, D.I.; Gómez, H.W. Scale Mixture of Rayleigh Distribution. Mathematics 2020, 8, 1842. [Google Scholar] [CrossRef]
  9. Shanker, R. Akash Distribution and Its Applications. Int. J. Probab. Stat. 2015, 4, 65–75. [Google Scholar]
  10. Shanker, R.; Shukla, K.K. On Two-Parameter Akash Distribution. Biom. Biostat. Int. J. 2017, 6, 00178. [Google Scholar] [CrossRef]
  11. Shanker, R.; Shukla, K.K. Power Akash Distribution and Its Application. J. Appl. Quant. Methods 2017, 12, 1–10. [Google Scholar]
  12. Rolski, T.; Schmidli, H.; Schmidt, V.; Teugel, J. Stochastic Processes for Insurance and Finance; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
  13. Dempster, A.P.; Laird, N.M.; Rubim, D.B. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 1977, 39, 1–38. [Google Scholar]
  14. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023. [Google Scholar]
  15. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  16. Schwarz, G. Estimating the dimension of a model. Ann. Statist. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  17. Dunn, P.K.; Smyth, G.K. Randomized Quantile Residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
Figure 1. Left side: examples of the SAK( 1 , 1 ) (in black), SAK( 1 , 5 ) (in blue), SAK( 1 , 10 ) (in red). Right side: examples of the SAK( 0.5 , 1 ) (in black), SAK( 0.5 , 5 ) (in blue), SAK( 0.5 , 10 ) (in red).
Figure 1. Left side: examples of the SAK( 1 , 1 ) (in black), SAK( 1 , 5 ) (in blue), SAK( 1 , 10 ) (in red). Right side: examples of the SAK( 0.5 , 1 ) (in black), SAK( 0.5 , 5 ) (in blue), SAK( 0.5 , 10 ) (in red).
Mathematics 12 00031 g001
Figure 2. Hazard function of the SAK( 0.5 , 1 ) distribution (in black), SAK( 0.5 , 2 ) distribution (in blue), SAK( 0.5 , 3 ) distribution (in red).
Figure 2. Hazard function of the SAK( 0.5 , 1 ) distribution (in black), SAK( 0.5 , 2 ) distribution (in blue), SAK( 0.5 , 3 ) distribution (in red).
Mathematics 12 00031 g002
Figure 3. Betaplasma: histogram and fitted pdf for AK, TPAD, PAD, SMR and SAK distributions.
Figure 3. Betaplasma: histogram and fitted pdf for AK, TPAD, PAD, SMR and SAK distributions.
Mathematics 12 00031 g003
Figure 4. The qqplots of the quantile residuals for the fitted modelscand p-values of the AD, CVM and SW tests.
Figure 4. The qqplots of the quantile residuals for the fitted modelscand p-values of the AD, CVM and SW tests.
Mathematics 12 00031 g004
Table 1. Tail comparisons of the AK and SAK distributions.
Table 1. Tail comparisons of the AK and SAK distributions.
Distribution P ( X > 5 ) P ( X > 10 ) Distribution P ( X > 15 ) P ( X > 20 )
SAK(1,1) 0.443 0.233 SAK(0.5,1) 0.367 0.278
SAK(1,5) 0.162 0.015 SAK(0.5,5) 0.063 0.020
SAK(1,10) 0.120 0.005 SAK(0.5,10) 0.034 0.007
AK(1) 0.085 0.002 AK(0.5) 0.018 0.003
Table 2. Skewness and kurtosis of the SAK distribution for various values of the shape parameters.
Table 2. Skewness and kurtosis of the SAK distribution for various values of the shape parameters.
θ q β 1 β 2
0.5 5 1.974 16.574
1 1.952 15.180
0.5 6 1.570 9.039
1 1.596 8.650
0.5 7 1.391 7.009
1 1.438 6.863
0.5 10 1.201 5.460
1 1.271 5.470
0.5 100 1.085 4.788
1 1.166 4.837
0.5 1.084 4.785
1 1.165 4.834
Table 3. Estimated bias, SE, RMSE and CP of the ML estimators of the parameters of the SAK distribution for different sample sizes.
Table 3. Estimated bias, SE, RMSE and CP of the ML estimators of the parameters of the SAK distribution for different sample sizes.
n = 30 n = 50 n = 100 n = 200 n = 500
θ qEstimatorBiasSERMSECPBiasSERMSECPBiasSERMSECPBiasSERMSECPBiasSERMSECP
0.50.5 θ ^ −0.0020.1190.1240.914−0.0040.0920.0940.930−0.0010.0650.0660.9370.0000.0460.0460.9460.0000.0290.0290.947
q ^ 0.0360.1220.1390.9610.0250.0920.1000.9580.0120.0630.0650.9520.0050.0430.0440.9520.0010.0270.0270.951
1.0 θ ^ −0.0040.1100.1140.918−0.0030.0850.0860.931−0.0020.0600.0610.940−0.0010.0430.0430.9460.0000.0270.0270.946
q ^ −0.1590.2360.2530.924−0.1120.1610.1710.929−0.0870.1080.1150.939−0.0590.0740.0810.948−0.0460.0460.0510.948
2.0 θ ^ −0.0030.1050.1070.931−0.0030.0810.0820.939−0.0020.0570.0580.940−0.0010.0400.0410.9450.0000.0250.0260.947
q ^ −0.1370.5970.6220.904−0.1250.3950.4200.924−0.0770.2330.2500.932−0.0410.1510.1620.942−0.0230.0920.0950.948
3.00.5 θ ^ 0.1361.0631.2360.8910.0950.7940.8610.9150.0350.5370.5560.9270.0130.3730.3800.9400.0050.2340.2350.947
q ^ 0.0590.1560.2060.9630.0300.1100.1240.9580.0150.0750.0790.9550.0090.0520.0540.9530.0030.0320.0330.952
1.0 θ ^ 0.1040.9821.1120.8960.0600.7290.7860.9120.0280.4990.5170.9290.0120.3470.3540.9410.0030.2180.2190.948
q ^ −0.0870.3980.4460.892−0.0570.2450.2960.925−0.0210.1450.1880.938−0.0120.0970.1170.948−0.0020.0600.0660.947
2.0 θ ^ 0.1450.9761.0700.9220.0680.7090.7470.9290.0180.4780.4910.9340.0060.3320.3390.9410.0000.2080.2100.946
q ^ −0.1051.0251.0900.915−0.0840.7240.7900.924−0.0690.4400.4850.935−0.0480.2550.2820.942−0.0080.1400.1550.948
10.00.5 θ ^ 0.5954.6885.3310.8820.2913.4843.7090.9010.1262.4002.4700.9250.0881.6841.7060.9420.0191.0561.0490.944
q ^ 0.0690.1750.1840.9640.0350.1130.1280.9630.0160.0750.0800.9570.0070.0520.0530.9510.0030.0320.0330.951
1.0 θ ^ 0.5594.4404.9100.9040.2223.2603.4530.9100.1022.2482.3280.9260.0591.5741.6000.9410.0090.9870.9800.948
q ^ −0.0970.5080.6310.899−0.0510.2840.3890.903−0.0310.1520.1990.939−0.0230.0980.1170.948−0.0120.0600.0800.948
2.0 θ ^ 0.8854.5754.7570.9350.3893.2863.3160.9370.1722.2092.2170.9440.0351.5331.5460.947−0.0060.9550.9550.947
q ^ −0.0681.2241.2220.924−0.0570.8340.9500.931−0.0370.4400.4830.935−0.0270.3050.3130.942−0.0180.1490.1590.943
Table 4. Summary for betaplasma data.
Table 4. Summary for betaplasma data.
n x ¯ s 2 b 1 b 2
314190.496833480.723.53656216.8145
Table 5. ML estimates for AK, TPAD, PAD, SMR and SAK models (standard errors are in parenthesis).
Table 5. ML estimates for AK, TPAD, PAD, SMR and SAK models (standard errors are in parenthesis).
Parameter EstimatesAKTPADPADSMRSAK
θ 0.387 (0.120)0.016 (0.004)0.012 (0.003)16,998.167 (3399.076)0.027 (0.002)
α 1.830 (0.133)1.052 (0.038)
q2.926 (0.385)2.331 (0.294)
σ 25.767 (8.697)
log-likelihood−1952.939−1955.297−1953.632−1910.472−1908.147
Table 6. AIC and BIC criteria for fitted models.
Table 6. AIC and BIC criteria for fitted models.
CriterionAKTPADPADSMRSAK
AIC3909.8783914.5943911.2643824.9443820.294
BIC3917.3763922.0923918.7633832.4433827.793
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gómez, Y.M.; Firinguetti-Limone, L.; Gallardo, D.I.; Gómez, H.W. An Extension of the Akash Distribution: Properties, Inference and Application. Mathematics 2024, 12, 31. https://doi.org/10.3390/math12010031

AMA Style

Gómez YM, Firinguetti-Limone L, Gallardo DI, Gómez HW. An Extension of the Akash Distribution: Properties, Inference and Application. Mathematics. 2024; 12(1):31. https://doi.org/10.3390/math12010031

Chicago/Turabian Style

Gómez, Yolanda M., Luis Firinguetti-Limone, Diego I. Gallardo, and Héctor W. Gómez. 2024. "An Extension of the Akash Distribution: Properties, Inference and Application" Mathematics 12, no. 1: 31. https://doi.org/10.3390/math12010031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop