Next Article in Journal
A Group MCP Approach for Structure Identification in Non-Parametric Accelerated Failure Time Additive Regression Model
Previous Article in Journal
Drone-Based Decentralized Truck Platooning with UWB Sensing and Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Heavy-Tailed Distribution Based on the Lomax–Rayleigh Distribution with Applications to Medical Data

1
Departamento de Estadística y Ciencias de Datos, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
2
Departamento de Estadística, Facultad de Ciencias, Universidad del Bío-Bío, Concepción 4081112, Chile
3
Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad Católica de Temuco, Temuco 4780000, Chile
4
Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos 13560-095, Brazil
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(22), 4626; https://doi.org/10.3390/math11224626
Submission received: 17 October 2023 / Revised: 31 October 2023 / Accepted: 4 November 2023 / Published: 13 November 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this paper, we extend the Lomax–Rayleigh distribution to increase its kurtosis. The construction of this distribution is based on the idea of the Slash distribution, that is, its representation is based on the quotient of two independent random variables, one being a random variable with a Lomax–Rayleigh distribution and the other a beta ( q , 1 ) . Based on the representation of this family, we study its basic properties, such as moments, coefficients of skewness, and kurtosis. We perform statistical inference using the methods of moments and maximum likelihood. To illustrate this methodology, we apply it to two real data sets.

1. Introduction

Non-negative data modeling has grown exponentially, as many real data sets follow this pattern. Distributions with positive support are mainly used in the areas of engineering, reliability, survival analysis, and failure time. An important distribution found in positive phenomena, such as life-testing experiments, reliability analysis, communication theory, physical sciences, engineering, medical imaging science, applied statistics, and clinical studies, is the Rayleigh distribution, initially introduced by Johnson et al. [1]. Another important distribution in this type of phenomenon is the Lomax distribution, also known as the second-order Pareto distribution. Applications of this model include lifetime data, business failure data, and economic and actuarial modeling. Cordeiro et al. [2] proposed a family of univariate, positively supported distributions generated by Lomax random variables, which they defined as generating Lomax-G distributions.
The novelty of this work is to introduce a new model with heavy tails. In actuarial statistics, distributions of this type have proven to be the best option for heavy-tailed financial data, which is why they are of great interest to actuaries. Olmos et al. [3] presented a heavy right-tailed distribution with real data application to the Survey of Consumer Finances (SCF); Zhao et al. [4] presented a new family of heavy-tailed distributions, useful for modeling financial data; Riad et al. [5] introduced the new Kayva–Manoharan Lomax model, with a real data application related to HT insurance loss; and Afify et al. [6] defined the exponential power-weighted model to model financial data. Other types of studies based on heavy-tailed models can be found in the literature. In practice, the most interesting heavy-tailed distributions are those that have a finite mean and a divergent variance. Cococcioni et al. [7] provided a LogNormal distribution that has a finite mean and a variance that converges to a well-defined infinite value. On the other hand, Xu et al. [8] provided two robust estimators, the ridge log truncated M-estimator and the elastic net log-truncated M-estimator.
Venegas et al. [9] introduced the Lomax–Rayleigh (LR) heavy-tailed distribution, considering G as the cumulative distribution function (cdf) of the Rayleigh model.
The probability density function (pdf) of a random variable X with distribution LR, which is denoted as X L R ( θ , α ) , can be expressed as
f X ( x ; θ , α ) = 2 α θ α x θ + x 2 α + 1 , x > 0 ,
where θ > 0 and α > 0 are the scale and shape parameters, respectively.
On the other hand, the canonical Slash distribution is stochastically defined as the ratio of two independent random variables: one standard normal and the other a power of a uniform ( 0 , 1 ) , that is,
Y = X U 1 / α ,
where X N ( 0 , 1 ) and U U ( 0 , 1 ) are independent and α > 0 . This distribution has heavier tails than the normal distribution, that is, it has greater kurtosis. The properties of this family are discussed by Rogers and Tukey [10] and Mosteller and Tukey [11]. The location, scale, and maximum likelihood (ML) estimators are discussed by Kafadar [12]. Wang and Genton [13] provide a multivariate version of the Slash distribution and a multivariate skew version. Gómez et al. [14] extend this family using the family of univariate and multivariate elliptic distributions. Recent works by Olmos et al. [15,16], Actias [17], Gómez et al. [18], Barrios et al. [19], and Arendarczyk et al. [20] use a methodology similar to the Slash distribution to extend different models. The objective of this work is to extend the LR distribution in such a way that this new distribution has greater flexibility in its kurtosis using the Slash procedure, using an LR random variable in the numerator and a beta random variable in the denominator. We call this new model Slash Lomax–Rayleigh (SLR).
The structure of this article is as follows. Section 2 presents the representation of the family and produces the Slash Lomax–Rayleigh PDF, moments, and coefficients of asymmetry and kurtosis. In Section 3, inferences are drawn using the moments and ML estimation methods. Section 4 consists of a simulation study to observe the behavior of the ML estimates of the parameters. Two applications to real data sets are discussed in Section 5. Finally, Section 6 summarizes the main conclusions of this study.

2. The Slash Lomax–Rayleigh Model

In this section, we discuss the stochastic representation of the SLR model, including its PDF, CDF, and some properties of the model.

2.1. Stochastic Representation

Definition 1.
A random variable Z has an SLR distribution with parameters θ > 0 and α > 0 if it can be represented by the ratio:
Z = X Y
where X L R ( θ , α ) and Y B e t a ( α , 1 ) are two independent random variables. We denote this as Z S L R ( θ , α ) .

2.2. PDF, CDF, Hazard Function, and Other Properties

Proposition 1.
Let Z S L R ( θ , α ) . Then, the PDF of Z is given by:
f Z ( z ; θ , α ) = α 2 θ α 2 z ( α + 1 ) B z 2 θ + z 2 ; α 2 + 1 , α 2 , z > 0 ,
where θ > 0 and α > 0 , and B ( w ; a , b ) = 0 w u a 1 ( 1 u ) b 1 d u is the incomplete beta function.
Proof. 
Using the PDF given in (1) and the stochastic representation given in (2), the PDF associated with Z is given by:
f Z ( z ; θ , α ) = 0 1 v f X , Y ( z v , v ) d v ,
since X and Y are independent. Then, f X , Y ( z v , v ) = f X ( z v ) f Y ( v ) , and given that X L R ( θ , α ) and Y b e t a ( α , 1 ) , we obtain:
f Z ( z ; θ , α ) = 2 α 2 θ α z 0 1 v α + 1 ( θ + ( z v ) 2 ) α + 1 d v .
After making the change of variable u = ( z v ) 2 θ , we obtain:
f Z ( z ; θ , α ) = α 2 θ α 2 z ( α + 1 ) 0 z 2 θ u α 2 ( 1 + u ) α + 1 d u .
In this latter integral, we use the change of variable w = u 1 + u , where d w = d u / ( 1 + u ) 2 . Substituting this, we have
f Z ( z ; θ , α ) = α 2 θ α 2 z ( α + 1 ) 0 z 2 θ u α 2 ( 1 + u ) α + 1 d u , = α 2 θ α 2 z ( α + 1 ) 0 z 2 θ u 1 + u α 2 ( 1 + u ) α 2 + 1 d u ( 1 + u ) 2 , = α 2 θ α 2 z ( α + 1 ) 0 z 2 θ + z 2 w α 2 ( 1 w ) α 2 1 d w ,
and the result is obtained.    □
Proposition 2.
Let Z S L R ( θ , α ) . Then, the CDF of Z is given by:
F Z ( z ; θ , α ) = 1 θ α ( θ + z 2 ) α α θ α 2 p 0 ( z ) , z > 0 ,
where θ > 0 , α > 0 and p j ( z ) = z ( α + j ) B z 2 θ + z 2 ; α 2 + 1 , α 2 , j Z .
Proof. 
Using the definition of the CDF and integration by parts, the result is obtained.    □
Proposition 3.
The hazard function of Z is given by:
h Z ( z ; θ , α ) = α 2 p 1 ( z ) θ α / 2 ( θ + z 2 ) α + α p 0 ( z ) , z > 0 ,
where θ > 0 and α > 0 .
Proof. 
Using the definition of the hazard function
h Z ( z ; θ , α ) = f Z ( x ; θ , α ) 1 F Z ( x ; θ , α ) ,
the result follows immediately.    □
Figure 1 illustrates the PDF, CDF, and hazard function of the SLR ( θ , α ) model for some combinations of θ and α .
Proposition 4.
If Z | T = t L R ( θ t 2 , α ) and T b e t a ( α , 1 ) , then Z S L R ( θ , α ) .
Proof. 
The marginal distribution of Z can be calculated as:
f Z ( z ; θ , α ) = 0 1 f Z | T ( y | t ) f T ( t ) d t = 2 α 2 θ α 0 1 z t α + 1 ( θ + ( z t ) 2 ) α + 1 d t .
Using the change of variables in Proposition 1, we obtain Equation (3).    □
Proposition 5.
Let Z S L R ( θ , α ) . Then, W = a Z S L R ( a 2 θ , α ) for all a > 0 .
Proof. 
The proof follows directly by using the change of variable method.    □
We know that any distribution of probability is specified by its CDF F ( t ) , which is a heavy right-tailed distribution (see Rolski et al. [21]) if
lim sup t log ( 1 F ( t ) ) t = 0 .
The following result shows that the SLR distribution is a heavy right-tailed distribution.
Proposition 6.
The CDF of the random variable T S L R ( θ , α ) is a heavy right-tailed distribution.
Proof. 
By applying L’Hopital’s rule to the upper limit and substituting Equation (3), we obtain:
lim sup t log ( 1 F ( t ) ) t = lim sup t f T ( t ; θ , α ) 1 F T ( t ; θ , α ) = lim sup t α 2 p 1 ( t ) θ α / 2 ( θ + t 2 ) α + α p 0 ( t ) ,
and by again applying L’Hopital’s rule, we obtain:
= lim sup t α 2 t 1 B t 2 θ + t 2 ; α 2 + 1 , α 2 θ α / 2 θ t + t α + α B t 2 θ + t 2 ; α 2 + 1 , α 2 = 0 .    □

2.3. Moments

The following proposition presents the moments of the SLR distribution.
Proposition 7.
Let Z S L R ( θ , α ) . For  r = 1 , 2 , and α > r , the r-th moment of Z is given by:
μ r = E ( Z r ) = α θ r / 2 α r M r ( α ) ,
where M r = M r ( α ) = Γ α r 2 Γ r 2 + 1 Γ ( α ) .
Proof. 
Using the stochastic representation provided in Equation (2), we directly obtain:
μ r = E ( Z r ) = E ( X r Y r ) = E ( X r ) E ( Y r ) ,
where E ( Y r ) = α α r , α > r , and E ( X r ) = θ r 2 Γ α r 2 Γ r 2 + 1 Γ ( α ) is the r-th moment of the L R ( θ , α ) distribution.    □
Corollary 1.
If Z S L R ( θ , α ) , then,
1. 
μ 1 = E ( Z ) = α θ 1 / 2 α 1 M 1 , α > 1 ;
2. 
μ 2 = E ( Z 2 ) = α θ α 2 M 2 , α > 2 ;
3. 
μ 3 = E ( Z 3 ) = α θ 3 / 2 α 3 M 3 , α > 3 ;
4. 
μ 4 = E ( Z 4 ) = α θ 2 α 4 M 4 , α > 4 ;
5. 
Var ( Z ) = α θ 1 α 2 M 2 α ( α 1 ) 2 M 1 2 , α > 2 .
Proof. 
The proof follows directly from Proposition 7.   □
Corollary 2.
Let Z S L R ( θ , α ) . Then, the skewness coefficient ( β 1 ) and kurtosis coefficient ( β 2 ) are:
β 1 = α 2 { ( α 1 ) 3 ( α 2 ) M 3 3 α M 1 M 2 ( α 1 ) 2 ( α 3 ) + 2 α 2 ( α 2 ) ( α 3 ) M 1 3 } α ( α 3 ) { ( α 1 ) 2 M 2 α ( α 2 ) M 1 2 } 3 / 2 , α > 3 , β 2 = ( α 1 ) 3 ( α 2 ) 2 A + 3 ( α 2 ) ( α 3 ) ( α 4 ) α 2 B α ( α 3 ) ( α 4 ) { ( α 1 ) 2 M 2 α ( α 2 ) M 1 2 } 2 , α > 4 ,
where A = A ( α ) = ( α 1 ) ( α 3 ) M 4 4 α ( α 4 ) M 1 M 3 y B = B ( α ) = 2 ( α 1 ) 2 M 1 2 M 2 α ( α 2 ) M 1 4 .
Proof. 
From the definition of the skewness and kurtosis coefficients, we have:
β 1 = μ 3 3 μ 2 μ 1 + 2 μ 1 3 ( μ 2 μ 1 2 ) 3 / 2 a n d β 2 = μ 4 4 μ 1 μ 3 + 6 μ 1 2 μ 2 3 μ 1 4 ( μ 2 μ 1 2 ) 2 .
By replacing μ 1 , μ 2 , μ 3 , and μ 4 presented in Corollary 1, the result is obtained.    □
Remark 1.
The moments of the SLR distribution are primarily influenced by the moments of the LR ( θ , α ) model, as demonstrated in Proposition 7. The graphs in Figure 2 illustrate the skewness and kurtosis coefficients of the SLR and LR models, with different values of the parameter α while keeping the parameter θ = 1 constant. The figure highlights the impact of the parameter α, indicating the significant influence of the SLR model on its kurtosis and skewness, surpassing the LR model, which already exhibits heavy tails. Consequently, as α decreases, both models exhibit higher values of the skewness and kurtosis coefficients, with the SLR model outperforming the LR model. This observation is also evident in Table 1.

2.4. Incomplete Moments

Proposition 8.
Let Z S L R ( θ , α ) . Then, the k-th incomplete moment is given by
T k ( z ) = α 2 k α θ α / 2 p k ( z ) θ k / 2 B z 2 θ + z 2 ; k 2 + 1 , α k 2 , 2 α > k , k = 0 , 1 , .
Proof. 
By definition, the k-th incomplete moment of the SLR model is given by:
T k ( z ) = E ( Z k Z < z ) = 0 z t k f ( t ; θ , α ) d t ,
and using integration by parts, the result is obtained.    □
Especially for k = 0 , 1 , we have:
T 0 ( z ) = F ( z ) , T 1 ( z ) = α 2 1 α θ α / 2 p 1 ( z ) θ 1 / 2 B z 2 θ + z 2 ; 3 2 , α 1 2 .

2.5. The Lorenz Curve and the Gini Index

The standard definition of the Lorenz curve [22] is provided in terms of the first incomplete moment and the expected value of Z. Specifically, for the SLR model, the following closed-form expression is obtained
L ( p ) = 1 μ 0 z t f ( t ) d t = 1 μ T 1 ( z ) = α 2 μ ( 1 α ) θ α / 2 p 1 ( z ) θ 1 / 2 B z 2 θ + z 2 ; 3 2 , α 1 2 .
The Gini index, also known as the Gini coefficient (see [23,24]), is a statistical dispersion mean associated with the Lorenz curve, intended to represent income inequality, wealth inequality, or consumption inequality within a nation or social group. The Gini index is defined as:
G ( θ , α ) = 1 1 μ 0 1 F ( z ; θ , α ) 2 d z .
Proposition 9.
Let Z S L R ( θ , α ) . Then, the Gini index is given by:
G ( θ , α ) = 1 1 μ 0 1 T 0 ( z ) 2 d z = 1 θ α / 2 μ 0 ( θ + z 2 ) α + α θ α 2 p 0 ( z ) 2 d z .
Proof. 
By definition, the proof is direct.    □
Proposition 10.
Let Z S L R ( θ , α ) . The mode of Z is obtained as the solution to the following non-linear equation in relation to z:
2 θ α / 2 ( θ + z 2 ) ( α + 1 ) ( α + 1 ) p 0 ( z ) = 0 .
Proof. 
The result is obtained by deriving the logarithm of the PDF for the SLR model and setting it equal to zero.    □
Table 2 provides the numerical values of the mode for θ = 1 and α = 1 , 3 , 5 , 7 , 9 , and 12.

2.6. Order Statistics

Order statistics have a wide range of applications in physical and life sciences (see Balakrishnan and Cohen [25]). From a statistical perspective, they allow the computation of useful functions such as the sample range and the sample median. The following result states the PDF of the k-th order statistic from an SLR random sample of size n, which is arranged in a non-decreasing order.
Proposition 11.
Let Z 1 : n Z 2 : n Z n : n be independent and identically SLR-distributed random variables. Then, for  k = 1 , 2 , . . . , n , the PDF of the k-th order statistics Z k : n is given by:
f k : n ( z ) = n ! α 2 θ α / 2 p 1 ( z ) ( k 1 ) ! ( n k ) ! 1 θ α ( θ + z 2 ) α α θ α / 2 p 0 ( z ) k 1 θ α ( θ + z 2 ) α + α θ α / 2 p 0 ( z ) n k .
Proof. 
The above expression is obtained using the following formula (see Casella and Berger [26])
f k : n ( z ) = n ! ( k 1 ) ! ( n k ) ! f ( z ) ( F ( z ) ) k 1 [ 1 F ( z ) ] n k , k = 1 , 2 , , n ,
where f ( z ) and F ( z ) are the PDF and CDF of Z S L R ( θ , α ) .    □
Corollary 3.
Let Z 1 : n Z 2 : n Z n : n be independent and identically SLR-distributed random variables. Then:
1. 
The PDF of the first-order statistic Z 1 : n is given by:
f 1 : n ( z ) = n α 2 θ α p 1 ( z ) θ α / 2 ( θ + z 2 ) α + α p 0 ( z ) n 1 .
2. 
The PDF of the n-th order statistic Z n : n is given by:
f n : n ( z ) = n α 2 θ α / 2 p 1 ( z ) 1 θ α ( θ + z 1 ) α α θ α / 2 p 0 ( z ) n 1 .
Proof. 
The proof follows directly from Proposition 11.    □

3. Inference

In this section, the problem of estimating the parameters of the SLR distribution is addressed. First, we apply the moments method for estimating the parameters, and then the ML method.

3.1. Moment Estimators

Let Z 1 , Z 2 , , Z n be a random sample from Z S L R ( θ , α ) . The moment estimators are obtained as the solution to equations E ( Z j ) = Z j ¯ for  j = 1 , 2 , where Z j ¯ = n 1 i = 1 n z i j denotes the j-th sample moment. By solving E ( Z ) = Z ¯ for θ , we obtain:
θ ^ M = Z ¯ 2 ( α ^ M 1 ) 2 α ^ M 2 M 1 2 ( α ^ M ) ,
which depends on the solution to α , say α ^ M . Therefore, by using (4) and replacing the second population moment, the following equation is obtained
Z ¯ 2 ( α ^ M 1 ) 2 M 2 ( α ^ M ) α ^ M ( α ^ M 1 ) M 1 2 ( α ^ M ) Z 2 ¯ = 0 .
This equation can be solved numerically using the R-4.3.1. software [27].

3.2. ML Estimators

For z 1 , , z n , a random sample from the SLR ( θ , α ) model, the log-likelihood function is given by:
( ψ ) = 2 n α + n α 2 log ( θ ) ( α + 1 ) i = 1 n log ( z i ) + i = 1 n log B ( z i ) ,
where B ( z i ) = B z i 2 θ + z 2 ; α 2 + 1 , α 2 and ψ = ( θ , α ) . The ML equations are given by:
i = 1 n B θ ( z i ) B ( z i ) = n α 2 θ
2 n + n 2 log ( θ ) + i = 1 n B α ( z i ) B ( z i ) = i = 1 n log ( z i )
where B θ ( z i ) = B ( z i ) θ and B α ( z i ) = B ( z i ) α .
The ML estimators (MLEs) can be obtained by solving the likelihood Equations (6) and (7). The solution for these equations can be obtained using numerical methods such as the Newton–Raphson procedure. Alternative maximization techniques could also be applied, for instance, the proposal by MacDonald [28]. It is important to note that the computational cost of finding the ML estimates can be high in certain cases, as the equations depend on the incomplete beta function.

3.3. Observed Information Matrix

The asymptotic variance of the MLEs, say ψ ^ = ( θ ^ , α ^ ) , can be estimated using the Fisher information matrix, defined as I ( ψ ) = E 2 ( ψ ) / ψ ψ , where ( ψ ) is the log-likelihood function of the SLR model provided in (5). Recall that under the conditions of regularity,
I ( ψ ) 1 / 2 ψ ^ ψ D N 2 ( 0 2 , I 2 ) , a s   n + ,
where D represents convergence in the distribution, and N 2 ( 0 2 , I 2 ) denotes the standard bivariate normal distribution. The elements of the matrix 2 ( ψ ) / ψ ψ are given by I θ θ = 2 ( ψ ) / θ 2 , I α θ = 2 ( ψ ) / α θ , and I α α = 2 ( ψ ) / α 2 . Explicitly, we have:
I θ θ = n α 2 θ 2 + i = 1 n B θ θ ( z i ) B ( z i ) B θ 2 ( z i ) B 2 ( z i ) , I α θ = n 2 θ + i = 1 n B θ α ( z i ) B ( z i ) B θ ( z i ) B α ( z i ) B 2 ( z i ) , I α α = i = 1 n B α α ( z i ) B ( z i ) B α 2 ( z i ) B 2 ( z i ) .
where B θ θ ( z i ) = 2 B ( z i ) θ 2 , B α α ( z i ) = 2 B ( z i ) α 2 , and B θ α ( z i ) = 2 B ( z i ) θ α .
In practice, it is not possible to obtain the expected value of previous expressions in a closed form. Therefore, the covariance matrix of the MLEs, I ( ψ ) 1 , can be estimated consistently by I ( ψ ^ ) 1 , where I ( ψ ^ ) denotes the observed information matrix, which is obtained as
I ( ψ ^ ) = 2 ( ψ ) ψ ψ | ψ = ψ ^ .
The asymptotic variances of θ ^ and α ^ are estimated by the diagonal elements of I ( ψ ^ ) 1 , and their standard errors by the square root of the asymptotic variances.

4. Simulation Study

Using the stochastic representation provided in (2), it is possible to generate random numbers from the SLR ( θ , α ) distribution using Algorithm 1.
This scheme was used to perform two simulation studies. The first assessed the recovery parameters provided by the MLEs. The second evaluated the performance of different criteria in model selection.
Algorithm 1 Simulating values from the SLR ( θ , α ) distribution
  1: Simulate U 1 , U 2 U ( 0 , 1 ) .
  2: Calculate X = θ [ ( 1 U 1 ) 1 / α 1 ] .
  3: Calculate Z = X U 2 1 / α .
  4: Return Z. Z S L R ( θ , α ) .

4.1. Recovery Parameters

We used the following sequence to perform a simulation study to evaluate the behavior of the MLEs for the SLR model in finite samples. For θ , we fixed two values: 1 and 10. For α , we fixed three values: 1, 2, and 3. For the sample size, we fixed three values: 100, 200, and 500. For each combination of θ , α , and n, we simulated 1000 replicas of that size and calculated the MLEs and their standard errors. Table 3 summarizes the mean bias of each estimator (bias), the mean of the standard errors (SE), the estimated root mean squared error (RMSE), and the coverage percentages at 95% (CP). Note that as the sample size increased, the bias, SE, and RMSE decreased, suggesting that the MLEs for the SLR model exhibited acceptable behavior, even in finite samples. Moreover, the SE and RMSE approached each other as the sample size increased, suggesting that the variance of the estimators was well estimated. Finally, the CP approached the nominal value as n increased, suggesting that the asymptotic normality of the MLEs for the SLR model was reasonable, even in finite samples.

4.2. Assessing Model Selection Criteria

In this section, we assess different model selection criteria, such as the Akaike information criterion (AIC) (see Akaike [29]), Bayesian information criterion (BIC) (see Schwarz [30]), and Vuong test [31], to decide between the SLR model and competing models, such as the Lomax–Rayleigh (see Venegas et al. [9]), Weibull (W), inverse Gaussian (IG), and Slash Half-Normal (SHN) distributions. The data were drawn from the SLR with the same parameter combinations as in the previous study. Table 4 reports the percentage of occurrences where the AIC and BIC favored the SLR model over the corresponding competitor. The Vuong test was used to test the hypothesis
H 0 : f ( z ; θ ^ , α ^ ) = g ( z ; θ * ^ , α * ^ ) versus H 1 : f ( z ; θ ^ , α ^ ) g ( z ; θ * ^ , α * ^ ) ,
where f ( · ; θ , α ) denotes the PDF of the SLR model, θ ^ and α ^ represent the MLEs for θ and α in the SLR model, g ( z ; θ * , α * ) is the PDF of the competing model, and θ * ^ and α * ^ are the MLEs for that model. The decision is made at a 5% significance level. Table 4 shows the percentage of cases where the AIC and BIC chose the SLR over the LR, W, IG, and SHN models. In addition, the row labeled “Vuong” corresponds to the percentage of cases where the Vuong test rejected the previously stated null hypothesis. In such cases, there would be evidence that the SLR model was preferable to the competing model (note that if the null hypothesis was not rejected, this does not mean that there was no preference for the alternative model, but rather that both models provided an equally good fit). The first conclusion is that for a greater sample size, the AIC, BIC, and Vuong tests increase the preference for the SLR. Conversely, the AIC and BIC work well to differentiate between the SLR and the W, IG, and SHN models. However, a considerable sample size is needed to differentiate between the SLR and LR models.

5. Applications

In this section, we present two relevant applications to illustrate the superior performance of the SLR model in comparison with other proposals in the literature.

5.1. Application 1

The first data set was drawn from a study carried out by the US Department of Veterans Affairs, which measured survival time (in days). The study included 137 patients with advanced lung cancer. This data set was presented by Kalbfleisch and Prentice [32] and is available in the R-4.3.1 software “survival” package [33], labeled as “veteran”. Table 5 shows the descriptive statistics of the data, where b 1 and b 2 are, respectively, the coefficients of asymmetry and kurtosis of the sample.
The proposed SLR model was compared with some distributions from the literature, using the AIC and BIC, as well as the Vuong test. One distribution fit the data better than the other distribution when the values of the AIC and BIC were lower. On the other hand, if the p-value for the Vuong test was lower than 0.05, this suggests that the corresponding model produced a different PDF compared to that of the SLR model. The distributions we used to compare with the model proposed in this work were the IG, LR, and Slash Fréchet model (SFr) (see Castillo et al. [34]).
The PDF of the SFr distribution is given by:
f ( y ; θ , α ) = α y α + 1 Γ 1 α θ , y θ , y , θ , α > 0 ,
where θ > α , and Γ a , t = t w a 1 e w d w is the incomplete gamma function.
Table 6 shows the MLEs for the four models and their SEs (standard errors), as well as their AIC and BIC values, and the results of the Vuong test. Note that the AIC and BIC values were lower for the SLR distribution, and the results of the Vuong test suggest that the three models produced a different PDF compared to that of the SLR model for these data.
Figure 3 shows a histogram of the lung cancer data fitted to the SLR model and the empirical CDF. The graphs confirm the results in Table 6. Therefore, the SLR distribution provides a better fit for the lung cancer data compared to other distributions in the literature. To illustrate the differences in the possible decisions that can be made based on the four fitted models, we calculated the probability of a survival time of at least 4 months for patients of this type. The estimations (and their 95% confidence intervals computed using the delta method) were 0.343 (0.279–0.408) for the SLR, 0.336 (0.261–0.410) for LR, 0.319 (0.259–0.380) for SFr, and 0.279 (0.232–0.326) for IG models. Note that all the models underestimated the referred probability in comparison with the SLR model.
Using the results from Section 3.1, the moment estimates were computed and were as follows: θ ^ M = 12 , 103.743 and α ^ M = 2.509 . These were used as the initial values for computing the ML estimates.

5.2. Application 2

In the second application, we included a comparison of the SLR model with the Slash Half-Normal (SHN) distribution (see Olmos et al. [15]), the PDF of which is given by:
f ( z ; θ , α ) = α 2 α π θ α Γ α + 1 2 z ( α + 1 ) G z 2 , α + 1 2 , 1 2 θ 2 z , θ , α > 0 .
where G is the CDF of the gamma distribution. The data set is available in the R-4.3.1 software “survival” package [33], labeled as “gbsg”, and contains data from a clinical trial performed between 1984 and 1989 by the German Breast Cancer Study Group (GBSG) on 686 breast cancer patients with positive ganglia. In this study, the variable of interest was the number of positive lymph ganglia in each patient (see Shumacher et al. [35] for a more detailed description). Table 7 shows the descriptive statistics of the data, where b 1 and b 2 are, respectively, the coefficients of asymmetry and kurtosis of the sample. Table 8 shows the MLEs for the four models and their SEs (standard errors), as well as their AIC and BIC values, and the results for the Vuong test. Again, the SLR model exhibited the lowest AIC and BIC values, and the results of the Vuong test suggest that the three competing models produced different PDFs compared to that of the SLR distribution. Figure 4 shows a histogram of the breast cancer data fitted to the SLR model and the empirical CDF. The graphs confirm the results in Table 8. Therefore, the SLR distribution provides a better fit for the breast cancer data compared to other distributions in the literature.
Using the results from Section 3.1, the moment estimates were computed and were as follows: θ ^ M = 25.878 and α ^ M = 2.739 . These were used as the initial values for computing the ML estimates. Again, to illustrate the differences in the decisions obtained using these models, we show the estimated probability of finding more than 20 positive lymph ganglia in patients of this kind. The estimations and their 95% confidence intervals were 0.045 (0.033–0.057) for SLR, 0.043 (0.031–0.057) for LR, 0.029 (0.018–0.041) for SHN, and 0.014 (0.009–0.020) for W models. In all the cases, the models underestimated the probability in comparison with the SLR model.

6. Final Discussion

This paper presents a study of the SLR distribution with two parameters. Some properties are shown, and the performance is compared to other known distributions with two parameters by fitting two data sets using ML estimations. The SLR distribution is a viable alternative for fitting data with positive asymmetry and atypical observations. Some other characteristics of the SLR distribution are:
  • The SLR distribution has a closed expression and depends on the incomplete beta function.
  • The SLR distribution has a heavy right tail.
  • The SLR distribution can also be represented as a mixed scale between the LR and beta distributions.
  • The CDF, hazard function, moments, and incomplete moments are explicit and are represented by known functions.
  • The asymmetry and kurtosis coefficients of the SLR distribution have greater ranges than the coefficients of the LR distribution.
  • The applications show that the SLR distribution is a good alternative when the data present positive asymmetry with a heavy right tail; this is confirmed by the AIC and BIC model selection criteria.

Author Contributions

Conceptualization, K.I.S., I.E.C. and H.W.G.; methodology, D.I.G. and H.W.G.; software, I.E.C.; formal analysis, K.I.S., H.W.G. and O.V.; investigation, K.I.S. and I.E.C.; writing—original draft preparation, K.I.S. and D.I.G.; writing—review and editing, O.V. and D.I.G.; funding acquisition, O.V. and H.W.G. All authors have read and agreed to the published version of the manuscript.

Funding

The research of H.W. Gómez was supported by Semillero UA-2023.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available in Applications 1 and 2 in the R-4.3.1 software “survival” package [33].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley: New York, NY, USA, 1995; Volume 1. [Google Scholar]
  2. Cordeiro, G.M.; Cristino, C.T.; Hashimoto, E.M.; Ortega, E.M. The beta generalized Rayleigh distribution with applications to lifetime data. Stat. Pap. 2013, 54, 133–161. [Google Scholar] [CrossRef]
  3. Olmos, N.M.; Gómez-Déniz, E.; Venegas, O. The Heavy-Tailed Gleser Model: Properties, Estimation, and Applications. Mathematics 2022, 10, 4577. [Google Scholar] [CrossRef]
  4. Zhao, J.; Ahmad, Z.; Mahmoudi, E.; Hafez, E.H.; El-Din, M.M.M. A New Class of Heavy-Tailed Distributions: Modeling and Simulating Actuarial Measures. Complexity 2021, 2021, 5580228. [Google Scholar] [CrossRef]
  5. Riad, F.H.; Radwan, A.; Almetwally, E.M.; Elgarhy, M. A new heavy tailed distribution with actuarial measures. J. Radiat. Res. Appl. Sci. 2023, 16, 100562. [Google Scholar] [CrossRef]
  6. Afify, A.Z.; Pescim, R.R.; Cordeiro, G.M.; Mahran, H.A. A New Heavy-Tailed Exponential Distribution: Inference, Regression Model and Applications. Pak. J. Stat. Oper. Res. 2023, 19, 395–411. [Google Scholar] [CrossRef]
  7. Cococcioni, M.; Fiorini, F.; Pagano, M. Modelling Heavy Tailed Phenomena Using a LogNormal Distribution Having a Numerically Verifiable Infinite Variance. Mathematics 2023, 11, 1758. [Google Scholar] [CrossRef]
  8. Xu, L.; Yao, Q.; Zhang, H. Non-Asymptotic Guarantees for Robust Statistical Learning under Infinite Variance Assumption. J. Mach. Learn. Res. 2023, 24, 1–46. [Google Scholar]
  9. Venegas, O.; Iriarte, Y.A.; Astorga, J.M.; Gómez, H.W. Lomax-Rayleigh Distribution with an Application. Appl. Math. Inf. Sci. 2019, 13, 741–748. [Google Scholar] [CrossRef]
  10. Rogers, W.H.; Tukey, J.W. Understanding some long-tailed symmetrical distributions. Stat. Neerl. 2019, 26, 211–226. [Google Scholar] [CrossRef]
  11. Mosteller, F.; Tukey, J.W. Data Analysis and Regression. A Second Course in Statistics; Addison-Wesley: Reading, MA, USA, 1977. [Google Scholar]
  12. Kafadar, K. A biweight approach to the one-sample problem. J. Am. Stat. Assoc. 1982, 77, 416–424. [Google Scholar] [CrossRef]
  13. Wang, J.; Genton, M. The multivariate skew-slash distribution. J. Stat. Plan. Inference 2006, 136, 209–220. [Google Scholar] [CrossRef]
  14. Gómez, H.W.; Quintana, F.A.; Torres, F.J. A new family of slash-distributions with elliptical contours. Stat. Probab. Lett. 2007, 77, 717–725. [Google Scholar] [CrossRef]
  15. Olmos, N.M.; Varela, H.; Gómez, H.W.; Bolfarine, H. An extension of the half-normal distribution. Stat. Pap. 2012, 53, 875–886. [Google Scholar] [CrossRef]
  16. Olmos, N.M.; Varela, H.; Bolfarine, H.; Gómez, H.W. An extension of the generalized half-normal distribution. Stat. Pap. 2014, 55, 967–981. [Google Scholar] [CrossRef]
  17. Acitas, S.; Arslan, T.; Senoglu, B. Slash Maxwell Distribution: Definition, Modified Maximum Likelihood Estimation and Applications. Gazi Univ. J. Sci. 2020, 33, 249–263. [Google Scholar] [CrossRef]
  18. Gómez, H.J.; Gallardo, D.I.; Santoro, K.I. Slash Truncation Positive Normal Distribution and its Estimation Based on the EM Algorithm. Symmetry 2021, 13, 2164. [Google Scholar] [CrossRef]
  19. Barrios, L.; Gómez, Y.M.; Venegas, O.; Barranco-Chamorro, I.; Gómez, H.W. The Slashed Power Half-Normal Distribution with Applications. Mathematics 2022, 10, 1528. [Google Scholar] [CrossRef]
  20. Arendarczyk, M.; Kozubowski, T.J.; Panorska, A.K. Slash distributions, generalized convolutions, and extremes. Ann. Ins. Stat. Math. 2023, 74, 593–617. [Google Scholar] [CrossRef]
  21. Rolski, T.; Schmidli, H.; Schmidt, V.; Teugel, J. Stochastic Processes for Insurance and Finance; John Wiley & Sons: Hoboken, NJ, USA, 1999. [Google Scholar]
  22. Lorenz, M.O. Methods of measuring the concentration of wealth. J. Am. Stat. Assoc. 1905, 9, 209–219. [Google Scholar]
  23. Gini, C. On the measurement of concentration and variability of characters. Metron 2005, 63, 1–38. [Google Scholar]
  24. Gini, C. Measurement of inequality of incomes. Econ. J. 1921, 31, 124–126. [Google Scholar] [CrossRef]
  25. Balakrishnan, N.; Cohen, C.A. Order Statistics and Inference: Estimation Methods; Statistical Modeling and Decision Science; Elsevier Science: Amsterdam, The Netherlands, 1991. [Google Scholar]
  26. Casella, G.; Berger, R.L. Statistical Inference; Duxbury: Pacific Grove, CA, USA, 2002. [Google Scholar]
  27. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 12 October 2023).
  28. MacDonald, I.L. Does Newton-Raphson really fail? Stat. Methods Med. Res. 2014, 23, 308–311. [Google Scholar] [CrossRef] [PubMed]
  29. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  30. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  31. Vuong, Q.H. Likelihood Ratio Tests for Model Selection and non-nested Hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef]
  32. Kalbfleisch, J.D.; Prentice, R.L. The Statistical Analysis of Failure Time Data; John Wiley and Sons: New York, NY, USA, 1980. [Google Scholar]
  33. Therneau, T. A Package for Survival Analysis in R; R Package Version 3.5-7; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://cran.r-project.org/package=survival (accessed on 12 March 2023).
  34. Castillo, J.S.; Rojas, M.A.; Reyes, J. A More Flexible Extension of the Fréchet Distribution Based on the Incomplete Gamma Function and Applications. Symmetry 2023, 15, 1608. [Google Scholar] [CrossRef]
  35. Schumacher, M.; Bastert, G.; Bojar, H.; Hübner, K.; Olschewski, M.; Sauerbrei, W.; Schmoor, C.; Beyerle, C.; Neumann, R.L.; Rauschecker, H.F. Randomized 2×2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. J. Clin. Oncol. 1994, 12, 2086–2093. [Google Scholar] [CrossRef]
Figure 1. (a) PDF, (b) CDF, and (c) hazard function of the S L R ( θ , α ) distribution with different combinations of θ and α .
Figure 1. (a) PDF, (b) CDF, and (c) hazard function of the S L R ( θ , α ) distribution with different combinations of θ and α .
Mathematics 11 04626 g001
Figure 2. (a) Skewness coefficient and (b) kurtosis coefficient for the SLR( θ , α ) model.
Figure 2. (a) Skewness coefficient and (b) kurtosis coefficient for the SLR( θ , α ) model.
Mathematics 11 04626 g002
Figure 3. SLR model adjusted using the ML method for the veteran data.
Figure 3. SLR model adjusted using the ML method for the veteran data.
Mathematics 11 04626 g003
Figure 4. SLR model fitted by the ML method for gbsg data.
Figure 4. SLR model fitted by the ML method for gbsg data.
Mathematics 11 04626 g004
Table 1. Some skewness and kurtosis values for different values of the parameter α .
Table 1. Some skewness and kurtosis values for different values of the parameter α .
α 5 5.5 671020
β 1 2.1421.8261.6131.3491.0180.777
β 2 19.02912.7279.7637.0654.7963.718
Table 2. Numerical values of the mode for θ = 1 associated with the SLR distribution.
Table 2. Numerical values of the mode for θ = 1 associated with the SLR distribution.
α Mode
10.824
30.462
50.348
70.288
90.251
120.215
Table 3. Estimated bias, SE, and RMSE of the MLEs for the SLR model in finite samples.
Table 3. Estimated bias, SE, and RMSE of the MLEs for the SLR model in finite samples.
True Value n = 100 n = 200 n = 500
θ α Estim.BiasSERMSECPBiasSERMSECPBiasSERMSECP
11 θ ^ 0.14370.45630.52920.89700.04180.29290.31740.91000.01750.17300.19240.9240
σ ^ 0.02700.12870.13800.92100.00830.08830.09020.92500.00080.05340.05660.9350
2 θ ^ 0.17390.49100.62880.93400.07930.31300.34830.94500.02780.18580.19680.9470
σ ^ 0.11360.38580.44280.95000.05870.25760.28320.95200.01810.15610.16220.9490
3 θ ^ 0.21520.59450.73540.93000.10680.35320.39800.94800.03990.20460.20870.9590
σ ^ 0.27030.83851.00740.96200.13570.51580.56140.95600.05300.30610.31590.9550
101 θ ^ 1.09284.61474.98350.92000.62093.09873.15150.94300.32351.90471.98430.9490
σ ^ 0.02150.13160.13610.95600.00910.09040.09080.94600.00750.05690.05700.9480
2 θ ^ 1.80624.95306.22050.94200.67753.09513.34420.95500.30481.87841.96400.9510
σ ^ 0.12460.39040.46930.96100.05080.25590.27480.95400.02000.15710.16300.9480
3 θ ^ 2.30986.00227.10930.93801.00983.50863.62590.95600.37982.04442.10950.9540
σ ^ 0.31100.85371.02660.95700.12850.51350.52950.95500.04680.30540.31110.9520
Table 4. Percentage of cases where the AIC and BIC selected the SLR model over the indicated model. The Vuong test corresponds to the percentage of instances where the null hypothesis that the log-likelihood of the SLR model was equal to that of the indicated model was rejected.
Table 4. Percentage of cases where the AIC and BIC selected the SLR model over the indicated model. The Vuong test corresponds to the percentage of instances where the null hypothesis that the log-likelihood of the SLR model was equal to that of the indicated model was rejected.
n = 100 n = 200 n = 500
θ α LRWIGSHNLRWIGSHNLRWIGSHN
11AIC0.641.000.840.920.641.000.950.970.651.001.001.00
BIC0.641.000.840.920.641.000.950.970.651.001.001.00
Vuong0.330.890.200.450.351.000.430.590.391.000.830.86
2AIC0.470.980.840.970.481.000.941.000.501.001.001.00
BIC0.470.980.840.970.481.000.941.000.501.001.001.00
Vuong0.380.590.180.600.410.840.300.780.451.000.690.98
3AIC0.440.910.860.980.490.980.971.000.491.001.001.00
BIC0.440.910.860.980.490.980.971.000.491.001.001.00
Vuong0.410.320.170.640.440.550.300.880.470.890.730.99
101AIC0.621.000.830.940.611.000.960.970.641.001.001.00
BIC0.621.000.830.940.611.000.960.970.641.001.001.00
Vuong0.310.910.210.480.320.990.470.600.371.000.840.89
2AIC0.480.970.830.970.521.000.941.000.511.001.001.00
BIC0.480.970.830.970.521.000.941.000.511.001.001.00
Vuong0.350.600.190.580.370.850.320.800.381.000.710.99
3AIC0.410.920.850.980.450.980.961.000.471.001.001.00
BIC0.410.920.850.980.450.980.961.000.471.001.001.00
Vuong0.390.300.180.640.420.530.340.850.460.900.721.00
Table 5. Descriptive statistics for the data set of patients with lung cancer.
Table 5. Descriptive statistics for the data set of patients with lung cancer.
n X ¯ S 2 b 1 b 2
137 121.628 24 , 906.12 3.092 15.554
Table 6. Estimated parameters and their standard errors (in parentheses) for the SLR, LR, SFr, and IG models for the lung cancer data set. The AIC and BIC values are also presented.
Table 6. Estimated parameters and their standard errors (in parentheses) for the SLR, LR, SFr, and IG models for the lung cancer data set. The AIC and BIC values are also presented.
ParameterSLRLRSFrIG
θ ^ 630.939 (223.857)1043.268 (855.508)0.772 (0.110)24.443 (2.953)
α ^ 1.029 (0.112)0.497 (0.142)0.339 (0.037)121.829 (23.277)
Log-likelihood−802.71−804.49−872.47−816.67
AIC1609.431612.991748.951637.35
BIC1615.271618.831754.791643.19
Vuong test (p-value)−2.60 (0.009)−13.81 (<0.001)−2.05 (0.040)
Table 7. Descriptive statistics for the breast cancer patients data set.
Table 7. Descriptive statistics for the breast cancer patients data set.
n X ¯ S 2 b 1 b 2
686 5.0102 29.981 2.878 16.208
Table 8. Estimated parameters and their standard errors (in parentheses) for the SLR, LR, SHN, and W models for the breast cancer data set. The AIC and BIC are also presented.
Table 8. Estimated parameters and their standard errors (in parentheses) for the SLR, LR, SHN, and W models for the breast cancer data set. The AIC and BIC are also presented.
ParameterSLRLRSHNW
θ ^ 4.715 (0.777)5.584 (0.789)3.250 (0.238)5.156 (0.196)
α ^ 1.493 (0.089)0.730 (0.052)1.926 (0.203)1.067 (0.029)
Log-likelihood−1748.00−1749.81−1790.61−1788.88
AIC3500.003503.613585.223581.76
BIC3509.073512.683594.283590.82
Vuong test (p-value)−4.18 (<0.001)−6.88 (<0.001)−3.78 (<0.001)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Santoro, K.I.; Gallardo, D.I.; Venegas, O.; Cortés, I.E.; Gómez, H.W. A Heavy-Tailed Distribution Based on the Lomax–Rayleigh Distribution with Applications to Medical Data. Mathematics 2023, 11, 4626. https://doi.org/10.3390/math11224626

AMA Style

Santoro KI, Gallardo DI, Venegas O, Cortés IE, Gómez HW. A Heavy-Tailed Distribution Based on the Lomax–Rayleigh Distribution with Applications to Medical Data. Mathematics. 2023; 11(22):4626. https://doi.org/10.3390/math11224626

Chicago/Turabian Style

Santoro, Karol I., Diego I. Gallardo, Osvaldo Venegas, Isaac E. Cortés, and Héctor W. Gómez. 2023. "A Heavy-Tailed Distribution Based on the Lomax–Rayleigh Distribution with Applications to Medical Data" Mathematics 11, no. 22: 4626. https://doi.org/10.3390/math11224626

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop