Next Article in Journal
Numerical Analysis of Fractional-Order Camassa–Holm and Degasperis–Procesi Models
Next Article in Special Issue
A Bivariate Extension to Exponentiated Inverse Flexible Weibull Distribution: Shock Model, Features, and Inference to Model Asymmetric Data
Previous Article in Journal
On the Padovan Codes and the Padovan Cubes
Previous Article in Special Issue
Symmetric and Asymmetric Distributions: Theoretical Developments and Applications III
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New More Flexible Class of Distributions on (0,1): Properties and Applications to Univariate Data and Quantile Regression

Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1270300, Chile
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(2), 267; https://doi.org/10.3390/sym15020267
Submission received: 26 December 2022 / Revised: 14 January 2023 / Accepted: 16 January 2023 / Published: 18 January 2023

Abstract

:
In this paper, we will present a new, more flexible class of distributions with a domain in the interval (0,1), which presents heavier tails than other distributions in the same domain, such as the B e t a , Kumaraswamy, and Weibull Unitary distributions. This new distribution is obtained as a transformation of two independent random variables with a Weibull distribution, which we will call the Generalized Unitary Weibull distribution. Considering a particular case, we will obtain an alternative to the B e t a , Kumaraswamy, and Weibull Unitary distributions. We will call this new distribution of two parameters the type 2 unitary Weibull distribution. The probability density function, cumulative probability distribution, survival function, hazard rate, and some important properties that will allow us to infer are provided. We will carry out a simulation study using the maximum likelihood method and we will analyze the behavior of the parameter estimates. By way of illustration, real data will be used to show the flexibility of the new distribution by comparing it with other distributions that are known in the literature. Finally, we will show a quantile regression application, where it is shown how the proposed distribution fits better than other competing distributions for this type of application.

1. Introduction

There are various probability distributions with support on (0,1). One of the most used is the B e t a distribution, which is a family of continuous probability distributions defined in the interval (0,1) with two shape parameters, both positive, normally denoted by α and β .
In Bayesian inference, the B e t a distribution is generally used as the conjugate prior to probability distribution for the Bernoulli, binomial, negative binomial, and geometric distributions. For example, the B e t a distribution can be used in Bayesian analysis to describe any initial knowledge about the probability of success. In addition, it is a density that is usually used to model the data associated with percentages and proportions.
The usual formulation of the B e t a distribution is also known as the type I B e t a distribution, whose density function is provided by:
f X ( x ) = Γ ( α + β ) Γ ( α ) Γ ( β ) x α 1 ( 1 x ) β 1 ,
where α , β > 0 are shape parameters, with 0 < x < 1 . We denote this by writing X B e t a ( α , β ) .
A distribution similar to B e t a is the Kumaraswamy distribution [1], but it is simpler in the sense that simulations can be obtained from the inverse of the cumulative distribution, since it has a closed expression, alike the quantiles. Its density is defined by:
f X ( x ) = α β x α 1 ( 1 x α ) β 1 ,
where α , β > 0 are the shape parameters, with 0 < x < 1 . We denote this by writing X K W ( α , β ) .
Mazucheli et al. [2] show the Unitary Weibull distribution where they present some inferential procedures. Mazucheli et al. [3] present a unitary version of the Weibull distribution as an alternative to the K W distribution to model quantiles conditional on covariates. The stochastic representation of a Unitary Weibull distribution is provided by V = e X , with X W e i b u l l ( α , β ) , denoted by V U W ( α , β ) , which has a density function provided by:
f V ( v ) = 1 v α β [ l o g ( v ) ] β 1 e x p α [ l o g ( v ) ] β , 0 < v < 1 .
In this paper, the Generalized Unitary Weibull distribution of a random variable Y is presented based on a transformation of two independent random variables with distributions W e i b u l l ( θ 1 , α ) and W e i b u l l ( θ 2 , β ) , denoted by Y G U W ( θ 1 , θ 2 , α , β ) . In particular, we will study the case for θ 1 = θ 2 = θ and α = 1 that we will call Weibull Unitary distribution type 2, denoted by U W 2 ( θ , β ) , where θ , β > 0 .
The article is organized in the following manner. In Section 2, we provide the stochastic representation, the pdf of a random variable with G U W distribution, and present some properties and the distribution U W 2 as a particular case. The cumulative distribution function (cdf), quantiles, reliability functions, and hazard rate, moments, skewness coefficients, and kurtosis are also provided. Some statistical properties are provided. The Canonical Unitary Weibull distribution and its properties are presented. In Section 3, an inference is made through a simulation study of the parameter estimates using the maximum likelihood method. In addition, the B e t a , K W , U W , and U W 2 distributions are fitted to real data sets in Section 4. In Section 5, a discussion and the main conclusions are presented.

2. The Generalized Unitary Weibull Family of Distribution

A random variable Y has a Generalized Unitary Weibull distribution, of parameters θ 1 , θ 2 , α , and β > 0 , denoted by Y G U W ( θ 1 , θ 2 , α , β ) , if its stochastic representation is provided by:
Y = X 1 X 1 + X 2 ,
where X 1 W e i b u l l ( θ 1 , α ) and X 2 W e i b u l l ( θ 2 , β ) , X 1 , and X 2 are independent random variables. Its density function is presented below.

2.1. Density Function

Proposition 1.
Let Y G U W ( θ 1 , θ 2 , α , β ) then the density function of Y is:
f Y ( y ) = θ 1 θ 2 y θ 1 1 β θ 2 α θ 1 ( 1 y ) θ 1 + 1 0 w θ 1 + θ 2 1 e x p y w α ( 1 y ) θ 1 + w β θ 2 d w
where 0 < y < 1 , θ 1 , θ 2 , α, and β > 0 .
Proof. 
Using the stochastic representation provided in (4), we have that:
X 1 W e i b u l l ( θ 1 , α ) f X 1 ( x ) = θ 1 α x α θ 1 1 e x p x α θ 1 , x > 0 ,
X 2 W e i b u l l ( θ 2 , β ) f X 2 ( x ) = θ 2 β x β θ 2 1 e x p x β θ 2 , x > 0 ,
are independent random variables and, using the Jacobian of the transformation, it follows that:
y = x 1 x 1 + x 2 w = x 2 x 1 = y w 1 y x 2 = w J = x 1 y x 1 w x 2 y x 2 w = w ( 1 y ) 2 y 1 y 0 1 = w ( 1 y ) 2 .
Hence,
f Y , W ( y , w ) = | J | f X 1 , X 2 y w 1 y , w = w ( 1 y ) 2 f X 1 y w 1 y f X 2 ( w ) = θ 1 θ 2 y θ 1 1 w θ 1 + θ 2 1 β θ 2 α θ 1 ( 1 y ) θ 1 + 1 e x p y w α ( 1 y ) θ 1 + w β θ 2 , 0 < y < 1 , w > 0 .
Therefore,
f Y ( y ) = θ 1 θ 2 y θ 1 1 β θ 2 α θ 1 ( 1 y ) θ 1 + 1 0 w θ 1 + θ 2 1 e x p y w α ( 1 y ) θ 1 + w β θ 2 d w ,
where 0 < y < 1 . □
Now, we provide some elementary properties.
Proposition 2.
Let Y G U W ( θ 1 , θ 2 , α , β ) then:
1.
If θ 1 = θ 2 = α = β = 1 then Y U ( 0 , 1 ) , where U denotes the uniform distribution in (0,1).
2.
If θ 1 = θ 2 = θ , and α = β = 1 then f Y is symmetric.
3.
If θ 1 = θ 2 = α = 1 then f Y ( y ) = β [ 1 + ( β 1 ) y ] 2 .
Proof. 
Let Y G U W ( θ 1 , θ 2 , α , β ) , whose density is represented in proposition 1.
1.
The result is obtained by replacing θ 1 = θ 2 = α = β = 1 in the distribution of Y then Y U ( 0 , 1 ) .
2.
If θ 1 = θ 2 = θ and α = β = 1 then:
f Y ( y ) = θ y θ 1 ( 1 y ) θ 1 [ y θ + ( 1 y ) θ ] 2 , 0 < y < 1 .
Then f Y ( y ) = f Y ( 1 y ) .
3.
The result follows from plugging θ 1 = θ 2 = α = 1 into the distribution of Y.

2.2. Density Function of the Unitary Weibull Distribution Type 2

Definition 1.
Setting θ 1 = θ 2 = θ and α = 1 in (5), the density function of Y is provided for:
f Y ( y ) = θ β θ y θ 1 ( 1 y ) θ 1 [ ( β y ) θ + [ ( 1 y ) θ ] 2 , 0 < y < 1 ,
which we will call Weibull Unitary distribution type 2, denoted by Y U W 2 ( θ , β ) .
Figure 1 below show each pdf of the U W 2 distribution compared to the B e t a distribution. It shows that, for certain values of the parameters, respectively, the distributions are very similar and in others there is quite a difference.
Figure 2 shows the pdfs of the U W 2 distribution for β = 2 and different values of θ .
Proposition 3.
Let Y U W 2 ( θ , β ) . Then, the cdf of Y is provided by:
F Y ( t ) = 1 + 1 t β t θ 1 , 0 < t < 1 .
Proof. 
F Y ( t ) = 0 t f Y ( y ) d y = 0 t θ β θ y θ 1 ( 1 y ) θ 1 [ ( β y ) θ + ( 1 y ) θ ] 2 d y = θ β 0 t ( β y ) θ 1 ( 1 y ) θ 1 ( β y ) 2 θ 1 + 1 y β y θ 2 d y = θ β 0 t 1 y β y θ 1 ( β y ) 2 1 + 1 y β y θ 2 d y .
Performing the change of variable u = 1 y β y y expanding the integral, we obtain the result. □
Corollary 1.
Let Y U W 2 ( θ , β ) , then the quantile function of Y is provided by:
t = 1 + β 1 p 1 1 θ 1 , 0 < p < 1 .
Proof. 
Solving t from p = F Y ( t ) provides the result. □
In Figure 3, we graphically illustrate the behavior of the Cumulative distribution function of the U W 2 distribution for different values of θ and β = 2 .

2.3. The Reliability, Hazard Rate Functions and Increasing Failure Rate

Two important measures of reliability are the reliability function and hazard (failure) rate function. The reliability function of a random variable Y is defined by S Y ( t ) = 1 F Y ( t ) , where F Y denotes the cdf of Y. The risk rate function is defined by h Y ( t ) = f Y ( t ) / ( 1 F Y ( t ) ) . For the distribution U W 2 , as a direct consequence of Proposition 3, both reliability measures can be expressed in closed form. The corresponding expression is obtained in the following Proposition simple form.
In Table 1, it can be seen that the U W 2 distribution better captures the values’ outliers compared to the U W , K W , and β distributions, since the reliability is higher.
Proposition 4.
Let Y U W 2 ( θ , β ) . Then, the hazard rate funtion of Y is provided by:
h ( t ) = θ β θ t θ 1 ( 1 t ) [ ( β t ) θ + ( 1 t ) θ ] .
Proof. 
h Y ( t ) = f Y ( t ) / ( 1 F Y ( t ) ) .
Replacing f Y ( t ) and F Y ( t ) provides the result. □
Figure 4 shows the hazard rate function of the U W 2 distribution for different values of θ and β = 2 . Looking at the graphical representation, it is clear that it presents a wide variety of forms. Therefore, the new family of distributions is flexible enough to model real data sets.
Next, we present the Increasing Failure Rate, which is defined as the derivative of the failure rate function provided in (15).
Proposition 5.
Let Y have distribution U W 2 ( θ , β ) . Then for any θ and β > 0 the random variable Y has Increasing Failure Rate (IFR).
Proof. 
The first derivative of h provided in (15) can be written as follows
h ( t ) = β θ θ t θ 2 θ 1 t θ + 2 t 1 t θ β t θ + 2 t β t θ 1 t θ β t θ + 1 t θ 2 1 t 2 .
It is clear that h ( t ; θ , β ) > 0 since t > 0 , β > 0 and θ > 0 , which implies the result. □

2.4. Moments

The following statement shows the moments for the U W 2 distribution. Essentially, these moments are expressed as a numerical integral (the problem of obtaining a closed analytic expression remains open).
Definition 2.
Let Y U W 2 ( θ , β ) . Hence, for r = 1 , 2 , 3 , we define:
μ r ( θ , β ) = E Y r ; θ , β = θ β θ 0 1 y r + θ 1 ( 1 y ) θ 1 [ ( β y ) θ + ( 1 y ) θ ] 2 d y .
Proposition 6.
Let Y U W 2 ( θ , β ) then:
E ( Y r ; θ , β ) = E ( 1 Y ) r ; θ , 1 β .
Proof. 
E Y r ; θ , β = θ β θ 0 1 y θ + r 1 ( 1 y ) θ 1 [ ( β y ) θ + ( 1 y ) θ ] 2 d y = θ 1 β θ 0 1 ( 1 y ) θ + r 1 y θ 1 [ y θ + ( ( 1 y ) β ) θ ] 2 d y = E ( 1 Y ) r ; θ , 1 β
In particular, for r = 1 we have:
μ 1 ( θ , β ) = 1 μ 1 θ , 1 β .
Remark 1.
This Proposition allows us to reaffirm that, for β = 1 and any value of the parameter θ, the density U W 2 is symmetric (case r = 1 ).
Remark 2.
From definition 2, the skewness and kurtosis coefficients can be obtained through:
β 1 = μ 3 3 μ 2 μ 1 + 2 μ 1 3 ( μ 2 μ 1 2 ) 3 / 2
and
β 2 = μ 4 4 μ 1 μ 3 + 6 μ 1 2 μ 2 3 μ 1 4 ( μ 2 μ 1 2 ) 2 ,
respectively, which do not present a closed expression, so they must be obtained using numerical methods.
Corollary 2.
Let Y U W 2 ( θ , β ) , then:
β 1 ( θ , β ) = β 1 θ , 1 β
β 2 ( θ , β ) = β 2 θ , 1 β .
Proof. 
Using Proposition 6 for r = 1 , 2 , 3 , 4 and substituting in (21) and (22), respectively, the required result is obtained. □
Figure 5 and Table 2 graphically and numerically show the behavior of the asymmetry and kurtosis coefficients of the U W 2 distribution and are consistent with what is represented in corollary 2. That is, the value of the asymmetry coefficient, given a value of the parameter θ , is the same for β as for 1 / β , but with the opposite sign. For example: β 1 ( 5 , 1 / 2 ) = 0.4592 and β 1 ( 5 , 2 ) = 0.4592 . Similarly, the value of the kurtosis coefficient, given a value of the parameter θ , is the same for β as it is for 1 / β . For example: β 2 ( 5 , 1 / 2 ) = β 2 ( 5 , 2 ) = 4.6954 .

2.5. Some Statistical Properties

2.5.1. Entropy of U W 2

The entropy H ( Θ ) can be obtained using the density function of Y; specifically, the following form expression is obtained:
H ( Θ ) = 0 1 f Y ( y , Θ ) l n ( f Y ( y , Θ ) ) d y .
If Y be a random variable with UW 2 ( Y ; θ , β ) distribution. So, the entropy of Y is provided by:
H ( θ , β ) = 0 1 θ β θ y θ ( 1 y ) θ 1 β y θ + 1 y θ 2 ln θ β θ y θ ( 1 y ) θ 1 β y θ + 1 y θ 2 d y .
Table 3 shows the entropy values of the U W 2 distribution for different values of the parameters θ and β .

2.5.2. Mean Residual Life

An important reliability quantity for positive random variables is the mean residual life, which is defined as μ ( t ; θ , β ) = 1 1 F Y ( t ) t ( 1 F Y ( y ) ) d y , t > 0 .
For the case that Y U W 2 ( θ , β ) , then the mean residual life of Y is obtained by replacing:
F Y ( t ) = 1 + 1 y β y θ 1 , t > 0 .

2.5.3. Incomplete Moments

The r-th incomplete moment of Y f ( y ; Θ ) is defined as:
m r ( y ; Θ ) = 0 y t r f ( t ; Θ ) d t .
If Y U W 2 ( θ , β ) , then the r-th incomplete moment of Y is provided by:
m r ( y ; θ , β ) = 0 y t θ + r 1 ( 1 t ) θ 1 [ ( β t ) θ + ( 1 t ) θ ] 2 d t , 0 < y < 1 .
An interesting application of the first incomplete moment is that the mean deviation about the mean μ of Y can be directly obtained, specifically by means of the relation (see [4]):
E | Y μ | = 2 μ F ( μ ; θ , β ) 2 m 1 ( μ ; θ , β ) ,
where μ = E [ Y ] .

2.5.4. Lorenz Curve and the Gini Index

The Lorentz curve and Gini coefficient are tools used in the field of economics to measure income inequality in a society.
The Lorenz curve (see [5]), L ( x ; Θ ) , can also be obtained from the quantile function of Y; specifically, the following closed-form expression is obtained:
L p , Θ = 1 μ 1 0 p F 1 ( y ) d y , 0 < p < 1 ,
where μ 1 = E ( Y ) .
If Y U W 2 ( θ , β ) . Next, the Lorenz curve is provided by:
L p , θ , β = 1 μ 1 0 p 1 + β 1 y 1 1 θ 1 d y , 0 < p < 1 ,
where μ 1 = 0 1 θ β θ y θ ( 1 y ) θ 1 β y θ + 1 y θ 2 d y .
The Gini index (see [5]) is the measure of inequality associated with the Lorenz curve. For the random variable X, the Gini index is defined by:
G ( α , θ ) = 1 1 μ 0 ( 1 F ( y ; α , θ ) ) 2 d y .
In the next result, an analytical expression is provided for G ( α , θ ) .
Proposition 7.
Let Y U W 2 ( θ , β ) , then the Gini index of Y is provided by:
G ( α , θ ) = 1 1 μ 0 1 1 + 1 y α y θ 1 d y .
Figure 6 shows the Lorenz curve using the U W 2 distribution for different values of the parameters θ and β .
It can be observed that, as θ increases, inequality with the Gini index decreases, and, as β increases, inequality with the Gini index increases.

2.6. Canonical Type 2 Unitary Weibull Distribution

Let Y U W 2 ( θ , β ) causing θ = 1 ; then, the distribution of Y is called the canonical type 2 Weibull distribution and we will denote it by Y U W 2 ( 1 , β ) and its density function has the following expression:
f Y ( y ) = β 1 1 β y 2 , 0 < y < 1 .
Its most important properties are:
1.
The c d f of Y is provided by:
F Y ( t ) = β t 1 + β 1 t , 0 < t < 1 .
2.
Quantile function of Y is:
t = p 1 β + β p .
3.
The r-th moment of Y has the following expression:
μ r = E [ Y r ] = 1 r β 2 F 1 1 , r + 1 ; r + 2 ; ( β 1 ) Γ r + 1 , , r = 1 , 2 ,
where 2 F 1 a , b ; c ; z = 1 B ( b , c b ) 0 1 x b 1 ( 1 x ) c b 1 1 + z x a d x .
In particular, for r = 1 , 2 , 3 , 4 we have:
μ 1 = β ln ( β ) + 1 β β 1 2 ; β 1
μ 2 = β 2 2 β ln ( β ) 1 β 1 3 ; β 1
μ 3 = β 3 6 β 2 + 3 β + 6 β ln ( β ) + 2 2 β 1 4 ; β 1
μ 4 = β 4 6 β 3 + 18 β 2 10 β 12 β ln ( β ) 3 3 β 1 5 ; β 1 ,
4.
Kurtosis coefficient is provided by following expression.
β 2 = 6 β 1 3 β 4 6 β 3 + 18 β 2 10 β 12 β ln ( β ) 3 4 β 1 2 β ln ( β ) + 1 β β 3 6 β 2 + 3 β + 6 β ln ( β ) + 2 β 1 5 β 2 2 β ln ( β ) 1 β ln ( β ) + 1 β 2 2 + 6 β 1 β ln ( β ) + 1 β 2 β 2 2 β ln ( β ) 1 3 β ln ( β ) + 1 β 4 β 1 5 β 2 2 β ln ( β ) 1 β ln ( β ) + 1 β 2 2
Figure 7 shows the graphic behavior of the kurtosis for the canonical distribution U W 2 for different values of β .
5.
The Lorenz curve of Y is:
L p , 1 , β = p p β β ( ln ( ( p 1 ) β p ) + β ln ( β ) β ln β β + 1 .
6.
The expression for the Gini index of Y is provided by:
G 1 , β = β 1 + β ln β 2 β 1 β 1 1 β 1 ln β .
Figure 8 shows the Lorenz curve and the Gini index of the canonical U W 2 distribution for different values of β in which the parameter β is directly proportional to the Gini index.
7.
Entropy of Y:
H ( 1 , β ) = 2 β + 1 β 1 ln ( β ) .
Figure 9 shows the graph of the entropy of the canonical U W 2 distribution for different values of β .

3. Inference

In this section, we discuss the statistical inference of the estimators for the model Y U W 2 ( θ , β ) .

3.1. Maximum Likelihood Estimate

We now discuss the maximum likelihood estimate. Given a random sample Y 1 , , Y n of the distribution U W 2 θ , β , the logarithm of the likelihood function can be written as:
l ( θ , β ) = n ln θ + n θ ln β + ( θ 1 ) i = 1 n ln y i + ( θ 1 ) i = 1 n ln ( 1 y i ) 2 i = 1 n ln [ ( β y i ) θ + ( 1 y i ) θ ] .
Therefore, the maximum likelihood equations are provided by:
i = 1 n y i θ 1 ( β y i ) θ + ( 1 y i ) θ = n 2 β θ + 1
2 i = 1 n ( β y i ) θ ln ( β y i ) + ( 1 y i ) θ ln ( 1 y i ) ( β y i ) θ + ( 1 y i ) θ ln y i ( 1 y i ) 2 = n θ + n ln β .
The solutions to the equations can be obtained using numerical procedures such as the Newton–Raphson procedure.

3.2. Simulation Study

We use the Monte Carlo method to generate random numbers from the distribution U W 2 ( θ , β ) .
Table 4 presents a simulation study of 1000 samples of size n = 50 , 100 , and 200 for different values of the parameters θ and β . These random values are obtained from u i U ( 0 , 1 ) , i = 1 , 2 , . . . , n , and substituting in the quantile y i = 1 + β 1 u i 1 1 θ for given θ and β , we obtain the random values of the distribution U W 2 ( θ , β ) . On the other hand, the table shows that when the sample size increases, the parameter estimates converge asymptotically to the parameters. However, the standard deviations and the average length of the confidence intervals decrease as the sample size increases. This allows us to verify the consistency of the parameter estimates. Finally, the values obtained from the empirical coverage are as expected, since it is close to a 95% confidence.

4. Analysis of Real Data

4.1. Example 1: Application to Medical Data

In this example, we compute the MLEs of ( β , α , θ ) to fit the K W , B e t a , U W , and U W 2 models to a real data set. The data can be found in the book on Biostatistics (see [6] Daniel, Pag. 475) and correspond to a study carried out by Slemenda et al. [7], in which he investigates the effects of lateral bone mineral density (LBMD) on spinal osteoarthritis in 66 women aged 34–87 years. Some descriptive statistics are shown in Table 5. Table 6 shows the MLEs for the models: K W , B e t a , U W , and U W 2 . Using the Akaike criterion (AIC) [8], criterion Bayesian (BIC) [9], the Kolmogorov–Smirnov (KS) test, and Chen’s approximate goodness-of-fit test [10] (W*), (A*), we see that model U W 2 best fits the data. The advantage of the U W 2 model is more evident for the data with more extreme observations, see Figure 10 (side right). Figure 11 and Figure 12 show that the U W 2 distribution fits the data better than the U W , B e t a , and K W distributions.
Observing Table 6, we see that the values of AIC and BIC are lower than those of their competitors, thus the statistic K S , A*, and W* indicating the best fit of the distribution U W 2 in comparison with the distributions K W , B e t a , and U W .

4.2. Example 2: An Application to Environment Data

In this section, we compute the MLEs of ( α , β , θ ) to fit the B e t a , K W , U W , and U W 2 models to a real environment data set. The data can be found at https://dga.mop.gob.cl/servicioshidrometeorologicos/Paginas/default.aspx (1 December 2022) servicioshidrometeorologicos/Paginas/default.aspx and they correspond to the fluviometric and meteorological data recorded in monitoring stations from Arica to Tierra del Fuego. In addition, you will have access to various official statistical reports on hydrometeorological variables and water quality, obtained from our National Hydrometric Network; the analyzed data are the percentage of dissolved oxygen in a lake. Some descriptive statistics are shown in Table 7. Table 8 shows the MLEs for the models: B e t a , K W , U W , and U W 2 . From the Akaike criteria (AIC), (BIC), we see that the U W 2 model best fits the data. Figure 13 shows that the U W 2 model fits the data better than U W , B e t a , and K W models.
The QQ plots of the data with the U W 2 distribution compared to the B e t a , K W , and U W 2 distributions adjusted with the maximum likelihood estimators of their parameters are shown in Figure 14.

4.3. Example 3: An Application to Quantile Regression

4.3.1. One-Dimensional Quantile Regression

Translating this concept of quantile to the regression line, we obtain the linear quantile regression (see [11]). If we assume that:
Y i = α 0 , τ + α 1 , τ X i + ϵ i , τ , i ϵ ( 1 , . . . , n ) ,
with τ ϵ ( 0 , 1 ) and that the conditional expected value is not necessarily zero, but the τ -th quantile of the error with respect to the return variable is zero ( Q τ ( ϵ i , τ | X ) = 0 ) , so the τ -th quantile of Y i with respect to X can be written as:
Q τ ( Y i | X ) = α 0 , τ + α 1 , τ X i .
The estimators of α 0 , τ and α 1 , τ are obtained by:
α ^ τ = arg min α τ ϵ 2 Y i A τ | Y i α 0 , τ α 1 , τ X i | + Y i < A ( 1 τ ) | Y i α 0 , τ α 1 , τ X i | ,
being α τ = ( α 0 , τ , α 1 , τ ) and A = α 0 , τ + α 1 , τ X i . To estimate the parameters, the function described in the formula must be minimized.

4.3.2. Quantile Regression Unitary Weibull Type 2

In this case, in the regression equation:
Y i = α 0 , τ + α 1 , τ X i + ϵ i , τ , i ϵ ( 1 , . . . , n ) ,
where the response variable Y U W 2 ( θ , β ) , it is possible to reparameterize it in the distribution. So, one way to obtain the quantile of the function of Y is the following:
Let μ τ = 1 + β 1 τ 1 1 θ 1 , then β = 1 μ τ μ τ τ 1 τ 1 / θ and substituting into the density function of Y, we obtain:
f Y ( y ) = θ 1 μ τ μ τ θ τ 1 τ y θ 1 ( 1 y ) θ 1 1 μ τ μ τ θ τ 1 τ y θ + ( 1 y ) θ 2 , 0 < y < 1 ,
then the cdf of Y is:
F Y ( y ) = 1 + 1 τ τ θ + 1 1 μ τ μ τ 1 .
then Y U W 2 ( μ τ , θ ) , where 0 < μ τ < 1 is the quantile parameter. Considering τ known, μ τ and θ are estimated by the maximum likelihood.

4.3.3. An Application of Quantile Regression to Praters Gas Mileage Data

To illustrate this, we consider Simas et al. [12] investigating Praters gas mileage data based on the same mean equation as above, but now with temperature. Table 9 shows the statistics of these data. Table 10 shows the maximum likelihood estimators as predictor variables of ( α 0 , α 1 ) and their standard errors for the U W 2 , U W , and B e t a distributions.
Looking at Table 11 and Figure 15 and Figure 16, we see that the U W 2 distribution compared to the B e t a and U W distributions fits better using quantile regression when the variable response has high kurtosis.

5. Discussion

In this work, we have introduced a new family of distributions with a domain in the interval (0,1) and with heavier tails than some similar distributions seen in the literature. The new family is based on a transformation of two independent random variables with a two-parameter Weibull distribution. We define the new family by its stochastic representation. We provide its density function and reliability function and also provide some statistical properties of interest. In the inferential part, we estimate the parameters of the new model using the maximum likelihood method and the information criteria are used to select the best model and evaluate the goodness of fit of the new distribution compared to other similar distributions. A Monte Carlo simulation study was carried out to empirically evaluate the statistical performance of the estimators, using the maximum likelihood method for the parameters of the new model. In addition, we show the coverage probabilities and the mean length of the confidence intervals obtained for the corresponding parameters using the asymptotic normality of these estimators. The simulation study reported consistent performance of these estimators. Finally, three illustrations with real data were created, with two related to medical information and the environment. A third application was related to quantile regression. These analyses provided sufficient information to conclude that the proposed model presents better behavior when compared to others from the competition.

Author Contributions

Data curation, J.R.; formal analysis, J.R., M.A.R., P.L.C. and J.A.; investigation, J.R., M.A.R. and P.L.C.; methodology, J.R., M.A.R., P.L.C. and J.A.; writing—original draft, J.R., M.A.R., P.L.C. and J.A.; writing—review and editing, M.A.R., P.L.C. and J.A.; Funding Acquisition, J.R., M.A.R. and J.A. All authors have read and agreed to the published version of the manuscript.

Funding

Research of J.R., M.A.R. and J.A. was supported by the Universidad de Antofagasta through Projecto Semillero UA 2022.

Data Availability Statement

The analyzed data is available at the URL and references, respectively, given in the article.

Acknowledgments

The authors acknowledge helpful of Universidad de Antofagasta for the research of J. Reyes, M. Rojas and J. Arrué was supported by Proyecto Semillero UA 2022.

Conflicts of Interest

No potential conflict of interest was reported by the authors.

References

  1. Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
  2. Mazucheli, J.; Menezes, A.F.B.; Ghitany, M.E. The unit-Weibull distribution and associated inference. J. Appl. Probability Stat. 2018, 13, 1–22. [Google Scholar]
  3. Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; de Oliveira, R.P.; Ghitany, M.E. The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates. J. Appl. Stat. 2019, 47, 954–974. [Google Scholar] [CrossRef]
  4. Butler, R.J.; McDonald, J.B. Using incomplete moments to measure inequality. J. Econom. 1989, 42, 109–119. [Google Scholar] [CrossRef]
  5. Gastwirth, J.L. The Estimation of the Lorenz Curve and Gini Index. Econ. Stat. 1972, 54, 306–316. [Google Scholar] [CrossRef]
  6. Daniel, W.W. Biostatistics: A Foundation for Analysis in the Health Sciences, 9th ed.; John Wiley and Sons, Inc.: Hoboken, NJ, USA, 2005. [Google Scholar]
  7. Slemenda, C.W.; Turner, C.H.; Peacock, M.; Christian, J.C.; Sorbel, J.; Hui, S.L.; Johnston, C.C. The genetics of proximal femur geometry, distribution of bone mass and bone mineral density. Osteoporos. Int. 1996, 6, 178–182. [Google Scholar] [CrossRef]
  8. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  9. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  10. Chen, G.; Balakrishnan, N. A general Purpose Aproximate Goodness-of-Fit Test. J. Qual. Technol. 1995, 27, 154–161. [Google Scholar] [CrossRef]
  11. Buchinsky, M. Quantile regression, Box-Cox transformation model, and the U.S. wage structure. J. Econom. 1995, 65, 109–154. [Google Scholar] [CrossRef]
  12. Simas, A.B.; Barreto-Souza, W.; Rocha, A.V. Improved Estimators for a General Class of Beta Regression Models. Comput. Stat. Data Anal. 2010, 54, 348–366. [Google Scholar] [CrossRef] [Green Version]
Figure 1. U W 2 pdf for θ = 2 and different values of β .
Figure 1. U W 2 pdf for θ = 2 and different values of β .
Symmetry 15 00267 g001
Figure 2. U W 2 pdf for β = 2 and different values of θ .
Figure 2. U W 2 pdf for β = 2 and different values of θ .
Symmetry 15 00267 g002
Figure 3. Cdf of U W 2 for different values θ and β = 2 .
Figure 3. Cdf of U W 2 for different values θ and β = 2 .
Symmetry 15 00267 g003
Figure 4. The hazard rate functions for the U W 2 distribution.
Figure 4. The hazard rate functions for the U W 2 distribution.
Symmetry 15 00267 g004
Figure 5. Plots of the skewness (left) and kurtosis of the U W 2 distribution (right).
Figure 5. Plots of the skewness (left) and kurtosis of the U W 2 distribution (right).
Symmetry 15 00267 g005
Figure 6. U W 2 Lorenz Curve for different values of θ and β .
Figure 6. U W 2 Lorenz Curve for different values of θ and β .
Symmetry 15 00267 g006
Figure 7. U W 2 canonical kurtosis for different values of β .
Figure 7. U W 2 canonical kurtosis for different values of β .
Symmetry 15 00267 g007
Figure 8. Lorenz curve and Gini index of the canonical U W 2 distribution for different values of β .
Figure 8. Lorenz curve and Gini index of the canonical U W 2 distribution for different values of β .
Symmetry 15 00267 g008
Figure 9. Graph of the entropy of the canonical U W 2 distribution for different values of β .
Figure 9. Graph of the entropy of the canonical U W 2 distribution for different values of β .
Symmetry 15 00267 g009
Figure 10. Histogram for LBMD data with Densities U W 2 (solid line), U W (dashed line), B e t a (dotted line), and K W (dashed dotted line) (left) and tails (right).
Figure 10. Histogram for LBMD data with Densities U W 2 (solid line), U W (dashed line), B e t a (dotted line), and K W (dashed dotted line) (left) and tails (right).
Symmetry 15 00267 g010
Figure 11. QQ plots for the LBMD data set: K W (a), B e t a (b), U W (c), and U W 2 (d).
Figure 11. QQ plots for the LBMD data set: K W (a), B e t a (b), U W (c), and U W 2 (d).
Symmetry 15 00267 g011
Figure 12. Comparison of cumulative distributions for the LBMD data set for U W 2 (blue line), U W (red line), B e t a (green line), and K W (orange line).
Figure 12. Comparison of cumulative distributions for the LBMD data set for U W 2 (blue line), U W (red line), B e t a (green line), and K W (orange line).
Symmetry 15 00267 g012
Figure 13. Histogram for percent dissolved oxygen data (left) with densities of U W 2 (solid line), U W (dashed line), B e t a (dotted line), and K W (dashed line) and tails (right).
Figure 13. Histogram for percent dissolved oxygen data (left) with densities of U W 2 (solid line), U W (dashed line), B e t a (dotted line), and K W (dashed line) and tails (right).
Symmetry 15 00267 g013
Figure 14. QQ plots for the data set: B e t a (a), K W (b), U W (c), and U W 2 (d).
Figure 14. QQ plots for the data set: B e t a (a), K W (b), U W (c), and U W 2 (d).
Symmetry 15 00267 g014
Figure 15. Quantile regression for Yield and Temperature data with U W 2 density (left) and B e t a density (right).
Figure 15. Quantile regression for Yield and Temperature data with U W 2 density (left) and B e t a density (right).
Symmetry 15 00267 g015
Figure 16. Quantile regression for Yield and Temperature data with U W 2 density (left) and U W density (right).
Figure 16. Quantile regression for Yield and Temperature data with U W 2 density (left) and U W density (right).
Symmetry 15 00267 g016
Table 1. Reliability function comparisond for distributions of U W 2 , U W , K W , and β .
Table 1. Reliability function comparisond for distributions of U W 2 , U W , K W , and β .
S Y ( t ) = P ( Y > t )
t UW 2 ( 1 , 5 ) UW ( 1 , 5 ) KW ( 1 , 5 ) Beta ( 1 , 5 )
0.700.07894740.00575590.00243000.0024300
0.750.06250000.00196850.00097660.0009766
0.800.04761910.00055310.00032000.0003200
0.850.03409090.00011340.00007590.0000759
0.900.02173910.00001300.00001000.0000100
0.950.01041670.00000040.00000030.0000003
Table 2. Skewness and kurtosis values of the U W 2 model with different values of θ and β .
Table 2. Skewness and kurtosis values of the U W 2 model with different values of θ and β .
Skewness Kurtosis
θ β = 1 / 2 β = 1 / 3 β = 1 β = 2 β = 3 β = 1 / 2 β = 1 / 3 β = 1 β = 2 β = 3
1−0.4861−0.784900.48610.78492.09282.56441.80002.09282.5644
2−0.5980−0.973900.59800.97393.04593.95782.50133.04593.9578
3−0.5744−0.927900.57440.92793.58094.54972.99393.58094.5497
4−0.5176−0.825500.51760.82553.85644.70503.32403.85644.7050
5−0.4592−0.723700.45920.72373.99954.69543.54603.99944.6954
6−0.4077−0.636300.40770.63634.07654.63773.69844.07654.6377
7−0.3642−0.564100.36420.56414.12014.57363.80584.12014.5736
8−0.3278−0.504800.32780.50484.14604.51613.88364.14604.5161
9−0.2972−0.455700.29720.45574.16204.46783.94124.16204.4678
10−0.2715−0.414800.27150.41484.17234.42813.98484.17234.4281
11−0.2496−0.380200.24960.38024.17934.39574.01864.17934.3957
12−0.2307−0.350800.23070.35084.18404.36924.04724.18404.3693
13−0.2144−0.325400.21440.32544.18744.34754.06654.18744.3475
14−0.2002−0.303300.20020.30334.18994.32954.07654.18994.3295
15−0.1877−0.284000.18770.28404.19174.31444.09794.19174.3145
16−0.1766−0.267000.17660.26704.19324.30184.10964.19314.3018
17−0.1667−0.251800.16670.25184.19424.29114.11944.19424.2908
18−0.1578−0.239300.15780.23824.19514.40344.12784.19514.2837
19−0.1498−0.226100.14980.22634.19584.27414.13504.19584.2692
20−0.1426−0.240500.14260.21494.19635.76804.14114.19634.2746
Table 3. Entropy values for the distribution U W 2 ( θ , β ) for different values of θ and β .
Table 3. Entropy values for the distribution U W 2 ( θ , β ) for different values of θ and β .
θ β = 1 / 3 β = 1 / 2 β = 1 β = 2 β = 3
1−0.1976−0.07980.0000−0.0781−0.1939
2−0.5145−0.3657−0.2640−0.3657−0.5146
3−0.8398−0.6808−0.5714−0.6808−0.8398
4−1.0984−0.9350−0.8223−0.9350−1.0984
5−1.3079−1.1423−1.0279−1.1423−1.3079
6−1.4828−1.3159−1.2006−1.3159−1.4828
7−1.6324−1.4648−1.3488−1.4648−1.6324
8−1.7630−1.5949−1.4785−1.5949−1.7630
9−1.8788−1.7103−1.5936−1.7103−1.8788
10−1.9827−1.8139−1.6971−1.8139−1.9827
11−2.0770−1.9080−1.7910−1.9080−2.0770
12−2.1632−1.9940−1.8769−1.9940−2.1632
13−2.2426−2.0733−1.9561−2.0733−2.2426
14−2.3162−2.1469−2.0295−2.1469−2.3162
15−2.3848−2.2154−2.0980−2.2154−2.3848
16−2.4490−2.2795−2.1621−2.2795−2.4490
17−2.5093−2.3398−2.2223−2.3398−2.5093
18−2.5663−2.3967−2.2792−2.3967−2.5663
19−2.6201−2.4505−2.3330−2.4505−2.6201
20−2.6713−2.5016−2.3841−2.5016−2.6713
Table 4. Simulation of 1000 iterations of the model U W 2 ( θ , β ).
Table 4. Simulation of 1000 iterations of the model U W 2 ( θ , β ).
n β θ β ^        sd ( β ^ ) c ( β ^ ) θ ^        sd ( θ ^ ) c ( θ ^ )
5020.52.25121.101490.60.51050.060394.6
10020.52.12920.738094.30.50430.042294.0
20020.52.03460.497594.20.50220.029794.5
50212.05590.502192.81.02090.120794.6
100212.03400.352395.11.00860.084494.0
200212.00290.245095.01.00430.059494.5
50222.01180.245594.32.04190.241394.6
100222.00970.174095.42.01720.168794.0
200221.99790.122295.22.00870.118894.4
50242.00200.122194.14.08370.482694.6
100242.00310.086795.84.03440.337494.0
200241.99810.061195.34.01730.237694.5
500.540.50050.030594.24.08370.482694.6
1000.540.50080.021795.84.03440.337494.0
2000.540.49950.015395.34.01730.237694.5
500.520.50300.061494.22.04190.241394.6
1000.520.50240.043595.42.01720.168794.0
2000.520.49950.030695.22.00860.118894.5
50121.00590.122894.32.04190.241394.6
100121.00490.087095.42.01720.168794.0
200120.99890.061195.22.00870.118894.5
β ^  is the EMV of β, sd corresponds to the standard deviation, and c the empirical coverage based on a confidence interval of 95% of the respective EMV of the parameters.
Table 5. Summary statistics for ant data set of the LBMD.
Table 5. Summary statistics for ant data set of the LBMD.
n w ¯ sd b 1 b 2
660.58640.13390.040853.5395
Table 6. Parameters estimates for K W , B e t a , U W , and U W 2 distributions.
Table 6. Parameters estimates for K W , B e t a , U W , and U W 2 distributions.
Parameter Estimates KW ( sd ) Beta ( sd ) UW ( sd ) UW 2 ( sd )
α ^ 5.0241 (1.0507)4.4717 (0.7554)2.3068 (0.2055)-
β ^ 3.8972 (0.4386)6.4115 (1.1016)2.8807 (0.3867)0.6992 (0.0513)
θ ^ ---2.9370 (0.3041)
Log-likelihood33.958635.563936.33338.0463
AIC−61.712−67.1278−68.666−72.0926
BIC−59.333−62.748−64.281−67.713
KS Statistic0.12120.15150.12120.0909
W*0.13800.10290.081970.06354
A*0.92760.71490.60910.4204
Table 7. Summary statistics for environment data set of the percentage of dissolved oxygen.
Table 7. Summary statistics for environment data set of the percentage of dissolved oxygen.
n w ¯ sd b 1 b 2
2100.82940.1283−2.370211.3423
Table 8. Parameter estimates for the distributions B e t a , K W , U W , and U W 2 .
Table 8. Parameter estimates for the distributions B e t a , K W , U W , and U W 2 .
Parameter Estimates Beta ( sd ) KW ( sd ) UW ( sd ) UW 2 ( sd )
α ^ 7.3538 (0.7436)6.9905 (0.5687)5.4931 (0.4721)-
β ^ 1.6043 (0.1435)1.8721 (0.0.2003)1.1210 (0.0504)0.1616 (0.0089)
θ ^ ---2.1259 (0.1740)
AIC−344.234−352.005−326.509−394.225
BIC−337.540−345.311−319.814−387.530
Table 9. Summary statistics for data set of the temperature and yield.
Table 9. Summary statistics for data set of the temperature and yield.
Data n w ¯ sd b 1 b 2
Yield52332.093869.7559−0.26571.3058
Temp520.19650.10700.36872.1997
Table 10. Parameters estimates and standard error for the quantile regression coefficients U W 2 , U W , and B e t a models for the dataset and the quantile of 0.5.
Table 10. Parameters estimates and standard error for the quantile regression coefficients U W 2 , U W , and B e t a models for the dataset and the quantile of 0.5.
UW 2 UW Beta
Coef . Est . sd t-Valuep-Value Est . sd t-Valuep-Value Est . sd t-Valuep-Value
α 0 −0.17020.0800−2.12560.0418−0.17330.0963−1.79920.0820−0.13390.0528−2.53330.0167
α 1 0.00110.00024.19760.00020.00120.00033.67450.00090.00090.00015.18230.0000
Table 11. AIC and BIC values for the models U W 2 , U W , and B e t a of the Temperature and Yield.
Table 11. AIC and BIC values for the models U W 2 , U W , and B e t a of the Temperature and Yield.
ModelAICBIC
U W 2 −58.07373−53.67652
U W −53.75805−49.36084
B e t a −43.69310−39.29589
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Reyes, J.; Rojas, M.A.; Cortés, P.L.; Arrué, J. A New More Flexible Class of Distributions on (0,1): Properties and Applications to Univariate Data and Quantile Regression. Symmetry 2023, 15, 267. https://doi.org/10.3390/sym15020267

AMA Style

Reyes J, Rojas MA, Cortés PL, Arrué J. A New More Flexible Class of Distributions on (0,1): Properties and Applications to Univariate Data and Quantile Regression. Symmetry. 2023; 15(2):267. https://doi.org/10.3390/sym15020267

Chicago/Turabian Style

Reyes, Jimmy, Mario A. Rojas, Pedro L. Cortés, and Jaime Arrué. 2023. "A New More Flexible Class of Distributions on (0,1): Properties and Applications to Univariate Data and Quantile Regression" Symmetry 15, no. 2: 267. https://doi.org/10.3390/sym15020267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop