Next Article in Journal
Multiple Attribute Decision Making Based on Linguistic Generalized Weighted Heronian Mean
Next Article in Special Issue
The Skewed-Elliptical Log-Linear Birnbaum–Saunders Alpha-Power Model
Previous Article in Journal
Evolution to Mirror-Symmetry in Rotating Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Alternative One-Parameter Distribution for Bounded Data Modeling Generated from the Lambert Transformation

by
Yuri A. Iriarte
1,
Mário de Castro
2,* and
Héctor W. Gómez
1,*
1
Departamento de Matemática, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
2
Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos 13560-095, SP, Brazil
*
Authors to whom correspondence should be addressed.
Symmetry 2021, 13(7), 1190; https://doi.org/10.3390/sym13071190
Submission received: 5 May 2021 / Revised: 27 May 2021 / Accepted: 28 June 2021 / Published: 1 July 2021

Abstract

:
The beta and Kumaraswamy distributions are two of the most widely used distributions for modeling bounded data. When the histogram of a certain dataset exhibits increasing or decreasing behavior, one-parameter distributions such as the power, Marshall–Olkin extended uniform and skew-uniform distributions become viable alternatives. In this article, we propose a new one-parameter distribution for modeling bounded data, the Lambert-uniform distribution. The proposal can be considered as a natural alternative to well known one-parameter distributions in the statistical literature and, in certain scenarios, a viable alternative even for the two-parameter beta and Kumaraswamy distributions. We show that the density function of the proposal tends to positive finite values at the ends of the support, a behavior that favors good performance in certain scenarios. The raw moments are derived from the moment-generating function and used to describe the skewness and kurtosis behavior. The quantile function is expressed in closed form in terms of the Lambert W function, which allows reparameterizing the distribution such that the involved parameter represents the qth quantile. Thus, for the analysis of a bounded range variable, for which a set of covariates is available, we propose a regression model that relates the qth quantile of the response to a linear predictor through a link function. The parameter estimation is carried out using the maximum likelihood method and the behavior of the estimators is evaluated through simulation experiments. Finally, three application examples are considered in order to illustrate the usefulness of the proposal.

1. Introduction

It is common to deal with data expressed as a proportion, percentage, rate or fraction in the continuous range ( 0 , 1 ) when analyzing certain random phenomena, for example, when observing the annual replacement rate related to blue collar workers [1], the proportion of codling moth eggs that die from fumigation with methyl bromide [2] and the percentage difference in nicotine levels in users of first and new generation e-cigarette devices [3].
Two widely used probability distributions in data modeling such as those described above are the two-parameter beta (B) [4] and Kumaraswamy (K) [5] distributions. These distributions have a very flexible probability density function (pdf), presenting monotonic, unimodal and U shapes. Although these distributions are usually the first alternatives considered for modeling bounded data, it is possible to find in the statistical literature one-parameter distributions that can appropriately model datasets whose histograms show increasing or decreasing behavior. In this scenario, the power (P) distribution, which can be derived as a special case of the B and K distributions, and the Marshall–Olkin extended uniform (MOEU) [6] and skew-uniform (SU) [7] distributions, which are the result of the popular approach of adding a parameter to a baseline distribution in search of a more flexible distribution, can be considered viable alternatives.
A notable characteristic shared by these latter distributions is that the pdfs tend monotonically to finite values (functions of the parameters) at the extremes of the support, which in certain scenarios allows the extreme sample quantiles to be much more adequately modeled. However, on occasions, due to the curvature characteristics of the pdfs, the modeling of the most central quantiles may not be good, which in turn affects the quality of the fit of the extreme quantiles. Consequently, the performance of the P, MOEU and SU distributions is not good. In such a case, the B and K distributions, having two shape parameters, can properly model the most central quantiles due to the great flexibility exhibited by the pdfs in terms of curvature.
Motivated by the above, we formulate the following question as a starting point in this work: Based on the approach of adding parameters to a baseline distribution, is it possible to generate a parsimonious distribution that can perform better than the P, MOEU, SU, B and K distributions when modeling data whose histogram exhibits increasing or decreasing behavior?
To answer such question, we consider the Lambert-F distribution generator [8] defined by the cumulative distribution function (cdf) given by G ( x ; α ) = 1 [ 1 F ( x ) ] α F ( x ) , where α ( 0 , e ) is a shape parameter, e 2.718 is the Euler’s number and F ( x ) is an arbitrary baseline cdf. This generator has the particularity that the inverse function, that is, the quantile function, can be expressed in closed form in terms of the Lambert W function defined in Appendix A. If the baseline distribution F ( x ) is symmetric, it can be verified that α performs as a skewness parameter, allowing asymmetric shapes for the resulting pdf (for more details, see Iriarte et al. [9]).
In this article, we introduce a new one-parameter distribution that is especially useful for modeling bounded data from a population whose pdf has a monotonic (increasing or decreasing) behavior. The proposal arises directly from the Lambert-F generator when considering a uniform (U) baseline distribution. We observe that the pdf of the proposed distribution, called the Lambert-uniform (LU) distribution, tends to finite values at the ends of the support, which in certain scenarios favors the good performance of the distribution. Consequently, the LU distribution may perform better in data modeling than the P, MOEU, SU, B and K distributions. We show that the LU distribution can be reparameterized in terms of its qth quantile. Thus, based on this result, we propose a regression model that relates the qth quantile of the response to a linear predictor through a link function. The proposed model can be considered as an alternative to quantile regression models available in the literature, such as the K quantile regression model, and performs adequately in scenarios where the histogram of the observed values of the response variable exhibits an increasing or decreasing behavior.
The article is organized as follows. In Section 2, we propose the LU distribution, derive its main structural properties, describe the skewness and kurtosis characteristics and discuss the parameter estimation under the maximum likelihood (ML) method. In Section 3, we propose the quantile regression model based on the LU distribution and discuss the estimation of the regression coefficients via the ML method. In Section 4, the behavior of the estimators is evaluated in scenarios with and without covariates. Section 5 presents three application examples illustrating the usefulness of the propose. Finally, the main conclusions are reported in Section 6.

2. Lambert-Uniform Distribution

In this section, we define the Lambert-uniform random variable and derive the density, distribution and quantile functions.

2.1. LU Random Variable

Definition 1.
A random variable X follows a Lambert-uniform distribution, with shape parameters α ( 0 , e ) , denoted as X LU ( α ) , if it can be represented as
X = 1 log ( α ) W 0 log ( α ) ( U 1 ) α + 1 , if α ( 0 , 1 ) ( 1 , e ) , U , if α = 1 ,
where W 0 ( · ) is the principal branch of the Lambert W function and U is a uniform ( 0 , 1 ) random variable.
In Definition 1, taking into account that W 0 ( · ) is a monotonic function, it is observed that X corresponds to a one-to-one transformation of the uniform random variable. Furthermore, it is observed that the analytic expression W 0 [ log ( α ) ( U 1 ) / α ] / log ( α ) + 1 assumes values in the interval ( 0 , 1 ) for all U ( 0 , 1 ) and α ( 0 , e ) , which means that X assumes values in this same interval.
Proposition 1.
Let X LU ( α ) . Then, the cdf of X is given by F X ( x ; α ) = 1 1 x α x , where x ( 0 , 1 ) and α ( 0 , e ) .
Proof. 
From Equation (1), for α 1 , we have that
F X ( x ; α ) = P ( X x ) = P W 0 log ( α ) ( U 1 ) α log ( α ) x 1 .
Then, by definition of the Lambert W function, it follows that
P ( X x ) = P log ( α ) ( U 1 ) α log ( α ) x 1 α x 1 = P U 1 1 x α x ,
and the result is obtained taking into account that P ( U u ) = u , since U uniform ( 0 , 1 ) . Finally, note that the expression obtained is also valid for α = 1 , once F X ( x ; 1 ) = x . □
The pdf of X can be obtained in a straightforward way from Proposition 1.
Corollary 1.
Let X LU ( α ) . Then, the pdf of X is given by f X ( x ; α ) = α x 1 log ( α ) 1 x .
Consistent with Definition 1, it is observed that the cdf and the pdf of the LU distribution reduce to the cdf and the pdf of the U distribution when α = 1 . Therefore, the LU distribution can be understood as an extension with one extra parameter of the U distribution.
The analytical description of the shapes for the LU pdf is simple and leads to establish that: (i) lim x 0 + f X ( x ; α ) = 1 log ( α ) and lim x 1 f X ( x ; α ) = α . (ii) f X ( x ; α ) is a constant function when α = 1 , a decreasing monotonic function when α ( 0 , 1 ) and an increasing monotonic function when α ( 1 , e ) . Property (i) shows that the pdf of the LU distribution converges to finite values (greater than 0) as x tends to the extreme values, 0 and 1, of the support. From Property (ii), it follows that the LU distribution is appropriate to fit bounded data whose relative frequency shows increasing or decreasing behavior. Figure 1 shows some pdf curves of the LU distribution for different values of the parameter α . Note that the behavior of the pdf curves is consistent with what is established above. In addition, note that the curvature of the pdf varies depending on its behavior at the ends of the support. Thus, the behavior of the LU pdf is similar to that of the P, MOEU and SU pdfs.
Note that, due to the behavior of the pdf at the ends of the support, the LU distribution may more adequately fit the extreme sample quantiles than a distribution whose pdf tends to and 0 at the ends of the support. In Section 5, we see that the LU distribution may perform better in fitting data than the P, MOEU and SU distributions, even better than the two-parameter B and K distributions whose pdfs (in the monotonic case) tend to and 0 at the ends of the support.
Considering steps very similar to those of the proof of Proposition 1, the quantile function (qf) of the LU distribution can be easily derived by inverting the cdf given in Proposition 1. The resulting analytical expression for this function, for u ( 0 , 1 ) , is given by
Q X ( u ; α ) = 1 log ( α ) W 0 log ( α ) ( u 1 ) α + 1 , if α ( 0 , 1 ) ( 1 , e ) , u , if α = 1 .
Since the Lambert W function is implemented in different statistical software, Equation (2) can be easily computed.
As a final consideration of this section, we highlight that the linear transformation a + b X , where X LU ( α ) , a R and b > 0 , follows a LU distribution on the continuous range ( a , a + b ) . Therefore, the LU distribution can be easily used to fit bounded data to any real range.

2.2. Related Distributions

It is well known that some distributions such as the exponential, Rayleigh and power, among others, can be derived as a transformation of a uniform random variable. Considering these transformations on a LU random variable, we derive the following distributions: (1) Let Y = λ log ( 1 X ) , where X LU ( α ) and λ > 0 . Then, Y follows the Lambert-exponential distribution, see Iriarte et al. [8]. (2) Let Y = σ log ( 1 X ) , where X LU ( α ) and σ > 0 . Then, Y follows the Lambert–Rayleigh distribution (see Iriarte et al. [8]). (3) Let Y = X 1 / δ , where X LU ( α ) and δ > 0 . Then, the distribution of Y is a two-parameter distribution that reduces to the P distribution when α = 1 . In this case, the pdf of Y is given by f Y ( y ; δ , α ) = δ y δ 1 α y δ 1 log ( α ) ( 1 y δ ) , where y ( 0 , 1 ) . Thus, we refer to this distribution as the Lambert-power distribution.
Other distributions of the literature can be derived under consideration of appropriate transformations of LU random variables. Illustratively, we consider in this section only the three transformations described above.

2.3. Skewness and Kurtosis

In the following, a description of the skewness and kurtosis characteristics of the LU distribution is made by analyzing Fisher’s asymmetry and kurtosis coefficients. For this, the moment generating function is first calculated.
Proposition 2.
Let X LU ( α ) . Then, in the case α = 1 , the moment generating function of X is given by M X ( t ) = ( e t 1 ) / t . In the case α ( 0 , 1 ) ( 1 , e ) , the moment generating function is given by M X ( t ) = { log 2 ( α ) [ 1 log ( α ) ] t + α t e t } [ t + log ( α ) ] 2 .
Proof. 
In the case α = 1 , the distribution LU reduces to standard U distribution, thus M x ( t ) = E ( e t X ) = ( e t 1 ) / t . In the case α ( 0 , 1 ) ( 1 , e ) , we observe that
E ( e t X ) = [ 1 log ( α ) ] [ α exp ( t ) 1 ] t + log ( α ) + log ( α ) 0 1 x exp { x [ t + log ( α ) ] } d x ,
and the result is obtained considering the usual method of integration by parts and an appropriate algebra. □
Corollary 2.
Let X LU ( α ) . Then, in the case α = 1 , the first four raw moments of X are E ( X ) = 1 / 2 , E ( X 2 ) = 1 / 3 , E ( X 3 ) = 1 / 4 and E ( X 4 ) = 1 / 5 . In the case α ( 0 , 1 ) ( 1 , e ) , for δ = log ( α ) , the first four raw moments are given by
E ( X ) = α 1 δ δ 2 , E ( X 3 ) = 3 [ α δ 2 2 δ ( 2 α + 1 ) 6 ( 1 α ) ] δ 4 , E ( X 2 ) = 2 [ ( α + 1 ) δ + 2 ( 1 α ) ] δ 3 , E ( X 4 ) = 4 [ α δ 2 ( δ 6 ) + 6 δ ( 3 α + 1 ) + 24 ( 1 α ) ] δ 5 .
Corollary 3.
Let X LU ( α ) . Then, in the case α = 1 , the skewness ( γ 1 ( α ) ) and kurtosis ( γ 2 ( α ) ) coefficients assume the values 0 and 9/5, respectively. In the case α ( 0 , 1 ) ( 1 , e ) , the coefficients are given by
γ 1 ( α ) = μ 3 3 μ 1 μ 2 + 2 μ 1 3 ( μ 2 μ 1 ) 3 / 2 and γ 2 ( α ) = μ 4 4 μ 1 μ 3 + 6 μ 1 2 μ 2 3 μ 1 4 ( μ 2 μ 1 ) 2 .
where μ r = E ( X r ) , with r = 1 , 2 , 3 , 4 , are as in Corollary 2.
The skewness and kurtosis ranges for the LU distribution are ( 3 e 6 e 2 + 2 e 3 4 ) [ ( 2 + 2 e e 2 ) 3 / 2 ] < γ 1 ( α ) < 2 and 9 / 5 < γ 2 ( α ) < 9 .
Figure 2 presents plots of the coefficients given in Corollary 3. The figure shows that the LU distribution is symmetric only in the case α = 1 , has positive skewness when α ( 0 , 1 ) and has negative skewness when α ( 1 , e ) . Furthermore, it is observed that the LU distribution can model kurtosis levels higher than the kurtosis level of the U distribution.

2.4. ML Estimation

For a random sample X 1 , , X n , such that X i LU ( α ) , with i = 1 , , n , the log-likelihood function is given by
( α ) = log i = 1 n f X ( x i ; α ) = log ( α ) i = 1 n x i + i = 1 n log [ 1 log ( α ) ( 1 x i ) ] ,
where x ¯ is the mean of the observed sample. Thus, the score function is given by
U ( α ) = ( α ) α = 1 α i = 1 n x i 1 α i = 1 n 1 x i 1 log ( α ) ( 1 x i ) .
From Equation (4), it is observed that the ML estimator for α cannot be explicitly expressed. Therefore, the ML estimate of α must be obtained by solving the equation U ( α ) = 0 by numerical procedures. The uniroot.all function available in the rootSolve package of the R programming language [10] is a good option to tackle this task.
Since the ML estimator of α does not have a closed form, a good alternative to obtain the ML estimate is to solve the optimization problem max α ( α ) , subject to α ( 0 , e ) . To solve this problem, we use the optim function in the R programming language under the L-BFGS-B algorithm. This algorithm requires the specification of a value in the range of α to initialize the iterative process. Through simulation experiments, we observe that the initial value α 0 = 1 is a good initial value.
The second partial derivative of the ( α ) function, with respect to α , is given by
2 ( α ) α 2 = 1 α 2 i = 1 n x i + 1 α 2 i = 1 n 1 x i 1 log ( α ) ( 1 x i ) 1 α 2 i = 1 n 1 x i 1 log ( α ) ( 1 x i ) 2 .
Thus, under regularity conditions, we observe that the Fisher information is given by
I ( α ) = E 2 ( α ) α 2 = n α 0 1 u 2 α u [ 1 log ( α ) u ] d u .
The integral in Equation (5) can be calculated by numerical integration, for example, the integrate function of the R programming language can be used. Then, under regularity conditions, the asymptotic distribution of ( α ^ α ) is N ( 0 , I 1 ( α ) ) . Thus, the asymptotic standard error of α ^ is given by 1 / I ( α ^ ) and the asymptotic 100 ( 1 γ ) % confidence interval for α ^ is given by α ^ ± z γ / 2 / I ( α ^ ) , where γ / 2 is the γ / 2 upper quantile of the standard normal distribution.

3. Quantile Regression Model

In statistical modeling, the regression technique is used to quantify the relationship between the dependent variable (response) and one or more independent variables (covariates). In the case in which the interest lies in quantifying the effect on the conditional mean response, given the covariates, the classical least squares regression model and the generalized linear models are especially valued. These models have been shown to be very useful when analyzing data in various areas of knowledge. However, there are scenarios where it is equally or even more important to quantify the effect on some other measure such as the conditional median or some extreme conditional quantile of the response (see, e.g., [11,12]). In this scenario, a quantile regression model is appropriate because it allows quantifying the effect of the covariates on any conditional quantile of the response.
In the case of a continuous bounded range response, for example bounded to the range ( 0 , 1 ) , it is possible to use the well-known beta regression model (among others) to quantify the effect of the covariates on the mean response (see [13]). Attractive alternatives to the beta regression model can be found in the literature (see, e.g., [14]). On the other hand, from a quick review of the literature, we found the K [15] and arc-secant-hyperbolic-normal (ASHN) [16] regression models among the proposals for modeling the conditional quantiles of the response. In these last two models, we emphasize that the regression depends on a shape parameter that provides great flexibility and that must be estimated together with the regression coefficients.
A very good description of regression models for bounded response variables can be found in the work of Bayes et al. [17], who proposed a mixed quantile regression model for bounded response variables.
In what follows, we propose a quantile regression model formulated from a reparameterized version of the LU distribution proposed in Section 2.1. In this model, unlike the K and ASHN models, only the regression coefficients must be estimated. This is because it is formulated from a distribution with a single shape parameter that is linked to the linear predictor through an appropriate link function. We highlight that the performance of the proposed model is appropriate in scenarios where the histogram of the observed values of the response variable exhibits a decreasing or increasing behavior.

3.1. The LU Model

In Corollary 2, it can be seen that the mean of the LU distribution has a closed form. However, despite this, we observe that the shape parameter α cannot be expressed explicitly as a function of the mean, which is a major drawback to formulate a regression model to quantify the effect of the covariates on the mean response. On the other hand, we observe that α can be explicitly expressed as a function of the qth quantile, which allows reparameterizing the LU distribution in terms of its qth quantile and, consequently, formulate a quantile regression model in a simple way.
Denoting by η the qth quantile of the LU distribution, from Equation (2), we obtain that α = [ ( 1 q ) / ( 1 η ) ] 1 / η . Thus, the LU distribution can be easily reparameterized in terms of the qth quantile, obtaining (for q ( 0 , 1 ) is known) the pdf given by
f X ( x ; η ) = 1 q 1 η x η 1 1 η log 1 q 1 η ( 1 x ) , x , η ( 0 , 1 ) .
Let X 1 , , X n be n random variables and denote by x 1 , , x n the observed values. Assume that each X i has pdf f X i ( x ; η i ) given in Equation (6). The LU quantile regression model is defined by establishing that the qth quantile η i of X i satisfies the functional relationship g ( η i ) = w i t β , i = 1 , , n , where w i = ( 1 , w i 1 , , w i ( k 1 ) ) t is the vector of covariates associated to the response x i , β = ( β 0 , β 1 , , β ( k 1 ) ) t is a k-dimensional vector of unknown regression coefficients and g ( · ) is a strictly increasing and twice differentiable function that maps ( 0 , 1 ) into R (link function). For instance, the most useful well-known link functions are the logit, log-log and probit functions.

3.2. ML Estimation

From Equation (6), the log-likelihood function is given by
( β ) = log ( 1 q ) i = 1 n x i η i i = 1 n ( 1 η i ) x i η i + i = 1 n log 1 1 η i log 1 q 1 η i ( 1 x i ) ,
and the score functions are given by
( β ) β r = log ( 1 q ) i = 1 n x i η i , r η i 2 + i = 1 n x i log ( 1 η i ) η i , r η i 2 + i = 1 n x i η i , r η i ( 1 η i ) + i = 1 n log 1 q 1 η i ( 1 x i ) η i , r η i 2 1 1 η i log 1 q 1 η i ( 1 x i ) i = 1 n ( 1 x i ) η i , r ( 1 η i ) 1 1 η i log 1 q 1 η i ( 1 x i ) ,
where η i , r = η i / β r , η i = g 1 ( w i t β ) , with r = 0 , 1 , , k 1 . Note that η i , r depends on the link function. For example, if the logit link is considered, that is, g ( u ) = log ( u / ( 1 u ) ) , for u ( 0 , 1 ) , then η i r = η i ( 1 η i ) w i r , where η i = exp ( w i t β ) / [ 1 + exp ( w i t β ) ] , w i 0 = 1 , with i = 1 , 2 , , n , r = 0 , 1 , , k 1 .
We observe from Equation (7) that the ML estimators for the coefficients β s cannot be expressed in closed form. Thus, the ML estimates must be obtained by solving the system of score equations using numerical procedures. In the R programming language, the multirrot function of the rootSolve package is a good alternative to solve this system of equations.
In this case, since the ML estimators do not have a closed form, a good alternative to obtain ML estimates is to solve the following optimization problem max β ( β ) , subject to β r R , r = 0 , 1 , , k 1 , where ( β ) is given in Equation (7). We solved this problem using the function optim of the R programming language and, specifically, the BFGS algorithm was applied.
Under regularity conditions, the asymptotic distribution of ( β ^ ML β ) is N k ( 0 , K ( β ) 1 ) , where K ( β ) is the expected information matrix. Since the function ( β ) is not simple, it is not easy to obtain the analytical expression of this matrix. However, we obtain an approximation from the observed information matrix, whose elements are computed as minus the second partial derivatives of the log-likelihood function with respect to all the parameters (evaluated at the ML estimates). Thus, the observed information matrix is given by
I ( β ) = ε β 0 β 0 ε β 0 β 1 ε β 0 β k 1 ε β 0 β 1 ε β 1 β 1 ε β 1 β k 1 ε β 0 β k 1 ε β 1 β k 1 ε β k 1 β k 1 , ε β r β p = 2 ( β ) β r β p | β = β ^ ML , r , p = 0 , 1 , k 1 ,
where the second derivatives are presented in Appendix C.

4. Simulation Studies

In this section, we initially carry out a simulation study to evaluate the behavior of the ML estimators of the shape parameter of the LU distribution. Subsequently, we carried out a second simulation study to evaluate the behavior of the ML estimators for the coefficients of the LU quantile regression model.

4.1. First Simulation Study

In this study, 1000 random samples from the LU distribution were simulated considering the sample sizes n = 10 , 20 , , 1000 , respectively, in the scenarios α = 0.5 and α = 1.5 . The samples were generated using the qf given in Equation (2). The LambertW package [18] available in the R programming language was used to compute the principal branch of the Lambert W function. The estimates were obtained by maximizing the log-likelihood function under the considerations of Section 2.4.
Figure 3 shows the average estimate (AE), the empirical standard deviation (SD) and the roots of the mean squared error (RMSE) for each of the 1000 estimates obtained for each scenario and sample size considered. In addition, the average of asymptotic standard errors (SE) and the coverage probability (CP) of the 95% asymptotic confidence intervals are also reported. In the figures, it is observed that the AEs tend to be close to the true values of α as the sample size increases. The SDs, RMSEs and SEs are close and decrease to zero as the sample size increases, as expected in standard asymptotic theory. It is also observed that the CPs converge to the nominal values as the sample size increases.
Note that the decreasing patterns observed in the left panels of the figure suggest an overestimation of α when the sample size is small. As complementary information, the AEs obtained in sample sizes less than 100 can be consulted in Appendix B.

4.2. Second Simulation Study

In this study, we simulated 1000 random samples (considering the sample sizes n = 10 , 20 , , 1000, respectively) from the reparameterized LU distribution, given in Equation (6), where the shape parameter is linked via the logit function with three covariates. The samples were generated as follows:
  • Definition of covariates: Generate w 1 = ( w 11 , , w 1 n ) t , w 2 = ( w 21 , , w 2 n ) t and w 3 = ( w 31 , , w 3 n ) t , where ( w 1 j , w 2 j ) follows a bivariate normal distribution with parameters μ 1 = μ 2 = 0 , σ 1 = σ 2 = 1 and ρ = 0.7 , with j = 1 , , n and w 3 is a binary variable with probability of success depending on the variable w 1 through the logistic function, that is, w 3 j Bernoulli ( p j ) , where p j = 1 / [ 1 + exp ( w 1 j ) ]   j = 1 , , n .
  • Definition of scenarios: We considered two scenarios, A and B, where in both we picked β 0 = 2 , β 1 = 0.1 , β 2 = 0.5 and β 3 = 2.5 . Regarding the choices for q, we chose the values 0.25 for Scenario A and 0.75 for Scenario B.
  • Simulate the response variable: Generate ( u 1 , , u n ) t , u j uniform ( 0 , 1 ) , j = 1 , , n , and calculate
    x j = η j log 1 q 1 η j W 0 log 1 q 1 η j u j 1 η j 1 η j 1 q 1 / η j + 1 , j = 1 , , n ,
    where η j = exp ( w t β ) / [ 1 + exp ( w t β ) ] , such that w t β = β 0 + β 1 w 1 j + β 2 w 2 j + β 3 w 3 j , with j = 1 , , n .
In both scenarios, under the different sample sizes, it is possible to verify that the histogram of the simulated values has a decreasing behavior.
For each simulated sample, we calculated the ML estimates for the coefficients β s under the considerations of Section 3.2. Figure 4 and Figure 5 show the AEs, SDs, SEs and RMSEs for the estimates obtained in each scenario and sample size considered. In Figure 6, the CPs of the 95% asymptotic confidence intervals are reported. Similar to the results obtained in the first simulation study, in the figures, it is observed that the AEs tend to be close to the true values of the coefficients β s as the sample size increases. The SDs, RMSEs and SEs are close and decrease to zero as the sample size increases, as expected in standard asymptotic theory. It is observed that the CPs converge to the nominal values as the sample size increases.
Note that the increasing and decreasing patterns exhibited in the top panels of Figure 4 and Figure 5 suggest an underestimation and overestimation, respectively, of the individual effect of the covariates on the 0.25th and 0.75th quantiles of the response when the sample size is small. Complementarily, the AEs of the β s obtained in sample sizes smaller than 100 can be consulted in Appendix B.
When comparing the upper right panels of the figures, a different behavior pattern is observed for the AEs of β 3 . Therefore, the effect of the covariate w 3 on the 0.25th quantile of the response can be underestimated when the sample size is small, while the effect on the 0.75th quantile can be overestimated.

5. Data Analysis

In this section, three application examples are presented in order to illustrate the usefulness of the LU distribution and the LU quantile regression model. In the first, we compare the performance of the LU, P, MOEU, SU, B and P distributions in the fitting of 1000 samples generated from an LU population. Here, we highlight the virtues of the LU distribution over the aforementioned distributions, leaving open the existence of a possible real-world setting where the distributions show such performance. In the second example, we compare the performance of the LU, P, MOEU, SU, B and P distributions in fitting a real dataset. In the third example, based on a real data frame, and in order to quantify the effect of the covariates on the 0.25th, 0.5th and 0.75th quantiles of the response, we compare the performance of the LU quantile regression model with the performance of other models such as ASHN [16] and K [15] quantile regression models. In all three models, the logit function is considered to relate the qth quantile of the response to the linear predictor. The pdfs of the ASHN and K distributions are given, respectively, by
f ( x ; α , η ) = 2 α Φ 1 ( q α ) A ( η ) x 1 x 2 ϕ Φ 1 ( q α ) A ( x ) A ( η ) 2 2 Φ Φ 1 ( q α ) A ( x ) A ( η ) α 1 ,
where x ( 0 , 1 ) , q α = 1 ( q 1 / α ) / 2 , A ( z ) = log [ ( 1 + 1 z 2 ) / z ] is the hyperbolic arcsecant function, α > 0 is a shape parameter, η ( 0 , 1 ) is a quantile parameter and q ( 0 , 1 ) is known, and
f ( x ; β , η ) = β log ( 1 q ) log ( 1 η β ) x β 1 ( 1 x β ) log ( 1 q ) / log ( 1 η β ) 1 , x ( 0 , 1 ) ,
where β > 0 is a shape parameter, η ( 0 , 1 ) is a quantile parameter and q ( 0 , 1 ) is known.
The regression framework for bounded responses based on the K and ASHN distributions is very similar to the regression framework based on the LU distribution presented in Section 3.1. However, the main difference with LU regression is that it depends on a shape parameter that gives great flexibility to the modeling.
In all three examples, the parameters are estimated by maximizing the corresponding likelihood functions with the optim function in the R programming language. We compared the performance of the models by contrasting the maximum value of the log-likelihood function () and contrasting the values associated with the Akaike Information Criterion (AIC) [19] and the Bayesian Information Criterion (BIC) [20]. In general, the best model can be chosen as the one that shows the highest value and the lowest AIC, CAIC and BIC values. In addition, we consider the usual Anderson–Darling (AD) and Cramer–von Mises (CvM) goodness-of-fit tests to assess the quality of the fits in the first and second examples. In the third example, we use these tests to assess the overall quality of fit of the regression models, by testing the hypothesis that the randomize residuals [21] follow the standard normal distribution. These residuals follow a standard normal distribution when the overall quality of fit is appropriate. We use the ad.test and cvm.test functions available in the goftest [22] package in the R programming language to calculate the statistics and p-values of these tests.

5.1. Data from an LU Population

In this example, we generate 1000 random sample of size 300 from an LU population with parameter α = 0.01 . The chosen value for α is associated with a decreasing pdf that converges to the values 5.605 and 0.01 at the ends of the support.
Based on the AD and CvM goodness of fit tests, considering a significance level of 5%, we calculate the proportion of samples where the LU, P, MOEU, SU, B and K distributions fit the data appropriately. We call this the non-rejection rate. Additionally, we calculate the proportion of samples where each distribution presents the lowest AIC and BIC values, that is, the proportion of samples where each distribution exhibits the best performance. We call this the hit rate. Table 1 reports the values associated with the non-rejection and hit rates for the LU, P, MOEU, SU, B and K distributions fitted to the 1000 samples. In the table, we observe that the two-parameter B and K distributions are capable of appropriately fitting a large proportion of samples. However, the LU distribution better fit most of the samples generated.
Calculating lim x 0 + f X ( x ) for each fitted distribution in a single generated sample, we observe the limit value 5.611 for the LU distribution, 7.618 for the MOEU distribution, 2.000 for the SU distribution and for the B, K and P distributions. Now, calculating lim x 1 f X ( x ) , we observe the limit values 0.009 for the LU distribution, 0.131 for the MOEU distribution, 0.438 for the P distribution and 0 for the SU, B and K distributions. In Figure 7, the histogram for this sample fitted with the LU, P, MOEU, SU, B and K distributions is presented. Here, it can be seen that the curvature characteristics of the LU, B and K pdfs are similar, exhibiting a similar performance in the fit of the most central sample quantiles. However, the LU pdf more appropriately fits the more extreme quantiles by tending to the values 5.611 and 0.009 at the ends of the support. The ML estimates, the AIC and BIC values and the p-values of the AD and CvM tests for each distribution fitted to this sample can be consulted in Appendix E.
The analysis considered in this section suggests the possible existence of a real world scenario in which such performances are displayed.

5.2. Peak Horizontal Acceleration Data

We consider a dataset consisting of 182 observations on the peak horizontal acceleration (g) in earthquakes recorded by observation stations in California, USA. These data were originally analyzed by Joyner and Boore [23] and can be found with the name attenu in the dataset package of the R programming language. Some descriptive statistics are the following: minimum, 0.003; maximum, 0.810; skewness, 1.641; and kurtosis, 6.071. The histogram of this dataset, presented in Appendix D, shows a decreasing behavior. Thus, we hope that the LU distribution can properly fit the peak horizontal acceleration data.
Table 2 reports the ML estimates, the , AIC and BIC values and the p-values of the AD and CvM goodness of fit tests for each distribution fitted to the peak horizontal acceleration data. In this table, based on p-values, considering a significance level of 5%, we observe that the SU and P distributions are not appropriate to fit the peak horizontal acceleration data. Note that the MOEU, SU, P and LU distributions are uni-parametric, however the performance shown by the LU distribution is clearly better. We also observe in Table 2 that the LU distribution is the one with the lowest AIC and BIC values and the one with the highest value, evidencing that this distribution must be selected over the others for the fit of the peak horizontal acceleration data.
Figure 8 presents the qqplots for the fitted distributions. This figure shows that the LU distribution fits the peak horizontal acceleration data appropriately.
Calculating lim x 0 + f X ( x ) for each fitted distribution, we observe the limit value 6.298 for the LU distribution, 8.923 for the MOEU distribution, 2.000 for the SU distribution and for the B, K and P distributions. Now, calculating lim x 1 f X ( x ) , we observe the limit values 0.005 for the LU distribution, 0.112 for the MOEU distribution, 0.412 for the P distribution and 0 for the SU, B and K distributions. This illustrates that the performances exhibited by the LU, P, MOEU, SU, B and K distributions over this dataset are very similar to the performances exhibited in the previous section based on simulated data.

5.3. Risk Managements Practice Data

In this example, we consider the data frame presented by Schmit and Roth [24] consisting of observations of seven variables (73 observations for each variable) consulted by means of a questionnaire sent to 374 risk managers of large US-based organizations. The variables consulted are described below: FI represents the measure of the firm’s risk management cost effectiveness, defined as total property and casualty premiums and uninsured losses as a percentage of total assets; AS represents the per occurrence retention amount as a percentage of total assets; CA indicates that the firm owns a captive insurance company; SI represents the logarithm of total assets; IN represents a measure of the firm’s industry risk; CE represents a measure of the importance of the local managers in choosing the amount of risk to be retained; and SO represents a measure of the degree of importance in using analytical tools.
Gómez-Déniz et al. [25] considered a Log-Lindley regression model to quantify the effect of the variables AS, CA, SI, IN, CE and SO on the mean of the variable FI. In our case, we consider the LU quantile regression model to quantify the effect of such covariates on the 0.25th, 0.5th and 0.75th quantiles of the variable FI, providing a very informative scenario (which complements the one proposed by Gómez-Déniz et al. [25]) to explain the behavior of the FI response in terms of the covariates already described. We observe that the histogram of the variable FI, presented in Appendix D, shows a decreasing behavior. Thus, we hope that the LU model can appropriately quantify the effect of the covariates on the 0.25th, 0.5th and 0.75th quantiles of the response variable.
As already mentioned, we compare the results with those obtained with the K and ASHN quantile regression models. The regression structure assumed for η i is given by logit ( η i ) = β 1 ( q ) + β 2 ( q ) AS i + β 3 ( q ) CA i + β 4 ( q ) SI i + β 5 ( q ) IN i + β 6 ( q ) CE i + β 7 ( q ) SO i , i = 1 , , 73 , where η i denotes the qth quantile of the LU, K and ASHN distributions.
Table 3 reports the , AIC and BIC values for the ASHN, K and LU models fitted to the risk managements practice data. The p-values of the AD and CvM tests of the hypothesis that the randomize residuals follow a standard normal distribution are also reported in Table 3. In this table, we see that the , AIC and BIC values change as the value of q changes. This shows that the rate of change in the conditional quantile of the response FI, expressed by the regression coefficients, depends on the value of q. On the other hand, based on the p-values and considering a significance level of 5%, we observe that the assumption of normality of the randomize residuals of the LU and K models is not rejected. Thus, under this significance level, the global fit of these models is appropriate. Furthermore, we observe that the LU model is the one with the highest value and the one with the lowest AIC and BIC values, suggesting that this model should be selected to quantify the effect of the covariates on the 0.25th, 0.5th and 0.75th quantiles of the response. As already pointed out in Section 3, keeping in mind that, in the K and ASHN models a shape parameter must be estimated together with the regression coefficients, with one fewer parameter to estimate, the LU quantile regression model performs more appropriately than these models.
Table 4 reports the estimates for the coefficients of the LU models fitted to the risk managements practice data. In addition, the z statistics and the p-values of the significance tests of the individual regression coefficients are reported. Here, we observe that the covariates SI and IN (the logarithm of total assets and the measure of the firm’s industry risk) are statistically significant at usual nominal levels. Additionally, it is important to point out that there is a negative relationship between the 0.25th, 0.5th and 0.75th quantiles of the response (the firm’s risk management cost effectiveness) and the covariate SI, while there is a positive relationship between the 0.25th, 0.5th and 0.75th quantiles of the response and the covariate IN. On the other hand, the covariates AS, CA, CE and SO are not significant. Figure 9 shows the estimates of the coefficients with the 95% confidence intervals of the LU regression model assuming different values for q. Here, we observe that the estimates of the coefficients of the covariates SI and IN decrease and increase, respectively, distancing themselves more and more from the value 0 as q increases. This illustrates a greater effect of these covariates on the high quantiles of the firm’s risk management cost effectiveness.

6. Final Comments

In this article, we propose a new one-parameter distribution for the modeling of bounded data whose histograms show increasing or decreasing behavior. The new distribution, called the Lambert-uniform distribution (LU), arises from the Lambert-F generator considering a U baseline distribution. An important aspect to highlight about the LU distribution is that its pdf tends to finite values at the ends of the support, which allows the extreme sample quantiles to be adequately modeled.
We derive the main structural properties of the LU distribution, including the moment-generating function that is used to describe the skewness and kurtosis characteristics. The quantile function of the LU distribution can be expressed in terms of the Lambert W function, which allows reparameterizing the pdf in terms of its qth quantile, resulting in a pdf with a simple, easy to compute analytical structure. Thus, we propose the LU quantile regression model that relates the qth quantile of the response to a linear predictor through a link function. The parameter estimation for the cases with and without covariates is performed with the maximum likelihood method. The estimators of the parameters do not have a closed form, so the use of some computational routine is required to obtain the estimates. We use the optim function in the R programming language to obtain the estimates. We evaluate the behavior of the estimators through two simulation studies, concluding that the maximum likelihood method provides acceptable estimates. Finally, we present three application examples in order to illustrate the usefulness of the proposal. In the first and second examples, considering datasets whose histograms show decreasing behavior, we illustrate that the LU distribution may present a better fit than the P, MOEU, SU, B and K distributions. Thus, in scenarios where the data exhibit such behavior, the LU distribution can be considered a viable alternative to commonly used distributions. In the third example, based on a real data frame, we illustrate that a quantile regression model formulated from the LU distribution performs well when modeling the 0.25th, 0.5th and 0.75th quantiles of the response (given a set of covariates), being a viable alternative to the other models such as the ASHN and K quantile regression models.

Author Contributions

Conceptualization, Y.A.I. and M.d.C.; Formal analysis, Y.A.I., M.d.C. and H.W.G.; Investigation, Y.A.I., M.d.C. and H.W.G.; Methodology, Y.A.I. and H.W.G.; Software, Y.A.I.; Supervision, M.d.C. and H.W.G.; and Validation, M.d.C. and H.W.G. All authors contributed significantly to this research article. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Y.A.I. was funded by CONICYT PAI/INDUSTRIA 79090016, Chile. This work was partially done during M.d.C.’s visit to the Universidad de Antofagasta, supported by MINEDUC-UA Project, code ANT1856, Chile. The work of M.d.C. was partially funded by CNPq, Brazil. The research of H.W.G. was supported by Grant SEMILLERO UA-2021 (Chile).

Acknowledgments

The authors would like to thank the editor and the anonymous referees for their comments and suggestions, which significantly improved our manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
UUniform
PPower
BBeta
KKumaraswamy
ASHNArcsecant-hyperbolic-normal
LULambert-uniform
MLMaximum likelihood
cdfCumulative distribution function
pdfProbability density function
qfQuantile function

Appendix A. Lambert W Function

The Lambert W function, W ( z ) , is defined as the inverse function of x e x = z , that is, it satisfies the equation
W ( z ) e W ( z ) = z , z C .
Generally, the Lambert W function is defined for any z C . By restricting z to be a real number, the function is defined for z 1 / e , where e is the Euler’s number. In this case, it is possible to distinguish three cases:
  • If z < 1 / e , then no solution exists in the reals.
  • If z ( 1 / e , 0 ) , then there are two solutions given by the principal branch, W 0 ( z ) , and the non-principal branch, W 1 ( z ) .
  • If z 0 , then the solution is unique, W 0 ( z ) = W 1 ( z ) .

Appendix B. The AEs Obtained in the Simulation Studies Presented in Section 4.1 and Section 4.2 with Sample Sizes Less Than 100

Table A1. The AEs obtained in the simulation study of Section 4.1.
Table A1. The AEs obtained in the simulation study of Section 4.1.
Sample Size
Scenario102030405060708090
A ( α = 0.5 )0.5660.5340.5230.5080.5200.5090.4980.5100.512
B ( α = 1.5 )1.6191.5351.5301.5121.5071.5211.5161.5041.509
Table A2. The AEs obtained in Scenarios A and B of the simulation study in Section 4.2.
Table A2. The AEs obtained in Scenarios A and B of the simulation study in Section 4.2.
Sample Size
Parameter102030405060708090
Scenario A ( q = 0.25 )
β 0 −2.049−2.019−2.005−2.011−2.005−1.984−2.006−2.015−1.994
β 1 0.0900.0970.1250.0890.1070.1130.1080.0980.104
β 2 0.6040.5290.5150.5200.5110.5010.4950.4960.502
β 3 2.558−2.555−2.563−2.529−2.535−2.557−2.515−2.496−2.528
Scenario B ( q = 0.75 )
β 0 −2.275−2.110−2.063−2.046−2.034−2.008−2.026−2.028−2.009
β 1 −0.0350.0890.1140.0960.1080.1100.1070.0990.105
β 2 0.6210.5070.5030.5030.4990.4950.4890.4910.497
β 3 −2.275−2.454−2.494−2.491−2.503−2.529−2.493−2.482−2.511

Appendix C. Second Partial Derivatives of the Log-Likelihood Function Given in Equation (7)

2 ( β ) β p β r = log ( 1 q ) i = 1 n x i η i , r , p η i 2 + 2 log ( 1 q ) i = 1 n x i η i , r η i , p η i 3 i = 1 n x i η i , r η i , p η i 2 ( 1 η i ) + i = 1 n x i log ( 1 η i ) η i , r , p η i 2 2 i = 1 n x i log ( 1 η i ) η i , r η i , p η i 3 + i = 1 n x i ( 1 η i ) η i , r , p η i 3 i = 1 n x i η i , r ( 1 η i ) η i , p η i 4 + i = 1 n x i η i , r η i , p η i 3 + i = 1 n ( 1 x i ) η i , r η i , p η i 2 ( 1 η i ) H ( x i ; η i ) + i = 1 n ( 1 x i ) log 1 q 1 η i η i , r , p η i 2 H ( x i ; η i ) 2 i = 1 n ( 1 x i ) log 1 q 1 η i η i , r η i , p η i 3 H ( x i ; η i ) i = 1 n ( 1 x i ) 2 log 2 1 q 1 η i η i , r η i , p η i 4 H 2 ( x i ; η i ) i = 1 n ( 1 x i ) η i , p ( 1 η i ) H ( x i ; η i ) i = 1 n ( 1 x i ) η i , r η i , p ( 1 η i ) 2 H ( x i ; η ) i = 1 n ( 1 x i ) 2 log 1 q 1 η i η i , r η i , p η i 2 ( 1 η i ) H 2 ( x i ; η i ) + i = 1 n ( 1 x i ) 2 η i , r η i , p η i ( 1 η i ) 2 H 2 ( x i ; η i ) ,
where H ( x i ; η i ) = 1 ( 1 / η i ) log [ ( 1 q ) / ( 1 η i ) ] ( 1 x i ) , η i , r and η i , s are as in Equation (8) and η i , r , p = 2 η i / ( β p β r ) , with r , p = 0 , 1 , , k 1 . Thus, under the consideration of the link logit, we observe that η i , 0 , p = δ i w i p , η i , r , 0 = δ i w i r and η i , r , p = δ i w i r w i p , where δ i = η i ( 1 η i ) ( 1 2 η i ) , with i = 1 , 2 , , n , and r , p = 0 , 1 , , k 1 .

Appendix D. Histograms of the Data Considered in Section 5.2 and Section 5.3

Figure A1. (Left) Histograms of the peak horizontal acceleration data. (Right) Histogram of the response FI (the measure of the firm’s risk management cost effectiveness).
Figure A1. (Left) Histograms of the peak horizontal acceleration data. (Right) Histogram of the response FI (the measure of the firm’s risk management cost effectiveness).
Symmetry 13 01190 g0a1

Appendix E. Estimates and other Fit Measures for the Sample Associated with Figure 7 of Section 5.1

Table A3. The parameter estimates (with standard errors in parentheses); the , AIC, CAIC and BIC values; and the p-values of the AD and CvM goodness-of-fit tests for the SU, P, MOEU, B, K and LU distributions fitted to simple generated data.
Table A3. The parameter estimates (with standard errors in parentheses); the , AIC, CAIC and BIC values; and the p-values of the AD and CvM goodness-of-fit tests for the SU, P, MOEU, B, K and LU distributions fitted to simple generated data.
ParameterLUKBMOEUPSU
α 0.0090.9140.9140.1310.4381.000
(0.003)(0.050)(0.065)(0.012)(0.025)(0.113)
β -4.0784.291---
(0.399)(0.375)
232.3227.0226.4216.3136.5143.4
AIC−462.7−450.0−448.8−430.7−271.0−284.8
BIC−459.0−442.6−441.4−427.0−267.3−281.1
AD0.9150.4850.3990.005<0.001<0.001
CvM0.9620.6240.4730.027<0.001<0.001

References

  1. McCall, B.P. The impact of unemployment insurance benefit levels on recipiency. J. Bus. Econ. Stat. 1995, 13, 189–198. [Google Scholar]
  2. Maindonald, J.H.; Waddell, B.C.; Petry, R.J. Apple cultivar effects on codling moth (Lepidoptera: Tortricidae) egg mortality following fumigation with methyl bromide. Postharvest Biol. Technol. 2001, 22, 99–110. [Google Scholar] [CrossRef]
  3. Farsalinos, K.E.; Spyrou, A.; Tsimopoulou, K.; Stefopoulos, C.; Romagna, G.; Voudris, V. Nicotine absorption from electronic cigarette use: Comparison between first and new-generation devices. Sci. Rep. 2014, 4, 4133. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, Volume 2, 2nd ed.; John Wile & Sons: Hoboken, NJ, USA, 1995; ISBN 0-471-58494-0. [Google Scholar]
  5. Kumaraswamy, P. A generalized probability density function for double-bounded random processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
  6. Jose, K.; Krishna, E. Marshall-Olkin extended uniform distribution. ProbStat Forum 2011, 4, 78–88. [Google Scholar]
  7. Shaw, W.T.; Buckley, I.R. The alchemy of probability distributions: Beyond Gram-Charlier expansions, and a skew-kurtotic-normal distribution from a rank transmutation map. arXiv 2009, arXiv:0901.0434. [Google Scholar]
  8. Iriarte, Y.A.; de Castro, M.; Gómez, H.W. The Lambert-F distributions class: An alternative family for positive data analysis. Mathematics 2020, 8, 1398. [Google Scholar] [CrossRef]
  9. Iriarte, Y.A.; de Castro, M.; Gómez, H.W. A unimodal/bimodal skew/symmetric distribution generated from Lambert’s transformation. Symmetry 2021, 13, 269. [Google Scholar] [CrossRef]
  10. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
  11. Girma, S.; Görg, H. Foreign Direct Investment, Spillovers and Absorptive Capacity: Evidence from Quantile Regressions. IIIS Discussion Paper 1. GEP Working Paper 2002/14. 2003. Available online: https://ssrn.com/abstract=410742 (accessed on 29 April 2021).
  12. Chunying, Z. A Quantile Regression Analysis on the Relations between Foreign Direct Investment and Technological Innovation in China. In Proceedings of the 2011 International Conference of Information Technology, Computer Engineering and Management Sciences, Nanjing, China, 24–25 September 2011; Volume 4, pp. 38–41. [Google Scholar]
  13. Ferrari, S.; Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 2004, 31, 799–815. [Google Scholar] [CrossRef]
  14. Bayes, C.L.; Bazán, J.L.; García, C. A new robust regression model for proportions. Bayesian Anal. 2012, 7, 841–866. [Google Scholar] [CrossRef]
  15. Mitnik, P.A.; Baek, S. The Kumaraswamy distribution: Median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Stat. Pap. 2013, 54, 177–192. [Google Scholar] [CrossRef]
  16. Korkmaz, M.Ç.; Chesneau, C.; Korkmaz, Z.S. On the arcsecant hyperbolic normal distribution. Properties, quantile regression modeling and applications. Symmetry 2013, 13, 117. [Google Scholar] [CrossRef]
  17. Bayes, C.L.; Bazán, J.L.; de Castro, M. A quantile parametric mixed regression model for bounded response variables. Stat. Interfaces 2017, 10, 483–493. [Google Scholar] [CrossRef]
  18. Goerg, G.M. LambertW: An R Package for Lambert W × F Random Variables. R Package Version 0.6.4. 2016. Available online: https://CRAN.R-project.org/package=LambertW (accessed on 29 April 2021).
  19. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  20. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  21. Dunn, P.K.; Smyth, G.K. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
  22. Faraway, J.; Marsaglia, G.; Marsaglia, J.; Baddeley, A. goftest: Classical Goodness-of-fit Tests for Univariate Distributions. R Package Version 1.2-2. 2019. Available online: https://CRAN.R-project.org/package=goftest (accessed on 29 April 2021).
  23. Joyner, W.B.; Boore, D.M. Peak horizontal acceleration and velocity from strong-motion records including records from the 1979 Imperial Valley, California, earthquake. Bull. Seismol. Soc. Am. 1981, 71, 2011–2038. [Google Scholar] [CrossRef]
  24. Schmit, J.T.; Roth, K. Cost effectiveness of risk management practices. J. Risk. Insur. 1990, 57, 455–470. [Google Scholar] [CrossRef]
  25. Gómez-Déniz, E.; Sordo, M.A.; Calderín-Ojeda, E. The Log–Lindley distribution as an alternative to the beta regression model with applications in insurance. Insur. Math. Econ. 2014, 54, 49–57. [Google Scholar] [CrossRef]
Figure 1. LU pdf curves for different values of α .
Figure 1. LU pdf curves for different values of α .
Symmetry 13 01190 g001
Figure 2. Plots of the skewness and kurtosis coefficients of the LU distribution (red color) and the U distribution (circle).
Figure 2. Plots of the skewness and kurtosis coefficients of the LU distribution (red color) and the U distribution (circle).
Symmetry 13 01190 g002
Figure 3. The AE, SD, SE, RMSE and CP for each of the 1000 estimates of α obtained in the scenarios α = 0.5 (top) and α = 1.5 (bottom), under the different sample sizes.
Figure 3. The AE, SD, SE, RMSE and CP for each of the 1000 estimates of α obtained in the scenarios α = 0.5 (top) and α = 1.5 (bottom), under the different sample sizes.
Symmetry 13 01190 g003
Figure 4. The AE, SD, SE and RMSE for each of the 1000 estimates of the coefficients β s obtained in Scenario A, under the different sample size.
Figure 4. The AE, SD, SE and RMSE for each of the 1000 estimates of the coefficients β s obtained in Scenario A, under the different sample size.
Symmetry 13 01190 g004
Figure 5. The AE, SD, SE and RMSE for each of the 1000 estimates of the coefficients β s obtained in Scenario B, under the different sample size.
Figure 5. The AE, SD, SE and RMSE for each of the 1000 estimates of the coefficients β s obtained in Scenario B, under the different sample size.
Symmetry 13 01190 g005
Figure 6. The CPs for the estimates of the coefficients β s in: Scenario A (left); and Scenario B (right).
Figure 6. The CPs for the estimates of the coefficients β s in: Scenario A (left); and Scenario B (right).
Symmetry 13 01190 g006
Figure 7. Histogram for a single sample generated from the LU( 0.01 ) population fitted with the LU, K, MOEU, SU, B and K distributions.
Figure 7. Histogram for a single sample generated from the LU( 0.01 ) population fitted with the LU, K, MOEU, SU, B and K distributions.
Symmetry 13 01190 g007
Figure 8. QQ-plots: (a) LU distribution; (b) K distribution; (c) B distribution; (d) MOEU distribution; (e) P distribution; and (f) SU distribution.
Figure 8. QQ-plots: (a) LU distribution; (b) K distribution; (c) B distribution; (d) MOEU distribution; (e) P distribution; and (f) SU distribution.
Symmetry 13 01190 g008
Figure 9. Coefficient estimates and its 95% confidence intervals for variables AS, CA, SI, IN, CE and SO in different LU quantile regression models considering q = 0.1 , 0.2 , , 0.9 and response variable FI.
Figure 9. Coefficient estimates and its 95% confidence intervals for variables AS, CA, SI, IN, CE and SO in different LU quantile regression models considering q = 0.1 , 0.2 , , 0.9 and response variable FI.
Symmetry 13 01190 g009
Table 1. Non-rejection and hit rates for the LU, K, MOEU, SU, B and K distributions obtained from the 1000 samples generated.
Table 1. Non-rejection and hit rates for the LU, K, MOEU, SU, B and K distributions obtained from the 1000 samples generated.
Non-Rejection RateHit Rate
DistributionADCvMAICBIC
LU0.9970.9990.8650.969
P0.0000.0000.0000.000
MOEU0.1170.4180.0020.002
SU0.0000.0000.0000.000
B0.9820.9860.0410.020
K0.9930.9950.0920.009
Table 2. The parameter estimates (with standard errors in parentheses), the , AIC, CAIC and BIC values and the p-values of the AD and CvM goodness-of-fit tests for the SU, P, MOEU, B, K and LU distributions fitted to the peak horizontal acceleration data.
Table 2. The parameter estimates (with standard errors in parentheses), the , AIC, CAIC and BIC values and the p-values of the AD and CvM goodness-of-fit tests for the SU, P, MOEU, B, K and LU distributions fitted to the peak horizontal acceleration data.
ParameterLUKBMOEUPSU
α 0.0050.8900.8770.1120.4121.000
(0.002)(0.062)(0.080)(0.013)(0.030)(0.170)
β -4.4234.699---
(0.571)(0.533)
158.5157.0156.6149.398.191.7
AIC−315.1−310.1−309.3−296.7−194.3−181.5
BIC−311.9−303.7−302.9−293.5−191.1−178.2
AD0.9780.8820.4860.069<0.001<0.001
CvM0.9650.8840.5760.140<0.001<0.001
Table 3. The , AIC, CAIC and BIC values for the ASHN, K and LU quantile regression models fitted to the risk managements practice data and the p-values of the AD and CvM tests for the randomize residuals.
Table 3. The , AIC, CAIC and BIC values for the ASHN, K and LU quantile regression models fitted to the risk managements practice data and the p-values of the AD and CvM tests for the randomize residuals.
Criterionp-Value for the
qModelAICBICAD TestCvM Test
0.25ASHN80.3−144.7−126.4<0.001<0.001
K97.9−179.8−161.50.1660.198
LU107.8−201.6−185.60.1340.231
0.5ASHN80.1−144.2−125.9<0.001<0.001
K98.8−181.6−163.30.1500.169
LU108.1−202.2−186.20.1540.252
0.75ASHN81.8−147.7−129.4<0.001<0.001
K99.9−183.9−165.60.1510.147
LU108.6−203.2−187.20.1580.230
Table 4. Coefficient estimates for the LU quantile regression model fitted to the risk managements practice data and significance tests of individual regression coefficients.
Table 4. Coefficient estimates for the LU quantile regression model fitted to the risk managements practice data and significance tests of individual regression coefficients.
qParameterEstimateSEzp-Value
0.25Intercept1.6191.5431.0490.293
AS−0.0220.017−1.3110.189
CA0.3180.3141.0130.310
SI−0.7740.164−4.717<0.001
IN3.4940.8883.932<0.001
CE−0.0440.112−0.3990.689
SO−0.0090.028−0.3220.747
0.50Intercept2.7301.6011.7050.088
AS−0.0220.018−1.2630.206
CA0.3110.3220.9650.334
SI−0.8020.167−4.788<0.001
IN3.6440.8424.323<0.001
CE−0.0440.115−0.3880.697
SO−0.0090.029−0.3330.738
0.75Intercept3.8501.7152.2450.024
AS−0.0230.019−1.1730.240
CA0.2930.3400.8630.387
SI−0.8550.175−4.869<0.001
IN3.9460.8264.777<0.001
CE−0.0450.122−0.3740.707
SO−0.0100.031−0.3470.728
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Iriarte, Y.A.; de Castro, M.; Gómez, H.W. An Alternative One-Parameter Distribution for Bounded Data Modeling Generated from the Lambert Transformation. Symmetry 2021, 13, 1190. https://doi.org/10.3390/sym13071190

AMA Style

Iriarte YA, de Castro M, Gómez HW. An Alternative One-Parameter Distribution for Bounded Data Modeling Generated from the Lambert Transformation. Symmetry. 2021; 13(7):1190. https://doi.org/10.3390/sym13071190

Chicago/Turabian Style

Iriarte, Yuri A., Mário de Castro, and Héctor W. Gómez. 2021. "An Alternative One-Parameter Distribution for Bounded Data Modeling Generated from the Lambert Transformation" Symmetry 13, no. 7: 1190. https://doi.org/10.3390/sym13071190

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop