Next Article in Journal
Recent Advances in Robust Design for Accelerated Failure Time Models with Type I Censoring
Previous Article in Journal
Quantum Algorithms for Some Strings Problems Based on Quantum String Comparator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Family of Distributions Based on Proportional Hazards

by
Guillermo Martínez-Flórez
1,
Carlos Barrera-Causil
2,
Osvaldo Venegas
3,*,
Heleno Bolfarine
4 and
Héctor W. Gómez
5
1
Departamento de Matemáticas y Estadística, Facultad de Ciencias, Universidad de Córdoba, Córdoba 2300, Colombia
2
Facultad de Ciencias Exactas y Aplicadas, Instituto Tecnológico Metropolitano, Medellín 050034, Colombia
3
Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad Católica de Temuco, Temuco 4780000, Chile
4
Departamento de Estatística, Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508-090, Brazil
5
Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(3), 378; https://doi.org/10.3390/math10030378
Submission received: 17 October 2021 / Revised: 17 December 2021 / Accepted: 14 January 2022 / Published: 26 January 2022
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this article, we introduce a new family of symmetric-asymmetric distributions based on skew distributions and on the family of order statistics with proportional hazards. This new family of distributions is able to fit both unimodal and bimodal asymmetric data. Furthermore, it contains, as special cases, the symmetric distribution and the “skew-symmetric” family, and therefore the skew-normal distribution. Another interesting feature of the family is that the parameter controlling the distributional shape in bimodal cases takes values in the interval (0, 1); this is an advantage for computing maximum likelihood estimates of model parameters, which is performed by numerical methods. The practical utility of the proposed distribution is illustrated in two real data applications.

1. Introduction

A seminal paper by [1] revealed the main properties of the “skew-normal” distribution whose probability density function (pdf) is given by
ϕ S N ( z ; λ ) = 2 ϕ ( z ) Φ ( λ z ) , z R ,
where Φ and ϕ denote the cumulative and density functions of the standard normal distribution, respectively. Here, λ is a parameter that controls the asymmetry of the random variable Z. Generally this is denoted by SN ( λ ) . Since this work was published, numerous publications have been based on this model, primarily [2,3,4,5,6,7,8,9].
An important lemma demonstrated by [1] represents a fundamental result in the development of asymmetric and symmetric models for both unimodal and bimodal cases. This lemma is presented below.
Lemma 1.
Let f 0 be a pdf symmetrical around zero and a distribution function G such that G exists and is a symmetric (around zero) density function; then
f Z ( z ; λ ) = 2 f 0 ( z ) G ( λ z ) , z R
is a density function for any λ R . This will be denoted by S f 0 ( λ ) .

1.1. Asymmetric Models of Fractional Order Statistics

The study of asymmetric models based on order statistics goes back to [10], who introduced a model called the “Lehmann alternative”, which originated from the distribution of the maximum in the sample. It later became an alternative for distributions presenting a high degree of asymmetry and/or kurtosis. This family of distributions is represented by the distribution function
F F ( z ; α ) = { F ( z ) } α , z R ,
where F is a cumulative distribution function (cdf) and α is a rational number. For α N , we have the distribution function of the maximum in the sample.
Subsequently, [11] introduced the distribution of fractional order statistics, which is defined by the pdf
ψ F ( z ; α ) = α f ( z ) { F ( z ) } α 1 , z R ,
where α R + is a shape parameter and F is an absolutely continuous distribution function with pdf f = d F . This is called the power-symmetric (PS) model. Derivations and properties of the distributions of order statistics have been widely discussed by [12,13,14], among others. One important special case follows when f = ϕ : this is called the power-normal (PN) distribution (see [14]). Ref. [15] derived the expected (Fisher) information matrix for the PN distribution and showed that it is nonsingular at the vicinity of symmetry ( α = 1.0 ), in contrast to the case of SN density, for which the Fisher formation matrix is singular at λ = 1.0 .

1.2. Asymmetric Bimodal Models

Several fields of science provide data that cannot be modeled or fitted with distributions such as skew-normal or fractional order statistics because the nature of these data leads to bimodal behaviors; these distributions have good performance only for unimodal cases. In many areas, such as health sciences, engineering, economics, among others, it is common to find data sets that present bimodal behaviors; thus, it is required other model alternatives that besides being able to capture a possible bimodality, do not present identifiability problems of the parameters, which are often proposals that come from mixtures of distributions. Consequently, this research is motivated with the interest of estimating, in a simple way, the parameters of the model that we propose and that has the faculty of fitting symmetric or asymmetric bimodal data, being thus a proposal that opens the possibility of new researches in these areas.
Models of this type have been studied by [16], who introduced the bimodal extension of the skew-normal model, called the “two-pieces skew-normal (TN) model”. This model is denoted by TN ( λ ) , whose pdf is represented by
g ( z ; λ ) = c λ ϕ ( z ) Φ ( λ | z | ) ,
where λ is a real number and c λ = 2 π / ( π + 2 a r c t a n ( λ ) ) is a normalizing constant. For λ > 0 , Kim demonstrates that model (1) is bimodal and symmetric around zero.
Ref. [17] developed the asymmetric bimodal model termed “the extended two-pieces skew-normal (ETN) model”, with a pdf given by
h ( z ; θ ) = 2 c λ ϕ ( z ) Φ ( λ | z | ) Φ ( β z ) ,
where β and λ are real numbers and c λ is a normalizing constant. The model is denoted by ETN ( λ , β ) and is an asymmetric extension of Kim’s model.
The proportional hazards model was introduced by [18] and is very important in survival analysis. Although [18] used this model to introduce covariables, it can also be used to introduce a shape parameter into the base distribution (see [19]). One example is the Burr XII distribution (see [20]), which can be obtained as a proportional hazards model from the base distribution function. The main object of this paper is to use this proportional hazards methodology to propose a new family of uni-/bimodal distributions, based on the power- symmetric family of distributions.
The paper is organized as follows. In Section 2, the extended skew model distribution with proportional hazards is derived, and its density function, special cases and moments are presented. In Section 3, parameter estimation is considered using maximum likelihood (ML). Observed and Fisher information matrices are derived, and it is shown that the Fisher information matrix is nonsingular. In Section 4, we perform a small-scale simulation study. In Section 5, two real data sets are analyzed using the proposed distribution and some other competing distributions to illustrate their applicability.

2. Extended Skew Model with Proportional Hazard

Following similar guidelines as in [10,11], we define the density function of the order statistics with proportional hazard.
Let F be a continuous cdf with pdf f, continuous and symmetric around zero, and hazard function h = f / ( 1 F ) . We say that Z has a distribution with proportional hazards, associated with the cdf F and pdf f, and parameter α > 0 if its pdf is given by the expression
φ F ( z ; α ) = α f ( z ) { 1 F ( z ) } α 1 , z R ,
where α is a positive real number and F is a continuous distribution function with density function f = d F , continuous and symmetrical around zero. The PS distribution with proportional hazards is denoted by PSH ( α ) . For f = F , a pdf continuous and symmetric around zero, the density (3) matches the density of the variable Z = Y where Y PS ( α ) .
The cdf of the PSH model is given by
F ( z ) = 1 { 1 F ( z ) } α , z R .
The expression proportional hazards model must be understood in the sense that the hazard function of this model concerning the function F ( z ) is
h ( X , F , α ) = α h ( x ) .
When F = Φ , we get the PN distribution with proportional hazards, which is denoted by PNH ( α ) . This model also represents an alternative for modeling data with skewness and kurtosis outside the permitted ranges for normal function.
Figure 1 depicts how parameter α controls the skewness and kurtosis of the PNH ( α ) model.
The PNH model is suitable for fitting asymmetric unimodal data. Although this model is more flexible than the normal model, it is unsuitable for fitting a bimodal data set. A more flexible model than the PNH is as defined below, which has the ability to fit unimodal and well as bimodal data. This model is obtained from the PNH ( α ) model.
We define the extended proportional hazard model by the pdf
φ ( z ; α ) = α f ( z ) 2 ( 1 F ( | z | ) ) α 1 , z R ,
where α R + , F is an absolutely continuous cdf with pdf f = F , which is symmetrical around zero. We use the notation EPSH ( α ) .
Result 1.
If Z EPSH ( α ) , then the model (4) is symmetrical.
Result 2.
If Z EPSH ( α ) , then the cdf of Z is given by:
F F ( z ; α ) = 1 2 2 F ( z ) α , if z < 0 , 1 2 2 ( 1 F ( z ) ) α , if z 0 .
Result 3.
Let Z EPSH ( α ) and U u ( 0 , 1 ) . Then, using the inversion method, we can obtain a random variable with distribution Z EPSH ( α ) . This variable can be obtained by the expression
Z = F 1 1 2 ( 2 U ) 1 / α , with U < 1 / 2 and Z < 0 , F 1 1 1 2 ( 2 U ) 1 / α , with U 1 / 2 and Z 0 ,
where F 1 is the inverse function of F.

2.1. Skew-EPsH (SEPSH) Model

Although the EPSH model is adequate for fitting bimodal data sets, it is not suitable when the data set presents asymmetric bimodality. However, supported by the results given in [1], we can obtain a more general model that achieves asymmetric bimodality.
Based on models (4) and (2), we now introduce a new family of distributions with the special feature that for certain distributions (e.g., normal), it can fit asymmetrical uni and bimodal data sets. This new family of distributions has pdf
φ F G ( z ; α , β ) = 2 α f ( z ) 2 ( 1 F ( | z | ) ) α 1 G ( β z ) , z R ,
where α R + , β R , F is a continuous distribution function with density f = F , which is symmetric around zero, and G is a continuous and symmetric cdf with pdf G , symmetric around zero. This new family of distributions is called the asymmetric extended family with proportional hazards. Note that when α = 1 , we have the “skew-symmetric” distribution, i.e., this new model can be seen as a generalization of the “skew-symmetric” model and the models of order statistics for the case of proportional hazards.
The proof that function (6) is a density follows from Lemma 1 by taking f 0 ( z ) = α f ( z ) 2 ( 1 F ( | z | ) ) α 1 , which is symmetric around zero. Therefore, this new family of distributions belongs to the “skew-symmetric” family, and as f 0 belongs to the exponentiated family (see [13]) or family of order statistics [11], this model will be called “skew-power-symmetric (SPS)” and we will be denoted by SPS ( α , β ) .
Result 4.
SPS ( 1 , β ) = S f 0 ( β ) .
The proof of this result is immediate since f = d F = f 0 is symmetric around zero. Therefore, the “skew-symmetric” distribution of Azzalini is a special case of the SPS ( α , β ) distribution.

2.2. Skew-Power-Normal Model

Taking F = G = ϕ in (6) leads to the model
φ Φ ( z ; α , β ) = 2 α ϕ ( z ) 2 ( 1 Φ ( | z | ) ) α 1 Φ ( β z ) , z R ,
which will be called “skew-power-normal (SPN)” model, and will be denoted by SPN ( α , β ) .
  • Properties
The following properties are obtained directly from the model (7).
Property 1.
SPN(1, 0) = N(0, 1).
Property 2.
SPN(1, β) = SN(β).
Property 3.
SPN(α, 0) = PNH(α).
Property 4.
SPN(2, β) = a × SN(β) − b × ETN(β) with a and b positive constants.
Property 5.
SPN(2, 0) = a × N(0, 1) − b × TN(1) with a and b positive constants.
Result 5.
If Z SPN ( α , β ) , then for β 0 , its density function is unimodal asymmetric for α 1 and asymmetric bimodal for α < 1 .
Proof of Result 5. 
Differentiating f 0 ( z ) = α ϕ ( z ) 2 ( 1 Φ ( | z | ) ) α 1 with respect to z and equating to zero, we obtain that the points where the maxima and minima occur are the solutions of the equations
( α 2 ) log [ 1 ϕ ( | x | ) ] + log ( ϕ ( | x | ) ) = 0 , if α 1 , ( 1 α ) ϕ ( | x | ) = | x | ( 1 Φ ( | x | ) ) , if α < 1 .
Then, f 0 is unimodal for α 1 and bimodal for α < 1 . In addition, as f 0 is symmetric, then this density will be bimodal symmetric for α < 1 . Therefore, we conclude that φ Φ ( z ; α , β ) is asymmetric bimodal if α < 1 and asymmetric unimodal otherwise. □
This feature makes the model attractive for fitting data presenting bimodality, since the parameter range is very short (between 0 and 1), making it advantageous for computational procedures taking into account that the starting point of the process maximizing the log-likelihood function is determined more accurately.

The Location-Scale Case

Consider a random variable Z SPN ( α , β ) , with α R + and β R . The family of distributions with location-scale parameters for the SPN distribution is defined as the distribution of X = ξ + η Z for ξ R and η > 0 , and its density function is given by
φ Φ ( x ; ξ , η , α , β ) = 2 α η ϕ x ξ η 2 1 Φ x ξ η α 1 Φ β x ξ η , x R ,
where ξ is the location parameter and η is the scale parameter. We use the notation SPN ( ξ , η , α , β ) .
Figure 2 illustrates the behavior of the pdf (8) for different values of ξ , η , α and β . As can be seen from the figure, the shape of the bimodality depends on the parameters α and β .

2.3. Moments

The following expressions allow the calculation of the moments of a random variable with SPN ( α , β ) distribution
E ( Z r ) = 2 α μ r ( α ) , if r is even , 2 α 2 μ r ( α , β ) μ r ( α ) , if r is odd ,
where
μ r ( α ) = α 0 z r ϕ ( z ) Φ ( z ) α 1 d z and μ r ( α , β ) = α 0 z r ϕ ( z ) Φ ( z ) α 1 Φ ( β z ) d z .
The central moments μ r ´ = E ( Z E ( Z ) ) r for r = 2 , 3 , 4 can be calculated from the expressions:
μ 2 ´ = μ 2 μ 1 2 , μ 3 ´ = μ 3 3 μ 2 μ 1 + 2 μ 1 3 and μ 4 ´ = μ 4 4 μ 3 μ 1 + 6 μ 2 μ 1 2 3 μ 1 4 .
Consequently, the variance and the coefficients of asymmetry and kurtosis are given by σ 2 = V a r ( Z ) = μ 2 ´ , β 1 = μ 3 ´ / [ μ 2 ´ ] 3 / 2 and β 2 = μ 4 ´ / [ μ 2 ´ ] 2 .

3. Inference

We study next the ML estimators and the observed and expected information matrices for the parameters of the SPN model.

3.1. The Standard Case

For a random sample Z = ( Z 1 , Z 2 , , Z n ) of the SPN ( α , β ) distribution, we have the log-likelihood function
( θ ; Z ) = n log ( α ) + n α log ( 2 ) + i = 1 n log ( ϕ ( ( z i ) ) ) + ( α 1 ) i = 1 n log ( 1 Φ ( | z i | ) ) + i = 1 n log ( Φ ( β z i ) ) .
Therefore, the score function, defined as the derivatives with respect to the parameters α and β of the log-likelihood function, is given by
U ( α ) = n α + i = 1 n log [ 2 ( 1 Φ ( | z i | ) ) ] and U ( β ) = i = 1 n z i ϕ ( β z i ) Φ ( β z i ) .
Equating the score function to zero leads to score equations
α ^ = n i = 1 n log [ 2 ( 1 Φ ( | z i | ) ) ] and i = 1 n z i ϕ ( β z i ) Φ ( β z i ) = 0
whose solution is obtained using iterative numerical methods.
Therefore, the elements of the observed information matrix, denoted by j α α , j β α , j β β , are given by
j α α = n α 2 , j β α = 0 , j β β = n [ β z 3 w ¯ + z 2 w 1 2 ¯ ] ,
where w i = ϕ ( | z i | ) Φ ( | z i | ) and w 1 i = ϕ ( β z i ) / Φ ( β z i ) ; then, the parameters α and β are orthogonal, so the expected information matrix defined as n 1 times the expectation of the observed information matrix will be diagonal with elements
I ( α , β ) = 1 / α 2 0 0 β a 31 + a 122 ,
where a j k = E ( z j w k ) and a 1 j k = E ( z j w 1 k ) . Then for β a 31 + a 122 0 we have
( α ^ , β ^ ) A N 2 ( ( α , β ) , I ( α , β ) 1 ) ,
which ensures the asymptotic convergence of the ML estimators for the parameters of the model.

3.2. The Location-Scale Case

For a random sample X 1 , X 2 , , X n , with X i SPN ( ξ , η , α , β ) , the log-likelihood function of θ = ( ξ , η , α , β ) , given X , is given by:
( θ ; X ) = n log ( α ) + n α log ( 2 ) n log ( η ) + i = 1 n log ( ϕ ( z i ) ) + ( α 1 ) i = 1 n log ( 1 Φ ( | z i | ) ) + i = 1 n log ( Φ ( β z i ) ) ,
where z i = x i ξ η . Thus, the score function is given by:
U ( ξ ) = 1 η i = 1 n z i α 1 η i = 1 n s g n ( z i ) ϕ ( | z i | ) 1 Φ ( | z i | ) β η i = 1 n ϕ ( β z i ) Φ ( β z i ) ,
U ( η ) = n η + 1 η i = 1 n z i 2 + α 1 η i = 1 n | z i | ϕ ( | z i | ) 1 Φ ( | z i | ) β η i = 1 n z i ϕ ( β z i ) Φ ( β z i ) ,
U ( α ) = n α + n log ( 2 ) + i = 1 n log [ 1 Φ ( | z i | ) ] ,
U ( β ) = i = 1 n z i ϕ ( β z i ) Φ ( β z i ) ,
where “sgn” is the s i g n function. Equating these equations to zero, we obtain the corresponding score equations, the solution of which by iterative numerical methods leads to ML estimators.

3.3. Observed Information Matrix

The elements of the information matrix are defined similarly to the standard case and denoted by j ξ ξ , j ξ η , , j α α , j β α , j β β ; they are given by:
j ξ ξ = n η 2 n α 1 η 2 w 2 ¯ + s g n ( z ) z w ¯ + n β 2 η 2 β z w 1 ¯ + w 1 2 ¯ ,
j ξ η = 2 n η 2 z ¯ + n α 1 η 2 s g n ( z ) | z | w 2 ¯ + s g n ( z ) z 2 w ¯ s g n ( z ) w ¯ + n β η 2 β 2 z 2 w 1 ¯ + β z w 1 2 ¯ w 1 ¯ ,
j η η = n η 2 + 3 n η 2 z 2 ¯ + n α 1 η 2 2 | z | w ¯ z 2 w 2 ¯ + | z | 3 w ¯ β η z w 1 ¯ + n β η 2 β 2 z 3 w 1 ¯ + β z 2 w 1 2 ¯ 2 z w 1 ¯ ,
j ξ α = n η s g n ( z ) w ¯ , j η α = n η | z | w ¯ , j α α = n α , j β β = n [ β z 3 w ¯ + z 2 w 1 2 ¯ ] ,
j α β = 0 , j β η = n η [ z w 1 ¯ β 2 z 3 w 1 ¯ β z 2 w 1 2 ¯ ] , j β ξ = n η w 1 ¯ n β η 2 β z 2 w 1 ¯ + z w 1 2 ¯ ,
where w i = ϕ ( z i ) 1 Φ ( | z i | ) ,   w ¯ = 1 n i = 1 n w i ,   w 2 ¯ = 1 n i = 1 n w i 2 ,   z w ¯ = 1 n i = 1 n z i w i ,   s g n ( z ) z w ¯ = 1 n i = 1 n s g n ( z i ) z i w i , …, z 2 w 2 ¯ = 1 n i = 1 n z i 2 w i 2 ,   w 1 i = ϕ ( β z i ) / Φ ( β z i ) ,   w 1 ¯ = 1 n i = 1 n w 1 i and w 1 2 ¯ = 1 n i = 1 n w 1 i 2 .

3.4. Expected Information Matrix

Similar to the standard case, the elements of the expected information matrix are n 1 times the expected value of the elements of the observed information matrix, namely:
I θ r θ p = n 1 E 2 ( θ ; x ) θ r θ p , r , p = 1 , 2 , 3 , 4 ,
with θ 1 = ξ , θ 2 = η , θ 3 = α and θ 4 = β . Taking a k j = E { z k w j } , a k j * = E { | z | k w j } , a k j * * = E { s g n ( z ) z k w j } and a 1 k j = E { Z k ( ϕ ( β Z ) / Φ ( β Z ) ) j } , the elements of the expected information matrix can be expressed as follows:
I ξ ξ = 1 η 2 1 ( α 1 ) ( a 02 + a 11 * * ) + β 2 η 2 [ β a 111 + a 102 ] ,
I η ξ = 2 η 2 a 10 + α 1 η 2 a 01 * * + a 21 * * a 12 + β η 2 [ β 2 a 121 + β a 112 a 101 ] ,
I η η = 1 η 2 + 3 η 2 a 20 + α 1 η 2 a 31 * a 22 * 2 a 11 * + β η 2 [ β 2 a 131 + β a 122 2 a 111 ] ,
I β ξ = 1 η a 101 β η 2 [ β a 121 + a 112 ] , I β η = 1 η a 111 β η 2 [ β a 131 + β a 122 ] , I α ξ = 1 η a 01 * * ,
I α η = 1 η a 11 * , I α α = 1 α 2 , I β α = 0 , i β β = β a 131 + a 122 .
These expectations are calculated using numerical integration. When α = 1 and β = 0 , then φ ( x ; ξ , η , 1 , 0 ) = 1 η ϕ x ξ η , which is the location-scale density of the normal distribution. Thus, the information matrix is reduced to
I ( θ ) = 1 / η 2 0 a 01 * * / η 2 π / η 0 2 / η 2 a 11 * / η 0 a 01 * * / η a 11 * / η 1 0 2 π / η 0 0 2 / π
whose determinant is | I ( θ ) | = 4 π η 4 a 01 * * 2 = 0.30 η 4 0 ; hence, we conclude for the special case of the normal distribution that the expected information matrix for the model is nonsingular. The upper 2 × 2 submatrix is the information matrix of the normal distribution, and hence, for large n, we have that
θ ^ A N 4 ( θ , I ( θ ) 1 ) ,
so that θ ^ is consistent and asymptotically normally distributed, where I ( θ ) 1 is the covariance matrix for large samples.

4. Simulation

We now carry out a simulation study to analyze the behavior of the ML estimator of the shape parameter α . The samples were generated using the algorithm described in this document for different sample sizes n = 50, 100, 150, 300 and 1000. In each scenario, we performed 10,000 iterations and studied the mean and the root of the mean squared error (RMSE). The results are presented in Table 1, from which it is observed that for each scenario, the estimates were good for large and small sample sizes, and that when the sample size increases, the mean converges to the true value of the parameter α and the RMSE decreases, which indicates that the estimator α ^ is consistent for α .

5. Applications

In this section we present two real data illustrations, the first associated with a bimodal data set and the second to a unimodal one.

5.1. Application 1

The first application includes 3848 observations of the variable n u b , which measures a geometric feature of pollen grains. These data come from Pollen Data, available at http://lib.stat.cmu.edu/datasets/pollen.data (assessed on 12 August 2021). Table 2 shows the descriptive statistics of the variable n u b .The quantities b 1 and b 2 indicate, respectively, the sample skewness and kurtosis coefficients.
Note that the skewness and kurtosis coefficients are different from the values expected for the normal distribution, which leads to considering the use of a more flexible model such as the SPN model discussed in this article.
Therefore, the hypothesis to be tested is
H 0 : ( α , β ) = ( 1 , 0 ) versus H 1 : ( α , β ) ( 1 , 0 ) ;
using the
Λ = N ( θ ^ ) SPN ( θ ^ ) ,
statistic, this leads to
2 log ( Λ ) = 2 ( 11793.47 + 11774.54 ) = 37.86 ,
which is greater than the critical 5% chi-squared value, namely, χ 1 , 95 % 2 = 3.8414 . Therefore, the SPN model seems to be a useful alternative for modeling the n u b data. Table 3 shows the estimated standard errors ML estimates (in parentheses) for the SN, TN, ETN and SPN models. In Figure 3, we can see that the ETN and SPN models fit quite well.
It is evident that the fitting of the normal and SN models in this example is inadequate due to the asymmetric behavior and bimodality of the data. Thus, the TN, ETN and SPN models are adequate for fitting the variable n u b , so it is more reasonable to contrast the SPN model with models by [16,17]. To compare the models, which are not nested, we use the AIC criterion [21], namely
A I C = 2 ^ ( · ) + 2 k ,
where k is the number of parameters of the model to consider. Furthermore, we consider the consistent AIC (CAIC) criterion, namely
C A I C = 2 ^ ( · ) + ( 1 + log ( n ) ) k ,
where k is the number of parameters.
According to the AIC and CAIC criteria, the ETN and SPN models fit the variable n u b well, and much better than the TN model. Moreover, no significant differences are noted between the ETN and SPN models. Figure 3a shows clearly that the ETN and SPN models have the same degree of fit; note that the graph of the SPN model is superimposed on the ETN model. This shows the SPN model as a second alternative for modeling bimodal data. Figure 3b shows the qq-plot of the variable n u b for the SPN model.
Now we compare the SPN model with the mixture of the two normals model, which can be written as
f ( x ; μ 1 , σ 1 , μ 2 , σ 2 , p ) = p σ 1 ϕ ( x , μ 1 , σ 1 ) + 1 p σ 2 ϕ ( x ; μ 2 , σ 2 ) ,
where ϕ is the density of the standard normal distribution with parameters μ j , σ j , j = 1 , 2 and 0 < p < 1 . We denote the two-normals mixture model as MN ( μ 1 , σ 1 , μ 2 , σ 2 , p ) .
The estimated model is
MN ( 2.389 , 4.106 , 4.649 , 3.698 , 0.6605 )
with A I C = 23561.16 and C A I C = 23569.67 . This model presents BIC and CAIC greater than those for the SPN model, so the SPN model fits the n u b data set better than the MN model. Figure 3c shows the estimated densities for the SPN and MN models.

5.2. Application 2

In this second application, we use the data available at http://lib.stat.cmu.edu/jasadata/laslett (accessed on 19 August 2021), which, according to their summary statistics (see Table 4), have appropriate characteristics to be modeled with distributions such as the one proposed in this research. A detailed description of these data can be found in the link above, where the roller surface roughness height is measured. In total, there are 1150 observations measured at 1-micron intervals along the roller drum.
Hence, the PN, SN, and SPN models are fitted to the present data, and the MLE and standard errors (in parentheses) are calculated for each model studied (see Table 5). The results show the goodness of fit of the SPN model, which, compared to the other models, presents the best fit to the data. In addition, the plots of the fitted models are shown in Figure 4a, and the qq-plot for the SPN model is shown in Figure 4b.
In addition, a hypothesis test is performed to compare the normal model against the SPN model. Formally, we have the hypothesis
H 01 : ( α , β ) = ( 1 , 0 ) versus H 11 : ( α , β ) ( 1 , 0 ) ,
which can be tested using the statistic
Λ 1 = N ( θ ^ ) SPN ( θ ^ ) .
After numerical evaluations, we obtain
2 log ( Λ 1 ) = 142.274 ,
which is greater than the 5% critical value of the Chi-squared distribution with one degree of freedom, namely χ 1 , 95 % 2 = 3.8414 .
According to the AIC criterion, the SPN model fits the roller data set better than the SN and PN models; i.e., the SPN model achieves satisfactory fitting of skewness and kurtosis, which are not adequately fitted by the previous models. A reason for the above situation can be explained because the skewness and kurtosis of the data analyzed are outside the permitted ranges for the SN ((−0.9953, 0.9953) and (3, 3.8692), respectively) and PN ((−0.6115, 0.9007) and (1.7170, 4.3556), respectively) models. This may be an indication that the SPN model has range of skewness and kurtosis greater than that of the SN and PN models.

6. Discussion

In this paper, we introduce a new family of continuous uni-/bimodal distributions. This family was generated based on power-symmetric and proportional hazards distributions. The SPN distribution, which is a particular case of this family, is studied in greater detail. The new family presented is a viable alternative for modeling asymmetric unimodal and bimodal data sets. Further specific conclusions are as follows, listed in order:
  • The family of distributions presents flexibility in the modes of the base model, in both unimodal and bimodal cases;
  • The parameters are estimated using the ML method; a simulation study for the maximum likelihood estimators indicates good parameter recovery;
  • We show that the Fisher information matrix for the SPN distribution is nonsingular for the particular case of the normal distribution;
  • In the first example, we contrasted the normal, SN and TN models. It is obvious that these models fail to capture the asymmetric bimodality of the data. In contrast, the ETN and SPN models are more suitable for fitting the distribution of the variable n u b . In the second example we see that the normal, SN and PN models fail to adequately capture the high kurtosis of the variable roller. However, the SPN model appears to have more flexibility to fit this special feature.

Author Contributions

Conceptualization, G.M.-F. and C.B.-C.; methodology, H.W.G.; software, G.M.-F. and C.B.-C.; validation, G.M.-F., C.B.-C. and H.B.; formal analysis, O.V. and H.W.G.; investigation, G.M.-F. and C.B.-C.; writing—original draft preparation, H.B. and O.V.; writing—review and editing, O.V. and H.B.; funding acquisition, H.W.G. and O.V. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Héctor W. Gómez was supported by SEMILLERO UA-2022 (Chile). The research of O. Venegas was supported by Vicerrectoría de Investigación y Postgrado of the Universidad Católica de Temuco, Projecto interno FEQUIP 2019-INRN-03.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in Application 1 is available at http://lib.stat.cmu.edu/datasets/pollen.data (accessed on 16 October 2021), and for Application 2 at http://lib.stat.cmu.edu/jasadata/laslett (accessed on 16 October 2021).

Acknowledgments

We thank the anonymous referees for their thorough reading and significant suggestions that undoubtedly improved the presentation of the manuscript. In particular, authors G.M-F. and C.B-C extend their sincere gratitude for their support to the Universidad de Córdoba- Colombia and Instituto Tecnológico Metropolitano (ITM)-Colombia, respectively.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
pdfProbability density function
cdfCumulative distribution function
SNSkew-normal
S f 0 Skew-symmetric
PSPower-symmetry
PNPower-normal
TNTwo-pieces skew-normal
ETNExtended two-pieces skew-normal
MLMaximum likelihood
PSHPower-symmetry distribution with proportional hazards
PNHPower-normal distribution with proportional hazards
EPSHExtended power-symmetry distribution with proportional hazards
SEPSHSkew extended power-symmetry distribution with proportional hazards
SPSSkew-power-symmetric
SPNSkew-power-normal
AICAkaike information criterion
CAICConsistent AIC
MNTwo-normals mixture

References

  1. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  2. Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
  3. Henze, N. A probabilistic representation of the skew-normal distribution. Scand. J. Stat. 1986, 13, 271–275. [Google Scholar]
  4. Chiogna, M. Notes on Estimation Problems with Scalar Skew-Normal Distributions; Technical Report 15; Dept. Statistical Sciences, University of Padua: Padua, Italy, 1997. [Google Scholar]
  5. Pewsey, A. Problems of inference for Azzalini’s skew-normal distribution. J. Appl. Stat. 2000, 27, 859–870. [Google Scholar] [CrossRef]
  6. Venegas, O.; Sanhueza, A.I.; Gómez, H.W. An extension of the skew-generalized normal distribution and its derivation. Proyecc. J. Math. 2011, 30, 401–413. [Google Scholar] [CrossRef] [Green Version]
  7. Nadarajah, S.; Kotz, S. Skewed distributions generated by the normal kernel. Stat. Probab. Lett. 2003, 65, 269–277. [Google Scholar] [CrossRef]
  8. Gómez, H.W.; Venegas, O.; Bolfarine, H. Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 2002, 18, 395–407. [Google Scholar] [CrossRef] [Green Version]
  9. Gómez-Déniz, E.; Arnold, B.C.; Sarabia, J.M.; Gómez, H.W. Properties and Applications of a New Family of Skew Distributions. Mathematics 2021, 9, 87. [Google Scholar] [CrossRef]
  10. Lehmann, E.L. The power of rank tests. Ann. Math. Stat. 1953, 24, 23–43. [Google Scholar] [CrossRef]
  11. Durrans, S.R. Distributions of fractional order statistics in hydrology. Water Resour. Res. 1992, 28, 1649–1655. [Google Scholar] [CrossRef]
  12. Eugene, N.; Lee, C.; Famoye, F. Beta-normal distribution and its applications. Commun. Stat. Theory Methods 2002, 31, 497–512. [Google Scholar] [CrossRef]
  13. Gupta, R.C.; Gupta, R.D. Generalized skew normal model. Test 2004, 12, 501–524. [Google Scholar] [CrossRef]
  14. Gupta, R.D.; Gupta, R.C. Analyzing skewed data by power normal model. Test 2008, 17, 197–210. [Google Scholar] [CrossRef]
  15. Pewsey, A.; Gómez, H.W.; Bolfarine, H. Likelihood-based inference for power distributions. Test 2012, 21, 775–789. [Google Scholar] [CrossRef]
  16. Kim, H.J. On a class of two-piece skew-normal distribution. Statistics 2005, 39, 537–553. [Google Scholar] [CrossRef]
  17. Arnold, B.C.; Gómez, H.W.; Salinas, H.S. On multiple constraint skewed models. Statistics 2009, 43, 279–293. [Google Scholar] [CrossRef]
  18. Cox, D. Regression models and life tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
  19. Kalbfleisch, J.D.; Prentice, R.L. The Statistical Analysis of Failure Time Data, 2nd ed.; John Wiley and Sons: New York, NY, USA, 2002. [Google Scholar]
  20. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; John Wiley and Sons: New York, NY, USA, 1994; pp. 53–54. [Google Scholar]
  21. Akaike, H. A new look at statistical model identification. IEEE Trans. Automat. Contr. 1974, 19, 716–723. [Google Scholar] [CrossRef]
Figure 1. Plots of the PNH ( α ) distribution with α = 0.25 (dotted and dashed line), 1 (solid line), 2 (dashed line) and 3 (dotted line).
Figure 1. Plots of the PNH ( α ) distribution with α = 0.25 (dotted and dashed line), 1 (solid line), 2 (dashed line) and 3 (dotted line).
Mathematics 10 00378 g001
Figure 2. Plots of the distributions: (a) SPN ( 0.25 , 0.15 , 0.25 , 1 ) (solid line), SPN ( 0.5 , 0.25 , 0.5 , 1 ) (dashed line) and SPN ( 0.25 , 0.25 , 1.25 , 1 ) (dotted line) (b) SPN ( 0.25 , 0.15 , 0.25 , 1 ) (solid line), SPN ( 0.5 , 0.25 , 0.5 , 1 ) (dashed line) and SPN ( 0.25 , 0.25 , 1.25 , 1 ) (dotted line).
Figure 2. Plots of the distributions: (a) SPN ( 0.25 , 0.15 , 0.25 , 1 ) (solid line), SPN ( 0.5 , 0.25 , 0.5 , 1 ) (dashed line) and SPN ( 0.25 , 0.25 , 1.25 , 1 ) (dotted line) (b) SPN ( 0.25 , 0.15 , 0.25 , 1 ) (solid line), SPN ( 0.5 , 0.25 , 0.5 , 1 ) (dashed line) and SPN ( 0.25 , 0.25 , 1.25 , 1 ) (dotted line).
Mathematics 10 00378 g002
Figure 3. (a) Histogram for the variable n u b . Densities adjusted by ML: TN (dotted line), SN (dotted and dashed line), ETN (dashed line) and SPN (solid line). (b) qq-plot for the variable n u b . (c) SPN (solid line) and MN (dashed line).
Figure 3. (a) Histogram for the variable n u b . Densities adjusted by ML: TN (dotted line), SN (dotted and dashed line), ETN (dashed line) and SPN (solid line). (b) qq-plot for the variable n u b . (c) SPN (solid line) and MN (dashed line).
Mathematics 10 00378 g003
Figure 4. (a) Histogram for the variable roller. Densities adjusted: PN ( 4.5495 , 0.1982 , 0.0479 ) (dashed line), SN ( 4.2503 , 0.9694 , 2.7864 ) (dotted line) and SPN ( 3.9920 , 3.4615 , 6.5217 , 5.8626 ) (solid line). (b) qqplot for the variable roller.
Figure 4. (a) Histogram for the variable roller. Densities adjusted: PN ( 4.5495 , 0.1982 , 0.0479 ) (dashed line), SN ( 4.2503 , 0.9694 , 2.7864 ) (dotted line) and SPN ( 3.9920 , 3.4615 , 6.5217 , 5.8626 ) (solid line). (b) qqplot for the variable roller.
Mathematics 10 00378 g004
Table 1. ML estimator (mean) and RMSE for parameter α , SPN model.
Table 1. ML estimator (mean) and RMSE for parameter α , SPN model.
n = 50 n = 100 n = 150 n = 300 n = 1000
α Mean MSE MeanRMSEMeanRMSEMeanRMSEMeanRMSE
0.250.25490.03650.25340.02540.25200.02050.25080.01430.25050.0075
0.750.76410.11050.75890.07710.75420.06240.75280.04390.75090.0235
1.251.27590.18531.26010.12741.26030.10451.25390.07271.25090.0395
1.751.78660.26031.77160.17861.75820.14351.75790.10131.75150.0559
2.252.29540.33112.27210.23112.26790.18532.26030.13272.25270.0714
2.752.80680.4062.78150.28372.7670.22862.75950.15792.7530.0878
3.253.31560.47623.28450.33453.27130.2663.260.18663.25210.1033
3.753.82060.55943.78750.38413.77350.3113.76710.21873.75360.1185
4.254.32560.62874.28940.43284.27690.35334.26560.24724.25630.1359
4.754.85650.72054.79620.48434.78390.39314.76580.27284.75520.1502
5.255.34830.76175.29750.53455.28020.43595.27210.31045.25530.1662
5.755.87370.84585.80840.58815.79180.48025.7810.33615.75760.1825
6.256.39380.92886.31450.63546.28210.51256.26830.36356.25320.1975
6.756.87950.99616.81550.68856.79480.56096.77140.3936.75810.2139
Table 2. Descriptive statistics for the variable n u b (X).
Table 2. Descriptive statistics for the variable n u b (X).
VariablenMeanVariance b 1 b 2
X38480.00026.8980.0722.689
Table 3. Parameter estimates and standard errors for the SN, TN, ETN and SPN models.
Table 3. Parameter estimates and standard errors for the SN, TN, ETN and SPN models.
ParameterSNTNETNSPN
ξ 0.034 (0.032)0.064 (0.074)1.848 (0.123)1.7951 (0.1290)
η 5.186 (0.067)4.777 (0.069)5.000 (0.062)3.4988 (0.2395)
α −0.008 (1.216)0.409 (0.102)0.638 (0.131)0.5112 (0.0540)
β −0.417 (0.035)−0.2839 (0.0316)
Log-likelihood−11,793.47−11,783.98−11,774.47−11,774.50
AIC23,590.9423,573.9623,556.9423,557.00
CAIC23,605.4523,595.7323,567.4523,567.51
Table 4. Summary statistics for the variable roller.
Table 4. Summary statistics for the variable roller.
nMeanVariance β 1 β 2
11503.5350.650−0.9864.855
Table 5. Parameter estimates (standard error) for the PN, SN and PSN distributions.
Table 5. Parameter estimates (standard error) for the PN, SN and PSN distributions.
ParameterPNSNSPN
ξ 4.5495 (0.0572)4.2503 (0.0284)3.9920 (0.0220)
η 0.1982 (0.0279)0.9694 (0.0304)3.4615 (0.9914)
α 0.0479 (0.0156)−2.7864 (0.2529)6.5217 (2.1390)
β −5.8626 (1.6601)
Log-likelihood−1085.241−1071.362−1064.729
AIC2176.4822148.7242137.458
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Martínez-Flórez, G.; Barrera-Causil, C.; Venegas, O.; Bolfarine, H.; Gómez, H.W. A New Family of Distributions Based on Proportional Hazards. Mathematics 2022, 10, 378. https://doi.org/10.3390/math10030378

AMA Style

Martínez-Flórez G, Barrera-Causil C, Venegas O, Bolfarine H, Gómez HW. A New Family of Distributions Based on Proportional Hazards. Mathematics. 2022; 10(3):378. https://doi.org/10.3390/math10030378

Chicago/Turabian Style

Martínez-Flórez, Guillermo, Carlos Barrera-Causil, Osvaldo Venegas, Heleno Bolfarine, and Héctor W. Gómez. 2022. "A New Family of Distributions Based on Proportional Hazards" Mathematics 10, no. 3: 378. https://doi.org/10.3390/math10030378

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop