Next Article in Journal
Calculating Column Separation in Liquid Pipelines Using a 1D-CFD Coupled Model
Previous Article in Journal
Concept Mapping in Teaching Mathematics in Slovakia: Pedagogical Experiment Results
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Symmetric/Asymmetric Bimodal Extension Based on the Logistic Distribution: Properties, Simulation and Applications

by
Isaac E. Cortés
1,
Osvaldo Venegas
2,* and
Héctor W. Gómez
3
1
Inter-Institutional Graduation Program in Statistics, Universidade de São Paulo, São Paulo 05508-090, Brazil
2
Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad Católica de Temuco, Temuco 4780000, Chile
3
Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(12), 1968; https://doi.org/10.3390/math10121968 (registering DOI)
Submission received: 11 May 2022 / Revised: 26 May 2022 / Accepted: 31 May 2022 / Published: 7 June 2022
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this paper, we introduce bimodal extensions, one symmetric and one asymmetric, of the logistic distribution. We define this new density and study some basic properties. We draw inferences from the moment estimator and maximum likelihood approaches. We present a simulation study to assess the behaviour of the moment and maximum likelihood estimators. We also study the singularity of the Fisher information matrix for particular cases. We offer applications in real data and compare them with a mixture of logistics distributions.

1. Introduction

Data sets with bimodal behaviour appear in many disciplines, for example mining (see Bolfarine et al. [1]), medicine (see Ely et al. [2]; Elal-Olivero et al. [3]), meteorology (see Zhang et al. [4]), etc. The mixture of normal distributions is usually used to model bimodal data, but these models present some identifiability problems as discussed in McLachan and Peel [5] and Marin et al. [6].
Various recent works show bimodal distributions based on skew-symmetric distributions, for example Arnold and Beaver [7], Azzalini and Capitanio [8], Ma and Genton [9], Arellano-Valle et al. [10], Kim [11], Elal-Olivero et al. [3], Gómez et al. [12], Bolfarine et al. [1] and Venegas et al. [13]. These bimodal distributions offer a good alternative to statistical models based on finite mixtures of distributions.
The generalized bimodal (GB) distribution was introduced and studied by Rao et al. [14] and Sarma et al. [15], with density given by:
f X ( x ; γ ) = γ + x 2 1 + γ ϕ ( x ) , x R , γ 0 ,
where γ is the shape parameter and ϕ is the density N ( 0 , 1 ) .
A simple reparametrization of the GB distribution leads us to the expression given by:
f X ( x ; α ) = 1 + α x 2 1 + α ϕ ( x ) , x R , α 0 .
It is observed that the distribution N ( 0 , 1 ) is obtained when α = 0 in the GB distribution.
This idea can be extended using elliptic distributions; in the univariate case these are distributions of symmetric random variables. A random variable Y is symmetric about zero if its pdf ( g Y ) holds with
g Y ( y ) = g Y ( y ) ,
for all y R . For further discussion of the family of elliptic distributions, see Kelker [16] and Cambanis et al. [17], and the books Fang et al. [18] and Gupta and Varga [19], etc. Then using this methodology it can be extended (1), as follows
f X ( x ; α ) = 1 + α x 2 1 + α κ g Y ( x ) , x R ,
where α 0 , Y is a random variable symmetric about zero and κ = E ( Y 2 ) .
Values of k are reported in Table 1 for some particular cases of the density g Y :
  • g Y ( x ) = ϕ ( x ) (Normal);
  • g Y ( x ) = 1 2 exp | x | (Laplace);
  • g Y ( x ) = c 2 ( 1 + x 2 / ν ) ( 1 + ν ) / 2 with c 2 = ( ν π ) 1 / 2 Γ ( μ / 2 ) 1 Γ ( ( ν + 1 ) / 2 ) (Student-t);
  • g Y ( x ) = exp ( x ) / ( 1 + exp ( x ) ) 2 (Logistic).
One of the principal motivations of the present article is to consider the symmetric bimodal logistic (BL) distribution, since the logistic distribution has greater kurtosis than normal distribution, on account of the simplicity of its probability density function, given in item 4 of Table 1 and denoted here by Y L ( 0 , 1 ) . This distribution is therefore an alternative for modelling bimodal data when the distribution tails are a little heavier.
In other situations data are observed with greater flexibility in their modes; on this basis we introduce an asymmetric bimodal extension of the logistic distribution.
The paper is organised as follows. In Section 2 we show the density and some properties of the BL distribution. In Section 3 we carry out parameter estimation using the moments (M) and maximum likelihood (ML) methods, and a simulation to show the behaviour of the M and ML estimations. In Section 4 we apply the methodology to one real data set. Section 5 presents an asymmetric extension of the proposed model and two illustrations with real data sets. We finish with some conclusions in Section 6.

2. Density Function and Properties

Definition 1.
We will say that a random variable Z has a BL distribution if its density function is given by
f Z ( z ; α ) = 3 1 + α z 2 3 + π 2 α exp ( z ) ( 1 + exp ( z ) ) 2 , z R ,
where α 0 is the shape parameter. We denote by Z B L ( α ) .
By a straightforward computation we have
f Z ( z ; α ) d z = 3 3 + π 2 α exp ( z ) ( 1 + exp ( z ) ) 2 d z + α z 2 exp ( z ) ( 1 + exp ( z ) ) 2 d z = 1 .
We observe that the first term on the right side of the first equality is the integral of the standard Logistic distribution, and the second is the second moment of the same distribution; therefore their values are 1 and π 2 / 3 respectively, verifying the second equality.
Figure 1 shows the graph of the BL density for different values of parameter α .

2.1. Some Properties

The following properties arise directly from the above definition. Let Z BL ( α ) then the following basic properties are obtained:
Property 1.
If α = 0 the standard logistic distribution, Logistic ( 0 , 1 ) , is obtained.
Property 2.
As α tends to infinity the limiting distribution is obtained, defined by
f Z ( z ) = 3 z 2 exp ( z ) π 2 ( 1 + exp ( z ) ) 2 .
Property 3.
f Z ( z ; α ) = f Z ( z ; α ) , for all z R .
Property 4.
Consider Y = | Z | the Half BL distribution is obtained. In particular, as α = 0 the known Half Logistic distribution is obtained.
Property 5.
The cumulative distribution function (cdf) of the pdf f Z defined in (2) is given by
F Z ( z , α ) = 3 3 + α π 2 exp ( z ) 1 + exp ( z ) + α z 2 exp ( z ) 1 + exp ( z ) 2 z log ( 1 + exp ( z ) ) 2 L i 2 ( exp ( z ) ) .
We note that F Z is represented based on the poly-logarithm function that is defined as L i s ( z ) = n = 1 z n n s . More precisely, it refers to the dilogarithm function Li 2 (taking s = 2 ). For more details about this function we refer the reader to Prudnikov et al. [20].
The functions Li 2 ( exp ( z ) ) and z log ( 1 + exp ( z ) ) tend to zero when z tends to minus infinity; this can be proved using L’Hopital’s Rule, concluding that F Z ( z , α ) 0 as z .
A well-known reflection property of the dilogarithm function is given by L i 2 ( t ) + L i 2 ( 1 / t ) = π 2 / 6 1 / 2 log 2 ( t ) , in which taking t = exp ( z ) we get the following relation
L i 2 ( exp ( z ) ) + L i 2 ( exp ( z ) ) = π 2 6 1 2 z 2 .
Multiplying this equality by 2 and adding z 2 2 z log ( 1 + exp ( z ) ) we have that
z 2 2 z log ( 1 + exp ( z ) ) 2 Li 2 ( exp ( z ) ) = π 2 3 + 2 Li 2 ( exp ( z ) ) + 2 z 2 2 z log ( 1 + exp ( z ) ) .
We note that
lim z + 2 z 2 2 z log ( 1 + exp ( z ) ) = lim z + 2 z 2 2 z log ( exp ( z ) ( 1 + exp ( z ) ) ) = lim z + 2 z log ( 1 + exp ( z ) ) ,
then applying L’Hopital’s rule it follows that this limit is zero when z tends to infinity. Thus, making z + in Equation (3) we obtain that the expression of the left-hand side tends to π 2 / 3 .
We rewrite F Z , given in Property 5, as follows
3 3 + α π 2 exp ( z ) 1 + exp ( z ) + α z 2 1 + exp ( z ) + z 2 2 z log ( 1 + exp ( z ) ) 2 Li 2 ( exp ( z ) ) ,
then this latter, combined with the previous argument, concludes that F Z 1 as z + . Figure 1 shows graphs of the cdf for different values of parameter α .
We present the following technical Lemma that will allow us to prove the uni/bimodality of the proposed distribution.
Lemma 1.
Let g be a function defined by:
g ( x ) = 2 α ( 1 + x ) ( α x 2 + 1 2 α ) e x , α > 0 , x R .
The following holds:
  • g tends to as x ± , g ( x ) = 2 α ( α x 2 + 2 α x + 1 2 α ) e x y g ( x ) = ( α x 2 + 4 α x + 1 ) e x .
  • g is negative for all x R whenever 0 < α 1 / 4 .
  • If α > 1 / 4 , then g has two zeros.
Proof. 
A direct calculation gives the result for part 1. For part 2, we assume that in x 0 we reach a maximum for g, as g ( x 0 ) = 0 we obtain that
e x 0 = 2 α α x 0 2 + 2 α x 0 + 1 2 α ,
with α < 1 / 4 as the hypothesis. If x 0 0 , then from relation (4) we obtain that 2 α / ( α x 0 2 + 2 α x 0 + 1 2 α ) > 1 , or 0 < α x 0 2 + 2 α x 0 + 1 < 4 α . From the latter inequality we obtain that 1 < 4 α since α x 0 2 + 2 α x 0 + 1 > 1 ; this produces a contradiction with the fact that 0 < α 1 / 4 . Therefore the maximum is obtained in x 0 < 0 and its value is g ( x 0 ) = 2 α x 0 ( 1 + e x 0 ) , obtained using the definition of g and relation (4); this turns out to be negative, given the desired result.
To prove part 3, we first state that there is a single maximum value of g. In fact, we assume that there are two in order to obtain a contradiction. Let us consider two reached in x 0 and x 1 with x 0 < x 1 , then at least one x ˜ ( x 0 , x 1 ) where a minimum of g is reached, so we have that
g ( x 0 ) g ( x ˜ ) = 2 α x 0 ( 1 + e x 0 ) 2 α x ˜ ( 1 + e x ˜ ) = 2 α ( x 0 x ˜ ) + 2 α ( x 0 e x 0 x ˜ e x ˜ ) ,
the final expression, which is negative, since x 0 < x ˜ , producing a contradiction with the fact that in x ˜ a minimum of g is reached.
Now, let us assume that the maximum value of g is negative and is reached in x 0 . As α > 1 / 4 , g ( 0 ) = 4 α 1 > 0 and g as x ± ; this implies that another maximum value for g, which is a contradiction as in the previous paragraph. Thus, the maximum for g is positive, which gives the desired result.    □
Proposition 1.
Let X B L ( α ) , then the density function is unimodal if 0 < α 1 / 4 and bimodal as α > 1 / 4 .
Proof. 
From Equation (2) it follows
f ( x ) = α x 2 + 2 α x + 1 ( α x 2 2 α x + 1 ) e x ( 1 + α x 2 ) ( 1 + e x ) f ( x ) .
Hence, the zeros of f are the zeros of h ( x ) : = α x 2 + 2 α x + 1 ( α x 2 2 α x + 1 ) e x . As h ( 0 ) = 0 , we have that x = 0 is a critical point of f.
We Note that h ( x ) = g ( x ) , where g is the function defined in Lemma 1. Then using part 2 of Lemma 1, we obtain that h ( x ) < 0 for all x R . Therefore, h is strictly decreasing and as h ( 0 ) = 0 then x = 0 is the only zero for f . By direct calculation together with the fact that 0 < α < 1 / 4 we have that f ( 0 ) = 3 ( 4 α 1 ) / 8 ( 3 + π 2 α ) < 0 , which implies that f is unimodal; its mode is reached in x = 0 and its value is f ( 0 ) .
When α = 1 / 4 , by the above argument and Lemma 1, we have that zero is the only critical point for f. Consequently, its mode is reached at zero.
Now, using part 3 of Lemma 1 we have that h has two zeros when α > 1 / 4 and as h ( 0 ) = 0 we have that h has three zeros; one of these is x = 0 and the others are ± x 0 with x 0 0 (since h ( x 0 ) = h ( x 0 ) e x 0 ). Consequently, f has three zeros and from the previous calculation we have that f ( 0 ) > 0 , then in x = 0 a minimum of f is obtained as well as the other zeros in which f reaches its maximum value, which completes the proof.    □

2.2. Moments

In this subsection we calculate the moments of a random variable with BL distribution and its kurtosis coefficient.
Proposition 2.
Let Z B L ( α ) , then for r = 0 , 1 , 2 , the r-th moments of the random variable Z are:
(a) 
E Z 2 r + 1 = 0 ,
(b) 
E Z 2 r = 6 3 + π 2 α ( 1 2 1 2 r ) Γ ( 2 r + 1 ) ς ( 2 r ) + α ( 1 2 2 r 1 ) Γ ( 2 r + 3 ) ς ( 2 r + 2 ) ,
where ς ( s ) = n = 1 1 n s is the Riemann Zeta Function and satisfies that ς ( s ) = L i s ( 1 ) .
Proof. 
(a) We can express the odd moments of Z B L ( α ) as
E Z 2 r + 1 = 3 3 + π 2 α z 2 r + 1 exp ( z ) ( 1 + exp ( z ) ) 2 d z + α z 2 r + 3 exp ( z ) ( 1 + exp ( z ) ) 2 d z = 0 .
Note that the first and second integral represent the odd moments 2 r + 1 and 2 r + 3 of the standard logistic distribution, for r = 0 , 1 , 2 , . It is well known that the odd moments of this distribution are equal to zero (see Johnson et al. [21]).
(b) Let X Logistic ( 0 , 1 ) the even moments of X (see Balakrishnan [22]) are given by:
E ( X k ) = 2 ( 1 2 1 k ) Γ ( k + 1 ) ς ( k ) , if   k   is   even .
Then using these moments, we can calculate the even moments of Z B L ( α ) as
E Z 2 r = 3 3 + π 2 α z 2 r exp ( z ) ( 1 + exp ( z ) ) 2 d z + α z 2 r + 2 exp ( z ) ( 1 + exp ( z ) ) 2 d z = 3 3 + π 2 α 2 ( 1 2 1 2 r ) Γ ( 2 r + 1 ) ς ( 2 r ) + α 2 ( 1 2 2 r 1 ) Γ ( 2 r + 3 ) ς ( 2 r + 2 ) = 6 3 + π 2 α ( 1 2 1 2 r ) Γ ( 2 r + 1 ) ς ( 2 r ) + α ( 1 2 2 r 1 ) Γ ( 2 r + 3 ) ς ( 2 r + 2 ) .
   □
Definition 2.
Let Z B L ( α ) and Y = μ + σ Z , then the density of Y is given by
f Y ( y ; μ , σ , α ) = 3 σ 3 σ 2 + α ( y μ ) 2 3 + π 2 α exp y μ σ ( 1 + exp y μ σ ) 2 , y R ,
where μ R is the location parameter, σ > 0 is the scale parameter and α 0 is the bimodality parameter. It is denoted as Y B L ( μ , σ , α ) .
Corollary 1.
Let Y B L ( μ , σ , α ) . Then for k = 1 , 2 , , the k-th moments of the random variable of Y are:
E [ Y k ] = j = 0 k k j σ j E ( Z j ) μ k j = j = 0 2 r 2 r 2 j σ 2 j E ( Z 2 j ) μ 2 ( r j ) ,
where r = i n t ( k / 2 ) , i n t ( · ) denotes the integer part function.
Proof. 
Using binomial theorem, we have that
E Y k = E ( σ Z + μ ) k = j = 0 k k j σ j E ( Z j ) μ k j .
   □
Corollary 2.
Let Y B L ( μ , σ , α ) . Then
E [ Y ] = μ , V a r [ Y ] = σ 2 7 α π 4 + 5 π 2 5 α π 2 + 15 .
Proof. 
Considering the results of Corollary 1, we have that E [ Y ] = μ and E [ Y 2 ] = μ 2 + σ 2 7 α π 4 + 5 π 2 5 α π 2 + 15 . Thus, to obtain the second central moment the equality V a r [ Y ] = E [ Y 2 ] ( E [ Y ] ) 2 must be used.    □
Corollary 3.
Let Y B L ( μ , σ , α ) . Then the kurtosis coefficient ( β 2 ) of Y is
β 2 ( α ) = E [ Y E [ Y ] ] 4 ( V a r [ Y ] ) 2 = 5 ( 155 α π 2 + 49 ) ( α π 2 + 3 ) 7 ( 7 α π 2 + 5 ) 2 .
We note that:
(1)
Considering β 2 as a function of α , we have that it is increasing in the interval [ 0 , 1 4 π 2 ) and is decreasing in ( 1 4 π 2 , ) .
(2)
By straightforward calculation we show that lim α 0 + β 2 = 4.2 and lim α β 2 = 775 343 .
Proof. 
Considering the first derivative with respect to α of the β 2 ( α ) , β 2 = 2560 π 2 ( 4 α π 2 1 ) 7 ( 7 α π 2 + 5 ) 3 , we have that the critical value is obtained at α = 1 4 π 2 . Taking into account the intervals [ 0 , 1 4 π 2 ) and ( 1 4 π 2 , ) it follows that: β 2 ( α ) > 0 in [ 0 , 1 4 π 2 ) and it is therefore increasing in this interval; β 2 ( α ) < 0 in ( 1 4 π 2 , ) and it is therefore decreasing in this interval. Finally, when we evaluate β 2 ( 1 / 4 π 2 ) = 655.360 137.781 π 4 < 0 and it is therefore a maximum.    □
Remark 1.
Figure 2 presents a graph of the kurtosis coefficient of the BL distribution. As the BL distribution is symmetric, the asymmetry coefficient is zero. The values of the kurtosis coefficient of the BL model are smaller than those of the Logistic model; this frequently occurs since the presence of more than one mode means that the distribution tails are lighter.

3. Inference

In this section, we develop inferences from the location-scale version of the BL distribution studied above. The inferences are based mainly on M and ML estimation and a simulation study to see the behaviour of the M estimates (ME) and ML estimates (MLE); we will also derive the Fisher information matrix, studying the singularity of a special case.

3.1. Moment Estimators

Proposition 3.
Let Y 1 , . . . , Y n be a random sample of the random variable Y B L ( θ ) ; to find the M estimators for θ = ( μ , σ , α ) , the following three equations must be solved simultaneously:
μ ^ M = Y ¯ ,
( 343 b 2 775 ) π 4 α ^ M 2 + ( 490 b 2 2570 ) π 2 α ^ M = 735 175 b 2 ,
σ ^ M = S 2 5 α ^ M π 2 + 15 7 α ^ M π 4 + 5 π 2 ,
where Y ¯ is the mean of the sample (first sample moment), S 2 is the sample variance and b 2 is the sample kurtosis coefficient. For a solution to Equation (8) to exist it must be the case 775 343 < b 2 < 845 189 .
Proof. 
From Corollaries 2 and 3, replace E [ Y ] with Y ¯ , V a r [ Y ] with S 2 and β 2 with b 2 , the M estimators θ ^ = ( μ ^ M , σ ^ M , α ^ M ) are obtained.    □

3.2. ML Estimators

Given a random sample Y 1 , . . . , Y n of the BL( μ , σ , α ) distribution, the log-likelihood function can be written as:
( θ ) = c ( σ , α ) + i = 1 n z i + i = 1 n log ( 1 + α z i 2 ) 2 i = 1 n log 1 + exp z i ,
where c ( σ , α ) = n log ( 3 ) log ( σ ) log ( 3 + π 2 α ) , θ = ( μ , σ , α ) , z i = y i μ σ and i = 1 , . . . , n .
To calculate the estimation for θ = ( μ , σ , α ) it is necessary to maximize (11), by solving the following system of equations ( θ ) μ = 0 ; ( θ ) σ = 0 and ( θ ) α = 0 . More precisely,
i = 1 n exp ( z i ) 1 + exp ( z i ) α i = 1 n z i 1 + α z i 2 = n 2 ,
2 i = 1 n 1 1 + α z i 2 i = 1 n z i + 2 i = 1 n z i exp ( z i ) 1 + exp ( z i ) = 3 n ,
i = 1 n z i 2 1 + α z i 2 = π 2 n 3 + π 2 α ,
must be solved.
The solution for Equations (12)–(14) can be obtained using numerical procedures like Newton-Raphson, which can be implemented in the R software [23].

3.3. Simulation Study

In this section we present the results of a simulation study designed to assess the performance of the BL distribution. The acceptance-rejection method was used to carry out the simulations in this study. Each simulation consisted of 1000 samples of sizes 50, 100 and 200, and the following calculations were obtained for each: mean, root mean squared error (RMSE), mean absolute deviation (MAD) of the ME and MLE. Table 2 shows the results for the samples simulated by acceptance-rejection of sizes 50, 100 and 200 respectively.
In Table 2 it is observed that as the sample size increases the MLE of μ , log ( σ ) and log ( α ) approach the true values of the parameters. Furthermore, we note that RMSE and MAD for the parameters μ , log ( σ ) and log ( α ) decrease as the sample size increases, as expected. However we observe that the MLE of the parameters log ( σ ) and log ( α ) underestimate and overestimate the true values in all cases.
In Table 2, RMSE denotes the square root of the empirical mean squared error: for instance, for μ ^ , it is calculated as
R M S E ( μ ^ ) = 1 n i = 1 n ( μ ^ i μ ) 2 ,
and so on.
The expression of Equation (2) can be rewritten as a mixture of two densities, as follows
f Z ( z ; α ) = 3 ( 3 + π 2 α ) exp ( z ) ( 1 + exp ( z ) ) 2 + α π 2 ( 3 + π 2 α ) 3 z 2 exp ( z ) π 2 ( 1 + exp ( z ) ) 2 = 3 ( 3 + π 2 α ) f 1 ( z ) + α π 2 ( 3 + π 2 α ) f 2 ( z ) = p 1 f 1 ( z ) + p 2 f 2 ( z ) ,
where f 1 ( z ) is the pdf of standard logistic distribution, f 2 ( z ) is the pdf present in Property 2, p 1 , and p 2 are the respective weights. Thus, the algorithm for generating random numbers from the B L ( μ , σ , α ) distribution is shown in Algorithm 1.
Algorithm 1: Generation of random numbers from the distribution B L ( μ , σ , α )
  • repeat i = 1
  •      U 1 i U ( 0 , 1 )
  •     if  U 1 i 3 / ( 3 + π 2 α )  then
  •          Z i Logis ( 0 , 1 )
  •          X i = Z i σ + μ
  •     else
  •         k=0
  •         while  k = = 0  do
  •             U 2 i U ( 0 , 1 )
  •             Y i Cauchy ( 0 , 1 )
  •            if  U 2 i [ 3 y i 2 exp ( y i ) ( 1 + y i 2 ) ] / [ c π ( 1 + exp ( y i ) ) 2 ]  then
  •                 Z i = Y i
  •                 X i = Z i σ + μ
  •                k=1
  •            end if
  •         end while
  •     end if
  • until i = n

3.4. Fisher Information Matrix

Below we present the Fisher information matrix for the random variable Y B L ( μ , σ , α ) , which can be written as:
I f ( θ ) = I μ μ I μ σ I μ α I σ σ I σ α I α α ,
with
I μ μ = 3 24 α + α π 2 3 σ 2 ( 3 + π 2 α ) + 4 α 2 η 2 , I μ σ = 0 , I μ α = 0 , I σ σ = 7 α π 4 15 α π 2 + 5 π 2 165 15 σ 2 ( 3 + α π 2 ) + 4 σ 2 η 0 , I σ α = 2 σ η 2 , I α α = η 4 π 4 ( 3 + π 2 α ) 2 ,
where η j = E ( Y μ ) j ( σ 2 + α ( Y μ ) 2 ) 2 and the first and second derivatives are given in Appendix A.

3.5. Special Case

Now we will consider the Fisher information matrix for a special case of the BLlocation-scale model. A situation of particular interest is the case when α = 0 in Equation (2) resulting from the logistic model. In this case, the elements of the Fisher information matrix are given by
I f ( μ , σ , α = 0 ) = 1 3 σ 2 0 0 0 ( π 2 + 3 ) 9 σ 2 2 π 2 3 σ 0 2 π 2 3 σ 16 π 4 45 ,
its determinant is given by
I f = 16 π 6 132 π 4 1215 σ 4 0 .
From this we may conclude that the Fisher information matrix is non-singular when α = 0 . Then, for large samples, the ML estimators θ ^ of θ have an asymptotically normal distribution, i.e.,
n ( θ ^ θ ) D N 3 ( 0 , I f 1 ( θ ) ) ,
where the asymptotic variance of the ML estimators θ ^ is I f 1 ( θ ) n . As the parameters are unknown, the Fisher information matrix is estimated using the MLE for the unknown parameters.

4. Application

In this section the fit of the BL model is illustrated in a data set that has been analysed previously by Gui et al. [24], Hassan and Hijazi [25], and others. To do this, we used the M and ML methods described in Section 3.
The fit of the BL model with this data set is compared with that of the logistic mixture (LM) model. The respective density is given by:
f ( x ; μ 1 , σ 1 , μ 2 , σ 2 , p ) = p exp x μ 1 σ 1 σ 1 1 + exp x μ 1 σ 1 2 + ( 1 p ) exp x μ 2 σ 2 σ 2 1 + exp x μ 2 σ 2 2 .
This data set consists of the height in inches of 126 students of the University of Pennsylvania, [26]. Table 3 presents the basic descriptive statistics of this data set, showing that the asymmetry of the sample is very close to zero. Furthermore, the mean and median values are very similar, indicating that the data are symmetrical.
The estimations of the parameters of the BL model were obtained by the M method. These results are shown in Table 4.
Using the estimations in Table 4 as the initial values, the height data were fitted with the BL distribution by ML. The standard errors (SE) were calculated from the Fisher information matrix presented in Section 3.4 by numerical integration with the integral function of the pracma package in the R software (see Borchers [27]). The estimations and standard errors were also obtained for the LM model in order to compare the fits. Akaike’s information criterion (AIC) was used to measure the selection in both models, which is defined as A I C = 2 k 2 log lik, where k is the number of parameters in the model and log lik is the maximum value for the log-likelihood function (see Akaike [28]). These results are presented in Table 5, where we observe that the BL model presents a better fit as its AIC value is lower than that of the LM model.
Figure 3 shows the fit of the two densities to the data, and Figure 4 shows the respective QQ-plots for the BL and LM models.

5. An Asymmetric Extension

To make the modes in the BL distribution more flexible, we introduce an extension based on the methodology presented by Azzalini [29]. Namely, let f 0 be a pdf symmetric about zero, and G is a cdf such that G exists and is a pdf symmetric about zero, then
f z ; λ = 2 f 0 z G λ z , z R ,
is a pdf for any λ R . Using the BL distribution in this result obtains the skew BL (SBL) distribution and its pdf is given by
f ( z ; μ , σ , α , λ ) = 6 σ 3 σ 2 + α ( z μ ) 2 3 + π 2 α exp ( ( z μ ) / σ ) ( 1 + exp ( ( z μ ) / σ ) ) 2 exp ( λ ( z μ ) / σ ) ( 1 + exp ( λ ( z μ ) / σ ) ) ,
where z R , μ R , σ > 0 , α 0 and λ R . We denote it by Z SBL ( μ , σ , α , λ ) .
Remark 2.
Figure 5 shows the flexibility in the modes for different values of λ. When λ = 0 the BL distribution is obtained. Futhermore, when α = 0 the skew-Logist distribution studied in Nadarajah [30] and Gupta and Kundu [31] is obtained. All the properties studied in the BL distribution can be extended to the SBL distribution.

5.1. Application 2

In this subsection we fit the data considered in the previous application. Table 6 shows the ML fits of the BL and SBL distributions, from which, using the AIC criterion, we can observe that the distribution offering the best fit with the data is the BL distribution. Figure 6 shows the graphs of the QQ-plots of the BL and SBL distributions.
We use the results of the estimations given in Table 6 to test the null hypothesis ( λ = 0 ) that the data are obtained from the BL distribution against the SBL distribution, and the following statistical test
2 log Λ ( z ) = 2 log L B L μ ^ , σ ^ , α ^ L S B L μ ^ , σ ^ , α ^ , λ ^ ,
where Λ ( z ) is the likelihood-ratio statistic. The value of the study data set is 2 log Λ ( z ) = 0.13 , which is comparable with the critical value of χ 1 2 = 3.84 at 95%. There is therefore no evidence to reject the null hypothesis, in other words the BL distribution presents a better fit than the SBL distribution. Consequently, it is concluded that the data in Application 1 have a symmetrical bimodal behaviour.

5.2. Application 3

We consider fitting the SBL distribution to a data set available from the web address http://lib.stat.cmu.edu/datasets/pollen.data (accessed on 15 March 2022). More specifically, we analyse the 481 observations of the variable “Nub” in the data file POLLEN2.DAT. This variable measures a geometric characteristic of a specific type of pollen. In Table 7, the descriptive analysis is provided.
Table 8 shows the ML fits of the LM and SBL distributions; we observe that by the AIC criterion, the distribution with the best fit to the data is the SBL distribution. Figure 7 and Figure 8 show the fits for the pollen data sets and the graphs of the QQ-plots of the LM and SBL distributions respectively.

6. Concluding Remarks

We present bimodal extensions of the Logistic distribution, one symmetric and the other asymmetric. These distributions are useful for modelling symmetric/asymmetric bimodal data sets. We present some important properties and discuss two methods of estimation based on the M and ML estimators, obtained using numerical procedures such as the Newton-Raphson procedure. Three applications in the real data set are presented which show the usefulness of the BL and SBL distributions in comparison with two other bimodal distributions. Some other characteristics of the new models are:
  • The proposed model has a closed-form expression and contains the logistic model as a particular case.
  • The even moments are based on the Riemann Zeta Function.
  • The simulation study shows that the behaviour of the M and ML estimators is good.
  • The Fisher information matrix is non-singular in the symmetric unimodal special case; this is very important for drawing asymptotic inference from the ML estimators. This result, i.e., a non-singular Fisher information matrix, cannot be obtained with other asymmetric extensions of the logistic model.
  • In the application we observe that the BL model shows a better fit with the data than the typical mixture model.
  • We also introduce an asymmetric bimodal extension of the Logistic distribution for situations where there are bimodal data with flexible modes. This extension is an alternative to the finite mixture of logistic distribution, as is shown by the first and third application.

Author Contributions

Conceptualization, I.E.C. and H.W.G.; methodology, H.W.G.; software, I.E.C.; validation, I.E.C., H.W.G. and O.V.; formal analysis, O.V.; investigation, I.E.C.; writing—original draft preparation, I.E.C.; writing—review and editing, O.V.; supervision, H.W.G. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Héctor W. Gómez was supported by SEMILLERO UA-2022 (Chile).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

For the first two applications the data is available in Subsection 4.6.1.2. page 78 of the PhD thesis [26], https://etda.libraries.psu.edu/files/final_submissions/2429 (accessed on 20 March 2022), and for the latter at http://lib.stat.cmu.edu/datasets/pollen.data (accessed on 15 March 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivatives

The first derivatives of ( θ ) , taking z = ( y μ ) / σ , are given by
( θ ) μ = 1 σ + 2 exp ( z ) σ ( 1 + exp z ) 2 α z σ ( 1 + α z 2 ) , ( θ ) σ = 3 σ z σ + 2 σ ( 1 + α z 2 ) + 2 z exp ( z ) σ ( 1 + exp ( z ) ) , ( θ ) α = π 2 ( 3 + π 2 α ) + z 2 ( 1 + α z 2 ) ,
and the second derivatives of ( θ ) are
2 ( θ ) μ 2 = 2 exp ( 2 z ) σ 2 ( 1 + exp ( z ) ) 2 2 exp ( z ) σ 2 ( 1 + exp ( z ) ) 4 α 2 z 2 σ 2 ( 1 + α z 2 ) 2 + 2 α σ 2 ( 1 + α z 2 ) , 2 ( θ ) μ σ = 1 σ 2 2 exp ( z ) σ 2 ( 1 + exp ( z ) ) + 2 z exp ( 2 z ) σ 2 ( 1 + exp ( z ) ) 2 2 z exp ( z ) σ 2 ( 1 + exp ( z ) ) + 4 α z σ 2 ( 1 + α z 2 ) 2 , 2 ( θ ) μ α = 2 α z 3 σ ( 1 + α z 2 ) 2 2 z σ ( 1 + α z 2 ) , 2 ( θ ) σ 2 = 3 σ 2 + 2 z σ 2 4 z exp ( z ) σ 2 ( 1 + exp ( z ) ) + 2 z 2 exp ( 2 z ) σ 2 ( 1 + exp ( z ) ) 2 2 z 2 exp ( z ) σ 2 ( 1 + exp ( z ) ) 4 σ 2 ( 1 + α z 2 ) 2 + 2 σ 2 ( 1 + α z 2 ) ,
2 ( θ ) σ α = 2 z 2 σ ( 1 + α z 2 ) 2 , 2 ( θ ) α 2 = π 4 ( 3 + α π 2 ) 2 z 4 ( 1 + α z 2 ) 2 .

References

  1. Bolfarine, H.; Gómez, H.W.; Rivas, L.I. The log-bimodal-skew-normal model. A geochemical application. J. Chemom. 2011, 25, 329–332. [Google Scholar] [CrossRef]
  2. Ely, J.T.A.; Fudenberg, H.H.; Muirhead, R.J.; LaMarche, M.G.; Krone, C.A.; Buscher, D.; Stern, E.A. Urine Mercury in Micromercurialism: Bimodal Distribution and Diagnostic Implications. Bull. Environ. Contam. Toxicol. 1999, 63, 553–559. [Google Scholar] [CrossRef] [PubMed]
  3. Elal-Olivero, D.; Olivares-Pacheco, J.F.; Venegas, O.; Bolfarine, H.; Gómez, H.W. On Properties of the Bimodal Skew-Normal Distribution and an Application. Mathematics 2020, 8, 703. [Google Scholar] [CrossRef]
  4. Zhang, C.; Mapes, B.E.; Soden, B.J. Bimodality in tropical water vapor. Q. J. R. Meteorol. Soc. 2003, 129, 2847–2866. [Google Scholar] [CrossRef]
  5. McLachan, G.; Peel, D. Mixture Models: Inference and Applications to Clustering; Marcel Dekker: New York, NY, USA, 2000. [Google Scholar]
  6. Marin, J.M.; Mengersen, K.; Robert, C. Bayesian modeling and inference on mixtures of distribution. Handb. Stat. 2005, 25, 459–503. [Google Scholar]
  7. Arnold, B.C.; Beaver, R.J. Skewed multivariate models related to hidden truncation and/or selective reporting (with discussion). Test 2002, 11, 7–54. [Google Scholar] [CrossRef]
  8. Azzalini, A.; Capitanio, A. Distributions generate by perturbation of symmetry with emphasis on a multivariate skew-t distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 2003, 65, 367–389. [Google Scholar] [CrossRef]
  9. Ma, Y.; Genton, M.G. Flexible class of skew-symmetric distributions. Scand. J. Stat. 2004, 31, 459–468. [Google Scholar] [CrossRef] [Green Version]
  10. Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. Statistical inference for a general class of asymmetric distributions. J. Stat. Plan. Inference 2005, 128, 427–443. [Google Scholar] [CrossRef]
  11. Kim, H.J. On a class of two-piece skew-normal distributions. Statistics 2005, 39, 537–553. [Google Scholar] [CrossRef]
  12. Gómez, H.W.; Elal-Olivero, D.; Salinas, H.S.; Bolfarine, H. Bimodal extension based on the skew-normal distribution with application to pollen data. Environmetrics 2011, 22, 50–62. [Google Scholar] [CrossRef]
  13. Venegas, O.; Salinas, H.S.; Gallardo, D.I.; Gómez, H.W.; Bolfarine, H. Bimodality based on the generalized skew-normal distribution. J. Stat. Comput. Simul. 2018, 88, 156–181. [Google Scholar] [CrossRef]
  14. Rao, K.S.; Narayana, J.L.; Papayya Sastry, V. A bimodal distribution. Bull. Calcutta Math. Soc. 1988, 80, 238–240. [Google Scholar]
  15. Sarma, P.V.S.; Srinivasa Rao, K.S.; Prabhakara Rao, R. On a family of bimodal distributions. Sankhya B 1990, 52, 287–292. [Google Scholar]
  16. Kelker, D. Distribution theory of special distributions and location-scale parameter. Sankhya A 1970, 32, 419–430. [Google Scholar]
  17. Cambanis, S.; Huang, S.; Simons, G. On the theory of elliptically contoured distributions. J. Multivar. Anal. 1981, 11, 368–385. [Google Scholar] [CrossRef] [Green Version]
  18. Fang, K.T.; Kotz, S.; Ng, K.W. Symmetric Multivariate and Related Distributions; Chapman & Hall: New York, NY, USA, 1990. [Google Scholar]
  19. Gupta, A.K.; Varga, T. Elliptically Contoured Models in Statistics; Kluwer Academics Publishers: Boston, MA, USA, 1993. [Google Scholar]
  20. Prudnikov, A.P.; Brychkov, Y.A.; Marichev, Y.I. Integrals and Series; Gordon and Breach: Amsterdam, The Netherlands, 1986. [Google Scholar]
  21. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; Wiley Series in Probability and Statistics; Wiley: New York, NY, USA, 1995. [Google Scholar]
  22. Balakrishnan, N. Handbook of Logistic Distribution; Marcel Dekker: New York, NY, USA, 1992. [Google Scholar]
  23. R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  24. Gui, W.; Chen, P.H.; Wu, H. A symmetric component alpha normal slash distribution: Properties and inferences. J. Stat. Theor. Appl. 2012, 12, 55–66. [Google Scholar] [CrossRef]
  25. Hassan, M.Y.; Hijazi, R. A bimodal exponential power distribution. Pak. J. Stat. 2010, 26, 379–396. [Google Scholar]
  26. Cruz-Medina, I. Almost Nonparametric and Nonparametric Estimation in Mixture Models. Ph.D. Thesis, The Pennsylvania State University: State College, PA, USA, 2001. [Google Scholar]
  27. Borchers, H.W. Pracma: Practical Numerical Math Functions. R Package Version 2.3.3. 2021. Available online: https://CRAN.R-project.org/package=pracma (accessed on 22 March 2022).
  28. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  29. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  30. Nadarajah, S. The skew logistic distribution. Adv. Stat. Anal. 2009, 93, 187–203. [Google Scholar] [CrossRef]
  31. Gupta, R.D.; Kundu, D. Generalized logistic distributions. J. Appl. Stat. 2010, 18, 51–66. [Google Scholar]
Figure 1. Plots of the BL distribution (left panel) and its corresponding cdf (right panel) for different values of the α parameter: α = 0.5 (black line), α = 1 (light blue line) and α = 5 (red line).
Figure 1. Plots of the BL distribution (left panel) and its corresponding cdf (right panel) for different values of the α parameter: α = 0.5 (black line), α = 1 (light blue line) and α = 5 (red line).
Mathematics 10 01968 g001
Figure 2. Graph of the kurtosis coefficient in BL ( α ) distribution.
Figure 2. Graph of the kurtosis coefficient in BL ( α ) distribution.
Mathematics 10 01968 g002
Figure 3. Histogram and models fitted for the height data set. The lines represent the fits using maximum likelihood estimation: LM (dashed line) and BL (solid line).
Figure 3. Histogram and models fitted for the height data set. The lines represent the fits using maximum likelihood estimation: LM (dashed line) and BL (solid line).
Mathematics 10 01968 g003
Figure 4. (Left panel): QQ-plot for BL model. (Right panel): QQ-plot for LM model.
Figure 4. (Left panel): QQ-plot for BL model. (Right panel): QQ-plot for LM model.
Mathematics 10 01968 g004
Figure 5. Graph of SBL(0, 1, 2, −0.5) (solid line) and SBL(0, 1, 2, 0.5) (dashed line).
Figure 5. Graph of SBL(0, 1, 2, −0.5) (solid line) and SBL(0, 1, 2, 0.5) (dashed line).
Mathematics 10 01968 g005
Figure 6. (Left panel): QQ-plot for BL model. (Right panel): QQ-plot for SBL model.
Figure 6. (Left panel): QQ-plot for BL model. (Right panel): QQ-plot for SBL model.
Mathematics 10 01968 g006
Figure 7. Histogram and models fitted for the pollen data set. The lines represent the fits using maximum likelihood estimation: SBL (solid line) and LM (dashed line).
Figure 7. Histogram and models fitted for the pollen data set. The lines represent the fits using maximum likelihood estimation: SBL (solid line) and LM (dashed line).
Mathematics 10 01968 g007
Figure 8. (Left panel): QQ-plot for SBL model. (Right panel): QQ-plot for LM model.
Figure 8. (Left panel): QQ-plot for SBL model. (Right panel): QQ-plot for LM model.
Mathematics 10 01968 g008
Table 1. Values of k for some distributions.
Table 1. Values of k for some distributions.
Typek
Normal1
Laplace2
Student-t ν ν 2 , ν > 2
Logistic π 2 3
Table 2. MLE, RMSE and MAD for the BL model with sample size 50, 100 and 200, respectively.
Table 2. MLE, RMSE and MAD for the BL model with sample size 50, 100 and 200, respectively.
ParameterValue n = 50 n = 100 n = 200
MLERMSEMADMLERMSEMADMLERMSEMAD
μ 68.50068.4910.4120.32668.4910.2890.22868.5030.2010.161
log ( σ ) 0.000−0.0570.1250.089−0.0280.0910.068−0.0100.0640.050
log ( α ) −0.693−0.1821.7510.605−0.4990.4780.346−0.6220.3330.262
μ 68.50068.5210.3890.30568.5100.2720.21768.5090.1930.154
log ( σ ) 0.000−0.0410.1180.087−0.0170.0860.067−0.0070.0650.051
log ( α ) −0.511−0.0242.2040.692−0.3450.5360.396−0.4460.3390.265
μ 68.50068.5030.4030.31968.5010.2700.21168.4980.1880.150
log ( σ ) 0.000−0.0320.1120.086−0.0110.0840.066−0.0040.0610.048
log ( α ) −0.3570.2393.0600.884−0.2390.5460.412−0.3020.3510.277
μ 68.50068.4740.6280.49568.4700.4170.33268.4890.2950.237
log ( σ ) 0.4050.3440.1250.0870.3720.0850.0610.3890.0600.046
log ( α ) −0.693−0.0762.0520.613−0.4360.4960.338−0.5610.3260.238
μ 68.50068.4830.5720.45168.4750.3900.3168.4860.2910.232
log ( σ ) 0.4050.3580.1150.0840.3820.0830.0640.3970.0590.047
log ( α ) −0.5110.1262.7580.772−0.2830.5120.363−0.4310.3310.26
μ 68.50068.4970.5740.45268.5000.4160.33168.5030.2850.228
log ( σ ) 0.4050.3690.1100.0820.3830.0840.0650.3970.0580.046
log ( α ) −0.3570.3413.3160.949−0.1580.5270.385−0.2960.3380.269
μ 69.00069.0070.4250.33668.9960.2820.22368.9880.2020.161
log ( σ ) 0.000−0.0530.1250.091−0.0290.0910.068−0.0100.0620.049
log ( α ) −0.693−0.2631.1410.516−0.4850.5140.367−0.6230.3190.252
μ 69.00069.0060.4140.32468.9950.2680.21969.0030.1960.156
log ( σ ) 0.000−0.0380.1170.088−0.0200.0850.067−0.0070.0600.048
log ( α ) −0.511−0.0462.0190.659−0.3011.0530.406−0.4470.3540.277
μ 69.00068.9920.3930.30968.9910.2540.20368.9950.1810.143
log ( σ ) 0.000−0.0310.1120.085−0.0120.0810.063−0.0080.0570.046
log ( α ) −0.3570.1782.7040.810−0.2200.5330.41−0.2960.3420.269
μ 69.00068.9880.6200.4969.0030.4170.33569.0010.2870.228
log ( σ ) 0.4050.3430.1300.0920.3700.0880.0640.3860.0620.047
log ( α ) −0.693−0.0981.9960.610−0.4350.5030.336−0.5590.3210.236
μ 69.00068.9620.6020.48168.9950.4050.32368.9880.2770.219
log ( σ ) 0.4050.3540.1230.0870.3820.0820.0630.3950.0580.046
log ( α ) −0.5110.0962.6060.745−0.2761.0410.402−0.4270.3350.258
μ 69.00069.0280.5810.44969.0300.3950.31269.0050.2710.216
log ( σ ) 0.4050.3690.1120.0830.3870.0800.0620.4020.0570.045
log ( α ) −0.3570.4013.3930.994−0.1730.5060.377−0.2900.3440.270
Table 3. Descriptive statistics for the height data set.
Table 3. Descriptive statistics for the height data set.
nMedianMeanStandard DeviationRangeSkewness
12668.4468.544.1623.5 0.05
Table 4. ME for the BL model.
Table 4. ME for the BL model.
Model α ^ M μ ^ M σ ^ M
BL0.41868.5461.357
Table 5. MLE, with SE in parentheses of the BL and LM models.
Table 5. MLE, with SE in parentheses of the BL and LM models.
Model α ^ p ^ μ ^ 1 σ ^ 1 μ ^ 2 σ ^ 2 loglikAIC
BL0.646-68.6271.268 355.118 716.236
(0.272)-(0.299)(0.097)----
LM-0.53465.8671.58971.6281.589−355.75721.5
-(0.119)(0.661)(0.282)(0.729)(0.306)--
Table 6. MLE, with SE in parentheses of the BL and SBL models.
Table 6. MLE, with SE in parentheses of the BL and SBL models.
Model α ^ μ ^ σ ^ λ ^ AIC
BL0.64668.6271.268-716.236
(0.272)(0.299)(0.097)--
SBL0.64468.7671.271−0.033718.106
(0.277)(0.499)(0.100)(0.092)-
Table 7. Descriptive statistics for the pollen data set.
Table 7. Descriptive statistics for the pollen data set.
nMedianMeanStandard DeviationRangeSkewness
481−0.298−0.0555.07130.609−0.149
Table 8. MLE, with SE in parentheses of the SBL and LM models.
Table 8. MLE, with SE in parentheses of the SBL and LM models.
Model μ ^ 1 σ ^ 1 μ ^ 2 σ ^ 2 p ^ α ^ λ ^ AIC
LM−1.8262.3915.4291.5080.751--2924.36
(0.466)(0.172)(0.564)(0.220)(0.064)---
SBL1.7161.700---0.506−0.2352922.72
(0.417)(0.089)---(0.117)(0.071)-
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Cortés, I.E.; Venegas, O.; Gómez, H.W. A Symmetric/Asymmetric Bimodal Extension Based on the Logistic Distribution: Properties, Simulation and Applications. Mathematics 2022, 10, 1968. https://doi.org/10.3390/math10121968

AMA Style

Cortés IE, Venegas O, Gómez HW. A Symmetric/Asymmetric Bimodal Extension Based on the Logistic Distribution: Properties, Simulation and Applications. Mathematics. 2022; 10(12):1968. https://doi.org/10.3390/math10121968

Chicago/Turabian Style

Cortés, Isaac E., Osvaldo Venegas, and Héctor W. Gómez. 2022. "A Symmetric/Asymmetric Bimodal Extension Based on the Logistic Distribution: Properties, Simulation and Applications" Mathematics 10, no. 12: 1968. https://doi.org/10.3390/math10121968

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop