1. Introduction
The problem of considering alternative distributions to the normal one to fit asymmetric data that present bimodal or multimodal behavior has been addressed by different authors. Elal-Olivero et al. [
1], for example, introduced a bimodal extension of the skew-normal (SN) distribution of Azzalini [
2] for modeling skewed bimodal data. In addition, Elal-Olivero [
3] studied the bimodal-normal (BN) model, which provides a methodology for analyzing variables with two modes as an extension of the normal distribution. On the other hand, Gómez et al. [
4] proposed a class of flexible bimodal SN distributions. Kim [
5] considered a type of symmetric bimodal SN distribution, whereas Arnold et al. [
6] extended Kim’s distribution to the situation of the asymmetric bimodal SN. Other works in this same direction were undertaken by Elal-Olivero et al. [
7], who presented a class of distributions for data with positive support; Bolfarine et al. [
8] studied a bimodal extension of the power-normal (PN) family of distribution, and Martínez-Flórez et al. [
9] proposed a distribution that can be useful for fitting data with up to three modes.
Chakraborty et al. [
10] proposed a multimodal skewed extension of the normal distribution based on the use of a trigonometric periodic skew function. The suitability of the proposed distribution was investigated by fitting data from real situations. Venegas et al. [
11] considered an extension of the normal new extension of the generalized skew-normal distribution by incorporating an additional parameter that gave the distribution the flexibility to fit data with unimodal and bimodal behaviors. Statistical inference was carried out using the maximum likelihood method and the EM algorithm. Gómez-Déniz et al. [
12] proposed a distribution suitable for modeling bimodality in discrete data, and that can fit biased data both positively and negatively. A virtue of this model is that it is capable of representing overdispersion phenomena present in count data obtained through a Poisson distribution. Elal-Olivero et al. [
13] developed an alternative to the bimodal skew-normal distribution based on the mixture of skew-normal distributions. For a new proposal, which is called the bimodal skew-normal distribution, the authors studied the stochastic representation and verified the uniqueness of the Fisher data. This proposal presented satisfactory results for modeling bimodal data. Martínez-Flórez et al. [
14] proposed two new families of distributions that are capable of modeling unimodal, bimodal, and trimodal data. The proposed distributions extended the normal model to symmetric and asymmetric trimodal situations, and involved fewer parameters to estimate than the mixtures of normal distributions. To fit positive unimodal data with high or low degrees of skewness, the gamma, Weibull, exponential, Birnbaum [
15], and Birnbaum and Saunders [
16] distributions and the log-normal (LN) distribution are commonly known, which involve transforming the ordinary normal distribution and are commonly used to fit right-skewed data. When the skewness and kurtosis of the distribution are above or below what is expected for the log-normal distribution, it is necessary to have distributions that fit these deviations. On the other hand, for positive data with more than one mode, Bolfarine et al. [
17] presented the log-skewed bimodal distribution as a logarithmic extension of the skewed bimodal normal distribution introduced by Elal-Olivero [
3]; the distribution can then be seen as an alternative to the log-normal distribution that is typically used to fit positive data with only one mode.
It is important to highlight that the bimodal distributions based on the skew-normal distribution of Azzalini [
2] present information matrix singularity problems for values of the skewness parameter close to zero, which puts them at a disadvantage compared to other existing models in the literature, such as those obtained from the power-normal distribution of Durrans [
18] that has a non-singular information matrix, which makes it useful for studying the behavior of distributions derived from the generic structure of these distributions and of distributions with a bimodal or multimodal basis.
In this work, bimodal distributions to model positive data are introduced. The proposals, which are based on the normal-skewed bimodal distribution and the bimodal power-normal distribution introduced by Martínez-Flórez et al. [
19], are extensions of alpha-power distributions. The main properties of the resulting distributions were studied, including the probability density function, for which the shape of the cumulative distribution function, its survival function, and the Hazard function was studied. In addition, if the moments existed, the moment-generating function, the expectation, the variance, and the asymmetry and kurtosis coefficients were studied, among others.
The rest of the article is organized as follows:
Section 2 introduces the exponentiated bimodal log-normal distribution and presents its main properties. The location–scale extension is performed and the statistical inference process of the distribution is carried out using the maximum likelihood method. The Fisher information matrix, which is non-singular, is also presented. In
Section 3, the exponentiated elliptical bimodal log-normal distribution is presented. For this new distribution, the probability density function and the cumulative distribution function are shown explicitly. Moments and their properties in general are also presented. Parameter estimation is performed using the maximum likelihood method. Finally, in
Section 4, an illustration of the new distribution is presented using a real data set, where it can be appreciated that this new distribution is a viable alternative to other existing methodologies in the statistical literature.
2. Exponentiated Bimodal Log-Normal Distribution
In this section, the exponentiated bimodal log-normal distribution is introduced, which is an extension of the EBN model of Martínez-Flórez et al. [
19] in the case of bimodal data with positive support.
Definition 1. A random variable X is said to have an exponentiated bimodal log-normal distribution if its probability density function (pdf) is given by:where and and are the pdf and cumulative distribution function (cdf) of the standard normal distribution, respectively. We used the notation . Figure 1 shows some forms of the EBLN density for some selected values of the parameter
. Note that for values of
or
, the EBLN density is unimodal with a high degree to the right asymmetry, while for values of
, the shape of the EBLN density is bimodal with positive skewness, so
is a parameter that controls the skewness of the distribution and, therefore, the EBLN distribution can be useful for fitting unimodal or bimodal positively skewed data.
Notice that, by letting:
then:
Therefore, it follows that the cdf of a continuous random variable EBLN is given by:
Given the great flexibility of the EEBLN distribution for fitting data with positive support, it can be used to find, with greater precision, the probability that a subject will survive beyond a given period of time
t. This function, which corresponds to the survival function of the EEBLN model, is given by:
In the graphs of
Figure 2, it can be seen that this is a decreasing monotonic function, with
and tending to 0 as
t tends to infinity.
Similarly, and supported by the flexibility of the EEBLN distribution in fitting non-negative data, this distribution can be used as a basis function to determine the failure rate of a system for data sets with positive support of the unimodal and bimodal types, or the probability of the survival of an object until the first failure occurs in the system, that is, the conditional probability of survival until failure occurs. This function, in the case of the EEBLN model, is represented by the hazard function of the distribution, which can be expressed in the form:
2.1. Properties
- (i)
The pdf (
1) has, at most, two modes. Indeed:
Let
, where
. By letting
, then
, and therefore,
can be written as:
then:
then:
If
, then:
Observe that, if
and
, the polynomial in (
3) is of the degree 3; therefore, it has at most three roots. In addition, it has two changes in sign; therefore, it has two positive real roots.
For
, it holds that
, which implies that
or
, that is,
or
. In addition,
All the roots correspond to maximums, that is, the distribution is bimodal.
In general, notice that:
where
. Then, for all roots of (
3) such that
and
, there will be two maxima. The same is true for
and
. Thus, it was concluded that there are, at most, two modes.
- (ii)
If
, it follows that there is a bimodal log-normal (BLN) distribution, with the pdf given by:
and the cdf given by:
2.2. Moments
The moments of the EBLN distribution do not have a closed analytic form and cannot be calculated explicitly; however, they can be calculated numerically. In general, the
kth moment of a random variable
X with an EBLN distribution can be obtained using the expression given by:
The expected value
, the variance
, and the skewness
and kurtosis
coefficients of the EBLN distribution can be calculated by using (
6) and (
7):
where:
The ranges of values for the coefficients
and
were calculated numerically for values of
, and we obtained:
which shows that the EBLN distribution is capable of fitting data with a high degree of skewness and kurtosis.
2.3. Location–Scale Extension of the EBLN Distribution
The location–scale extension of the EBLN distribution follows from the transformation
, where
X has an exponentiated bimodal normal distribution (EBN, see [
19]), with
and
. Its pdf is given by:
where
. The respective cdf is:
Note that, for
, the location–scale version of the BLN distribution was obtained, with the pdf and cdf given by:
and
2.4. Moments and Moment-Generating Function for Location–Scale Case
The
kth moment of a random variable
Z with a distribution
is obtained from the following expression:
Proposition 1. If , then the moment-generating function (MGF) of X does not exist.
Proof. Let us take
as fixed,
, and
; then:
where
and
for all
.
If
is fixed, then:
since
when
. Thus,
when
for all
. Consequently,
when
. □
2.5. Parameter Estimation
Consider a random sample
of size
n, such that
, for
. The log-likelihood function for
is given by:
where
for
. After some calculations, the following elements of the score function are obtained:
where
is the cdf of the BLN distribution given in (
10), and
for
. Taking the second partial derivative to the log-likelihood function, the following elements of the observed information matrix are obtained:
The elements of the Fisher information matrix
are obtained by taking the expected value of the previous expressions, becoming:
where
and
. Taking
and using numerical methods, the following information matrix is obtained:
whose determinant is equal to:
Then,
is a positive definite matrix; hence,
is non-singular, and therefore, the regularity conditions are satisfied (see
Appendix A for more details). t follows that the variance–covariance matrix of the vector
is given by
, and for a large sample size, it follows that:
3. Exponentiated Elliptical Bimodal Log-Normal Distribution
In this section, a new bimodal distribution, called the exponentiated elliptical bimodal log-normal (EEBLN) distribution for positive data, is presented. This distribution is obtained from the exponentiated elliptical bimodal normal distribution that was also proposed by Martínez-Flórez et al. [
19].
Definition 2. A random variable X is said to have an exponentiated elliptical bimodal log-normal distribution if its pdf is given by:for , where , , and and are the pdf and cumulative distribution function (cdf) of the standard normal distribution, respectively. We used the notation . Figure 3 presents some forms of the EEBLN distribution for selected values of the parameters
and
. It can be seen from the figure that the EEBLN density can be useful for fitting unimodal or bimodal data.
The cdf of a random variable with an EEBLN distribution is given by the expression:
From (
13), the survival and hazard functions of the EBLN distribution can be calculated as:
and
respectively. The behavior of the survival function for
values is presented in
Figure 4, which is strictly non-decreasing and convergent.
3.1. Properties
- (i)
The pdf (
12) has, at most, two modes. To demonstrate this, we took
in (
12) again, and derived it to obtain:
By reasoning as in Elal-Olivero [
3], it follows that
has a maximum of three zeros, so
has a maximum of two modes.
- (ii)
If
, the elliptical bimodal log-normal (ELBLN) distribution is obtained with the pdf given by:
3.2. Moments
Let
X be a random variable with an EEBLN distribution. The expected value
, the variance
, and the skewness
and kurtosis
coefficients of a random variable with an EEBLN distribution can be calculated by using (
7) with:
The ranges of values for the coefficients
and
were calculated numerically for values of
, and we obtained:
3.3. Location–Scale Extension of the EEBLN Distribution
The location–scale extension of the EEBLN distribution follows from the transformation
, where
X has an exponentiated elliptical bimodal normal distribution (EEBN, see [
19]) with
and
. Its pdf is given by:
where
. The respective cdf is:
Note that for
, the location–scale version of the ELBLN distribution is obtained, with the cdf given by:
3.4. Moments and Moment-Generating Function for Location–Scale Case
The
kth moment of a random variable
Z with a distribution of
is obtained from the following expression:
Proposition 2. If , then the moment-generating function (MGF) of X does not exist.
Proof. This result is obtained by following a reasoning similar to that of the EBLN distribution. □
3.5. Parameter Estimation
We considered a random sample
of size
n, such that
, for
. The log-likelihood function for
is given by:
where
for
. After some calculations, the following elements of the score function are obtained:
where
is the cdf of the ELBLN distribution given in (
17), and
for
.
The maximum likelihood estimates are obtained as the solution of this system of equations that results from setting the score functions equal to zero, , , , and , which do not have a closed expression and must be solved via numerical methods such as the Newton–Raphson or quasi-Newton methods.
Taking the second partial derivative to the log-likelihood function, the following elements of the observed information matrix are obtained:
The elements of the expected information matrix
are obtained by calculating the expected value of the elements of the observed information matrix. Due to the shape of these elements, they cannot be found explicitly, so numerical methods must be used to find the respective expected values. By setting
the expected information matrix is
, where
. Since the observed information matrix converges asymptotically to the expected information matrix, for
and large sample sizes, we have:
4. Application of the EEBLN Distribution
This section contains an illustration with real data from the studied bimodal distributions, which are compared with other existing methodologies.
The data set used in this illustration contains 85 observations regarding the nickel content in soil samples that were analyzed by the Mining Department of the Universidad de Atacama in Chile. The aim is to show the EEBLN distribution as an alternative to modeling unimodal and/or bimodal data.
Table 1 contains the main descriptive statistics of the application data. Note that the data have a high degree of kurtosis and a high degree of positive asymmetry; therefore, the EBLN and EEBLN models can be considered viable to fit this data set.
To compare the proposed distributions (EBLN and EEBLN), the flexible Birnbaum–Saunders (FBS), skewed Birnbaum–Saunders (SBS), log-normal (LN), and log-power-normal (LPN) distributions were also fitted. The fits were made using the maxLik function of the R Development Core Team [
21], obtaining the maximum likelihood estimates (MLE) with their respective standard errors in parentheses, which are obtained numerically as the square root of the diagonal elements of the matrix
, where:
with
and
. The results are presented in
Table 2 for each of the six distributions considered. To compare the distributions in question, the AIC criteria in [
22], the corrected AIC (AICC) in [
23], and the Bayesian information criterion (BIC) in [
24] were used. The criteria were defined by:
where
p is the number of parameters and
is the log-likelihood function evaluated at the MLEs of the parameters. The best model is the one with the smallest AIC, AICC, or BIC.
To test the significance of the bimodality parameter
in the data set, we considered the hypothesis system as follows:
which compares the fit of the LPN and EEBLN distributions to the set of data. We used the likelihood ratio (LR) statistic (see Lehmann and Romano [
25]), which is given by:
where
and
are the likelihood functions associated with the log-power-normal and exponentiated elliptical bimodal log-normal distributions, respectively, evaluated in the maximum likelihood estimators. After evaluating, we found that
, with a
; therefore, the null hypothesis
was rejected, and the parameter
was statistically significant to fit the nickel concentration data. Based on this hypothesis test, and the goodness-of-fit criteria AIC, AICC, and BIC (the smallest values among the considered models), it can be concluded that the EEBLN distribution has a better fit than the LPN distribution to the nickel concentration data.
Similarly, the hypothesis of EEBLN versus LN was tested through the hypothesis system:
with the likelihood ratio statistic:
where
is the likelihood function associated with the log-normal distribution. The sample data led to
with a
p-value = 0.000 < 0.05. Then, the null hypothesis was rejected. A similar reasoning based on the results of the hypothesis test and the AIC, AICC, and BIC comparison criteria allows us to conclude that both the parameters
and
are statistically significant to fit the nickel concentration data, that is, the EEBLN distribution also has a better fit than the LN distribution to the nickel concentration data. Due to all of the above, the EEBLN distribution captures the high degree of asymmetry and kurtosis, in addition to the bimodality present in the data set.
Figure 5 shows the fitted density functions and the empirical distribution function for the variable concentration of nickel, which reveals that the fit of the EEBLN model is quite good.
In addition, we performed the Anderson–Darling (AD) goodness-of-fit test for the nickel concentration data. This test measures how well data follow a particular distribution (Anderson and Darling [
26], Anderson and Darling [
27]); the better the fit, the lower the AD statistic, and analogously if the
p-value of the test is lower. At the specified level of significance (usually 0.05 or 0.10), it is concluded that the data do not follow the specified distribution. Therefore, the larger the
p-value, the better the fit of the distribution to the data.
The hypotheses to be tested are:
Hypothesis 1 (H1). The data follows distribution.
versus
Hypothesis 2 (H2). The data does not followdistribution.
with the test statistic:
where
is a distribution function chosen for testing.
Using the ad.test function from the goftest library of the R Development Core Team [
21], we obtained the value of the AD statistic, as well as its corresponding
p-value, from the nickel concentration data, yielding the results in
Table 3.
It is clear that the EEBLN distribution presented a lower AD statistic as well as a higher p-value; therefore, it fit the nickel concentration data better compared to the other distributions considered.
5. Concluding Remarks
In this work, two new, absolutely continuous probability distributions were presented to model positive bimodal data with high or low degrees of skewness and kurtosis. The new proposals, which are called the exponentiated bimodal log-normal (EBLN) and the exponentiated elliptical bimodal log-normal (EEBLN), were obtained from the extension of the bimodal-normal distribution and the alpha-power family. For the introduced distributions, their main properties were studied as a function of probability density, cumulative distribution, survival, and hazard. Parameter estimates were made using the maximum likelihood method. It is highlighted that the expected information matrices for both distributions were non-singular.
In addition, an application was made with nickel concentration data in soil samples and the results were compared with different existing models in the literature. The results indicate that the EBLN and EEBLN distributions showed a good fit to the aforementioned data, being the best fit for the EBLPN distribution, which demonstrates the great applicability of the proposals in the analysis of real data from different areas of knowledge.
Future work contemplates extending the proposed distributions to situations where the data under analysis present censorship and regression models. Another field of interest is to carry out the inference of these types of distributions from a Bayesian perspective.