1. Introduction
In the last two decades of the twentieth century, inferential processes assumed the normality of the data. This assumption is not realistic in many cases, and the inferential processes were, therefore, inappropriate. In such situations, many authors decided to transform the variables in order to achieve symmetry or normality of the data, but these transformations led to unsatisfactory results because their interpretation became very complicated. Azzalini [
1] introduced the skew-normal (SN) distribution, which allows asymmetric data to be modelled without the need for transformation. The probability density function (pdf) of the SN distribution is given by
where
,
and
represent the pdf and cumulative distribution function (cdf) of the standard normal distribution, respectively. This is usually denoted by SN(
), where
is a shape parameter. Many works have since been published on the SN distribution. To name a few, Azzalini [
2], Henze [
3], Chiogna [
4], Pewsey [
5], Arellano-Valle et al. [
6], DiCiccio and Monti [
7], Salinas et al. [
8], Rosco et al. [
9], Shafiei and Doostparast [
10], Adcock and Azzalini [
11], etc.
Mudholkar and Hutson [
12] studied the epsilon-skew-normal (ESN) distribution with an asymmetry parameter
, such that the standard normal distribution is recovered when
. Specifically,
X has an ESN distribution if its pdf can be written as
where sgn
is the sign function and
. We denote this by
. The properties of this distribution were studied extensively by Mudholkar and Hutson [
12]. Arellano-Valle et al. [
13] introduced a general family of epsilon-skew-symmetric (ESS) distributions, of which the ESN distribution is a particular case. Some works using distributions of this family are as follows: Hansen [
14] applied the epsilon-skew-t distribution to economic data; Gómez et al. [
15] applied it to mining data; recently, Celis et al. [
16] introduced an epsilon-positive family of distributions based on the ESS family and applied it to data with or without censoring and Bevilacqua et al. [
17] used the ESS family for modelling atypical data in special statistics.
The SN and ESN distributions are unimodal, meaning that they are not appropriate in fields such as economics, health, engineering and many others where the data are often bimodal. One of the classic distributions used for modelling bimodal data is the mixture of the normal (MN) distribution, but it was criticised for identifiability problems. See, for example, McLachlan and Peel [
18] and Marin et al. [
19]. Despite the many innovations introduced in this area, the problem still persists in many cases, which is a disadvantage when working with models of this type. Hence researchers continue to develop new symmetric and asymmetric bimodal distributions.
Bimodal distributions have been obtained from this skew-symmetric model. For example, Azzalini and Capitanio [
20], Ma and Genton [
21], Arellano-Valle et al. [
13], Kim [
22], Elal-Olivero et al. [
23], Arnold [
24], Gómez et al. [
25], Hassan and El-Bassiouni [
26], da Silva et al. [
27], Cordeiro et al. [
28], da Braga et al. [
29], Altun et al. [
30] and Alizadeh et al. [
31], among others. For further information on the results of SN distribution and related families, see Azzalini’s book [
32].
Models of this type were studied by Kim [
22], who introduced a bimodal extension of the SN model, named “two-pieces skew-normal model (TN)”, denoted by
, whose pdf can be written as
where
and
. For
, Kim [
22] discusses that (
2) defines a bimodal and symmetric around zero pdf.
An asymmetric extension of Kim’s model was presented by Arnold [
24], who developed an asymmetric bimodal model named “the extended two-pieces skew-normal model (ETN)”, with pdf given by
where
and
is a normalizing constant. The distribution is denoted by ETN(
).
Another model widely applied in these situations is the MN distribution, which is given by:
where
denotes the pdf of the normal distribution with parameters mean
and standard deviation
and
.
This article is organised as follows: in
Section 2, we give the pdf of the new distribution, its basic properties and moments; in
Section 3, we make an inference by the maximum likelihood (ML) method, we calculate the information matrix and we carry out a simulation study to assess the properties of the ML estimators in finite samples; in
Section 4, we show three fits to real data sets and compare them with other distributions, and in
Section 5, we discuss some conclusions.
4. Applications
In this section, we fit the BESN distribution to three real data sets that are widely used in the literature, namely the roller, birthweight and nickel data sets. The first application is to a unimodal data set and is compared with the fit of the normal (N) distribution; the second application is to a symmetric bimodal data set and is compared with the fits of the N and TN distributions; the third application is to an asymmetric bimodal data set and is compared with the fits of the SN, ETN and MN distributions. To compare the models, we use the Akaike information criterion AIC (see Akaike [
34]) and the Bayesian information criterion BIC (see Schwarz [
35]). Traditionally the preferred model is the one with the smallest AIC and/or BIC.
4.1. First Application: Roller Data
In this first application, we use the data set related to 1150 heights measured at 1-micron intervals along the drum of a roller (i.e., parallel to the axis of the roller). This was part of an extensive study of the surface roughness of the roller. It is available for downloading at
http://lib.stat.cmu.edu/jasadata/laslett (accessed on 5 November 2022). Summary statistics for the data set are presented in
Table 4.
Given the values of sample asymmetry, , and sample kurtosis, , there is strong evidence that an asymmetric model could provide a better fit to the data under study. Therefore, the N and BESN distributions are fitted to the data set.
The ML estimates for each model (N and BEST) and standard errors (SE) in parentheses are: and with AIC and BIC for the N distribution and and with AIC and BIC for the BESN distribution.
According to AIC and BIC, the BESN distribution provides a better fit for the roller data set than the N distribution. In other words, the BESN distribution achieves a satisfactory fit for skewness and kurtosis, which is not adequately fitted by the N distribution. Therefore, the BESN distribution presents the best fit for the roller data set. A qq-plot for the variable roller, using normal and BESN distributions, is shown in
Figure 3a,b.
Figure 3c shows the empirical cdf for the variable roller (solid line), while the dotted line corresponds to the cdf for the BESN model. The results suggest a better fit for the BESN model.
In addition, we also compare the N and BESN distributions with a hypothesis test. Specifically, we propose the hypothesis
which can be tested using the statistic
After numerical evaluations, we obtain , which is greater than the critical 5% chi-squared value with two degrees of freedom, namely .
4.2. Second Application: Birthweight Data
In the second application, we study the fit of the BESN model to 500 units observed for the variable
Z=
b.weight, which is the ultrasound weight (birthweight in grams). These data are available as supplementary material. The summary statistics for the data set are presented in
Table 5.
Given the symmetry of these data, we propose to fit a BESN model taking and then compare this with the fit of the N and TN models.
We used AIC and BIC to compare the fits of the N, TN and BESN models. According to these criteria (see
Table 6 and
Figure 4a), the BESN model is seen to present a better fit than the N and TN models.
Figure 4b,c show the qq-plot for the BESN model and the empirical cdf. The plots suggest a better fit for the BESN model than its competitors.
4.3. Third Application: Nickel Data
In this third application, we use a data set related to the logarithm of the nickel content in soil samples analysed at the Mines Department of Universidad de Atacama, Chile, which are available as supplementary material.
In this section, we compare the fits of the SN, BESN, ETN and MN models to the above data set.
The pdf for the MN model can be written as
with parameters
,
and
, we denote as MN
.
The comparisons are made using the AIC and BIC for the variable
Y, the logarithm of the nickel concentration. In all cases, the parameters are estimated by ML using
bbmle in the R software package [
36]. The SE of the ML estimates is calculated using the observed information matrix corresponding to each model.
Table 7 gives the estimated parameters and AIC and BIC for the SN, ETN, BESN and MN models. The respective SE are in parentheses. The graph in
Figure 5 shows that the BESN model presents quite a good fit.
In all cases, the models are augmented by the inclusion of location () and scale () parameters. Since the models are not nested, the AIC and BIC have been used to compare the distributions. According to these criteria, the BESN model provides the best fit to the data of the example. Hence, the BESN model seems to be a useful alternative for modelling the data for the logarithm of the nickel concentration.
The conclusion of the study is that the BESN model appears to be more appropriate for the particular data sets analysed here. These points are illustrated in more detail in
Figure 5, where the histograms and the fitted curves for the data sets are displayed.