*4.1. Nickel Concentration*

For illustrative purposes, we apply the FBS model to a data set related to nickel content in soil samples. This data set encompasses 85 observations of the variable concentration of nickel with sample mean = 21.588, sample standard deviation = 16.573, sample asymmetry = 2.392 and sample kurtosis = 8.325, much higher than expected with the ordinary BS distribution.

### 4.1.1. FBS versus the BS and SBS distributions

To fit the nickel concentration variable, we use the BS, skew BS (SBS) and FBS models. Using function optim from the R-package, [28], the following point estimates (and their standard errors) are obtained for each of the three models under consideration

BS model: *α*ˆ = 0.789 (0.060) and *β*ˆ = 16.382 (1.296).

 :

 =

SBS model: *α*ˆ = 1.073 (0.201), *β*ˆ = 8.841 (1.998) and *λ*ˆ = 1.252 (0.590). FBS model: *α*ˆ = 0.870 (0.104), *β*ˆ = 5.072 (0.763), ˆ *δ* = −1.520 (0.282) and *λ*ˆ = 1.405 (0.341). Thebimodalhypothesiscanbeformally testedasfollows

$$H\_0: \delta = 0 \quad \text{versus} \quad H\_1: \delta \neq 0,$$

which is equivalent to compare models SBS versus FBS. Given the nonsigularity of the Fisher information matrix, and since these models are nested, we can consider the likelihood ratio statistics, namely

 :

=

$$
\Lambda\_1 = L\_{SBS}(\widehat{\mathfrak{a}}, \widehat{\mathfrak{B}}, \widehat{\lambda}) / L\_{FBS}(\widehat{\mathfrak{a}}, \widehat{\mathfrak{B}}, \widehat{\mathfrak{S}}, \widehat{\lambda}).
$$

It is obtained −2 log(<sup>Λ</sup>1) = 5.618, which is greater than the 5% chi-square critical value with one degree of freedom (df), which is equal to 3.84. Therefore, the null hypothesis of no-bimodality is rejected at the 5% critical level, leading to the conclusion that FBS model fits better than the unimodal SBS model to the nickel concentration data.

To compare the FBS model with the BS model, consider to test the null hypothesis of a BS distribution versus a FBS distribution, that is

$$H\_0: \ (\delta, \lambda) = (0, 0) \quad \text{vs} \quad H\_1: \ (\delta, \lambda) \neq (0, 0)$$

using the likelihood ratio statistics based on the ratio Λ2 = *LBS*(*α*, *β* )/*LFBS*(*α*, *β* , *δ* , *<sup>λ</sup>*). After substituting the estimated values, we obtain −2 log(<sup>Λ</sup>2) = 7.628, which is greater than the 5% chi-square critical value with 2 df, which is 5.99. Therefore the FBS is preferred to BS model for this data set.

### 4.1.2. FBS versus a Mixture of Normal Distributions

Another model widely applied in such situations of bimodality is the mixture of two normal distributions. The normal mixture model is given by:

$$f(\mathbf{x}; \mu\_1, \sigma\_1, \mu\_2, \sigma\_2, p) = p f\_1(\mathbf{x}, \mu\_1, \sigma\_1) + (1 - p) f\_2(\mathbf{x}; \mu\_2, \sigma\_2) \tag{23}$$

where *fj* is a normal distribution with parameters (*μj*, *<sup>σ</sup>j*), *j* = 1, 2 and 0 < *p* < 1. (23) is denoted by MN(*μ*1, *σ*1, *μ*2, *σ*2, *p*).

To compare FBS model with the MN model, we propose the Akaike information criterion (AIC), see [31], namely *AIC* = −2<sup>ˆ</sup> -(·) + 2*k*, the modified AIC criterion (CAIC), typically called the consistent AIC, namely *CAIC* = −2<sup>ˆ</sup> -(·)+(<sup>1</sup> + log(*n*))*<sup>k</sup>* and the Bayesian Information Criterion, BIC, *BIC* = −2<sup>ˆ</sup> -(·) + log(*n*)*k*, where *k* is the number of parameters and ˆ -(·) is the log-likelihood function evaluated at the MLEs of parameters. The best model is the one with the smallest AIC or CAIC or BIC.

Now we compare the FBS with MN(*μ*1, *σ*1, *μ*2, *σ*2, *p*). The estimated mixture model is

$$\text{MN(15.348, 6.622, 40.908, 21.960, 0.755)}$$

with *AIC* = 674.849, *CAIC* = 692.061 and *BIC* = 687.062. On the other hand, for the FBS model, we have *AIC* = 671.859, *CAIC* = 685.628 and *BIC* = 681.630. According to these criteria, the FBS model provides a better fit to the data of nickel concentration.

### 4.1.3. FBS versus a Mixture of Log-Normal Distributions

Following reviewer's recommendations, a mixture of two log-normal distributions is also considered. The log-normal mixture model will be given by (23) with *fj* the pdf of a log-normal distribution with parameters (*μj*, *<sup>σ</sup>j*), *j* = 1, 2 and 0 < *p* < 1, and it is denoted by MLN(*μ*1, *σ*1, *μ*2, *σ*2, *p*). The estimated mixture model is

$$\text{MLN}(2.829, \, 0.177, \, 2.8275, \, 0.877, \, 0.327)$$

with *AIC* = 663.571, *CAIC* = 680.784 and *BIC* = 675.784. All of them are less than those corresponding to FBS. So, according to these criteria, the mixture ot two log-normal distributions provides a better fit to this dataset than the FBS model.

This discussion illustrates that, quite often, the final selection of a model is a matter of choice. FBS model can be considered as appropriate if we want to use a more parsimonious model, and this is better than other BS models and a mixture of two normal distributions. On the other hand, based on AIC, CAIC and BIC, the mixture of two log-normal would be preferred but this model has more parameters than the FBS distribution and may present problems of identifiability. Anyway, the final choice must be properly justified.

Figure 4 depicts maximized likelihoods and empirical cdf for variable nickel concentration revealing that FBS model fitting is quite good.

**Figure 4.** (**a**) Plots for FBS, (solid line), MLN (dashed line), BS (dotted line) and SBS (dotted and dashed line) models . (**b**) Empirical cdf with estimated FBS cdf (dashed line) and estimated BS cdf (dotted line).

**Remark 3.** *Going through the origin of this data set, the bimodal behavior of the nickel concentration statistical model seems to be due to the fact that the samples were taken according to different lithologies. Lithology classifies according to the physical and chemical elements in rock formation. Mining operations found different lithologies in these samples, as it is depicted in Figure 4.*
