**1. Introduction**

It frequently occurs in real life that we find continuous data that are bimodal; these cannot be modeled by known unimodal distributions. It is therefore of interest to investigate more flexible distributions in modes that will be useful for professionals working in different areas of knowledge.

In unimodal distributions, the flexibility is based on the asymmetry and kurtosis of the data. In this context, Azzalini [1] introduced the skew-normal (SN) distribution, with asymmetry parameter *λ*. It has a probability density function (pdf) given by

$$f\left(y;\mu,\sigma,\lambda\right) = \frac{2}{\sigma} \phi\left(\frac{y-\mu}{\sigma}\right) \Phi\left(\lambda \frac{(y-\mu)}{\sigma}\right), \qquad y,\mu,\lambda \in \mathbb{R}, \sigma > 0,\tag{1}$$

where *φ* and Φ denote, respectively, the density and cumulative distribution functions of the *N* (0, 1) distribution. This is denoted as *Y* ∼ *SN*(*λ*). SN(0) becomes the standard normal distribution.

Bimodal distributions generated from skew distributions can be found in Ma and Genton [2], Kim [3], Lin et al. [4,5], Elal-Olivero et al. [6], Arnold et al. [7], Arnold et al. [8], and Venegas et al. [9], among others. The importance of studying these distributions is based on the fact that they do not have identifiability problems and can be used as alternative parametric models to replace the use of mixtures of distributions that present estimation problems from either the classical or the Bayesian point of view (see McLachlan and Peel [10]; Marin et al. [11]). One difficulty with these distributions is that in general, there is no closed-form expression for their cumulative distribution function (cdf). This makes it more difficult to generate data from these distributions for simulation studies or to carry out quantile regression. Additionally, many such bimodal distributions have complicated expressions for a general quantile (say, the *q*-th).

A variety of bimodal data sets and appropriate models have been presented by many authors. For example, Cobb et al. [12] used the quartic exponential density presented by Fisher [13] to model crude birth rates data; Rao et al. [14] used a bimodal distribution to analyze fish length data; Famoye et al. [15] used the beta-normal distribution to analyze egg diameter data; Everitt and Hand [16] discussed some mixture distributions for modeling bimodal data; Chatterjee et al. [17] and Weisberg [18] presented two bimodal data sets on the eruption and interruption times of the Old Faithful geyser; Bansal et al. [19] discussed the bimodality of quantum dot size distribution; Famoye et al. [15] cited a variety of bimodal distributions that arise from different areas of science. On the other hand, the sinh Cauchy (SC) distribution is given by

$$f(z; \Lambda) = \frac{\lambda \cosh(z)}{\sigma \pi (1 + \{\lambda \sinh(z)\}^2)},$$

where Λ = (*<sup>λ</sup>*, *μ*, *<sup>σ</sup>*), *z* = *<sup>y</sup>*−*μσ* , *z* ∈ R, *μ* ∈ R is a location parameter, *σ* > 0 is a scale parameter, and *λ* > 0 is a symmetric parameter. The SC distribution produces unimodal and bimodal densities. The disadvantage of the SC distribution is that it is symmetric, which limits it to modeling only symmetric bimodal data. The main objective of this article is therefore to study a bimodal skew-symmetric model with closed cdf, in order to apply it to quantile regression. To do this, we used an extension of the SC distribution that we call the gamma–sinh Cauchy (GSC) distribution, which presents flexibility in its modes and also closed-form expression in its cdf. The GSC distribution belongs to the (gamma-G generator) family introduced by Zografos and Balakrishnan [20]. For any baseline cdf *<sup>G</sup>*(*y*; <sup>Λ</sup>), *x* ∈ R, they defined the gamma-G generator by the pdf and cdf given by

$$f(y; \phi, \Lambda) = \frac{g(y; \Lambda)}{\Gamma(\phi)} \left\{-\log\left[1 - G(y; \Lambda)\right]\right\}^{\phi - 1},\tag{2}$$

and

$$F(y; \phi, \Lambda) = \frac{\gamma (-\log[1 - G(y; \Lambda)], \phi)}{\Gamma(\phi)} = \frac{1}{\Gamma(\phi)} \int\_0^{-\log[1 - G(y; \Lambda)]} u^{\phi - 1} e^{-u} \, du,\tag{3}$$

respectively, where *φ* > 0 is a skewness parameter, Λ is a vector of parameters, *g*(*y*) = *ddy <sup>G</sup>*(*y*), *<sup>γ</sup>*(*y*, *a*) = *y*0 *t<sup>a</sup>*−1*e*<sup>−</sup>*tdt* is the incomplete gamma function, and <sup>Γ</sup>(*a*) = *<sup>γ</sup>*(+<sup>∞</sup>, *a*) is the usual gamma function. We remark that in the literature, there are many models that can accommodate bimodal distributions. However, in only a few of them do the parameters have an interpretation in terms of measures of central tendency (mean, median, for instance) or a general *q*-th quantile. As we will show in Section 3, the main advantage of the GSC is that the location parameter represents the respective *q*-th quantile under a certain restriction over *φ*, which is very convenient for the use of this model in a quantile regression framework.

The paper is organized as follows. Section 2 develops the GSC distribution, its basic properties, and quantile regression. In Section 3, we perform a small-scale simulation study of the maximum likelihood (ML) estimators for parameters. Two applications to real data are discussed in Section 4, which illustrate the usefulness of the proposed model. Finally, conclusions are given in Section 5.

### **2. Gamma–Sinh Cauchy Distribution**

The GSC distribution is obtained considering *G* in (2) as the cdf of the SC distribution. The pdf can be written as

$$f(z; \Theta) = \frac{\lambda \cosh(z)}{\sigma \pi \Gamma(\phi) (1 + \{\lambda \sinh(z)\}^2)} \left\{-\log\left[0.5 - \frac{1}{\pi} \arctan\left\{\lambda \sinh(z)\right\}\right] \right\}^{\phi - 1},\tag{4}$$

where Θ = (*φ*, *λ*, *μ*, *<sup>σ</sup>*), and *φ* > 0 is an asymmetric parameter. We denoted this by *Z* ∼ *GSC*(*φ*, *λ*, *μ*, *<sup>σ</sup>*). The cdf is given by

$$F(z; \phi, \lambda, \mu, \sigma) = \frac{1}{\Gamma(\phi)} \gamma \left( -\log \left[ 0.5 - \frac{1}{\pi} \arctan \{ \lambda \sinh(z) \} \right], \phi \right). \tag{5}$$

Particular cases:


The following proposition states conditions for the symmetry of the GSC distribution.

**Proposition 1.** *The density of the GSC*(*φ*, *λ*, *μ*, *σ*) *model is symmetric if and only if φ* = 1*.*

**Proof.** Without loss of generality, we consider *μ* = 0 and *σ* = 1. For *φ* = 1, the density of the model is

$$f(y;1,\lambda,0,1) = \frac{\lambda \cosh(y)}{\sigma \pi \Gamma(\phi)(1 + \{\lambda \sinh(y)\}^2)}.$$

This function is clearly even because cosh(*y*) and sinh(*y*)<sup>2</sup> are even. To prove the reciprocal, we will argue by contradiction. Let *φ*0 = 1 such that the density is symmetric, i.e. *f*(*y*; *φ*0, *λ*, 0, 1) = *f*(−*y*; *φ*0, *λ*, 0, <sup>1</sup>), ∀*y* ∈ R, ∀*λ* > 0. This implies that

$$\left(\frac{\log\left[\frac{1}{2} + \frac{1}{\pi}\arctan\left\{\lambda\sinh(y)\right\}\right]}{\log\left[\frac{1}{2} - \frac{1}{\pi}\arctan\left\{\lambda\sinh(y)\right\}\right]}\right)^{\Phi\_0 - 1} = 1.$$

From the latter equality, and jointly with the fact that the logarithmic function is injective, we find that arctan(*<sup>λ</sup>* sinh(*y*)) = 0, ∀*y* ∈ R, which implies that *λ* = 0, producing a contradiction.

The unimodal and bimodal regions for *GSC*(*φ*, *λ*, *μ*, *σ*) are illustrated in Figure 1. We can see that for all *φ*, there is *λ* such that *GSC* is bimodal. Figure 2 shows the density function for some values of the parameters *φ* and *λ*, considering the location and scale parameters fixed at 0 and 1, respectively. The distribution assumes symmetric unimodal and bimodal shapes and asymmetric unimodal and bimodal shapes. Figure 3 shows the skewness and kurtosis coefficients for the GSC model under different values of *λ* and *φ* (such coefficients do not depend on *μ* and *σ*). As illustrated previously, the model can assume positive and negative values for the skewness coefficient and can also accommodate kurtosis coefficients lower than, equal to, and greater than the normal model (<3, =3 and >3, respectively).

**Figure 1.** Unimodal and bimodal regions for *GSC*(*φ*, *λ*, *μ*, *<sup>σ</sup>*).

**Figure 2.** Plots for the gamma–sinh Cauchy (GSC) model for different values of the parameters with *μ* = 0, *σ* = 1 (**a**) *φ* = 1 (**b**) *λ* = 0.5, and (**<sup>c</sup>**,**d**) *λ* = 0.2

**Figure 3.** Skewness and kurtosis coefficients for the GSC (*φ*, *λ*, *μ*, *σ*) model with different values for *λ* and *φ*.

### *The GSC Model for Quantile Regression*

From Equation (5), it follows that the cdf of the GSC distribution evaluated in *μ* is given by

$$P(Y \le \mu; \phi, \lambda, \mu, \sigma) = F(\mu; \phi, \lambda, \mu, \sigma) = \frac{\gamma(\log(2), \phi)}{\Gamma(\phi)} = G\left(\log(2), \phi\right),\tag{6}$$

where *<sup>G</sup>*(*y*, *a*) = *<sup>γ</sup>*(*y*, *a*)/Γ(*a*) corresponds to the cdf of the gamma distribution with shape and scale parameters *a* and 1, respectively. Note that *G* (log(2), *φ*) depends only on *φ* (and not on *σ* or *λ*). As *<sup>G</sup>*(log(2), *φ*) is an increasing function in terms of *φ* and *<sup>G</sup>*(·, *φ*) ∈ (0, <sup>1</sup>), the equation *<sup>G</sup>*(log(2), *φ*) = *q* has an unique solution for *q* ∈ (0, <sup>1</sup>), which can also be written as

$$\frac{1}{\Gamma(\phi)} \int\_0^{\log(2)} u^{\phi - 1} e^{-u} du = q. \tag{7}$$

Equation (7) can be solved numerically. For instance, in R the uniroot function can be used. Table 1 shows some values for *φ*(*q*) with different values for *q*.

**Table 1.** Some values for *φ*(*q*) in terms of *q*.


For this reason, for a fixed *q*, if we take *φ* = *φ*(*q*) satisfying (7), by (6) the parameter *μ* directly represents the *q*-th quantile, allowing regression to be performed conveniently even though *μ*. Under this setting, a set of available *p*-covariates, say *xi* = (*xi*1, ... , *xip*), for *i* = 1, ... , *n*, can be introduced as follows:

$$\mu\_i = \mathbf{x}\_i^\top \mathcal{B}\_\prime \quad i = 1, \ldots, n.$$

This is a convenient property of the GSC distribution because it provides a simple way to performing quantile regression in a model that can be unimodal or bimodal, depending only on parameter *λ* (because *φ* = *φ*(*q*) is considered as fixed in this setting).

As far as we know, there is no model in the literature that is parameterized conveniently in terms of the *q*-th quantile and can also be unimodal or bimodal. Figure 1 shows that for any *φ* = *φ*(*q*) fixed in the GSC, there is an interval Λ(1) *φ*(*q*) for *λ* where the distribution is unimodal and an interval Λ(2) *φ*(*q*) where the distribution is bimodal.

### **3. ML Estimation for the GSC Distribution**
