New Regression Models Based on the Unit-Sinh-Normal Distribution: Properties, Inference, and Applications

Martínez-Flórez, Guillermo; Tovar-Falón, Roger

doi:10.3390/math9111231

Open AccessArticle

New Regression Models Based on the Unit-Sinh-Normal Distribution: Properties, Inference, and Applications

by

Guillermo Martínez-Flórez

^†

and

Roger Tovar-Falón

^*,†

Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 230027, Colombia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2021, 9(11), 1231; https://doi.org/10.3390/math9111231

Submission received: 25 March 2021 / Revised: 13 May 2021 / Accepted: 17 May 2021 / Published: 28 May 2021

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, two new distributions were introduced to model unimodal and/or bimodal data. The first distribution, which was obtained by applying a simple transformation to a unit-Birnbaum–Saunders random variable, is useful for modeling data with positive support, while the second is appropriate for fitting data on the (0,1) interval. Extensions to regression models were also studied in this work, and statistical inference was performed from a classical perspective by using the maximum likelihood method. A small simulation study is presented to evaluate the benefits of the maximum likelihood estimates of the parameters. Finally, two applications to real data sets are reported to illustrate the developed methodology.

Keywords:

unit-Birnbaum–Saunders distribution; log-sinh-normal regression model; unit-sinh-normal regression model; maximum likelihood method

1. Introduction

The Birnbaum–Saunders (BS) distribution has been used principally for modeling the lifetime of certain structures under dynamic load, and it was introduced by Birnbaum and Saunders [1]. The probability density function (pdf) of the BS distribution is given by:

\begin{matrix} f_{T} (t) = \frac{t^{- 3 / 2} (t + β)}{2 α \sqrt{β}} ϕ (a_{t}), t > 0, \end{matrix}

(1)

where

ϕ (\cdot)

is the pdf of the normal distribution and

a_{t} = \frac{1}{α} (\sqrt{t / β} - \sqrt{β / t})

, where

α > 0

is a shape parameter and

β > 0

is a scale parameter. We use the notation

T \sim BS (α, β)

. The BS model has been extended to a large number of families of distributions. Castillo et al. [2], for example, introduced the epsilon Birnbaum–Saunders family of distributions based on the epsilon-skew-symmetric distribution, while Vilca-Labra and Leiva-Sánchez [3] proposed a new fatigue model from the skew-elliptical family of distributions. The new proposal is called the doubly generalized Birnbaum–Saunders distribution, and within its main properties, it is highlighted that the incorporation of the elliptical aspect allows the kurtosis to be flexible and that the skewness makes the asymmetry flexible. Martínez-Flórez et al. [4] introduced the asymmetric alpha-power extension of the BS model. A generalization referred to as the proportional hazard Birnbaum–Saunders distribution was studied by Moreno-Arenas et al. [5], which includes a new parameter that provides more flexibility in terms of skewness and kurtosis. The BS model also has been used in the study of linear regression models as in Rieck and Nedelman [6], where it was supposed that

Y_{i} = log (T_{i})

with

T_{i} \sim BS (α, β)

for

i = 1, 2, \dots, n

and the errors in the linear model have a sinh-normal (SHN) distribution with parameter vector

{(α, 0, 2)}^{⊤}

. Santos and Cribari-Neto [7] numerically evaluated the finite sample performances of the likelihood ratio, score, and Wald tests in the log-Birnbaum–Saunders regression model and introduced a RESET-like misspecification test for the proposed model by Rieck and Nedelman [6]. Furthermore, Balakrishnan and Zhu [8] discussed the maximum likelihood estimation of the model parameters under a log-linear link function for the BS lifetime regression model with equal and unequal shape parameters.

The pdf of the SHN model is given by:

\begin{matrix} f_{SHN} (y; α, γ, σ) = \frac{2}{α σ} cosh (\frac{y - γ}{σ}) ϕ (\frac{2}{α} sinh (\frac{y - γ}{σ})), y \in R, \end{matrix}

(2)

where

α > 0

is a shape parameter,

γ

is a location parameter, and

σ > 0

is a scale parameter. A random variable Y following the model in (2) is denoted by

SHN (α, γ, σ)

.

The SHN model was extended by Barros et al. [9] by considering a Student t distribution for the errors. This proposed Student t log-BS regression model allows attenuating the influence of the outlying observations. Other extensions of the SHN model were considered by Leiva et al. [10] and Santana et al. [11].

Generalizations of the BS distribution to model data with support in the interval (0, 1) have also been considered by several authors. Mazucheli et al. [12] presented a type of BS distribution with support in the interval

(0, 1)

, which became a new alternative to the beta and Kumaraswamy distributions. This new proposal is called the unit-Birnbaum–Saunders (UBS) model and has the pdf given by:

\begin{matrix} f_{UBS} (x; α, β) = & \frac{1}{2 x α β \sqrt{2 π}} [{(- \frac{β}{log (x)})}^{\frac{1}{2}} + {(- \frac{β}{log (x)})}^{\frac{3}{2}}] \\ \times exp \{\frac{1}{2 α^{2}} [\frac{log (x)}{β} + \frac{β}{log (x)} + 2]\}, \end{matrix}

(3)

where

x \in (0, 1)

,

α > 0

is a shape parameter and

β > 0

is a scale parameter.

To explain response variables between zero and one, such as proportions or rates, alternative statistical models to the beta regression model were studied by Martínez-Flórez et al. [13]. The beta regression model is useful to study relations between variables where the response corresponds to rates, proportions, or indexes. Among the several studies related to the issue, we have Ospina et al. [14], Simas et al. [15], Rocha and Simas [16] and Cribari-Neto and Souza [17], among others. Recent applications of the beta regression model can be found in Ghosh [18], who developed the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model. For the applications, the author considered data on health measurements of several athletes collected at the Australian Institute of Sport (HIV data) and data on anxiety, depression, and stress in non-clinical women in Australia (stress-anxiety data). On the other hand, Kim et al. [19] proposed control charts of mean and variance by using a copula Markov statistical process control (SPC) and a conditional distribution with diverse copula functions. The authors used beta regression to explain the behavior of the average run lengths of the control charts of conditional variance with data on Major League Baseball (MLB) batting average (BA) and earned run average (ERA) data from the 1998 to 2016 seasons. The main objective of this work is to introduce new families of distributions capable of modeling bimodal data with positive support or on the unit interval. The extension to the case of regression models is also studied.

The rest of this paper is organized as follows: Section 2 introduces the non-negative sinh-normal distribution, and its main statistical properties are studied in detail. The log-sinh-normal regression model is also studied. In Section 3, the log-sinh-normal regression model is introduced, and its main properties are discussed. Section 4 presents the normal distribution, and its respective extension to the case of regression models is studied. In Section 5, a small Monte Carlo simulation study is presented. Finally, in Section 6, two real data applications are reported and compared with several rival models.

2. Non-Negative Sinh-Normal Distribution

In this section, a new non-negative distribution is introduced, which is obtained by extension of the UBS model. Let X be a random variable following a UBS distribution. If

Y = - log (X)

, then the distribution of Y has positive support and is referred to as a non-negative sinh-normal (SHN) distribution. The pdf of the non-negative SHN model is given by:

\begin{matrix} f (y; α, β) = \frac{1}{α y} cosh (\frac{log (y) - log (β)}{2}) ϕ (\frac{2}{α} sinh (\frac{log (y) - log (β)}{2})), \end{matrix}

(4)

where

ϕ (\cdot)

is the pdf of the standard normal distribution. The distribution in (4) can also be called log-unit-Birnbaum–Saunders (LUBS). One can see that a more general form of the non-negative SHN model is given by the pdf:

\begin{matrix} f_{LSHN} (y; α, γ, σ) = \frac{2}{α σ y} cosh (\frac{log (y) - γ}{σ}) ϕ (\frac{2}{α} sinh (\frac{log (y) - γ}{σ})), \end{matrix}

(5)

where

y > 0

,

α

,

γ

, and

σ

are the parameters of the shape, location, and scale, respectively. This model is denoted by

LSHN (α, γ, σ)

, and we refer to it as the log-sinh-normal model.

The density function in Equation (5) integrates to one, and the proof of this can be seen in Appendix A. Figure 1 displays some forms of the pdf of the LSHN distribution for selected values of

α

,

γ

, and

σ

. One can see in Figure 1a that the LSHN density is unimodal for

α \leq 2

, whereas for

α > 2

, the LSHN density is bimodal (see Figure 1b). This is a great result since it is possible to have a distribution for positive bimodal data.

2.1. Distribution Function, Survival Function, and Hazard Function of the LSHN Model

The cumulative distribution function (cdf) of the LSHN model is given by:

\begin{matrix} F_{LSHN} (y) = F_{SHN} (log (y)) = Φ (\frac{2}{α} sinh (\frac{log (y) - γ}{σ})), \end{matrix}

(6)

where

F_{S H N} (\cdot)

is the cdf of the SHN distribution. It follows from (5) and (6) that the survival and hazard functions are given, respectively, by:

\begin{matrix} S_{LSHN} (t) & = & 1 - F_{SHN} (log (t)) \\ = & 1 - Φ (\frac{2}{α} sinh (\frac{log (t) - γ}{σ})) \\ = & S_{SHN} (log (t)) \end{matrix}

and:

\begin{matrix} r_{LSHN} (t) & = & \frac{f_{LSHN} (t)}{1 - Φ (\frac{2}{α} sinh (\frac{log (t) - γ}{σ}))} \\ = & r_{SHN} (log (t)) \end{matrix}

where

S_{S H N} (\cdot)

and

r_{S H N} (\cdot)

are the survival and hazard functions of the SHN model. The graphs in Figure 2 show the form of the hazard function for some selected values of the parameters. The plots reveal that the LSHN density increases up to a certain value and then decreases to zero.

2.2. Moments of the LSHN Model

It can be shown that the r-th moment of the random variable Y following a

LSHN (α, γ, σ)

distribution is given by:

E (Y) = M_{Z} (r)

where

M_{Z} (r)

is the moment-generating function (mgf) of the random variable with the SHN distribution. Following some results found by Rieck [20], we have that:

\begin{matrix} E (Y^{r}) = e^{r γ} [\frac{k_{a} (α^{- 2}) + k_{b} (α^{- 2})}{k_{1 / 2} (α^{- 2})}] \end{matrix}

where

a = \frac{r σ + 1}{2}

,

b = \frac{r σ - 1}{2}

, and

k_{λ} (\cdot)

is the third-order Besser function defined by:

k_{λ} (v) = \frac{1}{2} {(\frac{v}{2})}^{λ} \int_{0}^{\infty} u^{- λ - 1} e^{- u - \frac{v^{2}}{4 u}} d u .

(7)

For the special case of

σ = 2

(the LUBS model), one can prove that:

E (Y) = e^{γ} \frac{2 + α^{2}}{2}, E (Y^{2}) = e^{2 γ} \frac{2 + 4 α^{2} + 3 α^{4}}{2}

and:

Var (Y) = e^{2 γ} \frac{α^{2} (5 α^{2} + 4)}{4} .

From the above results, it can be concluded that the LSHN distribution can be obtained by applying the transformation

Y = e^{Z}

to a random variable

Z \sim SHN (α, γ, σ)

.

2.3. Cumulant-Generating Function and Mode

Let

Y = - log (X)

with

X \sim UBS (α, β)

, then the random variable Y has an LSHN distribution. It follows that:

M_{Y} (t) = E (e^{t Y}) = E (e^{- t log (X)}) = E (X^{- t}) .

Letting

r = - t

, for

t < 0

, and following Mazucheli et al. [21], we have:

M_{Y} (t) = E (X^{r}) = \frac{1}{2 (2 r α^{2} β + 1)} [2 r α^{2} β + \sqrt{2 r α^{2} β + 1} + 1] e^{- \frac{\sqrt{2 r α^{2} β + 1} - 1}{α^{2}}} .

Now, the cumulant-generating function (cgf) is given by:

\begin{matrix} K_{Y} (t) = \sum_{j = 1}^{\infty} \frac{K_{j} (Y) t^{j}}{j!} = log (M_{Y} (t)) \end{matrix}

(8)

where

K_{j} (Y)

is the j-th moment of the random variable Y. We have that,

\begin{matrix} K_{Y} (t) = - log (2 (2 r α^{2} β + 1)) - \frac{\sqrt{2 r α^{2} β + 1} - 1}{α^{2}} + log (2 r α^{2} β + \sqrt{2 r α^{2} β + 1} + 1) . \end{matrix}

(9)

The modes of the LSHN distribution can be obtained by maximizing the logarithm of the pdf. Thus, let

ξ_{1} = \frac{2}{α} cosh (\frac{log (Y) - γ}{σ})

and

ξ_{2} = \frac{2}{α} sinh (\frac{log (Y) - γ}{σ})

in the logarithm of the pdf of the LSHN model; taking the derivative and setting the resulting derivative equal to zero, it is obtained that the mode (or modes) of the pdf of the LSHN distribution is (are) the solution(s) of the non-linear equation:

ξ_{2} - ξ_{1}^{2} ξ_{2} - σ ξ_{1} = 0 .

Solving this non-linear equation, the mode(s) of the LSHN distribution is (are) found.

2.4. Asymptotic Distribution

If

Y \sim LSHN (α, γ, σ)

, it can be proven that the random variable

(log (Y) - γ) / (α σ / 2)

converges to a normal distribution when

α \to 0

, that is random variable Y converges to a log-normal (LN) distribution when

α \to 0

. Therefore, it follows from (6) that, if

Y \sim LSHN (α, γ, σ)

, then:

Z = \frac{2}{α} sinh (\frac{log (Y) - γ}{σ}) \sim N (0, 1) .

Thus, if

Z \sim N (0, 1)

, then:

Y = e^{γ + σ {sinh}^{- 1} (\frac{α Z}{2})} \sim LSHN (α, γ, σ),

where

{sinh}^{- 1} (\cdot)

is the inverse function of the

sinh (\cdot)

function.

3. The LSHN Regression Model

Regression models have been a statistical technique widely used in many areas of knowledge to explain the behavior of a response variable, say Y, as a function of other variables called explanatory variables, say

X_{1}, \dots, X_{p}

, and a vector of unknown parameters called regression coefficients, which is denoted by

θ

. In this section, the LSHN linear regression model is introduced by considering a random sample of variables

Y_{i}

, such that:

Y_{i} \sim LSHN (α, x_{i}^{⊤} θ, σ)

(10)

for

i = 1, \dots, n

, with

x_{i} = {(X_{i 1}, \dots, X_{i p})}^{⊤}

and

θ = {(θ_{1}, \dots, θ_{p})}^{⊤}

. In this case, we suppose the functional relationship:

log (Y_{i}) = x_{i}^{⊤} θ + ε_{i}

(11)

where the random variables

ε_{i} \sim SHN (α, 0, σ)

for

i = 1, \dots, n

.

The functional relationship in Equation (11) is justified below from Theorem 1.

Theorem 1.

Let

X \sim U B S (α, β)

, then for

c > 0

,

Y = X^{1 / c}

has a

U B S (α, β / c)

distribution.

Proof.

Consider

X \sim UBS (α, β)

, and let

Y = X^{1 / c}

, then

X = Y^{c}

and

d X / d Y = c Y^{c - 1}

; thus:

\begin{matrix} f (y) = & \frac{1}{2 y^{c} α β \sqrt{2 π}} [{(- \frac{β}{log (y^{c})})}^{\frac{1}{2}} + {(- \frac{β}{log (y^{c})})}^{\frac{3}{2}}] \\ \times exp \{\frac{1}{2 α^{2}} [\frac{log (y^{c})}{β} + \frac{β}{log (y^{c})} + 2]\} c y^{c - 1} \\ = & \frac{1}{2 y α (β / c) \sqrt{2 π}} [{(- \frac{β / c}{log (y)})}^{\frac{1}{2}} + {(- \frac{β / c}{log (y)})}^{\frac{3}{2}}] \\ \times exp \{\frac{1}{2 α^{2}} [\frac{log (y)}{β / c} + \frac{β / c}{log (y)} + 2]\} \end{matrix}

That is,

Y \sim UBS (α, β / c)

. □

To construct the model, we considered a random sample

X_{1}, X_{2}, \dots, X_{n}

, such that

X_{i} \sim UBS (α_{i}, β_{i})

for

i = 1, 2, \dots, n

; and we supposed that

X_{i} = f (Z_{1}, Z_{2}, \dots, Z_{p})

and

β_{i} = exp (z_{i}^{⊤} θ)

for

i = 1, 2, \dots, n,

where

z_{i} = {(Z_{1}, Z_{2}, \dots, Z_{p})}^{⊤}

. Letting

α_{i} = α

and since

X_{i}^{1 / c} \sim UBS (α, β_{i} / c)

(this follows from Theorem 1), taking

X_{i} = δ_{i}^{exp (z_{i}^{⊤} θ)}

where

δ_{i} \sim UBS (α, 1)

, we have that,

X_{i} \sim UBS (α, 1 / (1 / exp (z_{i}^{⊤} θ)))

, that is

X_{i} \sim UBS (α, exp (z_{i}^{⊤} θ))

.

Thus, for

Y_{i} = - log (X_{i}) = - log (δ_{i}^{exp (z_{i}^{⊤} θ)})

, we have that,

\begin{matrix} Y_{i} & = exp (z_{i}^{⊤} θ) \times (- log (δ_{i})) \\ ⟹ log (Y_{i}) & = z_{i}^{⊤} θ + log (- log (δ_{i})) \\ = z_{i}^{⊤} θ + log (ε_{i}^{*}) \\ = z_{i}^{⊤} θ + ε_{i} \end{matrix}

Now, for

ε_{i}^{*} = - log (δ_{i})

, it follows that

ε_{i}^{*} \sim LSHN (α, log (β_{i}), 2)

, then

ε_{i}^{*} \sim LSHN (α, 0, 2)

, and then:

\begin{matrix} f (ε_{i}^{*}) & = \frac{1}{α ε_{i}^{*}} cosh (\frac{log (ε_{i}^{*})}{2}) ϕ (\frac{2}{α} sinh (\frac{log (ε_{i}^{*})}{2})) \\ = \frac{1}{α ε_{i}^{*}} cosh (\frac{log (Y_{i}) - z_{i}^{⊤} θ}{2}) ϕ (\frac{2}{α} sinh (\frac{log (Y_{i}) - z_{i}^{⊤} θ}{2})) \end{matrix}

It can be seen from the previous result that the regression model given in (10) generalizes the obtained model from Theorem 1.

3.1. Maximum Likelihood Estimation in the LSHN Regression Model

To get the estimates of the parameters in the LSHN regression model, we considered the maximum likelihood method. Thus, given a random sample of size n, say

Y = (Y_{1}, \dots, Y_{n})

, where

Y_{i} \sim LSHN (α, x_{i}^{⊤} θ, σ)

, the log-likelihood function for the parameter vector

φ = {(θ^{⊤}, α, σ)}^{⊤}

can be written as follows:

ℓ (φ; Y) \propto - n log (σ) - \sum_{i = 1}^{n} log (Y_{i}) + \sum_{i = 1}^{n} log (ξ_{i 1}) - \frac{1}{2} \sum_{i = 1}^{n} ξ_{i 2}^{2},

(12)

where

ξ_{i 1} = \frac{2}{α} cosh (\frac{log (Y_{i}) - x_{i}^{⊤} θ}{σ})

and

ξ_{i 2} = \frac{2}{α} sinh (\frac{log (Y_{i}) - x_{i}^{⊤} θ}{σ})

for

i = 1, \dots, n

.

After taking partial derivatives of the log-likelihood function (12) with respect to the parameters of interest and setting them equal to zero, we obtain the following score equations:

\begin{matrix} U (θ_{j}) & = \frac{1}{σ} \sum_{i = 1}^{n} x_{i j} (ξ_{i 1} ξ_{i 2} - \frac{ξ_{i 2}}{ξ_{i 1}}), j = 1, \dots, p, \end{matrix}

(13)

\begin{matrix} U (α) & = - \frac{n}{α} + \frac{1}{α} \sum_{i = 1}^{n} ξ_{i 2}^{2}, \end{matrix}

(14)

\begin{matrix} U (σ) & = - \frac{n}{σ} - \frac{1}{σ} \sum_{i = 1}^{n} z_{i} tanh (z_{i}) + \frac{1}{σ} \sum_{i = 1}^{n} z_{i} ξ_{i 1} ξ_{i 2}, \end{matrix}

(15)

where

z_{i} = (log (Y_{i}) - x_{i}^{⊤} θ) / σ

, for

i = 1, \dots, n

. The maximum likelihood estimators for

θ_{1}, \dots, θ_{p}, α

and

σ

, are the solutions to the equations

U (θ_{j}) = 0 (j = 1, \dots, p)

,

U (α) = 0

, and

U (σ) = 0

, which require a numerical method, such as the Newton–Raphson or quasi-Newton.

3.2. Observed and Expected Information Matrix

The elements of the observed information matrix

J (φ)

for the parameter vector

φ = {(θ^{⊤}, α, σ)}^{⊤}

, which are denoted by

j_{φ_{j} φ_{k}}

with

φ_{j} \in {(θ_{1}, \dots, θ_{p}, α, σ)}^{⊤}

, can be obtained by calculating the second partial derivative of the log-likelihood function (12), i.e.,

j_{φ_{j} φ_{k}} = - \partial^{2} ℓ (φ; Y) / \partial φ_{j} \partial φ_{k}

. These elements are given by:

\begin{matrix} j_{θ_{j} θ_{k}} & = \frac{1}{σ^{2}} \sum_{i = 1}^{n} x_{i j} x_{i k} \{2 ξ_{i 2}^{2} + \frac{4}{α^{2}} - 1 + \frac{ξ_{i 2}^{2}}{ξ_{i 2}^{2} + 4 / α^{2}}\} \\ j_{α θ_{j}} & = \frac{2}{σ α} \sum_{i = 1}^{n} x_{i j} ξ_{i 1} ξ_{i 2}, \\ j_{α α} & = - \frac{n}{α^{2}} + \frac{3}{α^{2}} \sum_{i = 1}^{n} ξ_{i 2}^{2}, \\ j_{σ θ_{j}} & = \frac{1}{2 σ} \sum_{i = 1}^{n} x_{i j} [z_{i} (ξ_{i 1}^{2} + ξ_{i 2}^{2} - {sech}^{2} z_{i}) + (ξ_{i 1} ξ_{i 2} - \frac{ξ_{i 2}}{ξ_{i 1}})], \\ j_{σ α} & = \frac{2}{σ α} \sum_{i = 1}^{n} z_{i} ξ_{i 1} ξ_{i 2}, \\ j_{σ σ} & = \frac{2}{σ^{2}} \sum_{i = 1}^{n} z_{i} (ξ_{i 1} ξ_{i 2} - \frac{ξ_{i 2}}{ξ_{i 1}}) + \frac{1}{σ^{2}} \sum_{i = 1}^{n} z_{i}^{2} \{2 ξ_{i 2}^{2} + \frac{4}{α^{2}} - 1 + \frac{ξ_{i 2}^{2}}{ξ_{i 2}^{2} + 4 / α^{2}}\} . \end{matrix}

The previous results are similar to those obtained by Rieck and Nedelman [6]. The elements of the expected information matrix,

I (φ)

, defined as

n^{- 1}

times the expected values of the elements of the observed information matrix, are denoted by

i_{θ θ^{⊤}}

,

i_{α θ_{j}}

,

i_{α α}

,

i_{σ θ}

,

i_{σ α}

, and

i_{σ σ}

. Following Rieck and Nedelman [6], we make:

a_{k} (φ) = E (z^{k} [2 ξ_{i 2}^{2} + \frac{4}{α^{2}} - 1 + \frac{ξ_{i 2}^{2}}{ξ_{i 2}^{2} + 4 / α^{2}}]), b_{k} (φ) = E (z^{k} ξ_{i 1} ξ_{i 2}) and

d_{k} (φ) = E (z \frac{ξ_{i 1}}{ξ_{i 2}}) .

Then, the following elements of the

I (φ)

matrix are obtained:

\begin{matrix} i_{θ θ^{⊤}} & = \frac{1}{σ^{2}} C (α) X^{⊤} X, & i_{α θ_{j}} & = 0, j = 1, \dots, p, \\ i_{α α} & = \frac{2}{α^{2}}, & i_{σ θ} & = \frac{1}{2 σ} [a_{1} (φ) + b_{0} (φ) - d_{0} (φ)] \bar{X}, \\ i_{σ α} & = \frac{2 b_{1} (φ)}{σ α}, & i_{σ σ} & = \frac{a_{2} (φ)}{σ^{2}} + 2 \frac{b_{1} (φ) - d_{1} (φ)}{σ^{2}}, \end{matrix}

where:

C (α) = 1 + \frac{4}{α^{2}} - \sqrt{\frac{2 π}{α^{2}}} \{1 - erf [{(2 / α^{2})}^{1 / 2}] exp (2 / α^{2})\}

and

erf (x)

is the error function given by:

erf (x) = \frac{2}{\sqrt{π}} \int_{0}^{x} e^{- z^{2}} d z .

One can be show that

det (I (φ)) \neq 0

, that is the information matrix is non-singular, which guarantees the existence of the covariance matrix of the maximum likelihood estimators. The Fisher information matrix is given by

Var (φ) = I^{- 1} (φ)

. The existence of

I^{- 1} (φ)

also guarantees that the vector of maximum likelihood estimators has asymptotic distribution:

\sqrt{n} {({\hat{θ}}^{⊤}, \hat{α}, \hat{σ})}^{⊤} \overset{d}{⟶} N_{p + 2} ({(θ^{⊤}, α, σ)}^{⊤}, I^{- 1} (φ))

that is the maximum likelihood estimators of the model parameters are consistent and asymptotically follow a normal distribution with the covariance matrix being the inverse of the Fisher information matrix. The approximation

N_{p + 2} (φ, n^{- 1} I^{- 1} (φ))

can be used to construct confidence intervals for the parameters

φ_{j}

. These confidence intervals are given by:

{\hat{φ}}_{j} \pm z_{1 - α / 2} \times se ({\hat{φ}}_{j}),

where

se ({\hat{φ}}_{j})

corresponds to the square root of the r-th diagonal element of the matrix

I^{- 1} (φ)

and

z_{1 - α / 2}

denotes the

100 (1 - α / 2)

quantile of the standard normal distribution.

4. Unit-Sinh-Normal Distribution

Now, we introduce the SHN model with support on interval

(0, 1)

, which is denominated by the unit-sinh-normal model, and it is denoted by

Y \sim USHN (α, γ, σ)

. The pdf is given by:

\begin{matrix} f_{USHN} (y; α, γ, σ) = & \frac{1}{(1 - y) log {(1 - y)}^{- 1}} \frac{2}{σ α} cosh (\frac{log (- log (1 - y)) - γ}{σ}) \\ \times ϕ (\frac{2}{α} sinh (\frac{log (- log (1 - y)) - γ}{σ})), \end{matrix}

(16)

where

y \in (0, 1)

,

α > 0

is a shape parameter,

γ \in R

is a location parameter, and

σ > 0

is a scale parameter. It can be seen in the complement of the sinh and cosh functions that the density function in (16) is defined on the log-log complementary transformation, which is widely used in generalized linear models. Although the density (16) could be defined from the log-log link function, we used the log-log complement link function. Note that, if

y \in (0, 1)

, then

(1 - y) \in (0, 1)

, and the simple transformation

Z = 1 - Y

leads to the model with the log-log link function. Figure 3 displays some plots of the pdf of the USHN distribution for some selected values of the parameters. The plots reveal that the USHN density is unimodal for

α \leq 2

(see Figure 3a), and the density function is bimodal for

α > 2

(see Figure 3b). One of the advantages of the USHN distribution is that it can be used for modeling data sets of proportions and rates with bimodal behaviors.

4.1. Distribution Function, Survival Function, and Hazard Function of the USHN Model

Is easy to see that the corresponding cdf of the random variable

Y \sim USHN (α, γ, σ)

is given by:

\begin{matrix} F_{USHN} (y) & = & F_{SHN} (log (- log (1 - y))) \\ = & Φ (\frac{2}{α} sinh (\frac{log (- log (1 - y)) - γ}{σ})), \end{matrix}

(17)

where

F_{SHN} (\cdot)

is the cdf of the

SHN (α, γ, σ)

distribution. The survival function

S_{USHN} (t)

and hazard function

r_{USHN} (t)

are given by:

\begin{matrix} S_{USHN} (t) & = & 1 - F_{SHN} (log (- log (1 - t))) \\ = & 1 - Φ (\frac{2}{α} sinh (\frac{log (- log (1 - t)) - γ}{σ})) \\ = & S_{SHN} (log (- log (1 - t))) \end{matrix}

and:

\begin{matrix} r_{USHN} (t) & = \frac{f_{USHN} (t)}{1 - Φ (\frac{2}{α} sinh (\frac{log (- log (1 - t)) - γ}{σ}))} \\ = r_{SHN} (log (- log (1 - t))) \end{matrix}

respectively, where

S_{SHN} (\cdot)

and

r_{SHN} (\cdot)

are the survival function and hazard function of the SHN model, respectively. From (17), it is concluded that:

Z = \frac{2}{α} sinh (\frac{log (- log (1 - Y)) - γ}{σ}) \sim N (0, 1),

which implies that

log (- log (1 - Y)) \sim SHN (α, γ, σ)

. Figure 4 shows the behavior of the hazard function of a USHN random variable for some selected values of the parameters. The graphs reveal that the hazard function is increasing up to a certain value and then is decreasing to zero.

One can see that a random variable Y following a

USHN (α, γ, σ)

distribution can be generated by using the expression:

Y = 1 - e^{- e^{γ + σ {sinh}^{- 1} (\frac{α}{2} Φ^{- 1} (U))}},

where

U \sim U (0, 1)

denotes the uniform distribution on the

(0, 1)

interval and

Φ^{- 1} (\cdot)

refers to the inverse of the cdf of the standard normal distribution.

4.2. Moments of the USHN Model

The r-th moment of a random variable Y following a

USHN (α, γ, σ)

distribution is given by:

\begin{matrix} E (Y^{r}) & = & E [{(1 - e^{- X})}^{r}] \\ = & \sum_{j = 0}^{r} (\binom{r}{j}) {(- 1)}^{j} E (e^{- j X}) \end{matrix}

where

X \sim LSHN (α, γ, σ)

. Using the Taylor expansion for

e^{- j X}

and from the r-th moment of the

LSHN (α, γ, σ)

distribution, it follows that:

\begin{matrix} E (Y^{r}) & = & \sum_{j = 0}^{r} \sum_{l = 0}^{\infty} (\binom{r}{j}) \frac{{(- 1)}^{j + l} {(- j e^{γ})}^{l}}{l!} [\frac{k_{a_{1}} (α^{- 2}) + k_{b_{1}} (α^{- 2})}{k_{1 / 2} (α^{- 2})}] \end{matrix}

where

a_{1} = \frac{l σ + 1}{2}

and

b_{1} = \frac{l σ - 1}{2}

, with

k_{a} (\cdot)

the third-order Besser function defined in (7).

4.3. Cumulant-Generating Function and Mode

The mgf of the USHN model can be obtained by using:

M_{Y} (t) = E (e^{t Y}) = E (e^{t (1 - e^{- X})}) = e^{t} E (e^{- t e^{- X}}) = e^{- r} E (e^{r Z}) = e^{- r} M_{Z} (r),

where

Z \sim UBS (α, β)

and

r = - t

, with

M_{Z} (r)

being the mgf of the UBS distribution. Thus,

K_{Y} (r) = - r + K_{Z} (r), r > 0,

where

K_{Z} (r)

is the cgf of the UBS distribution. To find the mode of the USHN distribution, we reasoned in the same way as in the LSHN model. Then, let

ξ_{1}^{*} = \frac{2}{α} cosh (\frac{log (- log (1 - Y)) - γ}{σ})

and

ξ_{2}^{*} = \frac{2}{α} sinh (\frac{log (- log (1 - Y)) - γ}{σ})

. Deriving the logarithm of the pdf of the USHN distribution, substituting

ξ_{1}^{*}

and

ξ_{2}^{*}

, and equaling to zero, we obtain the non-linear equation:

ξ_{1}^{*} (ξ_{1}^{*} ξ_{2}^{*} - σ (- log (1 - Y) - 1))) = ξ_{2}^{*} .

Solving this non-linear equation, the mode(s) of the USHN distribution is (are) found.

4.4. Asymptotic Distribution

Let

Y \sim USHN

. One can prove that random variable

\frac{log (- log (1 - Y)) - γ}{α σ / 2}

converges to a normal distribution when

α

tends to zero. Thus, if

Y \sim USHN (α, γ, σ)

, then:

Z = \frac{2}{α} sinh (\frac{log (- log (1 - Y)) - γ}{σ}) \sim N (0, 1) .

It follows from the result above that:

Y = 1 - exp (- exp (γ + σ {sinh}^{- 1} (\frac{α Z}{2}))) \sim USHN (α, γ, σ) .

4.5. The LUSHN Regression Model

Now, we introduce the LUSHN linear regression model. We considered a set of p explanatory variables, which are denoted by

x_{i} = {(x_{i 1}, \dots, x_{i p})}^{⊤}

, and a p-dimensional vector of unknown parameters

θ = {(θ_{1}, \dots, θ_{p})}^{⊤}

, such that, for

i = 1, \dots, n

, it follows the functional relationship:

log (- log (1 - Y_{i})) = x_{i}^{⊤} θ + ε_{i},

(18)

where

ε_{i} \sim SHN (α, 0, σ)

. From (18), we have that,

Z_{i} = log (- log (1 - Y_{i})) \sim SHN (α, x_{i}^{⊤} θ, σ);

hence,

E (Z_{i}) = x_{i}^{⊤} θ .

Thus,

{\hat{Z}}_{i} = x_{i}^{⊤} \hat{θ},

and therefore,

\begin{matrix} {\hat{Y}}_{i} & = & 1 - exp (- exp ({\hat{Z}}_{i})) \\ = & 1 - exp (- exp (x_{i}^{⊤} \hat{θ})) . \end{matrix}

(19)

To obtain the estimates of the model parameters, we considered the maximum likelihood method as in the LSHN regression model. Thus, given a random sample of size n, say

Y = (Y_{1}, \dots, Y_{n})

, the log-likelihood function for the parameter vector

ρ = {(θ^{⊤}, α, σ)}^{⊤}

is given by:

\begin{matrix} ℓ (ρ; Y) = & - n log (σ) - \sum_{i = 1}^{n} log (1 - Y_{i}) - \sum_{i = 1}^{n} log (- log (1 - Y_{i})) \\ + \sum_{i = 1}^{n} log (ξ_{i 1}) - \frac{1}{2} \sum_{i = 1}^{n} ξ_{i 2}^{2}, \end{matrix}

where:

ξ_{i 1} = \frac{2}{α} cosh (\frac{log (- log (1 - Y_{i})) - x_{i}^{⊤} θ}{σ}), and ξ_{i 2} = \frac{2}{α} sinh (\frac{log (- log (1 - Y_{i})) - x_{i}^{⊤} θ}{σ}),

for

i = 1, \dots, n

.

The score function and the observed information matrix of the LUSHN regression model have the same form as the respective expressions of the LSHN regression model by substituting

log (Y_{i})

by

log (- log (1 - Y_{i}))

and by defining:

z_{i} = \frac{log (- log (1 - Y_{i})) - x_{i}^{⊤} θ}{σ}

for

i = 1, \dots, n

. The MLEs for

θ

,

α

, and

σ

, are the solutions to the equations

U (θ_{j}) = 0 (j = 1, \dots, p)

,

U (α) = 0

and

U (σ) = 0

, which require a numerical method such as the Newton–Raphson or quasi-Newton.

5. Simulation Study

To analyze the behavior of the estimators of the parameters in the LSHN regression model, we carried out a small Monte Carlo simulation study. To generate the random variable USHN, we applied the described algorithm in this paper. In this simulation study, we analyzed the behavior of the estimators of the model parameters:

log (- log (1 - Y_{i})) = θ_{0} + θ_{1} X_{i} + ε_{i}, i = 1, 2, \dots, n .

(20)

where

ε_{i} \sim SHN (α, 0, σ)

. The values of the explanatory variable X were taken from a uniform random variable on the (0,1) interval, that is

X_{i} \sim U (0, 1)

. Without loss of generality, we took the value of the scale parameter equal to

σ = 1.0

; however, the following results can be obtained for any value of the scale parameter from the simple transformation

ε_{i} = σ ν_{i}

with

ν_{i} \sim SHN (α, 0, 1)

. The values of shape parameter were taken as

α

= 0.50, 0.75, 1.25, 1.75, 2.25, 2.75 to take into account different configurations in the form of the pdf of the random variable

ε_{i}

. On the other hand, since the coefficients

θ_{i}

,

i = 0, 1

in the model (18) can be any number in the set of real numbers and there are no restrictions on the values that can be assumed, we took the particular values

θ_{0} = 0.75

and

θ_{1} = 0.25

. To analyze some statistical measures of the maximum likelihood estimator (MLE), we considered small, moderate, and large sample sizes:

n = 10, 25, 50, 75, 100, 200, 500

, and 5000 iterations were performed for each scenario. The studied characteristics were: the relative bias (RB), the root of the mean squared error (RMSE), and the ratio between the standard deviation (SD) of the estimate and the average SD (RSD). Finally, we examined the coverage probability (CP) of the 95% confidence interval based on the asymptotic normality of the ML estimators.

Table 1 and Table 2 present the results of the simulation study. It can be observed that the RB and RMSE of the MLEs tend to decrease when the sample size increases, which guarantees the unbiasedness and asymptotic consistency of the MLE. It is also observed that, for small sample sizes, important biases are obtained in the estimates of

α

and

θ_{1}

. It is also observed that, for small sample sizes, important biases are obtained in the estimates of

α

and

θ_{1}

. Another interesting aspect to take into account is that, for values less than one of the

α

parameter, the bias of

σ

is quite considerable for small sample sizes; however, this bias is quite negligible for values of

α

above of one.

Regarding the coverage rates of the confidence intervals (CP), the simulation results showed that these were higher than 95% for the parameter

α

in all of the considered sample sizes. For the scale parameter

σ

, the CPs were low when there were small sample sizes (less than 50), and they tended to increase when the sample size increased to around 90 %. It was also observed that the CPs for the coefficients

θ_{0}

and

θ_{1}

were close to 95 % for moderate and large sample sizes (greater than 75).

6. Applications

To illustrate the potentiality of the proposed distributions, we considered two data sets of real-life examples taken from the literature. The first data set was an example of positive data called fatigue data in hardened steel. The second data set corresponded to data of the body fat data in athletes of the Australian Institute of Sport (AIS), which is an example of the observations on the

(0, 1)

interval.

6.1. Fatigue Data

This data set consisted of failure times

(T)

in rolling contact fatigue of ten hardened steel specimens tested at each of the four values of four contact stress points, X. The data were obtained using a four-ball rolling contact test rig at the Princeton Laboratories of Mobil Research and Development Co. This data set was analyzed by Chan et al. [22] by considering the regression model:

log (T_{i}) = θ_{1} + θ_{2} log (X_{i}) + ε_{i}, i = 1, \dots, 40 .

We considered that the positive response variable T followed the distribution:

T_{i} \sim LSHN (α, θ_{1} + θ_{2} log (X_{i}), σ) .

For this data set, we fit the log-BS (LBS) model, the log-skewed BS (LSBS) of Lemonte [23], and the proposed LSHN distribution. The MLEs of the parameters of the fitted models are given in Table 3.

To compare the fitted models, we used the AIC and BIC criteria, which are given by:

AIC = - 2 ℓ (\hat{θ}) + 2 p and AIC = - 2 ℓ (\hat{θ}) + p lg (n),

where p is the number of parameters of the model in question and n is the sample size. The best model is the one with the smallest AIC or BIC. According to AIC and BIC criteria in the table, we can see that the asymmetric models LSBS and LSHN fit better than the LBS model, that is the data present a larger degree of asymmetry than allowed by the BS model. We can conclude that the regression model with the LSHN error distribution provides a better fit than the regression model with the LSBS error distribution.

The significance of the variable

log (x_{i})

on the response variable

T_{i}

can be tested through the Wald statistic,

{\hat{θ}}_{2} / se ({\hat{θ}}_{2})

, which gives the value

- 12.618 / 1.371 = - 9.203501

with the respective p-value

< 0.0001

, in such a way that the logarithm of contact stress points affects the failure time of the hardened steel.

We recall that if

Y_{i} \sim LSHN (α, x_{i}^{⊤} θ, σ)

, then:

Z_{i} = \frac{2}{α} sinh (\frac{log (Y_{i}) - x_{i}^{⊤} θ}{σ}) \sim N (0, 1) .

for

i = 1, \dots, n

. Figure 5b plots the envelope of the random variable

Z_{i}

. The plot reveals that the LSHN regression model presents a good fit to the Fatigue data. The plot in Figure 5a depicts the envelope for the log-BS model regression.

6.2. Body Fat Data

We considered the data set included in the library sn of R Development Core Team [24] available for download at http://azzalini.stat.unipd.it/SN/index.html (accessed on 20 March 2021). We considered only the data of 37 rowing athletes in the AIS dataset. We were interested in the prediction of the body fat percentage (Bfat) of each athlete by considering their lean body mass (lbm). For the analysis, we considered the random variable:

Y_{i} \sim LUSHN (α, θ_{1} + θ_{2} X_{i}, σ) .

where

Y_{i}

is the body fat percentage of the i-th athlete for

i = 1, \dots, 37

. We also fit the beta regression model with logit link and the natural logarithm link to model the dispersion parameter. The MLEs of the parameters and their corresponding standard errors (in parenthesis) were: for the beta regression model,

{\hat{θ}}_{1} = 0.3262 (0.259)

,

{\hat{θ}}_{2} = - 0.0313 (0.003)

, and

\hat{σ} = 4.7027 (0.232)

with

A I C = - 136.56

and

B I C = - 131.73

; for the LUSHN regression model,

\hat{α} = 0.0576 (0.025)

,

{\hat{θ}}_{1} = 0.2569 (0.224)

,

{\hat{θ}}_{2} = - 0.0311 (0.003)

, and

\hat{σ} = 8.3982 (3.651)

, with

A I C = - 138.994

and

B I C = - 132.550

. According to AIC and BIC criteria, the better model was the non-negative SHN. Figure 6 plots the envelope of the variable

Z_{i} = \frac{2}{α} sinh (\frac{log (- log (Y_{i})) - θ_{1} - θ_{2} X_{i}}{σ}) \sim N (0, 1)

in which we can see that the LUSHN regression model presents a good fit to the body fat data.

7. Conclusions

In this paper, two new families of bimodal distributions were introduced. The new families were generated by applying transformations to the unit-Birnbaum–Saunders and were very useful alternatives for modeling data limited on the interval (0,1) or with positive support, due to their flexibility to fit data with a high degree of asymmetry and/or kurtosis. The main statistical properties of the families and the problem of the parameter estimation were studied in detail by using the maximum likelihood method. The observed and expected information matrix for the family was also deduced. A small Monte Carlo simulation was carried out, showing that the maximum likelihood estimators had good asymptotic properties for moderate and large sample sizes. Extensions to regression models were also presented based on the new family of distribution. Furthermore, we showed that such families of distributions can be useful to fit better to real data sets, especially when the variables are considered to explain the response variable in a regression model.

Author Contributions

All authors contributed equally to this work. All authors read and agreed to the published version of the manuscript.

Funding

The research of R.T.-F. and G.M.-F. were supported by project: Resolución de Problemas de Situaciones Reales Usando Análisis Estadístico a través del Modelamiento Multidimensional de Tasas y Proporciones; Esquemas de Monitoreamiento para Datos Asimétricos no Normales y una Estrategia Didáctica para el Desarrollo del Pensamiento Lógico-Matemático. Universidad de Córdoba, Colombia, Code FCB-05-19.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Details about data available are given in Section 6.

Acknowledgments

G.M.-F. and R.T.-F. acknowledges the support given by Universidad de Córdoba, Montería, Colombia.

Conflicts of Interest

The authors declare no conflict of interes.

Appendix A. Related Theorems

Theorem A1.

Let

T \sim B S (α, β)

. Then,

Y = log (T) \sim S H N (α, γ, σ = 2)

, where

γ = log (β)

.

Proof.

The density of a Birnbaum–Saunders distribution is:

\begin{matrix} f_{T} (t) & = \frac{t^{- 3 / 2} (t + β)}{2 α \sqrt{β}} ϕ (a_{t}) \\ = \frac{exp (α^{- 2})}{2 α {(2 π β)}^{1 / 2}} [t^{- 3 / 2} (t + β)] exp [- \frac{1}{2 α^{2}} (\frac{t}{β} - \frac{β}{t})] \end{matrix}

where

ϕ (\cdot)

is the pdf of the normal distribution and

a_{t} = \frac{1}{α} (\sqrt{t / β} - \sqrt{β / t})

.

Letting

Y = log (T)

, then

T = e^{Y} = k^{- 1} (y)

, and by applying the theorem for the transformation of random variables, it follows that:

\begin{matrix} f_{Y} (y) & = f_{T} (k^{- 1} (y)) |\frac{d}{d y} k^{- 1} (y)| \\ = \frac{exp (α^{- 2})}{2 α {(2 π β)}^{1 / 2}} [exp {(y)}^{- 3 / 2} (exp (y) + β)] exp [- \frac{1}{2 α^{2}} (\frac{exp (y)}{β} - \frac{β}{exp (y)})] exp (y) \\ = {(2 α \sqrt{2 π})}^{- 1} β^{- 1} {[exp (y)]}^{- 1 / 2} (exp (y) + β) \\ exp [- \frac{α^{- 2}}{2} (exp (y - log β) + exp (- (y - log β)))] exp (α^{- 2}) \\ = {(2 α \sqrt{2 π})}^{- 1} \{{[β^{- 1} exp (y)]}^{1 / 2} + {[β {(exp (y))}^{- 1}]}^{1 / 2}\} \\ exp [- 2 α^{- 2} (\frac{exp (y - log β)}{4} + \frac{exp (- (y - log β))}{4}) + α^{- 2}] \\ = {(2 α \sqrt{2 π})}^{- 1} [exp {(y - log β)}^{1 / 2} + exp {(- (y - log β))}^{1 / 2}] \\ exp [- 2 α^{- 2} (\frac{exp (y - log β)}{4} + \frac{exp (- (y - log β))}{4} - \frac{1}{2})] \\ = {(2 α \sqrt{2 π})}^{- 1} [exp (\frac{y - log β}{2}) + exp (- \frac{y - log β}{2})] \\ exp [- 2 α^{- 2} {(\frac{exp {(y - log β)}^{1 / 2}}{2} - \frac{exp {(- (y - log β))}^{1 / 2}}{2})}^{2}] \\ = {(2 α \sqrt{2 π})}^{- 1} [2 cosh (\frac{y - log β}{2})] exp \{- 2 α^{- 2} {[\frac{exp (\frac{y - log β}{2}) - exp (- \frac{y - log β}{2})}{2}]}^{2}\} \\ = 2 {(2 α \sqrt{2 π})}^{- 1} [cosh (\frac{y - log β}{2})] exp \{- 2 α^{- 2} {[sinh (\frac{y - log β}{2})]}^{2}\} \end{matrix}

Thus,

Y \sim SHN (α, γ = log β, σ = 2)

. □

Proposition A1.

The density function in Equation (5) integrates to one.

Proof.

We considered the density function

f (y)

such as:

\begin{matrix} f (y) = \frac{2}{α σ y} cosh (\frac{log (y) - γ}{σ}) ϕ (\frac{2}{α} sinh (\frac{log (y) - γ}{σ})), y > 0 \end{matrix}

(A1)

and we let:

u = \frac{2}{α} sinh (\frac{log (y) - γ}{σ}) ⟹ d u = - \frac{2}{α σ y} cosh (\frac{log (y) - γ}{σ}) d y,

then,

\begin{matrix} \int_{0}^{\infty} f (y) d y & = \int_{0}^{\infty} \frac{2}{α σ y} cosh (\frac{log (y) - γ}{σ}) ϕ (\frac{2}{α} sinh (\frac{log (y) - γ}{σ})) d y \\ = \int_{- \infty}^{\infty} ϕ (u) d u \\ = 1 \end{matrix}

□

References

Birnbaum, Z.W.; Saunders, S.C. A new family of life distributions. J. Appl. Prob. 1969, 6, 319–327. [Google Scholar] [CrossRef]
Castillo, N.O.; Gómez, H.W.; Bolfarine, H. Epsilon Birnbaum–Saunders distribution family: Properties and inference. Stat. Pap. 2011, 52, 871–883. [Google Scholar] [CrossRef]
Vilca-Labra, F.; Leiva-Sánchez, V. A new fatigue life model based on the family of skew-elliptical distributions. Commun. Stat. Theory Methods 2006, 35, 229–244. [Google Scholar] [CrossRef]
Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. An alpha-power extension for the Birnbaum–Saunders distribution. Stat. Am. J. Theor. Appl. Stat. 2014, 48, 896–912. [Google Scholar]
Moreno-Arenas, G.; Martínez-Flórez, G.; Barrera-Causil, C. Proportional Hazard Birnbaum–Saunders distribution with application to the survival data analysis. Rev. Colomb. Estad. 2016, 39, 129–147. [Google Scholar] [CrossRef]
Rieck, J.R.; Nedelman, J.R. A log-linear model for the Birnbaum–Saunders distribution. Technometrics 1991, 33, 51–60. [Google Scholar]
Santos, J.; Cribari-Neto, F. Hypothesis testing in log-Birnbaum–Saunders regressions. Commun. Stat. Simul. Comput. 2017, 46, 3990–4003. [Google Scholar] [CrossRef]
Balakrishnan, N.; Zhu, X. Inference for the Birnbaum–Saunders Lifetime Regression Model with Applications. Commun. Stat. Simul. Comput. 2015, 48, 2073–2100. [Google Scholar] [CrossRef]
Barros, M.; Paula, G.A.; Leiva, V. A new class of survival regression models with heavy-tailed errors: Robustness and diagnostics. Lifetime Data Anal. 2008, 14, 316–332. [Google Scholar] [CrossRef]
Leiva, V.; Vilca-Labra, F.; Balakrishnan, N.; Sanhueza, A. A skewed sinh-normal distribution and its properties and application to air pollution. Commun. Stat. Theory Methods 2010, 39, 426–443. [Google Scholar] [CrossRef]
Santana, L.; Vilca, F.; Leiva, V. Influence analysis in skew-Birnbaum–Saunders regression models and applications. J. Appl. Stat. 2011, 38, 1633–1649. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.; Dey, S. The unit-Birnbaum–Saunders distribution with applications. Chil. J. Stat. 2018, 9, 47–57. [Google Scholar]
Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Power-models for proportions with zero/one excess. Appl. Math. Inf. Sci. 2018, 12, 293–303. [Google Scholar] [CrossRef]
Ospina, R.; Cribari-Neto, F.; Vasconcellos, K.L.P. Improved point and interval estimation for a beta regression model. Comput. Stat. Data Anal. 2006, 51, 960–981. [Google Scholar] [CrossRef]
Simas, A.B.; Barreto-Souza, W.; Rocha, A.V. Improved estimators for a general class of beta regression models. Comput. Statist. Data Anal. 2010, 54, 348–366. [Google Scholar] [CrossRef] [Green Version]
Rocha, A.V.; Simas, A.B. Influence diagnostics in a general class of beta regression models. Test 2011, 20, 95–119. [Google Scholar] [CrossRef]
Cribari-Neto, F.; Souza, T.C. Testing inference in variable dispersion beta regressions. J. Stat. Comput. Sim. 2012, 82, 1827–1843. [Google Scholar] [CrossRef]
Ghosh, A. Robust inference under the beta regression model with application to health care studies. Stat. Methods Med. Res. 2019, 28, 871–888. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kim, J.M.; Baik, J.; Reller, M. Control charts of mean and variance using copula Markov SPC and conditional distribution by copula. Commun. Stat. Simul. Comput. 2021, 50, 85–102. [Google Scholar] [CrossRef]
Rieck, J.R. Statistical Analysis for the Birnbaum–Saunders Fatigue Life Distribution. Ph.D. Thesis, Department of Mathematical Sciences, Clemson University, Clemson, SC, USA, 1989. [Google Scholar]
Mazucheli, J.; Leiva, V.; Alves, B.; Menezes, A.F.B. A new quantile Regression for modeling bounded data under a unit Birnbaum–Saunders distribution with applications in medicine and politics. Symmetry 2021, 13, 682. [Google Scholar] [CrossRef]
Chan, P.S.; Ng, H.K.T.; Balakrishnan, N.; Zhou, Q. Point and interval estimation for extreme-value regression model under Type-II censoring. Comput. Stat. Data Anal. 2012, 52, 4040–4058. [Google Scholar] [CrossRef]
Lemonte, A.J. A log-Birnbaum–Saunders regression model with asymmetric errors. J. Stat. Comput. Simul. 2012, 82, 1775–1787. [Google Scholar] [CrossRef] [Green Version]
R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018; Available online: http://www.R-project.org (accessed on 22 February 2021).

Figure 1. Probability density function of the

LSHN (α, 0.5, 0.25)

distribution for: (a)

α = 2

(solid line),

α = 1.0

(dashed line), and

α = 0.75

(dotted line); (b)

α = 6.5

(solid line),

α = 4.5

(dashed line), and

α = 2.5

(dotted line).

Figure 1. Probability density function of the

LSHN (α, 0.5, 0.25)

distribution for: (a)

α = 2

(solid line),

α = 1.0

(dashed line), and

α = 0.75

(dotted line); (b)

α = 6.5

(solid line),

α = 4.5

(dashed line), and

α = 2.5

(dotted line).

Figure 2. Hazard function of the

LSHN (α, 0.5, 0.25)

distribution for: (a)

α = 2

(solid line),

α = 1.0

(dashed line), and

α = 0.75

(dotted line); (b)

α = 6.5

(solid line),

α = 4.5

(dashed line), and

α = 2.5

(dotted line).

Figure 2. Hazard function of the

LSHN (α, 0.5, 0.25)

distribution for: (a)

α = 2

(solid line),

α = 1.0

(dashed line), and

α = 0.75

(dotted line); (b)

α = 6.5

(solid line),

α = 4.5

(dashed line), and

α = 2.5

(dotted line).

Figure 3. Probability density function of the

USHN (α, 0.15, 0.5)

distribution for: (a)

α = 2

(solid line),

α = 1.25

(dashed line), and

α = 0.75

(dotted line); (b)

α = 4.5

(solid line),

α = 3.5

(dashed line), and

α = 2.5

(dotted line).

Figure 3. Probability density function of the

USHN (α, 0.15, 0.5)

distribution for: (a)

α = 2

(solid line),

α = 1.25

(dashed line), and

α = 0.75

(dotted line); (b)

α = 4.5

(solid line),

α = 3.5

(dashed line), and

α = 2.5

(dotted line).

Figure 4. Hazard function of the

USHN (α, 0.15, 0.5)

distribution for: (a)

α = 2

(solid line),

α = 1.25

(dashed line), and

α = 0.75

(dotted line); (b)

α = 4.5

(solid line),

α = 3.5

(dashed line), and

α = 2.5

(dotted line).

Figure 4. Hazard function of the

USHN (α, 0.15, 0.5)

distribution for: (a)

α = 2

(solid line),

α = 1.25

(dashed line), and

α = 0.75

(dotted line); (b)

α = 4.5

(solid line),

α = 3.5

(dashed line), and

α = 2.5

(dotted line).

Figure 5. Envelopes of the residuals for: (a) the LBS distribution and (b) the LSHN distribution.

Figure 6. Envelope of the residuals for the LUSHN regression model.

Table 1. Empirical relative bias (RB), root of the mean squared error (RMSE), ratio between the standard deviation of the estimate and the average standard deviation (RSD), and coverage probability (CP) of the 95% confidence interval for the MLEs of the

α

and

σ

in the LUBS model.

Table 1. Empirical relative bias (RB), root of the mean squared error (RMSE), ratio between the standard deviation of the estimate and the average standard deviation (RSD), and coverage probability (CP) of the 95% confidence interval for the MLEs of the

α

and

σ

in the LUBS model.

		$\hat{α}$				$\hat{σ}$
$α$	n	RB	RMSE	RSD	CP	RB	RMSE	RSD	CP
0.50	10	6.956	5.910	0.989	99.98	−0.552	0.695	0.192	40.88
	25	2.789	2.052	0.904	100.0	−0.496	0.614	0.356	46.26
	50	1.446	1.038	0.783	97.16	−0.391	0.551	0.441	55.46
	75	1.063	0.766	0.756	94.54	−0.334	0.492	0.447	60.00
	100	0.868	0.632	0.748	93.66	−0.301	0.456	0.466	63.42
	200	0.513	0.400	0.724	92.76	−0.198	0.402	0.531	70.58
	500	0.239	0.240	0.759	93.70	−0.086	0.330	0.606	79.22
0.75	10	7.166	8.224	0.935	99.96	−0.608	0.716	0.436	27.78
	25	2.312	2.546	0.992	100 0.0	−0.444	0.575	0.487	43.60
	50	1.045	1.193	0.911	97.34	−0.288	0.489	0.528	57.74
	75	0.679	0.814	0.868	96.06	−0.211	0.443	0.553	65.68
	100	0.479	0.623	0.841	95.48	−0.148	0.415	0.569	71.48
	200	0.237	0.384	0.841	95.26	−0.058	0.368	0.671	79.38
	500	0.080	0.231	0.896	95.68	0.006	0.291	0.812	86.68
1.25	10	5.736	10.877	0.900	100.0	−0.585	0.655	0.684	24.16
	25	1.493	2.854	1.009	100.0	−0.326	0.495	0.662	50.20
	50	0.581	1.263	0.976	99.42	−0.147	0.423	0.705	67.20
	75	0.316	0.833	0.965	98.06	−0.067	0.375	0.745	76.40
	100	0.226	0.668	0.970	97.74	−0.037	0.352	0.813	80.24
	200	0.090	0.415	0.989	97.14	0.003	0.280	0.957	86.68
	500	0.032	0.249	1.011	95.24	0.002	0.164	1.023	90.60
1.75	10	4.809	13.687	0.967	100.0	−0.526	0.598	0.781	26.72
	25	1.160	3.313	1.064	100.0	−0.255	0.436	0.784	56.12
	50	0.407	1.380	1.007	99.92	−0.089	0.386	0.873	73.86
	75	0.230	0.964	1.033	99.04	−0.038	0.330	0.944	80.50
	100	0.169	0.782	1.039	98.28	−0.030	0.285	1.005	83.04
	200	0.076	0.477	1.018	95.98	−0.014	0.187	1.037	88.46
	500	0.028	0.276	1.003	95.80	−0.006	0.109	1.011	92.36
2.25	10	4.371	15.744	0.905	100.0	−0.484	0.562	0.858	29.96
	25	1.004	3.813	1.082	100.0	−0.206	0.422	0.887	61.10
	50	0.361	1.663	1.071	99.94	−0.074	0.342	0.998	76.34
	75	0.208	1.122	1.048	98.20	−0.046	0.272	1.053	82.56
	100	0.130	0.858	1.025	96.94	−0.025	0.226	1.049	86.28
	200	0.065	0.539	1.004	96.24	−0.018	0.143	1.012	89.76
	500	0.026	0.330	1.04	94.88	−0.007	0.090	1.038	92.28
2.75	10	4.028	18.459	0.934	100.0	−0.444	0.531	0.891	34.52
	25	0.900	4.205	1.051	100.0	−0.177	0.398	0.970	64.02
	50	0.327	1.882	1.072	98.60	−0.075	0.277	1.031	78.30
	75	0.187	1.273	1.046	97.28	−0.042	0.225	1.055	84.56
	100	0.137	1.009	1.025	96.60	−0.035	0.187	1.052	87.18
	200	0.066	0.632	1.013	96.36	−0.021	0.122	1.009	91.08
	500	0.027	0.368	0.999	95.98	−0.009	0.076	1.006	93.40

Table 2. Empirical relative bias (RB), root of the mean squared error (RMSE), ratio between the standard deviation of the estimate and the average standard deviation (RSD), and coverage probability (CP) of the 95% confidence interval for the MLEs of the

θ_{0}

and

θ_{1}

in the LUBS model.

Table 2. Empirical relative bias (RB), root of the mean squared error (RMSE), ratio between the standard deviation of the estimate and the average standard deviation (RSD), and coverage probability (CP) of the 95% confidence interval for the MLEs of the

θ_{0}

and

θ_{1}

in the LUBS model.

		${\hat{θ}}_{0}$				${\hat{θ}}_{1}$
$α$	n	RB	RMSE	RSD	CP	RB	RMSE	CSD	CP
0.50	10	−0.022	0.186	1.232	81.92	0.201	0.331	1.234	81.74
	25	−0.011	0.106	1.121	89.08	0.065	0.184	1.107	89.16
	50	−0.006	0.073	1.069	91.80	0.029	0.126	1.063	92.26
	75	−0.001	0.058	1.048	93.38	0.002	0.101	1.050	93.20
	100	−0.002	0.050	1.043	93.34	0.007	0.086	1.026	94.10
	200	−0.001	0.035	1.018	94.14	0.000	0.060	1.016	94.50
	500	0.000	0.022	1.008	94.48	0.001	0.038	1.016	94.64
0.75	10	−0.002	0.266	1.444	75.54	0.113	0.473	1.432	77.38
	25	−0.006	0.156	1.222	84.92	0.036	0.272	1.207	85.72
	50	0.000	0.106	1.109	90.56	0.000	0.183	1.106	90.74
	75	−0.002	0.085	1.076	92.18	0.011	0.149	1.087	91.88
	100	−0.002	0.071	1.033	93.44	0.003	0.124	1.037	93.70
	200	0.001	0.050	1.011	94.80	−0.004	0.087	1.020	94.56
	500	0.000	0.031	1.004	94.86	−0.001	0.054	0.999	94.88
1.25	10	−0.011	0.405	1.717	68.84	0.018	0.712	1.655	70.64
	25	0.003	0.241	1.299	83.14	−0.019	0.421	1.291	83.76
	50	0.002	0.158	1.131	89.62	−0.011	0.276	1.135	89.92
	75	0.002	0.126	1.080	91.74	−0.015	0.219	1.082	92.06
	100	0.001	0.107	1.050	93.22	−0.006	0.185	1.046	93.30
	200	0.000	0.073	1.012	94.60	−0.004	0.128	1.017	94.38
	500	0.001	0.047	1.014	94.58	−0.011	0.081	1.011	94.46
1.75	10	0.012	0.519	1.805	66.92	−0.008	0.925	1.752	67.98
	25	0.005	0.298	1.321	83.64	−0.063	0.518	1.304	83.42
	50	0.001	0.190	1.124	90.26	0.000	0.335	1.133	90.14
	75	0.005	0.153	1.091	91.64	−0.02	0.264	1.077	92.06
	100	0.000	0.130	1.062	92.66	0.000	0.225	1.057	92.76
	200	0.001	0.091	1.040	93.62	−0.003	0.156	1.030	94.10
	500	0.000	0.056	1.006	94.90	0.001	0.096	1.006	95.00
2.25	10	0.005	0.597	1.867	66.88	−0.019	1.044	1.791	68.74
	25	0.000	0.327	1.313	82.96	0.004	0.568	1.284	84.32
	50	−0.007	0.210	1.124	90.60	0.038	0.366	1.118	90.50
	75	−0.004	0.168	1.089	91.80	0.031	0.293	1.089	92.00
	100	0.000	0.144	1.070	92.52	−0.006	0.249	1.063	93.06
	200	−0.002	0.100	1.047	93.78	0.012	0.173	1.044	93.28
	500	0.000	0.062	1.017	94.94	0.000	0.106	1.008	94.58
2.75	10	0.002	0.654	1.879	67.68	−0.017	1.152	1.800	69.18
	25	0.000	0.338	1.257	85.32	−0.031	0.601	1.265	85.52
	50	−0.005	0.225	1.133	90.16	0.037	0.392	1.125	90.48
	75	−0.003	0.179	1.092	92.34	0.025	0.314	1.100	92.24
	100	0.003	0.150	1.059	93.16	−0.023	0.258	1.049	93.20
	200	0.000	0.101	1.010	94.44	−0.009	0.176	1.010	94.86
	500	0.000	0.064	1.013	94.78	0.004	0.112	1.014	94.60

Table 3. MLE (standard error) for the LBS, LSBS, and LSHN models.

Parameters	LBS	LSBS	LSHN
$α$	1.279(0.143)	2.011(0.313)	0.228(0.076)
$θ_{1}$	0.097(0.170)	−0.961(0.166)	0.296(0.159)
$θ_{2}$	−14.116(1.571)	−13.870(1.602)	−12.618(1.371)
$λ / σ$		−0.932(0.174)	8.675(2.933)
AIC	129.235	125.360	120.099
BIC	134.296	132.115	126.854

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Martínez-Flórez, G.; Tovar-Falón, R. New Regression Models Based on the Unit-Sinh-Normal Distribution: Properties, Inference, and Applications. Mathematics 2021, 9, 1231. https://doi.org/10.3390/math9111231

AMA Style

Martínez-Flórez G, Tovar-Falón R. New Regression Models Based on the Unit-Sinh-Normal Distribution: Properties, Inference, and Applications. Mathematics. 2021; 9(11):1231. https://doi.org/10.3390/math9111231

Chicago/Turabian Style

Martínez-Flórez, Guillermo, and Roger Tovar-Falón. 2021. "New Regression Models Based on the Unit-Sinh-Normal Distribution: Properties, Inference, and Applications" Mathematics 9, no. 11: 1231. https://doi.org/10.3390/math9111231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Regression Models Based on the Unit-Sinh-Normal Distribution: Properties, Inference, and Applications

Abstract

1. Introduction

2. Non-Negative Sinh-Normal Distribution

2.1. Distribution Function, Survival Function, and Hazard Function of the LSHN Model

2.2. Moments of the LSHN Model

2.3. Cumulant-Generating Function and Mode

2.4. Asymptotic Distribution

3. The LSHN Regression Model

3.1. Maximum Likelihood Estimation in the LSHN Regression Model

3.2. Observed and Expected Information Matrix

4. Unit-Sinh-Normal Distribution

4.1. Distribution Function, Survival Function, and Hazard Function of the USHN Model

4.2. Moments of the USHN Model

4.3. Cumulant-Generating Function and Mode

4.4. Asymptotic Distribution

4.5. The LUSHN Regression Model

5. Simulation Study

6. Applications

6.1. Fatigue Data

6.2. Body Fat Data

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Related Theorems

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI