Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling

Bakouch, Hassan S.; Hussain, Tassaddaq; Tošić, Marina; Stojanović, Vladica S.; Qarmalah, Najla

doi:10.3390/math11194207

Open AccessArticle

Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling

¹

Department of Mathematics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia

²

Department of Mathematics, Faculty of Science, Tanta University, Tanta 31111, Egypt

³

Department of Statistics, Mirpur University of Science and Technology, Mirpur 10250, Pakistan

⁴

Department of Mathematics, Faculty of Sciences & Mathematics, University of Priština in Kosovska Mitrovica, 38220 Kosovska Mitrovica, Serbia

⁵

Department of Informatics & Computer Sciences, University of Criminal Investigation and Police Studies, 11060 Belgrade, Serbia

⁶

Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Mathematics 2023, 11(19), 4207; https://doi.org/10.3390/math11194207

Submission received: 8 August 2023 / Revised: 30 September 2023 / Accepted: 7 October 2023 / Published: 9 October 2023

(This article belongs to the Special Issue New Advances in Distribution Theory and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Distributions with bounded support show considerable sparsity over those with unbounded support, despite the fact that there are a number of real-world contexts where observations take values from a bounded range (proportions, percentages, and fractions are typical examples). For proportion modeling, a flexible family of two-parameter distribution functions associated with the exponential distribution is proposed here. The mathematical and statistical properties of the novel distribution are examined, including the quantiles, mode, moments, hazard rate function, and its characterization. The parameter estimation procedure using the maximum likelihood method is carried out, and applications to environmental and engineering data are also considered. To this end, various statistical tests are used, along with some other information criterion indicators to determine how well the model fits the data. The proposed model is found to be the most efficient plan in most cases for the datasets considered.

Keywords:

unit distribution; statistical model; hazard function; characterizations; estimation; simulation; application

MSC:

60E05; 62E15; 62F10

1. Introduction

Proportional variables are often encountered in data science, where they are used as stochastic models that describe, for instance, the number of successes divided by the number of attempts, party votes, the proportion of money spent on a cause, or the attendance rate of public events. Therefore, proportion analysis is necessary in various fields such as healthcare, economics, and engineering. Usually, to model the behavior of such random variables (RVs), distributions defined on a unit interval are used, which are highly valuable in modeling proportions and percentages. It is conceivable to model and forecast such variables, but one must look outside the traditional model because the data are limited to the range

(0, 1)

. For further study, readers are referred to [1,2,3].

In this context, the beta model was proposed by Bayes [4], which in many fields of statistics is a convenient and helpful model widely used for modeling percentages and proportions. However, there are a number of scenarios where it seems to not be a suitable one. Therefore, alternatively, several distributions have been developed for modeling bounded variables like proportions, indices, and rates, such as the unit distribution studied in [5], the unit Johnson distribution proposed in [6], the four-parameter distribution introduced in [7], the distribution proposed in [8], the Topp–Leone distribution studied in [9], and the unit gamma distribution introduced in [10]. More recently, many other unit interval distribution functions have been introduced, such as the cumulative distribution function (CDF) quantile distribution [11], new unit interval distribution [12], the unit-inverse Gaussian distribution [13], the log-xgamma distribution [14], unit Gompertz, unit Lindley, and unit Weibull distributions [15,16,17], the log-weighted exponential distribution [18], the unit Johnson SU distribution [19], the unit log–log distribution [20], the new unit distribution [21], the unit–power Burr X distribution [22], and the unit Teissier distribution [23], while in [24], the unit interval distribution via the conditional distribution approach was studied. Notice that all of these distributions are potential candidates for describing proportions. It is worth noting that the approaches mentioned above are mainly based on conventional strategies, namely the following:

(i): Log transformation approaches;
(ii): The CDF and quantile methodology;
(iii): Reciprocal transformation;
(iv): Exponential transformation;
(v): The conditional distribution methodology;
(vi): The T-X family approach.

However, all of the earlier models and others seem to be casual ways of generating unit interval distributions. In the current study, our motivational strategies begin with recalling the epsilon function examined in [25], which is defined as

ε_{λ, a} (x) = \{\begin{matrix} {(\frac{a + x}{a - x})}^{\frac{λ a}{2}}, & - a < x < a \\ 0, & otherwise, \end{matrix}

(1)

where

λ \in R ∖ {0}

and

a > 0

. For comprehensive details about the above and its bounded version, readers are referred to [25]. The function

y = ε_{λ, a} (x)

is the solution of an epsilon differential equation of the first order:

y^{'} = \frac{λ a^{2} y}{a^{2} - x^{2}},

In addition, it satisfies the following property of the exponential limit:

lim_{a \to + \infty} ε_{λ, a} (x) = e^{λ x}, \forall x \in (- a, + a) .

Furthermore, it is also related to the CDF class proposed in [7], which is based on the exponential function. However, the unit interval variants thus proposed differ from the design of our CDF. As will be seen, the distribution proposed here is much more flexible and exhibits both positive and negative skewness. Moreover, as will be seen below, the hazard rate function (HRF) of the proposed model purely yields an increasing failure rate (IFR) behavior, or all values of

λ > 0

thus belong to the decreasing mean residual life (DMRL) class.

The rest of the manuscript is organized as follows. In the next section, the basic stochastic properties of the proposed distribution are presented. The mode, quantiles, HRF, and characterization of the new distribution, among other properties, are examined. Section 3 shows the procedure for estimating the parameters of the proposed distribution using the maximum likelihood (ML) method, along with a Monte Carlo simulation study. Applications to a number of real-world datasets are given in Section 4, while the last section provides some concluding remarks.

2. The Proposed Unit Exponential Distribution

Let X be a bounded RV, and without loss of generality, it is convenient that values of X belong to the unit interval

[0, 1]

. Also, suppose that the CDF of the RV X is defined by the following equality:

F (x) = \{\begin{matrix} 1 - exp [α (1 - {(\frac{1 + x}{1 - x})}^{β})], & 0 \leq x < 1; \\ 1, & x = 1; \end{matrix}

(2)

where

α, β > 0

. The CDF given by Equation (2) is called the unit exponential distribution (UED) (with the parameters

α

and

β

) and referred to as the UED

(α, β)

. Note that the UED is related to the epsilon function defined in Equation (1). Indeed, when taking

a = 1

and

β = λ / 2

, Equation (2) becomes

F (x) = 1 - exp [α (1 - ε_{2 β, 1} (x))],

when

0 \leq x < 1

. Note that in this form, the function

F (x)

represents the composition of the CDF of the so-called one-shifted exponential distribution [26] and the epsilon function mentioned above. At the same time, it is obvious that

F (x)

approaches 0 and 1 when

x \to 0

and

x \to 1

, respectively, and thus represents a valid unit CDF. Graphical representations of the CDFs of the UED for different parameters

α

and

β

are shown in Figure 1. It portrays that for

α \to 0

and

β \geq 3

, the CDF curve is concave (bent inward), while for

α \to 1

, the CDF curve is convex (bent outward).

By differentiating the CDF given by Equation (2), the probability density function (PDF) of the UED when

0 \leq x < 1

can be easily obtained as follows:

f (x) = \frac{2 α β}{1 - x^{2}} {(\frac{1 + x}{1 - x})}^{β} \bar{F} (x) .

(3)

Here,

\bar{F} (x) = 1 - F (x)

is the tail of the CDF

F (x)

. Notice that the UED has two parameters

α, β > 0

, where one is like a dispersion and the other is like a shape parameter. Also, this PDF structure is similar to one of the simpler forms of the so-called proper dispersion models introduced in [7], but it does not belong to that class.

2.1. Properties of the Model

In practice, it is required that the proposed UED, whose PDF is defined by Equation (3), presents flexibility to describe the data adequately. In this regard, it exhibits negative and positive skewness for all values of

α > 0

and

β > 0

. The flexibility property of the UED can be visualized as in Figure 2, where the various cases of the appropriate PDF are shown, depending on the parameter values

α

and

β > 0

. These plots show the different skewness possibilities and the existence of modes of the UED that can be used to fit some real-world datasets.

2.1.1. Quantile

As a first property, the quantile function of the UED is quite manageable. By inverting the CDF

F (x)

, given by Equation (2), the quantile function is determined as follows:

Q (y) = F^{- 1} (y) = \frac{{(1 - ln (1 - y) / α)}^{1 / β} - 1}{{(1 - ln (1 - y) / α)}^{1 / β} + 1}, y \in (0, 1) .

Thanks to this function, the median of the UED is given by

M_{e} = Q (1 / 2) = \frac{{(1 + ln 2 / α)}^{1 / β} - 1}{{(1 + ln 2 / α)}^{1 / β} + 1} .

Using

Q (y)

, we are able to define various measures of skewness and kurtosis, as well as important actuarial measures (see, for example, [2,27]).

2.1.2. Mode

Note that Figure 2 shows that the PDF of the proposed model can have (at most) one mode. To identify this property, we should prove the following result, which collects these findings and their implications:

Proposition 1.

The PDF

f (x)

, given by Equation (3), has a unique mode if and only if

0 < α < 1

. Otherwise, the UED does not have any modes.

Proof.

The mode of the PDF

f (x)

is a solution of the equation

f^{'} (x) = 0

, which after certain calculations and simplification becomes

x + β - α β {(\frac{1 + x}{1 - x})}^{β} = 0 .

(4)

If we denote by

ψ (x)

the left-hand side of Equation (4), then the following is easily obtained:

lim_{x \to 1^{-}} ψ (x) < 0 and lim_{x \to 0^{+}} ψ (x) = β (1 - α) .

Obviously, the inequalities

0 < α < 1

and

β > 0

give

β (1 - α) > 0

. Then, Equation (4) has real solutions, which guarantee that

f (x)

has at least one mode. Next, the function

ψ (x)

defined above has the derivative

ψ^{'} (x) = 1 - \frac{2 α β^{2}}{1 - x^{2}} {(\frac{1 + x}{1 - x})}^{β} .

Note that

ψ^{'} (x)

is strictly decreasing because

ψ^{″} (x) = - \frac{4 α β^{2} (x + β)}{{(1 - x^{2})}^{2}} {(\frac{1 + x}{1 - x})}^{β} < 0 .

This fact then implies that the previously detected mode is unique. □

2.1.3. Behavior of the PDF at $x \to 0^{+}$ and $x \to 1^{-}$

The behavior of the PDF

f (x)

at the ends of the unit interval (i.e., when

x \to 0^{+}

and

x \to 1^{-}

) indicates how

f (x)

converges or not in these limits. In terms of data modeling, these facts would reflect the empirical limits on the extremes that the data show. At the limit

x \to 0^{+}

, according to Equations (2) and (3), the following is easily obtained:

lim_{x \to 0^{+}} f (x) = 2 α β .

On the other hand, to analyze the limit of

f (x)

at

x \to 1^{-}

, we observe the function

ln f (x)

, which can be written as

\begin{matrix} ln f (x) & = & ln (2 α β) + (β - 1) ln (1 + x) - (β + 1) ln (1 - x) + α (1 - {(\frac{1 + x}{1 - x})}^{β}) \\ = & \frac{1}{{(1 - x)}^{β}} ({(1 - x)}^{β} (ln (2 α β) + (β - 1) ln (1 + x) - (β + 1) ln (1 - x) + α) \\ - α {(1 + x)}^{β}) . \end{matrix}

Hence, we obtain

lim_{x \to 1^{-}} {(1 - x)}^{β} ln f (x) = - α 2^{β},

which implies that in a data representation, the data would decay at exponential rates when

x \to 1^{-}

.

2.1.4. Moments

Let X be an RV with the CDF given by Equation (2). Then, the rth moment of X, using partial integration, can be expressed as follows:

\begin{matrix} E (X^{r}) & = \int_{0}^{1} x^{r} d F (x) = \int_{1}^{0} x^{r} d (1 - F (x)) = r \int_{0}^{1} x^{r - 1} (1 - F (x)) d x \\ = r exp (α) \int_{0}^{1} x^{r - 1} exp [- α {(\frac{1 + x}{1 - x})}^{β}] d x . \end{matrix}

This integral can be determined numerically with the use of many pieces of software, such as R, Mathematica, and Matlab. The following result proposes a series expansion of

E (X^{r})

that can be used for numerical approximation:

Proposition 2.

The rth moment of X can be expanded as follows:

\begin{matrix} E (X^{r}) & = \frac{2 r α^{1 / β} exp (α)}{β} & \sum_{k = 0}^{r - 1} \sum_{ℓ = 0}^{+ \infty} (\binom{r - 1}{k}) (\binom{- (r + 1)}{ℓ}) {(- 1)}^{k} α^{(k + ℓ + 1) / β} Γ (- \frac{k + ℓ + 1}{β}, α), \end{matrix}

where

Γ (a, x)

denotes the upper incomplete gamma function (i.e.,

Γ (a, x) = \int_{x}^{+ \infty} t^{a - 1} exp (- t) d t

).

Proof.

By applying the change in the variable

y = (1 + x) / (1 - x)

, we have

\begin{matrix} E (X^{r}) = 2 r exp (α) \int_{1}^{+ \infty} \frac{{(y - 1)}^{r - 1}}{{(y + 1)}^{r + 1}} exp (- α y^{β}) d y . \end{matrix}

(5)

Then, using the “generalized version” of the binomial formula two times in a row, since

y > 1

, we find

\begin{matrix} \frac{{(y - 1)}^{r - 1}}{{(y + 1)}^{r + 1}} & = y^{- 2} \frac{{(1 - 1 / y)}^{r - 1}}{{(1 + 1 / y)}^{r + 1}} \\ = y^{- 2} [\sum_{k = 0}^{r - 1} (\binom{r - 1}{k}) {(- 1)}^{k} y^{- k}] [\sum_{ℓ = 0}^{+ \infty} (\binom{- (r + 1)}{ℓ}) y^{- ℓ}] \\ = \sum_{k = 0}^{r - 1} \sum_{ℓ = 0}^{+ \infty} (\binom{r - 1}{k}) (\binom{- (r + 1)}{ℓ}) {(- 1)}^{k} y^{- (k + ℓ + 2)} . \end{matrix}

(6)

Also, with the change in the variable

z = α y^{β}

, the following is obtained:

\begin{matrix} \int_{1}^{+ \infty} y^{- (k + ℓ + 2)} exp (- α y^{β}) d y & = \frac{α^{(k + ℓ + 1) / β}}{β} \int_{α}^{+ \infty} z^{- (k + ℓ + 1) / β - 1} exp (- z) d z \\ = \frac{α^{(k + ℓ + 1) / β}}{β} Γ (- \frac{k + ℓ + 1}{β}, α) . \end{matrix}

(7)

Therefore, by substituting Equations (6) and (7) into Equation (5), as well as by inverting the sign of the integral and the sum, the desired result is obtained. □

2.1.5. Failure (Hazard) Rate Function

The HRF of the UED is given by

h (x) = \frac{f (x)}{\bar{F} (x)} = \frac{2 α β}{1 - x^{2}} {(\frac{1 + x}{1 - x})}^{β} .

(8)

When

x \to 0^{+}

, the limit of

h (x)

is

2 α β > 0

, and when

x \to 1^{-}

, the limit is

+ \infty

. Thus, this function is strictly increasing, as can be seen in Figure 3, meaning that when x increases, the frequency at which an engineered system or component fails also increases.

2.2. Characterizations

To interpret the HRF realistically, we shall try to characterize Equation (3) with hazard and mean residual life functions. Characterization in general terms implies that under certain conditions, a family of distributions is the only one possessing a designated property. Researchers can identify the actual probability distribution with the help of characterization. For detailed study, readers are referred to the works of Ahsanullah et al. [28,29] and Hamedani [30]. In this regard, we characterize the proposed model with the HRF and truncated moments, and the characterizing conditions are defined as follows:

Proposition 3.

The RV

X : Ω ⟶ (0, + \infty)

has a continuous PDF

f (x)

if and only if the HRF

h (x)

satisfies the following equation:

\frac{f^{'} (x)}{f (x)} = \frac{h^{'} (x)}{h (x)} - h (x) .

(9)

Proof.

According to the definition of the HRF, given by the first equality in Equation (8), it follows that

\frac{h^{'} (x)}{h (x)} = \frac{f^{'} (x) \bar{F} (x) + f^{2} (x)}{{\bar{F}}^{2} (x)} \cdot \frac{\bar{F} (x)}{f (x)} = \frac{f^{'} (x)}{f (x)} + h (x) .

Thus, the statement of proposition immediately follows. □

Proposition 4.

The RV

X : Ω ⟶ (0, + \infty)

has a UED

(α, β)

if and only if the HRF

h (x)

, defined by Equation (8), satisfies the following equation:

\frac{h^{'} (x)}{{(h (x))}^{2}} = \frac{x + β}{α β} {(\frac{1 - x}{1 + x})}^{β} .

(10)

Proof.

Necessity: Assume that

X \sim UED (α, β)

, with the PDF

f (x)

defined by Equation (3). Then, the logarithm of this PDF, in the same way as in Section 2.1.3, can be expressed as:

ln (f (x)) = ln (2 α β) + (β - 1) ln (1 + x) - (β + 1) ln (1 - x) + α (1 - {(\frac{1 + x}{1 - x})}^{β}) .

By differentiating both sides of this equality with respect to

x,

we obtain

\begin{matrix} \frac{f^{'} (x)}{f (x)} & = \frac{β - 1}{1 + x} + \frac{β + 1}{1 - x} - \frac{2 α β}{{(1 - x)}^{2}} {(\frac{1 + x}{1 - x})}^{β - 1} = \frac{2}{1 - x^{2}} (x + β - α β {(\frac{1 + x}{1 - x})}^{β}) . \end{matrix}

(11)

Thus, according to Equations (8) and (9), it follows that

\frac{h^{'} (x)}{h (x)} = \frac{f^{'} (x)}{f (x)} + h (x) = \frac{2 (x + β)}{1 - x^{2}},

which after certain simplification yields Equation (10).

Sufficiency: Suppose that Equation (10) holds. After integration, it can be rewritten as follows:

\int \frac{h^{'} (x)}{{(h (x))}^{2}} d x = \int \frac{x + β}{α β} {(\frac{1 - x}{1 + x})}^{β} d x,

That is, we have

- \frac{1}{h (x)} = \frac{x^{2} - 1}{2 α β} {(\frac{1 - x}{1 + x})}^{β} .

From the above equation, we obtain the HRF

h (x)

as shown in Equation (8). Furthermore, by replacing this function in Equation (9), and after integration, we obtain

\begin{matrix} \int \frac{f^{'} (x)}{f (x)} d x & = 2 \int [\frac{x + β}{1 - x^{2}} - \frac{α β}{1 - x^{2}} {(\frac{1 + x}{1 - x})}^{β}] d x + C_{1} \\ = (β - 1) ln (1 + x) - (β + 1) ln (1 - x) - α {(\frac{1 + x}{1 - x})}^{β} + C_{1}, \end{matrix}

that is, we have

f (x) = \frac{exp [C_{1} - α {(\frac{x + 1}{1 - x})}^{β}]}{1 - x^{2}} {(\frac{1 + x}{1 - x})}^{β} .

Another integration implies that

F (x) = \int f (x) d x + C_{2} = - \frac{exp [C_{1} - α {(\frac{x + 1}{1 - x})}^{β}]}{2 α β} + C_{2},

whereby from the conditions

F (0) = 0

and

F (1) = 1

, the constants

C_{1} = α + ln (2 α β)

and

C_{2} = 1

are obtained. Thus, the function

F (x)

is indeed the CDF from

U E D (α, β)

, which completes the proof. □

The following theorem was used in [31] as well as [28,29] in order to characterize different univariate continuous distributions:

Theorem 1.

Let

(Ω; F; P)

be a given probability space, and let

H = [a, b]

be an interval for some

a < b

, where

a = - \infty

and

b = + \infty

might as well be allowed. Also, let

X : Ω \to H

be a continuous RV with the CDF

F (x)

and

g (x)

and

t (x)

be two real functions defined on

H

and such that

E [g (X) | X \geq x] = ξ (x) E [t (X) | X \geq x], x \in H

is defined with some real function

ξ (x)

. Assume that

g (x), t (x) \in C^{1} (H), ξ (x) \in C^{2} (H)

, and

F (x)

is a twice continuously differentiable and strictly monotone function on the set

H

. Finally, assume that the equation

t (x) ξ (x) = g (x)

has no real solution in the interior of

H

. Then,

F (x)

is uniquely determined by the functions

g (x), t (x)

, and

ξ (x)

as follows:

F (x) = C \int_{0}^{x} |\frac{ξ^{'} (u)}{ξ (u) t (u) - g (u)}| e^{- s (u)} d u,

(12)

where the function

s (x)

is a solution of the differential equation

s^{'} (x) = \frac{ξ^{'} (x) t (x)}{ξ (x) t (x) - g (x)},

and C is a constant such that

\int_{H} d F (x) = 1 .

Now, we discuss the characterization of the UED based on Theorem 1 and some simple relationship between two functions and the RV

X \sim U E D (α, β)

.

Proposition 5.

Let

X : Ω \to [0, 1)

be a continuous RV and

\begin{matrix} t (x) & = 3 exp [2 α (1 - {(\frac{1 + x}{1 - x})}^{β})], x \in [0, 1) \\ g (x) & = 2 exp [α (1 - {(\frac{1 + x}{1 - x})}^{β})], x \in [0, 1) . \end{matrix}

The RV X has a PDF defined by Equation (3) if and only if there exists a function

ξ (x)

, defined as in Theorem 1, that satisfies the differential equation

\frac{ξ^{'} (x)}{ξ (x) t (x) - g (x)} = \frac{2 α β}{1 - x^{2}} {(\frac{1 + x}{1 - x})}^{β} exp [- 2 α (1 - {(\frac{1 + x}{1 - x})}^{β})], 0 \leq x < 1 .

(13)

Proof.

Necessity: For the RV

X \sim U E D (α, β)

, with the CDF and PDF given by Equations (2) and (3), respectively, after a certain computation, we obtain

\begin{matrix} (1 - F (x)) E [t (X) | X \geq x] & = 3 e^{α r (x; β)} \int_{x}^{1} \frac{2 α β}{1 - u^{2}} {(\frac{1 + u}{1 - u})}^{β} e^{3 α r (u; β)} d u \\ = exp [4 α (1 - {(\frac{1 + x}{1 - x})}^{β})], \\ (1 - F (x)) E [g (X) | X \geq x] & = 2 e^{α r (x; β)} \int_{x}^{1} \frac{2 α β}{1 - u^{2}} {(\frac{1 + u}{1 - u})}^{β} e^{2 α r (u; β)} d u \\ = exp [3 α (1 - {(\frac{1 + x}{1 - x})}^{β})], \end{matrix}

where

0 < x < 1

and

r (x) : = 1 - {(\frac{1 + x}{1 - x})}^{β}

. This implies that

ξ (x) : = \frac{E (g (x) | X \geq x)}{E (t (x) | X \geq x)} = exp [- α (1 - {(\frac{1 + x}{1 - x})}^{β})], 0 < x < 1,

(14)

that is, we have

ξ (x) t (x) - g (x) = 3 e^{α r (x; β)} - 2 e^{α r (x; β)} = exp [α (1 - {(\frac{1 + x}{1 - x})}^{β})] > 0, 0 < x < 1 .

Hence, Equation (13) clearly holds.

Sufficiency: If the function

ξ (x)

satisfies the differential Equation (13), then it follows that

s^{'} (x) = \frac{ξ^{'} (x) t (x)}{ξ (x) t (x) - g (x)} = \frac{6 α β}{1 - x^{2}} {(\frac{1 + x}{1 - x})}^{β}, 0 < x < 1,

Therefore, one can take

s (x) = - 3 α (1 - {(\frac{1 + x}{1 - x})}^{β}) .

Using Equation (12), it is easy to obtain that the RV X has a PDF given by Equation (3). □

According to the previous proposition, one immediately obtains the following:

Corollary 1.

Let

X : Ω \to [0, + \infty)

be a continuous RV and functions

t (x)

and

g (x)

be given as in Proposition 5. Then,

X \sim U E D (α, β)

, with the PDF as shown in Equation (3) if and only if the function

ξ (x)

has the form in Equation (14).

3. Estimation and Simulation Procedures

Let us assume that

x_{1}

, …,

x_{n}

are observed values of the sample of size n taken from the

U E D (α, β)

. We propose the maximum likelihood method for estimating the pair of parameters

(α, β)

. This means that the estimates of those parameters are the ones that maximize the likelihood function

L (α, β | x_{1}, \dots, x_{n}) = \prod_{i = 1}^{n} f (x_{i}) .

As is known, this solution also corresponds to the one that maximizes the log-likelihood function; in other words, it maximizes

l = l (α, β | x_{1}, \dots, x_{n}) = \sum_{i = 1}^{n} ln f (x_{i}) .

By differentiating the function l with respect to each parameter, the estimators of

α

and

β

can be obtained by solving the coupled equations

\begin{matrix} \frac{\partial l}{\partial α} & = & \frac{n}{α} + \sum_{i = 1}^{n} (1 - {(\frac{1 + x_{i}}{1 - x_{i}})}^{β}) = 0 \\ \frac{\partial l}{\partial β} & = & \frac{n}{β} + \sum_{i = 1}^{n} ln (\frac{1 + x_{i}}{1 - x_{i}}) - α \sum_{i = 1}^{n} {(\frac{1 + x_{i}}{1 - x_{i}})}^{β} ln (\frac{1 + x_{i}}{1 - x_{i}}) = 0 . \end{matrix}

From the first equation, we obtain

α = {[\frac{1}{n} \sum_{i = 1}^{n} {(\frac{1 + x_{i}}{1 - x_{i}})}^{β} - 1]}^{- 1},

and by replacing this output in the second coupled equation, we obtain

\frac{n}{β} + \sum_{i = 1}^{n} ln (\frac{1 + x_{i}}{1 - x_{i}}) + \frac{\sum_{i = 1}^{n} {(\frac{1 + x_{i}}{1 - x_{i}})}^{β} ln (\frac{1 + x_{i}}{1 - x_{i}})}{1 - \frac{1}{n} \sum_{i = 1}^{n} {(\frac{1 + x_{i}}{1 - x_{i}})}^{β}} = 0 .

Obviously, the last equation has only

β

as an unknown parameter. Now, by denoting

z_{i} = (1 + x_{i}) / (1 - x_{i}) > 1

, i = 1, …, n, and

L (β) = \frac{n}{β} + \sum_{i = 1}^{n} ln z_{i} + \frac{\sum_{i = 1}^{n} z_{i}^{β} ln z_{i}}{1 - \frac{1}{n} \sum_{i = 1}^{n} z_{i}^{β}},

then by applying the L’Hopital’s rule, one obtains

\begin{matrix} lim_{β \to 0^{+}} L (β) & = \sum_{i = 1}^{n} ln z_{i} + n lim_{β \to 0^{+}} \frac{\sum_{i = 1}^{n} (1 - z_{i}^{β} + β z_{i}^{β} ln z_{i})}{β \sum_{i = 1}^{n} (1 - z_{i}^{β})} \\ = \sum_{i = 1}^{n} ln z_{i} + n lim_{β \to 0^{+}} \frac{\sum_{i = 1}^{n} (- z_{i}^{β} ln z_{i} + ln z_{i})}{\sum_{i = 1}^{n} (1 - z_{i}^{β} - β z_{i}^{β} ln z_{i})} \\ = \sum_{i = 1}^{n} ln z_{i} + n lim_{β \to 0^{+}} \frac{\sum_{i = 1}^{n} (- z_{i}^{β} {ln}^{2} z_{i})}{\sum_{i = 1}^{n} (- z_{i}^{β} ln z_{i} - ln z_{i})} \\ = \sum_{i = 1}^{n} ln z_{i} + \frac{n}{2} \cdot \frac{\sum_{i = 1}^{n} {ln}^{2} z_{i}}{\sum_{i = 1}^{n} ln z_{i}} > 0 . \end{matrix}

On the other hand, assuming that

z_{1} > max {z_{2}, \dots, z_{n}}

, it follows that

\begin{matrix} lim_{β \to + \infty} L (β) & = & \sum_{i = 1}^{n} ln z_{i} + lim_{β \to + \infty} \frac{ln z_{1} + \sum_{i = 2}^{n} {(\frac{z_{i}}{z_{1}})}^{β} ln z_{i}}{z_{1}^{- β} - \frac{1}{n} - \frac{1}{n} \sum_{i = 2}^{n} {(\frac{z_{i}}{z_{1}})}^{β}} \\ = & \sum_{i = 1}^{n} ln z_{i} - n ln z_{1} < 0 . \end{matrix}

Hence, equation

L (β) = 0

has at least one solution, and it can be solved numerically, for instance, by using the Newton–Raphson algorithm. This task may be performed using the function “uniroot” available in the statistical programming software “R” (version 4.3.1). Once

β

is estimated, this output can be used for estimating

α

.

For computing the interval estimators for

θ = {(α, β)}^{'}

and testing hypotheses with these parameters, we find the observed matrix information:

I (θ) = - (\begin{matrix} \frac{\partial^{2} l (θ)}{\partial α^{2}} & \frac{\partial^{2} l (θ)}{\partial α \partial β} \\ \frac{\partial^{2} l (θ)}{\partial β \partial α} & \frac{\partial^{2} l (θ)}{\partial β^{2}} \end{matrix}),

where

\begin{matrix} \frac{\partial^{2} l (θ)}{\partial α^{2}} & = & - \frac{n}{α^{2}} \\ \frac{\partial^{2} l (θ)}{\partial α \partial β} & = & \frac{\partial^{2} l (θ)}{\partial β \partial α} = - \sum_{i = 1}^{n} {(\frac{1 + x_{i}}{1 - x_{i}})}^{β} ln (\frac{1 + x_{i}}{1 - x_{i}}) \\ \frac{\partial^{2} l (θ)}{\partial β^{2}} & = & - \frac{n}{β^{2}} - α \sum_{i = 1}^{n} {(\frac{1 + x_{i}}{1 - x_{i}})}^{β} {ln}^{2} (\frac{1 + x_{i}}{1 - x_{i}}) . \end{matrix}

Note that

I (\hat{θ})

is a consistent estimator of the expected Fisher information matrix

E [I (θ)]

(see, for example, [32]). Under some suitable conditions, the approximation to a normal distribution

\hat{θ} \approx N (θ, I {(\hat{θ})}^{- 1})

holds, and more generally, we have

a^{'} \hat{θ} \approx N (a^{'} θ, a^{'} I {(\hat{θ})}^{- 1} a),

for any vector

a = {(a_{1}, a_{2})}^{'}

. By choosing

a = {(1, 1)}^{'}

, we find the

100 \times (1 - δ) %

confidence interval:

θ_{i} \pm z_{δ / 2} \sqrt{{(I {(\hat{θ})}^{- 1})}_{i i}},

where

0 < δ < 1

and

z_{δ / 2}

is the

1 - δ / 2

quantile of the standard normal distribution.

Simulation Study

In this part, we shall discuss the effectiveness of the proposed MLE procedure, which will be used in application for better predictions of a phenomenon. In this regard, we considered four sets of parameters and conducted a Monte Carlo simulation with 20,000 replications in order to generate samples of various sizes (i.e.,

n = 25, 50, 150, 350, 500

) from the UED

(α, β)

. The parameter combinations are listed below:

Set-I: $α = 0.9856$ , $β = 0.2178$ ;
Set-II: $α = 1.8986$ , $β = 0.3218$ ;
Set-III: $α = 2.4390$ , $β = 2.5145$ ;
Set-IV: $α = 0.4390$ , $β = 1.5145$ .

For all of them, the MLE estimates were obtained by using Mathematica 13.0 software. The simulation results are portrayed in Table 1, Table 2, Table 3 and Table 4, where they are compiled according to the following definitions:

\begin{matrix} Bias : = E (\hat{Θ}) - Θ; \\ Mean square error (MSE) : = E ({(\hat{Θ} - Θ)}^{2}); \\ Lower Confidence Limit : = LCL = \hat{Θ} - z_{\frac{δ}{2}} \frac{\sqrt{V a r (\hat{Θ})}}{n}; \\ Upper Confidence Limit : = UCL = \hat{Θ} + z_{\frac{δ}{2}} \frac{\sqrt{V a r (\hat{Θ})}}{n}, \end{matrix}

where

Θ = (α, β)

. From these tables, there is evidence that both the bias and MSE of the MLE estimates tended toward zero as the sample sizes increased, whereas the 95% confidence limits became compact as the sample size increased.

Moreover, Table 1 shows a downward bias for

\hat{α}

and an upward one for

\hat{β}

. Similarly, the MSE approached zero as the sample size increased.

Also, Table 2 portrays a downward bias for

\hat{α}

and

\hat{β}

when the sample sizes were less than or equal to 50, while there was an upward bias for

\hat{α}

when the sample size increased. However, the MSE approached zero as the sample size increased.

In the case of Set-III, shown in Table 3, the bias was downward for

\hat{α}

for all sample sizes. On the contrary, it was upward for

\hat{β}

when the sample sizes were higher, usually for those greater than 150. Notice that all biases were negligibly small and approached zero as the sample size increased.

Similarly, Table 4 shows a downward bias for

\hat{α}

and an upward bias for

\hat{β}

, but all biases were negigibly small and approached zero as the sample size increasd, and the same is true for the MSE. In summary, the above results show that the MLE is a suitable estimation method for realistic forecasting.

4. Model Compatibility and Its Application to Real-World Data

Here, the possibility of applying the UED model in terms of modeling empirical distributions of some real-world processes is discussed in more detail. To that end, by using several typical statistical indicators, the quality of fitting with the UED was also checked. The obtained results were also compared with the results of fitting using some of the previously known unit interval probability distributions, which additionally checked the possibility of applying the UED.

4.1. Measures of Goodness-of-Fit

In order to test the null hypothesis

H_{0} : F_{n} (x) = F_{0} (x)

, where

F_{n} (x)

is the empirical CDF and

F_{0} (x)

is the CDF of some specified (theoretical) distribution, usually some well-known statistical tests are used. In order to test the hypothesis that some real-world data are taken from the UED (i.e., from some other stochastic distribution), the following statistical tests are used here:

The Kolmogorov–Smirnov (KS) test, whose test-statistics are defined by

$KS = max_{1 \leq i \leq k} \{\frac{i}{k} - z_{i}, z_{i} - \frac{i - 1}{k}\},$

where k denotes the number of classes and $z_{i}$ represents the values of the theoretical CDF.
The Anderson–Darling (AD $_{0}^{*}$ ) test, which usually attaches more mass to the distributions tails and whose test-statistics are

$A_{0}^{*} = (\frac{2.25}{k^{2}} + \frac{0.75}{k} + 1) \{- k - \frac{1}{k} \sum_{i = 1}^{k} (2 i - 1) ln (z_{i} (1 - z_{k - i + 1}))\} .$
The Cramér–von Mises (CVM $_{0}^{*}$ )-test is a derived version of the KS test, with test-statistics defined by

$W_{0}^{*} = \sum_{i = 1}^{K} {(z_{i} - \frac{2 i - 1}{2 k})}^{2} + \frac{1}{12 k} .$

Additionally, in order to check the quality of fitting certain real-world data using the UED (i.e., some other distribution), the following indicators were used:

The Akaike information criterion (AIC), defined as

$AIC = 2 m - 2 ℓ (\hat{Θ}),$

where m denotes the number of parameters.
The corrected Akaike information criterion (AICc), expressed as

$AICc = AIC + \frac{2 m (m + 1)}{n - m - 1} .$
The Bayesian information criterion (BIC), which is defined as

$BIC = m ln (n) - 2 ℓ (\hat{Θ}) .$
The Hannan–Quinn information criterion (HQIC), expressed as

$HQIC = - 2 ℓ (\hat{Θ}) + 2 m ln (ln (m)) .$
The consistent Akaike information criterion (CAIC), given as

$CAIC = - 2 ℓ (\hat{Θ}) + m (ln (n) + 1) .$
The Vuong test was also used for model selection purposes.

For comprehensive details about these measures, readers may refer to Akaike [33], Hussain et al. [34], Murthy et al. [35], and Vuong [36], respectively.

4.2. Comparative Models

We also compared the proposed UED model with well-known unit interval models defined by the following PDFs:

The beta distribution (BD) [4]:

$\begin{matrix} f_{α}^{BD} (x) & = \frac{1}{B (α, β)} x^{β - 1} {(1 - x)}^{α - 1}, α, β > 0, 0 < x < 1, \end{matrix}$
The Johnson $S_{B}$ distribution (JSBD) [6]:

$\begin{matrix} f_{α, β}^{JSBD} (x) & = \frac{β exp [- \frac{1}{2} {(α + β ln (\frac{x}{1 - x}))}^{2} - β x]}{\sqrt{2 π} x (1 - x)}, α, β > 0, 0 < x < 1, \end{matrix}$
The Kumaraswamy distribution (KwD) [8]:

$\begin{matrix} f_{α, β}^{KwD} (x) & = α β x^{α - 1} {(1 - x^{α})}^{β - 1}, α, β > 0, 0 < x < 1, \end{matrix}$
The unit Gompertz distribution (UGoMD) [15]:

$\begin{matrix} f_{α, β}^{UGoMD} (x) & = α β x^{- α - 1} e^{- β (x^{- α} - 1)}, α, β > 0, 0 < x < 1 . \end{matrix}$

In order to compare the fitting results, we considered four different real-world datasets classified into two sections: (1) environmental and (2) engineering. The results obtained from the statistical analysis of these datasets are discussed below.

4.3. Environmental Datasets

Datasets I and II. The first two datasets were reported by Maiti [37], and they represent the following measured values:

-: Soil moisture (Dataset I): 0.0179, 0.0798, 0.0959, 0.0444, 0.0938, 0.0443, 0.0917, 0.0882, 0.0439, 0.049, 0.0774, 0.0171, 0.0305, 0.0757, and 0.0468;
-: Permanent wilting points (PWP) (Dataset II): 0.0821, 0.0561, 0.0202, 0.051, 0.0041, 0.0226, 0.0556, 0.0829, 0.0062, 0.0695, 0.0557, 0.0243, 0.0083, 0.0532, and 0.0118.

In this regard, we compiled both the descriptive and theoretical (UED) statistics, which are listed in Table 5 and Table 6, respectively. Note that the descriptive statistics of all datasets include the sample size (SS), mean, median, standard deviation (SD), skewness (SK), and kurtosis (KU).

In addition, the total test time (TTT) plot, introduced in [38], is portrayed in Figure 4 for both datasets. Notice that, in particular, the TTT plot indicates the empirical HRF, portraying an IFR. Table 5 and Table 6 also reveal that the theoretical UED statistics as well as the observed descriptive statistics showed remarkable closeness to each other, and it appears that both sets of data can be simulated by the proposed model. Furthermore, it is evident from Figure 5 that neither dataset contained any outliers.

Table 7 portrays that the model proposed by the UED is the best strategy for analyzing the observed dataset (Dataset I) in relation to all other distributions of unit intervals. Namely, although the p value of the KS statistics for the KwD was the highest, the other nonparametric tests, CVM

_{0}^{*}

and AD

_{0}^{*}

, indicate that for the UED, the minimum tested values were obtained. Also, based on the estimated values of the Vuong statistics, given in Table 8, the KwD and UED had an indecisive status. Thus, the UED is the best strategy, which is also confirmed by Figure 6. Similarly, Table 9 portrays that the proposed UED model is also one of the best strategies for the analysis of Dataset II in all aspects.Namely, the test statistics, including the KS test, CVM

_{0}^{*}

, and AD

_{0}^{*}

, had the lowest values compared with all the selected, previously known interval models. In addition, the Vuong statistic, which compares models based on the likelihood ratio phenomenon, openly supported the UED. Finally, Figure 6 also confirms our claim that the UED is the best strategy. Moreover, Table 10 and Table 11 yield the lowest information criterion values for the UED compared with the competing models.

4.4. Engineering Datasets

Datasets III and IV. The third and fourth datasets were first introduced and studied in [39] for Burr measurements on iron sheets. For the third dataset of 50 observations of the Burr measurements (in the unit of millimeters), the hole diameter was 12 mm, and the sheet thickness was 3.15 mm. For the fourth dataset of 50 observations, the hole diameter and sheet thickness were 9 mm and 2 mm, respectively. Hole diameter readings were taken for jobs with respect to one hole and then selected and fixed as per a predetermined orientation. These two datasets refer to two different machines being compared, and one can see [39] for the technical details of measuring the datasets. Note that both datasets were also analyzed in [19,40,41,42].The descriptive statistics of these datasets, as well as the corresponding theoretical statistics for the UED, are presented in Table 12 and Table 13, respectively. The TTT plot and box plots of the observed data are given in Figure 7 and Figure 8, respectively. It can be observed that Datasets III and IV were positively skewed and platykurtic in nature, which is confirmed by Table 12 and Table 13. In addition, from Figure 8, it is evident that the empirical and theoretical aspects of these datasets, in terms of the absence of outliers, are in close agreement and indicate that the proposed model can be used effectively. Such findings are also consolidated within Table 14 and Table 15, which show that the UED exhibited minimal values in the almost all cases for the goodness-of-fit statistic, which ensures that the UED is one of the best strategies.

However, the likelihood aspects and information criterion values also favor the proposed UED model, which can be seen in Table 16 and Table 17, respectively. Furthermore, the shape of our proposed model, as shown in Figure 9, matched the data in a better way compared with the other competing models. Finally, the Vuong statistic, as depicted in Table 18, also shows the capability of the proposed model.

5. Concluding Remarks

We introduced a two-parameter bounded model called the unit exponential distribution (UED), which is appropriate for modeling skewed and IFR data. Some of its mathematical properties were studied, including the moments, quantiles, and other distributional behavior. A characterization of the UED via the HRF was made, which provided the identification requirements of the distribution and thus provided a reliable prediction compared with the well-known unit domain models. The model parameters were estimated with the MLE method. We also provided a guide line to choose the best model by using various goodness-of-fit statistics. Applications of the newly defined distribution showed that the proposed models have better modeling abilities than competitive models. For this purpose, we used four datasets in two different disciplines, namely environmental and engineering, and it was found that the proposed strategy was the best one in the unit interval domain. Moreover, in a further study, the proposed model could also be generalized over the interval

[0, s)

by introducing the function

F (x) = 1 - exp [α (1 - {(\frac{s + x}{s - x})}^{β s})] .

where, obviously,

F (x) = 0

when

x = 0

and

F (x) = 1

when

x = s

.

Author Contributions

Conceptualization, H.S.B. and T.H.; methodology, H.S.B., M.T. and N.Q.; software, H.S.B. and T.H.; validation, H.S.B., M.T. and V.S.S.; formal analysis, H.S.B., T.H. and V.S.S.; data curation, H.S.B. and T.H.; writing—original draft preparation, H.S.B., T.H. and V.S.S.; writing—review and editing, M.T., V.S.S. and N.Q.; visualization, T.H. and M.T.; supervision, H.S.B. and V.S.S.; project administration, M.T. and N.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors gratefully acknowledge Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2023R376), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia for the financial support for this project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Fleiss, J.L.; Levin, B.; Paik, M.C. Statistical Methods for Rates and Proportions, 3rd ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 1993. [Google Scholar]
Gilchrist, W. Statistical Modelling with Quantile Functions; CRC Press: Abingdon, UK, 2000. [Google Scholar]
Seber, G.A.F. Statistical Models for Proportions and Probabilities; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Bayes, T. An Essay Towards Solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, F.R.S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S. Philos. Trans. R. Soc. 1763, 53, 370–418. [Google Scholar] [CrossRef]
Leipnik, R.B. Distribution of the Serial Correlation Coefficient in a Circularly Correlated Universe. Ann. Math. Stat. 1947, 18, 80–87. [Google Scholar] [CrossRef]
Johnson, N. Systems of Frequency Curves Derived From the First Law of Laplace. Trabajos Estadistica 1955, 5, 283–291. [Google Scholar] [CrossRef]
Jørgensen, B. Proper Dispersion Models. Braz. J. Probab. Stat. 1997, 11, 89–128. [Google Scholar]
Kumaraswamy, P. A Generalized Probability Density Function for Double-Bounded Random Processes. J. Hydrol. 1980, 46, 79–88. [Google Scholar] [CrossRef]
Topp, C.W.; Leone, F.C. A Family of J-Shaped Frequency Functions. J. Am. Stat. Assoc. 1955, 50, 209–219. [Google Scholar] [CrossRef]
Consul, P.C.; Jain, G.C. On the Log-Gamma Distribution and Its Properties. Stat. Hefte 1971, 12, 100–106. [Google Scholar] [CrossRef]
Smithson, M.; Shou, Y. CDF-Quantile. Distributions for Modelling RVs on the Unit Interval. Br. J. Math. Stat. Psychol. 2017, 70, 412–438. [Google Scholar] [CrossRef]
Nakamura, L.R.; Cerqueira, P.H.R.; Ramires, T.G.; Pescim, R.R.; Rigby, R.A.; Stasinopoulos, D.M. A New Continuous Distribution on the Unit Interval Applied to Modelling the Points Ratio of Football Teams. J. Appl. Stat. 2019, 46, 416–431. [Google Scholar] [CrossRef]
Ghitany, M.E.; Mazucheli, J.; Menezes, A.F.B.; Alqallaf, F. The Unit-Inverse Gaussian Distribution: A New Alternative to Two-Parameter Distributions on the Unit Interval. Commun. Stat. Theory Methods 2019, 48, 3423–3438. [Google Scholar] [CrossRef]
Altun, E.; Hamedani, G. The Log-Xgamma Distribution with Inference and Application. J. Soc. Fr. Stat. 2018, 159, 40–55. [Google Scholar]
Mazucheli, J.; Menezes, A.F.; Dey, S. Unit-Gompertz Distribution with Applications. Statistica 2019, 79, 25–43. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Chakraborty, S. On the One Parameter Unit-Lindley Distribution and Its Associated Regression Model for Proportion Data. J. Appl. Stat. 2019, 46, 700–714. [Google Scholar] [CrossRef]
Mazucheli, J.; Menezes, A.F.B.; Fernandes, L.B.; de Oliveira, R.P.; Ghitany, M.E. The Unit-Weibull Distribution as an Alternative to the Kumaraswamy Distribution for the Modeling of Quantiles Conditional on Covariates. J. Appl. Stat. 2019, 47, 954–974. [Google Scholar] [CrossRef]
Altun, E. The Log-Weighted Exponential Regression Model: Alternative to the Beta Regression Model. Commun. Stat. Theory Methods 2020, 50, 2306–2321. [Google Scholar] [CrossRef]
Gündüz, S.; Mustafa, Ç.; Korkmaz, M.C. A New Unit Distribution Based on the Unbounded Johnson Distribution Rule: The Unit Johnson SU Distribution. Pak. J. Stat. Oper. Res. 2020, 16, 471–490. [Google Scholar] [CrossRef]
Korkmaz, M.Ç.; Korkmaz, Z.S. The Unit Log–log Distribution: A New Unit Distribution with Alternative Quantile Regression Modeling and Educational Measurements Applications. J. Appl. Stat. 2023, 50, 889–908. [Google Scholar] [CrossRef]
Afify, A.Z.; Nassar, M.; Kumar, D.; Cordeiro, G.M. A New Unit Distribution: Properties and Applications. Electron. J. Appl. Stat. 2022, 15, 460–484. [Google Scholar]
Fayomi, A.; Hassan, A.S.; Baaqeel, H.; Almetwally, E.M. Bayesian Inference and Data Analysis of the Unit–Power Burr X Distribution. Axioms 2023, 12, 297. [Google Scholar] [CrossRef]
Krishna, A.; Maya, R.; Chesneau, C.; Irshad, M.R. The Unit Teissier Distribution and Its Applications. Math. Comput. Appl. 2022, 27, 12. [Google Scholar] [CrossRef]
Biswas, A.; Chakraborty, S. A new method for constructing continuous distributions on the unit interval. arXiv 2021, arXiv:2101.04661. [Google Scholar]
Dombi, J.; Jónás, T.; Tóth, Z.E. The Epsilon Probability Distribution and its Application in Reliability Theory. Acta Polytech. Hung. 2018, 15, 197–216. [Google Scholar]
Aslam, M.; Noor, F.; Ali, S. Shifted Exponential Distribution: Bayesian Estimation, Prediction and Expected Test Time Under Progressive Censoring. J. Test. Eval. 2020, 48, 1576–1593. [Google Scholar] [CrossRef]
Artzner, P.; Delbaen, F.; Eber, J.-M.; Heath, D. Coherent Measures of Risk. Math. Financ. 1999, 9, 203–228. [Google Scholar] [CrossRef]
Ahsanullah, M.; Shakil, M.; Kibria, B.M.G. Characterizations of Continuous Distributions by Truncated Moment. J. Mod. Appl. Stat. Methods 2016, 15, 316–331. [Google Scholar] [CrossRef]
Ahsanullah, M.; Ghitany, M.E.; Al-Mutairi, D.K. Characterization of Lindley Distribution by Truncated Moments. Commun. Stat. Theory Methods 2017, 46, 6222–6227. [Google Scholar] [CrossRef]
Hamedani, G.G. Characterizations of Univariate Continuous Distributions Based on Truncated Moments of Functions of Order Statistics. Stud. Sci. Math. Hung. 2010, 47, 462–468. [Google Scholar] [CrossRef]
Glánzel, W. A Characterization Theorem Based on Truncated Moments and Its Application to Some Distribution Families. In Mathematical Statistics and Probability Vol. B; Bauer, P., Konecny, F., Wertz, W., Eds.; D. Reidel Publishing Company: Dordrecht, The Netherlands, 1987; pp. 75–84. [Google Scholar]
Lindsay, B.G.; Li, B. On second-order optimality of the observed Fisher information. Ann. Stat. 1997, 25, 2172–2199. [Google Scholar] [CrossRef]
Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans. Autom. Control 1974, 9, 716–723. [Google Scholar] [CrossRef]
Hussain, T.; Bakouch, H.S.; Chesneau, C. A New Probability Model with Application to Heavy-Tailed Hydrological Data. Environ. Ecol. Stat. 2019, 26, 127–151. [Google Scholar] [CrossRef]
Murthy, D.N.P.; Xie, M.; Jiang, R. Weibull Models; John Wiley and Sons: Hoboken, NJ, USA, 2004. [Google Scholar]
Vuong, Q.H. Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef]
Maity, R. Statistical Methods in Hydrology and Hydroclimatology; Springer Nature Singapore Pte Ltd.: Singapore, 2018. [Google Scholar]
Aarset, M.V. How to Identify a Bathtub Hazard Rate. IEEE Trans. Reliab. 1987, 36, 106–108. [Google Scholar] [CrossRef]
Dasgupta, R. On the Distribution of. Burr with Applications. Sankhya B 2011, 73, 1–19. [Google Scholar] [CrossRef]
Dey, S.; Mazucheli, J.; Anis, M. Estimation of Reliability of Multicomponent Stress–strength for a Kumaraswamy Distribution. Commun. Stat. Theory Methods 2017, 46, 1560–1572. [Google Scholar] [CrossRef]
Dey, S.; Mazucheli, J.; Nadarajah, S. Kumaraswamy Distribution: Different Methods of Estimation. Comput. Appl. Math. 2018, 37, 2094–2111. [Google Scholar] [CrossRef]
ZeinEldin, R.A.; Chesneau, C.; Jamal, F.; Elgarhy, M. Different Estimation Methods for Type I Half-Logistic Topp–Leone Distribution. Mathematics 2019, 7, 985. [Google Scholar] [CrossRef]

Figure 1. Plots of the CDFs of the UED for varying parameters.

Figure 2. Plots of the PDFs of the UED for varying parameters.

Figure 3. Plots of the HRFs of the UED for varying parameters.

Figure 4. TTT plots of Datasets I and II.

Figure 5. Box plots for datasets I and II.

Figure 6. Datasets I and II (given by histograms) fitted via unit interval distributions (given by lines).

Figure 7. TTT plots of Datasets III and IV.

Figure 8. Box plots for Datasets III and IV.

Figure 9. Datasets III and IV (given by histograms) fitted via unit interval distributions (given by lines).

Table 1. Mean, bias, MSE, LCL, and UCL for Set-I.

Sample Size	Parameter	Estimate	Bias	MSE	LCL	UCL
n = 25	$α$	0.9417	−0.0439	0.0111	0.9335	0.9499
n = 25	$β$	0.2180	0.0013	0.00008	0.2172	0.2189
n = 50	$α$	0.9511	−0.0344	0.0068	0.9479	0.9544
n = 50	$β$	0.2189	0.0012	0.00007	0.2186	0.2193
n = 150	$α$	0.9655	−0.0200	0.0032	0.9648	0.9663
n = 150	$β$	0.2192	0.0014	0.00004	0.2192	0.2192
n = 350	$α$	0.9685	−0.0171	0.0022	0.9683	0.9688
n = 350	$β$	0.2194	0.0016	0.00003	0.2194	0.2194
n = 500	$α$	0.9729	−0.0126	0.0015	0.9728	0.9732
n = 500	$β$	0.2194	0.0016	0.00002	0.2194	0.2194

Table 2. Mean, bias, MSE, LCL, and UCL for Set-II.

Sample Size	Parameter	Estimate	Bias	MSE	LCL	UCL
n = 25	$α$	1.8547	−0.0438	0.0472	1.8376	1.8717
n = 25	$β$	0.3140	−0.0077	0.0004	0.3125	0.3155
n = 50	$α$	1.8902	−0.0084	0.0230	1.8843	1.8962
n = 50	$β$	0.3149	−0.0068	0.0003	0.3143	0.3156
n = 150	$α$	1.9204	0.0217	0.0117	1.9161	1.9246
n = 150	$β$	0.3167	−0.0050	0.0001	0.3163	0.3172
n = 350	$α$	1.9346	0.0359	0.0071	1.9341	1.9352
n = 350	$β$	0.3171	−0.0046	0.0001	0.3172	0.3172
n = 500	$α$	1.9337	0.0351	0.0063	1.9334	1.9342
n = 500	$β$	0.3171	−0.0047	0.00008	0.3170	0.3171

Table 3. Mean, bias, MSE, LCL, and UCL for Set-III.

Sample Size	Parameter	Estimate	Bias	MSE	LCL	UCL
n = 25	$α$	2.3256	−0.1133	0.0578	2.307	2.3445
n = 25	$β$	2.4995	−0.0149	0.0126	2.4908	2.5084
n = 50	$α$	2.3539	−0.0850	0.0316	2.3472	2.3620
n = 50	$β$	2.4871	−0.0274	0.0136	2.4826	2.4916
n = 150	$α$	2.3915	−0.0475	0.0123	2.3900	2.3929
n = 150	$β$	2.5121	−0.0023	0.0048	2.5113	2.5132
n = 350	$α$	2.4044	−0.0345	0.0065	2.4040	2.40491
n = 350	$β$	2.5155	0.0010	0.0028	2.5152	2.5158
n = 500	$α$	2.4051	−0.0338	0.0052	2.4048	2.4055
n = 500	$β$	2.5180	0.0035	0.0023	2.5179	2.5183

Table 4. Mean, bias, MSE, LCL, and UCL for Set-IV.

Sample Size	Parameter	Estimate	Bias	MSE	LCL	UCL
n = 25	$α$	0.4173	−0.0217	0.0024	0.4134	0.4212
n = 25	$β$	1.5168	0.0023	0.0037	1.5120	1.5215
n = 50	$α$	0.4251	−0.0138	0.0013	0.4237	0.4264
n = 50	$β$	1.5166	0.0021	0.0027	1.5146	1.5186
n = 150	$α$	0.4285	−0.0105	0.0007	0.4281	0.4288
n = 150	$β$	1.5205	0.0060	0.0015	1.5200	1.5210
n = 350	$α$	0.4314	−0.0076	0.0004	0.4313	0.4315
n = 350	$β$	1.5236	0.0091	0.0009	1.5234	1.5238
n = 500	$α$	0.4332	−0.0058	0.0003	0.4332	0.4333
n = 500	$β$	1.5203	0.0058	0.0009	1.5202	1.5203

Table 5. Descriptive statistics for Datasets I and II.

Dataset	SS	Mean	Median	SD	SK	KU
I	15	0.0598	0.0490	0.0277	−0.1083	1.6247
II	15	0.0402	0.0510	0.0277	0.1083	1.6247

Table 6. Theoretical statistics from the UED.

Dataset	SS	Mean	Median	SD	SK	KU
I	15	0.0606	0.0621	0.0254	−0.2107	2.3825
II	15	0.0406	0.0384	0.0247	0.2942	2.3050

Table 7. ML estimates and goodness-of-fit statistics for Dataset I.

Distribution	$\hat{β}$	$\hat{α}$	CVM $_{0}^{*}$	AD $_{0}^{*}$	KS	p-Value
UED	18.4218	0.0773	0.6239	0.1026	0.2079	0.5361
BD	3.8233	60.2492	0.6858	0.1041	0.2099	0.5232
KwD	719.3842	2.4408	0.6887	0.1109	0.2003	0.5844
JSBD	4.9859	1.7279	0.7751	0.1117	0.2128	0.5056
UGoMD	1.6525	0.0048	1.0587	0.1613	0.2353	0.3769

Table 8. Vuong test statistics for Datasets I and II.

Models	Dataset I	Suitability	Dataset II	Suitability
UED-BD	1.4601	UED	2.5935	UED
UED-KwD	0.9738	Indecisive	3.4585	UED
UED-JSBD	1.5427	UED	1.6793	UED
UED-UGoMD	2.2142	UED	1.5955	UED

Table 9. MLE and goodness-of-fit statistics for Dataset II.

Distribution	$\hat{β}$	$\hat{α}$	CVM $_{0}^{*}$	AD $_{0}^{*}$	KS	p-Value
UED	11.8676	0.4607	0.6239	0.1096	0.1960	0.6118
BD	1.5370	36.8071	0.6869	0.1199	0.2481	0.3142
KwD	78.9162	1.4011	0.7074	0.1224	0.2409	0.3487
JSBD	3.5837	1.0177	0.8112	0.1364	0.2619	0.2549
UGoMD	0.9497	0.0219	0.9011	0.1499	0.2386	0.3603

Table 10. Estimates of the maximum log-likelihood and information criteria for Dataset I.

Distribution	$- l$	AIC	AICC	BIC	HQIC	CAIC
UED	33.8617	−63.7233	−62.7233	−62.3072	−63.7384	−60.3072
BD	32.8026	−61.6052	−60.6052	−60.1891	−61.6203	−58.1891
KwD	33.3796	−62.7592	−61.7592	−61.3431	−62.7743	−59.3431
JSBD	32.0631	−60.1262	−59.1262	−58.7101	−60.1413	−56.7101
UGoMD	29.6463	−55.2925	−54.2925	−53.8764	−55.3076	−51.8764

Table 11. Estimates of the maximum log-likelihood and information criteria for Dataset II.

Distribution	$- l$	AIC	AICC	BIC	HQIC	CAIC
UED	35.2604	−66.5208	−65.5208	−65.1047	−66.5359	−63.1047
BD	34.1097	−64.2194	−63.2194	−62.8033	−64.2345	−60.8033
KwD	34.3392	−64.6784	−63.6784	−63.2623	−64.6935	−61.2623
JSBD	33.0448	−62.0896	−61.0896	−60.6735	−62.1047	−58.6735
UGoMD	31.1648	−58.3296	−57.3296	−56.9135	−58.3447	−54.9135

Table 12. Descriptive statistics for Datasets III and IV.

Dataset	SS	Mean	Median	SD	SK	KU
III	50	0.1632	0.1600	0.0810	0.0723	2.2166
IV	50	0.1520	0.1600	0.0785	0.0061	2.3012

Table 13. Theoretical statistics from the UED.

Dataset	SS	Mean	Median	SD	SK	KU
III	50	0.1633	0.1641	0.0809	0.0259	2.2511
IV	50	0.1519	0.1521	0.0777	0.0262	2.2521

Table 14. MLEs and goodness-of-fit statistics for Dataset III.

Distribution	$\hat{β}$	$\hat{α}$	CVM $_{0}^{*}$	AD $_{0}^{*}$	KS	p-Value
UED	4.7879	0.1756	0.3274	0.0419	0.1242	0.9881
BD	2.6824	13.8640	0.1538	0.9120	0.1414	0.5555
KwD	1.0746	0.0925	12.2879	2.3943	0.7222	0.0000
JSBD	2.3767	1.3175	0.2495	1.4647	0.1740	0.0968
UGoMD	0.0924	1.0747	0.5213	3.0810	0.2046	0.0304

Table 15. MLEs and goodness-of-fit statistics for Dataset IV.

Distribution	$\hat{β}$	$\hat{α}$	CVM $_{0}^{*}$	AD $_{0}^{*}$	KS	p-Value
UED	4.8518	0.1996	0.3224	0.0339	0.1239	0.9928
BD	2.4003	13.5218	0.2871	1.5649	0.1981	0.7340
KwD	1.9606	31.3769	0.2093	1.2683	0.1691	0.8825
JSBD	2.3682	1.2374	0.4145	2.2458	0.2285	0.5579
UGoMD	0.0916	1.0250	0.6091	3.4278	0.2312	0.5426

Table 16. Estimates of the maximum log-likelihood and information criteria for Dataset III.

Distribution	$- l$	AIC	AICC	BIC	HQIC	CAIC
UED	−57.0712	−110.142	−109.887	−106.318	−108.686	−104.318
BD	−54.6066	−105.213	−104.958	−101.389	−103.757	−99.3892
KwD	−56.0686	−108.137	−107.882	−104.313	−106.681	−102.313
JSBD	− 51.3231	−98.6462	−98.3909	−94.8222	−97.19	−92.8222
UGoMD	−40.672	−77.344	−77.0887	−73.52	−75.8878	−71.52

Table 17. Estimates of the maximum log-likelihood and information criteria for Dataset IV.

Distribution	$- l$	AIC	AICC	BIC	HQIC	CAIC
UED	−59.3536	−114.707	−114.452	−110.883	−113.251	−108.883
BD	−55.9312	−107.862	−107.607	−104.038	−106.406	−102.038
KwD	−57.5214	−111.043	−110.788	−107.219	−109.587	−105.219
JSBD	− 52.305	−100.61	−100.355	−96.786	−99.1538	−94.786
UGoMD	−42.6099	−81.2198	−80.9645	−77.3957	−79.7636	−75.3957

Table 18. Vuong test statistic for Datasets III and IV.

Models	Dataset III	Suitability	Dataset IV	Suitability
UED-BD	0.4137	Indecisive	3.5339	UED
UED-KwD	−2.3203	KwD	3.9633	UED
UED-JSBD	2.1336	UED	3.4202	UED
UED-UGoMD	4.9679	UED	4.0306	UED

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bakouch, H.S.; Hussain, T.; Tošić, M.; Stojanović, V.S.; Qarmalah, N. Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling. Mathematics 2023, 11, 4207. https://doi.org/10.3390/math11194207

AMA Style

Bakouch HS, Hussain T, Tošić M, Stojanović VS, Qarmalah N. Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling. Mathematics. 2023; 11(19):4207. https://doi.org/10.3390/math11194207

Chicago/Turabian Style

Bakouch, Hassan S., Tassaddaq Hussain, Marina Tošić, Vladica S. Stojanović, and Najla Qarmalah. 2023. "Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling" Mathematics 11, no. 19: 4207. https://doi.org/10.3390/math11194207

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling

Abstract

1. Introduction

2. The Proposed Unit Exponential Distribution

2.1. Properties of the Model

2.1.1. Quantile

2.1.2. Mode

2.1.3. Behavior of the PDF at $x \to 0^{+}$ and $x \to 1^{-}$

2.1.4. Moments

2.1.5. Failure (Hazard) Rate Function

2.2. Characterizations

3. Estimation and Simulation Procedures

Simulation Study

4. Model Compatibility and Its Application to Real-World Data

4.1. Measures of Goodness-of-Fit

4.2. Comparative Models

4.3. Environmental Datasets

4.4. Engineering Datasets

5. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Unit Exponential Probability Distribution: Characterization and Applications in Environmental and Engineering Data Modeling

Abstract

1. Introduction

2. The Proposed Unit Exponential Distribution

2.1. Properties of the Model

2.1.1. Quantile

2.1.2. Mode

2.1.3. Behavior of the PDF at x → 0 + and x → 1 −

2.1.4. Moments

2.1.5. Failure (Hazard) Rate Function

2.2. Characterizations

3. Estimation and Simulation Procedures

Simulation Study

4. Model Compatibility and Its Application to Real-World Data

4.1. Measures of Goodness-of-Fit

4.2. Comparative Models

4.3. Environmental Datasets

4.4. Engineering Datasets

5. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.1.3. Behavior of the PDF at $x \to 0^{+}$ and $x \to 1^{-}$