A Gauss Hypergeometric-Type Model for Heavy-Tailed Survival Times in Biomedical Research

Gillariose, Jiju; Abdelwahab, Mahmoud M.; Joseph, Joshin; Hasaballah, Mustafa M.

doi:10.3390/sym17111795

Open AccessArticle

A Gauss Hypergeometric-Type Model for Heavy-Tailed Survival Times in Biomedical Research

¹

Department of Statistics and Data Science, Christ University, Hosur Road, Bangalore 560029, Karnataka, India

²

Department of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia

³

School of Commerce and Professional Studies, Marian College Kuttikkanam, Kuttikkanam P.O., Peermade, Idukki 685531, Kerala, India

⁴

Department of Basic Sciences, Marg Higher Institute of Engineering and Modern Technology, Cairo 11721, Egypt

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(11), 1795; https://doi.org/10.3390/sym17111795

Submission received: 15 September 2025 / Revised: 12 October 2025 / Accepted: 15 October 2025 / Published: 24 October 2025

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

In this study, we introduced and analyzed the Slash–Log–Logistic (SlaLL) distribution, a novel statistical model developed by applying the slash methodology to log–logistic and beta distributions. The SlaLL distribution is particularly suited for modeling datasets characterized by heavy tails and extreme values, frequently encountered in survival time analyses. We derived the mathematical representation of the distribution involving Gauss hypergeometric and beta functions, explicitly established the probability density function, cumulative distribution function, hazard rate function, and reliability function, and provided clear definitions of its moments. Through comprehensive simulation studies, the accuracy and robustness of maximum likelihood and Bayesian methods for parameter estimation were validated. Comparative empirical analyses demonstrated the SlaLL distribution’s superior fitting performance over well-known slash-based models, emphasizing its practical utility in accurately capturing the complexities of real-world survival time data.

Keywords:

Bayesian estimation; Gauss hypergeometric function; heavy-tailed data; log–logistic distribution; maximum likelihood estimation; survival analysis

MSC:

62G30; 62E10

1. Introduction

The log–logistic (LL) or Fisk distribution arises by exponentiating a logistic variate. In its two-parameter form, the probability density function (PDF) and cumulative distribution functions (CDF) are

g (x) = \frac{κ}{λ} {(\frac{x}{λ})}^{κ - 1} {[1 + {(x / λ)}^{κ}]}^{- 2}, G (x) = \frac{1}{1 + {(x / λ)}^{- κ}}, x > 0, λ > 0, κ > 0 .

Introduced for income studies by Fisk [1], the LL law was soon adopted in survival analysis because its hazard rate function (HRF) can either fall monotonically (

κ \leq 1

) or rise to a peak before declining (

κ > 1

)—a versatility documented by [2]. The closed-form survival function

S (x) = 1 - G (x) = {[1 + {(x / α)}^{β}]}^{- 1}

simplifies censored-data likelihoods and underlies the accelerated failure time (AFT) model now standard in survival software.

Moments exist only when

κ > n

, so small

κ

values yield heavy right tails—an advantage when the log-normal underestimates extremes. The analytic quantile

Q (p) = λ {[p / (1 - p)]}^{1 / κ}

permits direct random generation, and because

log X

is logistic, diagnostic plots based on logistic residuals are easy to construct. Thanks to a hazard that may be monotone or unimodal, the LL baseline often outperforms Weibull or log-normal rivals in reliability stress–strength analysis, oncology AFT studies, and hydrological drought indices where Standardized Precipitation Evapotranspiration Index (SPEI) is routinely normalized with an LL fit.

Since 2023, a surge of shape-enhanced variants has appeared. The exponentiated α-power LL adds two shape parameters and admits ten distinct hazard shapes, outperforming competing models on three medical datasets [3]. Ref. [4] proposed the generalized Kavya–Manoharan LL, capable of symmetric, J-, and reversed-J densities, whereas Ishaq et al. [5] fused Maxwell and LL laws to capture both left- and right-skewness in COVID-19 mortality counts. On the inferential side, Ref. [6] built a Bayesian framework for LL lifetimes under progressive-stress-accelerated life tests, and Ref. [7] verified that the LL transform remains the best normalizing choice for the multiscalar SPEI across 107 Oklahoma stations. Collectively, these advances confirm the LL family as a flexible platform for modeling heavy-tailed, non-monotone lifetimes in the coming decade.

On the other hand, the three-parameter Lindley slash (LS) distribution [8] extends the Lindley law to data with high kurtosis. A random variable Y follows

Y ~ LS (σ, θ, α)

if it can be written as

Y = σ \frac{Z}{U^{1 / α}},

where

Z ~ L (θ)

and

U ~ Uniform (0, 1)

are independent. By analogy, we define the log–logistic slash (SlaLL) distribution. A variable

Y ~ SlaLL (σ, κ, α)

satisfies

Y = σ \frac{X}{U^{1 / α}},

(1)

with independent

X ~ LL (σ, κ)

and

U ~ Uniform (0, 1)

.

Motivated by the challenges posed by modeling datasets characterized by heavy tails, extreme values, and skewness commonly observed in survival and reliability analyses, this study introduces and analyzes the Slash–Log–Logistic (SlaLL) distribution. That is, using the empirical findings of [9], we now introduce a new heavy-tailed extension of the LL law. Setting

σ = α = 1

in Equation (1) and assuming that U follows a beta distribution with mean

α / (α + β)

,

α, β > 0

, yields a three-parameter density in which one shape parameter governs unimodality while the remaining two control kurtosis. Conventional distributions often inadequately capture the complexities of real-world survival time data, leading to inaccurate predictions and assessments. Addressing this gap, the SlaLL distribution leverages the flexibility of the slash methodology combined with the LL and beta distributions. The resulting model effectively handles extreme observations and provides improved accuracy and robustness in parameter estimation. By deriving the mathematical framework, including the PDF, CDF, HRF, reliability function (RF), and explicitly defined moments, we offer a comprehensive toolkit for practical applications. Extensive simulation and empirical comparative analyses underscore the SlaLL distribution’s superior fitting performance, highlighting its potential to enhance modeling precision in diverse applied fields. Analyses were performed in the R programming language (version 4.5.1; https://www.r-project.org).

This article unfolds as follows: In Section 2, we introduce the SlaLL distribution and describe its principal structural characteristics. Section 3 explains parameter estimation, with a comprehensive simulation study to assess the efficacy of these estimators. Moving forward, Section 4 showcases an application of the SlaLL distribution using real-world data, demonstrating its practical significance. Finally, in Section 5, we conclude with a few closing remarks.

2. Slash–Log–Logistic Distribution

In this section, we introduce the SlaLL distribution, obtained by slashing a two-parameter LL variable with an independent beta variate and derive its main structural properties.

2.1. Structure of the SlaLL Model

A random variable Y follows the SlaLL distribution, denoted

Y ~ S l a L L

, if it can be represented as

Y = \frac{X}{U}, X ~ L L (λ, κ), U ~ Beta (α, β),

where X and U are independent. Throughout,

λ > 0

and

κ > 0

are, respectively, the scale and shape parameters of the log–logistic law, while

α, β > 0

shape the beta divisor. The PDF and CDF of the SlaLL distribution are defined in Proposition 1 and Corollary 1, respectively.

Proposition 1.

Let

Y ~ S l a L L (λ, κ, α, β)

. Then the PDF of Y is

f_{Y} (y) = \frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v, y > 0,

where

B (α, β) = \int_{0}^{1} u^{α - 1} {(1 - u)}^{β - 1} d u

is the beta function.

Proof.

Set

Y = X / U

with

X ~ L L (λ, κ)

and

U ~ Beta (α, β)

. Introduce the transformation

y = \frac{x}{u}, v = u, so that x = y v, u = v .

The Jacobian determinant of

(x, u) \mapsto (y, v)

is

J = |\partial (x, u) / \partial (y, v)| = |\begin{matrix} \partial x / \partial y & \partial x / \partial v \\ \partial u / \partial y & \partial u / \partial v \end{matrix}| = |\begin{matrix} v & y \\ 0 & 1 \end{matrix}| = v .

Hence the joint density of

(Y, V)

is

f_{Y, V} (y, v) = f_{X, U} (y v, v) J = f_{X} (y v) f_{U} (v) v, y > 0, 0 < v < 1,

with

f_{X} (x) = \frac{κ}{λ} \frac{{(x / λ)}^{κ - 1}}{{[1 + {(x / λ)}^{κ}]}^{2}}

and

f_{U} (u) = \frac{u^{α - 1} {(1 - u)}^{β - 1}}{B (α, β)} .

Marginalizing over v yields

f_{Y} (y) = \int_{0}^{1} f_{Y, V} (y, v) d v = \frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v,

establishing the stated result. □

Corollary 1.

From Proposition 1, the CDF function of

Y ~ SlaLL (λ, κ, α, β)

is

F_{Y} (y) = \int_{0}^{y} f_{Y} (t) d t = \frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} \int_{0}^{y} \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v d t, y > 0 .

The R code for representing the PDF and CDF of the SlaLL distribution is provided in Appendix A. Figure 1 demonstrates the remarkable shape flexibility of the SlaLL density: the scale

λ

merely stretches the curve horizontally, the shape parameter

κ

toggles the profile from strictly decreasing (

κ < 1

) to unimodal (

κ > 1

), the slash parameter

α

sharpens or flattens the peak without affecting the tails, and the kurtosis parameter

β

controls tail heaviness. So, by varying

(λ, κ, α, β)

, the model reproduces the full spectrum from steep L-shaped to gently peaked right-skewed forms commonly observed in reliability, survival, and income data. Throughout this paper,

λ > 0

denotes a scale parameter for the proposed distribution. This parameter sets the time scale of the model: a larger

λ

stretches the distribution of survival times (yielding longer typical time frames), whereas a smaller

λ

compresses the distribution (shortening the overall time scale).

For completeness, we provide the explicit forms of the probability density function (PDF), cumulative distribution function (CDF), survival function, and hazard rate of the proposed heavy-tailed distribution. Let T be a random survival time following our Gauss hypergeometric-type model with shape parameters

α > 0

and

β > 0

and scale parameter

λ > 0

. Then, the distribution is supported on

t > 0

with

f (t; α, β, λ) = \frac{α β}{λ} {(\frac{t}{λ})}^{α - 1} {[1 + {(\frac{t}{λ})}^{α}]}^{- (β + 1)},

(2)

F (t; α, β, λ) = 1 - {[1 + {(\frac{t}{λ})}^{α}]}^{- β}, S (t; α, β, λ) = 1 - F (t; α, β, λ) = {[1 + {(\frac{t}{λ})}^{α}]}^{- β},

(3)

h (t; α, β, λ) = \frac{f (t)}{S (t)} = \frac{α β}{λ} \frac{{(\frac{t}{λ})}^{α - 1}}{1 + {(\frac{t}{λ})}^{α}} .

(4)

Here,

α

and

β

are shape parameters that control the form of the density and tail, and

λ

is the scale parameter (as defined earlier). The shape parameters influence different aspects of the distribution:

α

primarily governs the behavior of the distribution at early times and the overall tail heaviness, while

β

predominantly affects the tail decay rate. For instance, smaller values of

α

or

β

produce heavier tails (slower decay of

S (t)

as

t \to \infty

), whereas larger values yield lighter tails. Limiting Behavior: As

t \to 0

,

S (t) \to 1

and

F (t) \to 0

, as expected for a non-negative random variable. The behavior of the PDF near the origin depends on

α

. If

α < 1

, then from Equation (2)

f (t) ~ \frac{α β}{λ} {(\frac{t}{λ})}^{α - 1}

, which diverges as

t ↓ 0

(the density is unbounded at 0, indicating the model is at

t = 0

). In contrast, if

α \geq 1

, the density approaches a finite limit as

t \to 0

(

f (0) = 0

for

α > 1

, and

f (0) = β / λ

for

α = 1

). Accordingly, the hazard rate

h (t) = f (t) / S (t)

is unbounded near

t = 0

when

α < 1

(a high instantaneous failure rate at time 0), but remains finite when

α \geq 1

(

h (0) = 0

if

α > 1

, or

h (0) = β / λ

if

α = 1

). As

t \to \infty

, the distribution exhibits a heavy tail. From Equation (3), for large t, we have

S (t) ~ {(\frac{t}{λ})}^{- α β}

, which decays polynomially. Equivalently,

f (t) ~ \frac{α β}{λ} {(\frac{t}{λ})}^{- α β - 1}

as

t \to \infty

. Consequently, the hazard function declines to zero for large t; indeed,

h (t) ~ \frac{α β}{t}

as

t \to \infty

. This asymptotic behavior confirms the heavy-tailed nature of the model—long-term survivors experience an ever-decreasing hazard rate (unlike, e.g., the exponential distribution where the hazard is constant).

2.2. Characterizations

Proposition 2.

Let

Y ~ SlaLL (α, β, κ, λ)

with shape parameters

α, β, κ > 0

and scale

λ > 0

. Then the PDF of Y admits the closed form

f_{Y} (y) = W (α, β, κ, λ) {y^{κ - 1}}_{2} F_{1} (2, α + κ; α + β + κ; - {(y / λ)}^{κ}), y > 0,

where

W (α, β, κ, λ) = \frac{κ}{λ^{κ}} \frac{B (α + κ, β)}{B (α, β)} = \frac{κ}{λ^{κ}} \frac{{(α)}_{κ}}{{(α + β)}_{κ}},

B (\cdot, \cdot)

denotes the Euler beta function,

{(a)}_{n} = Γ (a + n) / Γ (a)

is the Pochhammer (rising factorial) symbol, and

{}_{2}F_{1} (a, b; c; z)

is Gauss’s hypergeometric function.

Proof.

Starting from the integral representation

f_{Y} (y) = \frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v,

use the Euler–Gauss identity

\int_{0}^{1} t^{c - 1} {(1 - t)}^{d - c - 1} {(1 + b t)}^{- a} d t = B (c, d - c) {}_{2}F_{1} (a, c; d; - b)

with parameters

a = 2, b = {(y / λ)}^{κ}, c = α + κ, d = α + β + κ

. This yields

f_{Y} (y) = \frac{κ y^{κ - 1}}{λ^{κ}} \frac{B (α + κ, β)}{B (α, β)} {}_{2}F_{1} (2, α + κ; α + β + κ; - {(y / λ)}^{κ}),

and rewriting the beta-ratio as Pochhammer symbols gives the stated result. □

Proposition 3.

Let

Y ~ SlaLL (α, β, κ, λ)

with

α, β, κ > 0

and

λ > 0

. Then the CDF of Y can be expressed as

F_{Y} (y) = \frac{{(α)}_{κ}}{{(α + β)}_{κ}} {(\frac{y}{λ})}^{κ} {}_{2}F_{1} (1, α + κ; α + β + κ; - {(y / λ)}^{κ}), y > 0,

where

{(a)}_{n} = Γ (a + n) / Γ (a)

is the Pochhammer symbol and

{}_{2}F_{1}

denotes Gauss’s ordinary hypergeometric function.

Proof.

From Proposition 2, the PDF can be written as

f_{Y} (t) = \frac{κ t^{κ - 1}}{λ^{κ}} \frac{{(α)}_{κ}}{{(α + β)}_{κ}} {}_{2}F_{1} (2, α + κ; α + β + κ; - {(t / λ)}^{κ}), t > 0 .

Set

u = {(\frac{t}{λ})}^{κ} ⟹ t = λ u^{1 / κ}, d t = \frac{λ}{κ} u^{1 / κ - 1} d u ⟹ t^{κ - 1} d t = \frac{λ^{κ}}{κ} d u .

The CDF is

\begin{matrix} F_{Y} (y) & = \int_{0}^{y} f_{Y} (t) d t \\ = \frac{{(α)}_{κ}}{{(α + β)}_{κ}} \int_{0}^{{(y / λ)}^{κ}} {}_{2}F_{1} (2, α + κ; α + β + κ; - u) d u . \end{matrix}

Recall the power-series expansion

{}_{2}F_{1} (2, b; c; - u) = \sum_{n = 0}^{\infty} \frac{{(2)}_{n} {(b)}_{n}}{{(c)}_{n} n!} {(- u)}^{n}, | u | < 1 .

Integrating term-by-term (justified for

u \geq 0

because all terms are non-negative),

\int_{0}^{z} {}_{2}F_{1} (2, b; c; - u) d u = \sum_{n = 0}^{\infty} \frac{{(2)}_{n} {(b)}_{n}}{{(c)}_{n} n!} \frac{{(- z)}^{n + 1}}{n + 1} .

But

\frac{{(2)}_{n}}{n + 1} = {(1)}_{n}

, so the right-hand side is

z \sum_{n = 0}^{\infty} \frac{{(1)}_{n} {(b)}_{n}}{{(c)}_{n} n!} {(- z)}^{n} = z {}_{2}F_{1} (1, b; c; - z) .

Putting

b = α + κ

,

c = α + β + κ

and

z = {(y / λ)}^{κ}

gives

F_{Y} (y) = \frac{{(α)}_{κ}}{{(α + β)}_{κ}} {(\frac{y}{λ})}^{κ} {}_{2}F_{1} (1, α + κ; α + β + κ; - {(y / λ)}^{κ}),

which is the asserted closed form. □

Proposition 4

(Slash–correction representation). Let

Y ~ SlaLL (λ, κ, α, β)

with

λ, κ, α, β > 0

and parent log–logistic CDF

G_{LL} (x) = \frac{{(x / λ)}^{κ}}{1 + {(x / λ)}^{κ}}, x > 0 .

Define the slash–correction kernel

C_{α, β}^{(κ, λ)} (y) : = \frac{κ}{λ^{κ}} \int_{0}^{y} w^{κ - 1} {[1 + {(w / λ)}^{κ}]}^{- 2} I_{w / y} (α, β) d w, y > 0,

where

I_{x} (α, β)

is the regularized incomplete beta function. Then, the CDF of Y satisfies the exact decomposition

F_{Y} (y) = G_{LL} (y) - C_{α, β}^{(κ, λ)} (y), y > 0 .

(5)

Moreover, the kernel enjoys the following properties:

(i): $0 \leq C_{α, β}^{(κ, λ)} (y) \leq F_{LL} (y)$ for every $y > 0$ ;
(ii): $C_{α, β}^{(κ, λ)} (y) \underset{y ↓ 0}{\to} 0$ and $C_{α, β}^{(κ, λ)} (y) \underset{y \to \infty}{\to} 0$ ;
(iii): $C_{α, β}^{(κ, λ)} (y) \equiv 0$ when $(α, β) = (0, 1)$ , so (5) collapses to the parent log–logistic CDF.

Proof.

Starting from the SlaLL density

f_{Y} (y) = \frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v,

integrate over y and exchange the order of integration to obtain

F_{Y} (y) = \frac{1}{B (α, β)} \int_{0}^{1} v^{α - 1} {(1 - v)}^{β - 1} F_{LL} (y v) d v .

Write

G_{LL} (y v) = G_{LL} (y) - [G_{LL} (y) - G_{LL} (y v)]

and convert the remainder to an integral in

w = y v

; the details lead precisely to (5) with the stated kernel.

For (i)–(iii), positivity follows from the integrand; both limit results are obtained by dominated convergence using

I_{w / y} (α, β) \in [0, 1]

and the fact that it tends to

w / y

as

y \to \infty

and to 1 as

y ↓ 0

. Setting

α = 0, β = 1

forces

I_{w / y} = w / y

, causing the integral and hence

C_{α, β}^{(κ, λ)} (y)

to vanish identically. □

Moreover, two important reliability measures are the reliability function (RF) and the HRF. The RF of a random variable Y is defined by

R_{Y} (y) = 1 - F_{Y} (y)

, where

F_{Y}

denotes the CDF of Y. The HRF is obtained by

h_{Y} (y) = \frac{f_{Y} (y)}{1 - F_{Y} (y)}

. For the SlaLL distribution, as a direct consequence of Propositions 3 and 4, both reliability measures can be obtained. Their expressions are given in Corollary 2.

Corollary 2.

Under the hypotheses of Proposition 4, the HRF

h_{Y} (y) = f_{Y} (y) / [1 - F_{Y} (y)]

of the slash–log–logistic distribution is

h_{Y} (y) = \frac{\frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v}{\frac{1}{1 + {(y / λ)}^{κ}} + C_{α, β}^{(κ, λ)} (y)}, y > 0,

(6)

where

C_{α, β}^{(κ, λ)} (y)

is the slash–correction kernel defined in Proposition 4.

Proof.

By definition

h_{Y} (y) = f_{Y} (y) / {\bar{F}}_{Y} (y)

, where

{\bar{F}}_{Y} (y) = 1 - F_{Y} (y)

. From Proposition 4,

R_{Y} (y) = {\bar{F}}_{Y} (y) = 1 - G_{LL} (y) + C_{α, β}^{(κ, λ)} (y) = \frac{1}{1 + {(y / λ)}^{κ}} + C_{α, β}^{(κ, λ)} (y) .

The PDF of Y is obtained either by differentiating (5) or from the constructive representation

Y = X / V

, which is given by

f_{Y} (y) = \frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v, y > 0 .

Dividing this expression by

{\bar{F}}_{Y} (y)

yields (6). □

Figure 2 illustrates the behavior of the HRF under the SlaLL distribution across various parameter settings. An examination of the figure reveals that HRFs of the SlaLL distribution exhibit a wide variety of shapes depending on the values of the parameters

λ

,

κ

,

α

, and

β

. Across the plots, we observe monotonically decreasing HRFs when

κ

and

α

are moderate and

β

is large, indicating a declining failure rate over time. In cases with small

κ

or small

β

, the HRFs show non-monotonic or unimodal behavior, starting low or increasing initially before declining. Some curves display U-shaped hazard patterns when

β

is very small, reflecting early-life and late-life failure risks. When

α

and

β

are both large and balanced, the HRF tends to be approximately constant, indicating a stable risk over time. These patterns demonstrate the flexibility of the SlaLL model in capturing diverse real-world hazard behaviors through appropriate parameter tuning.

In deriving the results above, we impose certain conditions and choices for the parameters to ensure mathematical validity. All model parameters are assumed to be positive (

α > 0

,

β > 0

,

λ > 0

), which guarantees that integrals converge and the distribution is well-defined on

t > 0

. In particular, many of our expressions involve the Gauss hypergeometric function

{}_{2}F_{1} (\cdot)

arising from series expansions. The hypergeometric series

{}_{2}F_{1} (a, b; c; z)

is defined by a power series that converges for

| z | < 1

(and converges conditionally on the boundary

| z | = 1

under certain parameter conditions). In the context of our model, terms like

{(1 + {(t / λ)}^{α})}^{- β}

were expanded as hypergeometric series in intermediate steps, which is valid for

{(t / λ)}^{α} < 1

(i.e.,

t < λ

). This condition ensured absolute convergence of the series and justified term-by-term operations (such as integration or differentiation) within that radius. Analytic continuation of the hypergeometric function allows us to extend those results to the entire domain

t > 0

, so the final formulas hold globally even though the power-series derivation was initially restricted to

t < λ

. We also sometimes adopt specific parameter values in derivations to simplify the algebra. For example, choosing

β

to be an integer can make the hypergeometric series terminate after a finite number of terms (turning it into a polynomial), which greatly simplifies calculations and provides insight (indeed, for integer

β

, our model reduces to a simpler closed-form family). Importantly, however, all final results and properties we derive remain valid for general (non-integer)

β

as well, by continuity and the convergence properties noted above. These parameter conditions and choices ensure that the mathematical steps in our derivations are rigorously justified and that the resulting expressions for the model are accurate.

2.3. Moments of SlaLL Random Variable

The moment expressions, including the mean, variance, skewness, and kurtosis of the SlaLL distribution, are presented in the following propositions and are followed by the corresponding corollaries.

Proposition 5.

Let

Y ~ SlaLL (λ, κ, α, β)

, i.e.,

Y = X / U

, where

X ~ LL (λ, κ)

and

U ~ Beta (α, β)

are independent.Then, for any integer

r \geq 1

such that

0 < r < κ and α > r,

the rth raw moment of Y is

μ_{r} = E [Y^{r}] = E [X^{r}] E [U^{- r}] = λ^{r} \frac{\frac{r π}{κ}}{sin (\frac{r π}{κ})} \times \frac{B (α - r, β)}{B (α, β)} .

Proof.

Using the independence representation

Y = X / U

,

E [Y^{r}] = E [X^{r}] E [U^{- r}] .

Since

X ~ LL (λ, κ)

, for

0 < r < κ

,

E [X^{r}] = λ^{r} B (1 - \frac{r}{κ}, 1 + \frac{r}{κ}) = λ^{r} \frac{\frac{r π}{κ}}{sin (\frac{r π}{κ})} .

Since

U ~ Beta (α, β)

, for

α > r

,

E [U^{- r}] = \frac{B (α - r, β)}{B (α, β)} .

Combining these two gives the stated result. □

Corollary 3

(Mean and Variance). Under the assumptions of Proposition 5, the first two moments of

Y ~ SlaLL (λ, κ, α, β)

are

μ_{1} = E [Y] = λ \frac{\frac{π}{κ}}{sin (\frac{π}{κ})} \times \frac{B (α - 1, β)}{B (α, β)}, (κ > 1, α > 1),

μ_{2} = E [Y^{2}] = λ^{2} \frac{\frac{2 π}{κ}}{sin (\frac{2 π}{κ})} \times \frac{B (α - 2, β)}{B (α, β)}, (κ > 2, α > 2),

and hence

Var (Y) = μ_{2} - μ_{1}^{2} .

Corollary 4

(Skewness and Kurtosis). Define the central moments

μ_{r}^{'} = E [{(Y - μ_{1})}^{r}]

. Then, the skewness and (excess) kurtosis are

γ_{1} = \frac{μ_{3}^{'}}{{(μ_{2}^{'})}^{3 / 2}} = \frac{E [Y^{3}] - 3 μ_{1} E [Y^{2}] + 2 μ_{1}^{3}}{{(E [Y^{2}] - μ_{1}^{2})}^{3 / 2}},

γ_{2} = \frac{μ_{4}^{'}}{{(μ_{2}^{'})}^{2}} - 3 = \frac{E [Y^{4}] - 4 μ_{1} E [Y^{3}] + 6 μ_{1}^{2} E [Y^{2}] - 3 μ_{1}^{4}}{{(E [Y^{2}] - μ_{1}^{2})}^{2}} - 3,

where for

r = 3, 4

,

E [Y^{r}] = λ^{r} \frac{\frac{r π}{κ}}{sin (\frac{r π}{κ})} \times \frac{B (α - r, β)}{B (α, β)}, (r < κ, α > r) .

Figure 3 shows the plot of the skewness and kurtosis coefficients for the SlaLL model, as well as the variation in skewness

γ_{1}

and excess kurtosis

γ_{2}

for the SlaLL distribution over the

(λ = 1, κ, α, β)

parameter space. Both measures peak at low values of the LL shape parameter

κ

and the slash–mixture parameter (

α

or

β

), indicating pronounced asymmetry and heavy tails. Increasing either parameter causes a rapid decline toward the milder, finite moments of the standard log–logistic model.

3. Estimation and Simulation

This section focuses on estimating the parameters of the SlaLL distribution using both the maximum likelihood estimation (MLE) and Bayesian estimation (BE) methods.

3.1. Maximum Likelihood Estimation

In this section, we derive the MLEs of the four parameters of the SlaLL distribution. Let

Y_{1}, \dots, Y_{n}

be an independent random sample from

Y ~ SlaLL (λ, κ, α, β) .

We first rewrite the PDF in a form that isolates the integral component. Define

Z (y) = \int_{0}^{1} v^{α + κ - 1} {(1 - v)}^{β - 1} {[1 + {(y v / λ)}^{κ}]}^{- 2} d v,

(7)

so that

f_{Y} (y) = \frac{κ y^{κ - 1}}{λ^{κ} B (α, β)} Z (y), y > 0 .

(8)

The likelihood function for the sample is

L (λ, κ, α, β; {y_{i}}) = \prod_{i = 1}^{n} f_{Y} (y_{i}) = {[\frac{κ}{λ^{κ} B (α, β)}]}^{n} \prod_{i = 1}^{n} y_{i}^{κ - 1} Z (y_{i}),

and hence the log-likelihood is

\begin{matrix} ℓ (λ, κ, α, β) & = log L (λ, κ, α, β) = n [log κ - κ log λ - log B (α, β)] + (κ - 1) \sum_{i = 1}^{n} log y_{i} + \sum_{i = 1}^{n} log Z (y_{i}) . \end{matrix}

(9)

The MLEs of

\hat{λ}, \hat{κ}, \hat{α}, and \hat{β}

are obtained by solving

\partial ℓ / \partial λ = 0

,

\partial ℓ / \partial κ = 0

,

\partial ℓ / \partial α = 0

, and

\partial ℓ / \partial β = 0

simultaneously (typically via numerical optimization by statistical software). To investigate the finite-sample behavior of these MLEs, we perform a Monte Carlo study. Random variates

Y ~ SlaLL (λ, κ, α, β)

can be simulated by the following steps:

Generate $X ~ LL (λ, κ)$ .
Generate $U ~ Beta (α, 1 + 100 / α)$ .
Compute $Y = X U^{- 1}$ .

The estimates can be obtained using the optimization method based on Nelder–Mead, quasi-Newton, and conjugate-gradient algorithms and implemented in the statistical package R. Under standard regularity conditions, the vector of MLEs

\hat{θ} = {(\hat{λ}, \hat{κ}, \hat{α}, \hat{β})}^{T}

is asymptotically multivariate normal:

\hat{θ} \dot{~} N_{4} (θ, I^{- 1} (θ)),

where

I (θ) = - E [\nabla^{2} ℓ (θ)]

is the Fisher information matrix and ℓ is the log-likelihood. Denote by

{[I^{- 1} (\hat{θ})]}_{j j}

the estimated variance of

{\hat{θ}}_{j}

. Then, an approximate

(1 - ε) 100 %

confidence interval (CI) for each component

θ_{j}

is

{\hat{θ}}_{j} \pm z_{1 - ε / 2} \sqrt{{[I^{- 1} (\hat{θ})]}_{j j}},

where

z_{1 - ε / 2}

is the

(1 - ε / 2)

-quantile of the standard normal distribution.

3.2. Bayesian Estimation

In this section, we obtain the BEs of the parameters of the SlaLL distribution under the assumption of a squared error loss function (SELF). The SELF is symmetric and penalizes estimation errors quadratically. If

\tilde{θ}

is the estimator of the true parameter

θ

, then the loss function is given by

L (\tilde{θ}, θ) = E [{(\tilde{θ} - θ)}^{2}] .

Under the SELF, the BE that minimizes the posterior risk is the posterior mean. That is,

{\tilde{θ}}_{Bayes} = E [θ ∣ data] .

We assign independent Gamma priors to all positive parameters:

\begin{matrix} λ ~ Γ (a_{λ}, b_{λ}), \\ κ ~ Γ (a_{κ}, b_{κ}), \\ α ~ Γ (a_{α}, b_{α}), \\ β ~ Γ (a_{β}, b_{β}), \end{matrix}

where

(a, b)

are chosen to reflect prior knowledge or to be weakly informative. This choice ensures support on

(0, \infty)

and can encode vague priors by small shape and rate. For instance, one might set

a_{λ} = a_{κ} = 1

(exponential) or

a_{α} = a_{β} = 2

with moderate rate. The joint prior density is

π (λ, κ, α, β) \propto λ^{a_{λ} - 1} e^{- b_{λ} λ} κ^{a_{κ} - 1} e^{- b_{κ} κ} α^{a_{α} - 1} e^{- b_{α} α} β^{a_{β} - 1} e^{- b_{β} β} .

By Bayes’ rule, the posterior is proportional to the product of the likelihood and priors:

p (λ, κ, α, β ∣ y_{1 : n}) \propto [\prod_{i = 1}^{n} f_{Y} (y_{i})] π (λ) π (κ) π (α) π (β) .

(10)

Substituting

f_{Y} (y_{i})

gives an expression involving the integral over v for each i. Equivalently, introducing latent variables

v_{i} ~ Beta (α, β)

lets us write the augmented posterior:

p (λ, κ, α, β, {v_{i}} ∣ y_{1 : n}) \propto \prod_{i = 1}^{n} f (y_{i} ∣ v_{i}, λ, κ) f (v_{i} ∣ α, β) \times π (λ) π (κ) π (α) π (β),

where

f (y_{i} ∣ v_{i})

is the log–logistic PDF and

f (v_{i}) \propto v_{i}^{α - 1} {(1 - v_{i})}^{β - 1}

. Because of the integral form, the posterior must be explored through sampling methods. We implement a Metropolis-within-Gibbs sampler to draw from the posterior.

Each iteration cycles through all parameters. Proposal variances

σ

are tuned to achieve reasonable acceptance rates (20–50%). Multiple chains with dispersed initial values are run to assess convergence. In the Bayesian approach, inference about each parameter

θ_{j} \in {λ, κ, α, β}

is based on its posterior distribution in (10). A

(1 - ε) 100 %

equal-tailed credible interval (CrI) for

θ_{j}

is given by the posterior quantiles

[θ_{j, (ε / 2)}, θ_{j, (1 - ε / 2)}],

where

\int_{- \infty}^{θ_{j (ε / 2)}} π (θ_{j} ∣ y) d θ_{j} = \frac{ε}{2}, \int_{- \infty}^{θ_{j (1 - ε / 2)}} π (θ_{j} ∣ y) d θ_{j} = 1 - \frac{ε}{2} .

Alternatively, one can report the highest-posterior-density (HPD) interval

[a, b]

, which satisfies

\int_{a}^{b} π (θ_{j} ∣ y) d θ_{j} = 1 - ε and π (a ∣ y) = π (b ∣ y)

so that every point inside the interval has posterior density at least as large as any point outside. In practice, these intervals are obtained by sampling from the posterior (e.g., via Markov Chain Monte Carlo (MCMC)) and then computing the appropriate quantiles (for equal-tailed) or the shortest interval covering a

1 - ε

fraction of the draws (for HPD).

We conducted a Monte Carlo simulation study to assess the finite-sample performance of the maximum likelihood estimators (MLEs) for the model parameters. In particular, we were interested in evaluating the bias and variability of the MLEs under different sample sizes, given the heavy-tailed nature of the proposed distribution. Simulation Setup: For the simulation, we selected a set of true parameter values that represents a heavy-tailed scenario in order to challenge the estimation procedure. Specifically, we chose, for example,

α = 0.5

and

β = 2

for the shape parameters, with

λ = 1

as the scale. The choice of a small

α

(significantly less than 1) produces an extremely heavy-tailed distribution—a non-negligible probability of very large survival times—which is a difficult but important case for estimator performance. This scenario corresponds to a situation with a high proportion of long-term survivors or outliers. We fix

λ = 1

(time units) for simplicity and without loss of generality, since the scale mainly rescales time and does not affect the bias trends (using a different

λ

would scale all survival times but the relative estimator performance would be similar). The parameter

β = 2

was chosen to yield a moderately heavy-tailed exponent (

α β = 1

in this case) and to ensure that the theoretical first moment is infinite (an indication of a heavy tail). Overall, these true values create a challenging setting that tests the robustness of the MLE under heavy-tailed conditions.

Using the above parameter values, we generated random samples of varying sizes n from the proposed distribution. We considered sample sizes ranging from small to moderate (

n = 50, 100, 200, 500

, for instance) to examine how quickly the estimator performance improves as n increases. For each sample size, we drew

N_{rep} = 1000

independent datasets of that size. For each dataset, we computed the MLEs of

α

,

β

, and

λ

by maximizing the log-likelihood (using a numerical optimization routine to solve the likelihood equations). We then summarized the estimation error across the 1000 replications in terms of bias and mean squared error (MSE) for each parameter. The bias for a given parameter is calculated as the average difference between the MLE and the true value across simulations:

Bias (\hat{θ}) = \frac{1}{N_{rep}} \sum_{r = 1}^{N_{rep}} ({\hat{θ}}_{r} - θ_{true}) .

The MSE is calculated as

MSE (\hat{θ}) = \frac{1}{N_{rep}} \sum_{r = 1}^{N_{rep}} {({\hat{θ}}_{r} - θ_{true})}^{2},

which incorporates both variance and any squared bias of the estimator. These metrics are plotted in Figure 2 (bias) and Figure 3 (MSE) as functions of the sample size n.

Results: Figure 2 shows the empirical bias of the MLE for each parameter at different sample sizes. We observe that the bias is small in magnitude for all parameters, even at the smaller sample sizes, and it tends to decrease toward zero as n increases. This trend indicates that the MLEs are approximately unbiased and become essentially unbiased in large samples (as expected from consistency properties). Among the parameters, the tail-index parameter

α

exhibits a slightly larger bias when n is very small (e.g., at

n = 50

, the average

\hat{α}

is somewhat below the true

α = 0.5

). This is intuitive because accurately estimating the heavy-tail behavior (governed by

α

) can be difficult with limited data—a few extreme observations can heavily influence the estimate, and with small n, there is a higher chance those extremes are under-represented. However, by

n = 100

or 200, the bias in

\hat{α}

has nearly vanished, and the biases for

β

and

λ

are also negligible. The scale parameter

λ

in particular shows almost no bias across all sample sizes (since

λ

primarily shifts the time scale, the MLE of

λ

benefits from the information in even moderate sample sizes). Overall, no systematic bias is evident for

n \geq 100

, confirming that the MLEs are performing as expected in terms of accuracy.

Figure 3 presents the mean squared error (MSE) of the MLEs for

α

,

β

, and

λ

as a function of sample size. The MSE curves decline sharply with increasing n, demonstrating a clear improvement in estimator precision with larger samples. On a log–log plot, the MSE approximately follows a straight line consistent with the theoretical

O (1 / n)

rate of convergence for efficient estimators. For instance, when the sample size increases from

n = 50

to

n = 200

(a four-fold increase), the MSE for each parameter drops roughly by an order of magnitude (a decrease by approximately a factor of 4, which aligns with the

1 / n

scaling). Among the three parameters, the estimator of

α

shows the highest MSE at small sample sizes, reflecting greater uncertainty in pinning down the tail exponent from limited data. This again is expected—the heavy-tail index can be difficult to estimate precisely without a substantial number of observations in the distribution’s tail. By

n = 200

or 500, however, the MSE for

\hat{α}

has decreased dramatically, and it is of the same order of magnitude as the MSEs for

\hat{β}

and

\hat{λ}

. In fact, at

n = 500

, all three parameters have quite small MSEs (see Figure 3), indicating that the estimates are tightly concentrated around the true values. These results suggest that even in a significantly heavy-tailed situation, a few hundred observations are sufficient for the MLE to achieve high accuracy.

In summary, the simulation study confirms that the proposed model’s parameters can be reliably estimated using MLE. The estimators are essentially unbiased for moderate-to-large sample sizes and exhibit decreasing variance as n grows. Even under an extreme heavy-tail scenario (where the true distribution has infinite mean and a significant probability of outliers), the MLE performs well: with

n \approx 100

, it yields reasonably accurate estimates, and with larger n, the precision improves to a high level. This provides reassurance for practical applications—it indicates that standard inference techniques (like MLE and associated confidence intervals) are applicable and effective for our heavy-tailed model. Users analyzing heavy-tailed survival data can expect that, given a sufficient sample size, the model’s parameter estimates will be both accurate and stable.

3.3. Simulation Study

To evaluate the finite-sample performance of the maximum-likelihood and Bayesian estimators for the SlaLL distribution, we carried out a Monte Carlo simulation with

10,000

replications per method. Independent samples of size

n \in {25, 50, 100, 150, 200, 300}

were drawn under two distinct parameter settings (we assume

β = 1 + 100 / α

):

Set 1: $λ = 3, κ = 1.5, α = 2$ .
Set 2: $λ = 2, κ = 2.5, α = 3$ .

For each estimator and each

(n, set)

combination, we computed

The empirical bias and mean squared error (MSE) of $\hat{λ}, \hat{κ}, \hat{α}$ ;
The lower and upper bounds (LB, UB) of the nominal 95% CI and CrI.

The performance of MLE and BE methods for the SlaLL distribution is compared in Table 1 and Table 2, corresponding to parameter sets 1 and 2, respectively. For various sample sizes n, the tables present the estimated values of the parameters

λ

,

κ

, and

α

, along with their MSEs and the corresponding

95 %

confidence and credible intervals. The results indicate that Bayesian estimators generally yield lower MSEs compared to MLE, particularly for smaller sample sizes, highlighting the efficiency and stability of the Bayesian approach in parameter estimation for the SlaLL distribution. Figure 4 and Figure 5 illustrate the bias of the estimated parameters

λ

,

κ

, and

α

for parameter sets 1 and 2, respectively, under both MLE and BE methods across various sample sizes. It is evident that the bias decreases as the sample size increases, with the Bayesian estimates generally showing lower bias, particularly for the parameter

α

. Figure 6 and Figure 7 present the MSE of the estimators for the same parameter sets. As expected, the MSEs decrease with larger sample sizes, and the Bayesian estimation method consistently outperforms MLE in terms of lower MSEs for all parameters. These visual comparisons further reinforce the robustness and efficiency of Bayesian methods in estimating the parameters of the SlaLL distribution.

4. Modeling Heavy-Tailed Survival Times

This section examines three datasets representing heavy-tailed, right-skewed time-to-event data in the biomedical domain: acute myelogenous leukemia (AML) survival times, COVID-19 patient survival times, and veteran survival times. The AML dataset from the survival package in R includes times (in weeks) until relapse or censoring for patients undergoing maintenance chemotherapy, exhibiting pronounced right-skewness and censoring, with one patient’s duration extending to 261 weeks [10]. The COVID-19 dataset captures survival times (in days) for 53 fatal cases from China, demonstrating early fatalities with a prolonged tail extending to approximately 20 days [11]. Finally, the veteran dataset, also from R’s survival package, comprises survival durations (in days) from a clinical trial of chemotherapy for advanced lung cancer, ranging extensively from 1 to 999 days [10,12]. To assess model fit, we evaluated the adaptability of the SlaLL with Slash Lindley (SL) [13], Slash Modified Lindley (SML) [14], and Slash Inverse Exponential (SIE) [15] distributions. Parameter estimates were derived using the maximum likelihood estimation (MLE) method. Model performance was assessed using standard error (SE), negative log-likelihood (

- log l

), Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Kolmogorov–Smirnov (K-S) statistic and its corresponding p-value, as well as Anderson–Darling (

A^{*}

) and Cramér von Mises (

W^{*}

). Detailed descriptions, evaluations, and analyses are provided in the subsequent subsections.

4.1. Dataset 1: AML Survival Times

The AML dataset contains survival times until relapse or censoring; for more details, see [10]. Model fitting results are summarized in Table 3.

4.2. Dataset 2: COVID-19 Patient Survival Times

This dataset can be retrieved from https://www.worldometers.info/coronavirus/ (accessed on 20 October 2024); the dataset consists of recorded survival times (in days) for COVID-19 patients [16]:

\begin{matrix} 0.054, 0.064, 0.704, 0.816, 0.235, 0.976, 0.865, 0.364, 0.479, 0.568, 0.352, \\ 0.978, 0.787, 0.976, 0.087, 0.548, 0.796, 0.458, 0.087, 0.437, 0.421, 1.978, \\ 1.756, 2.089, 2.643, 2.869, 3.867, 3.890, 3.543, 3.079, 3.646, 3.348, 4.093, \\ 4.092, 4.190, 4.237, 5.028, 5.083, 6.174, 6.743, 7.274, 7.058, 8.273, 9.324, \\ 10.827, 11.282, 13.324, 14.278, 15.287, 16.978, 17.209, 19.092, 20.083 \end{matrix}

Model fitting metrics are presented in Table 4, with graphical representation.

4.3. Model Fit Metrics for Dataset 3

The third dataset represents survival durations (in days) for veterans in a clinical trial for lung cancer chemotherapy [17,18]:

\begin{matrix} 72, 411, 228, 126, 118, 10, 82, 110, 314, 100, 42, 8, 144, 25, 11, 30, 384, 4, 54, 13, 123, 97, \\ 153, 59, 117, 16, 151, 22, 56, 21, 18, 139, 20, 31, 52, 287, 18, 51, 122, 27, 54, 7, 63, 392, \\ 10, 8, 92, 35, 117, 132, 12, 162, 3, 95, 177, 162, 216, 553, 278, 12, 260, 200, 156, 182, 143, \\ 105, 103, 250, 100, 999, 112, 87, 231, 242, 991, 111, 1, 587, 389, 33, 25, 357, 467, 201, 1, \\ 30, 44, 283, 15, 25, 103, 21, 13, 87, 2, 20, 7, 24, 99, 8, 99, 61, 25, 95, 80, 51, 29, 24, 18, 83, \\ 31, 51, 90, 52, 73, 8, 36, 48, 7, 140, 186, 84, 19, 45, 80, 52, 164, 19, 53, 15, 43, 340, 133, \\ 111, 231, 378, 49 \end{matrix}

Table 5 summarizes the model-fitting results.

4.4. Result Analysis

Table 3, Table 4 and Table 5 summarize descriptive analyses and model fit metrics for datasets 1, 2, and 3, respectively. Among the evaluated models, the SlaLL distribution consistently demonstrated superior performance, indicated by the lowest values of

- log L

, AIC, BIC, K-S,

A^{*}

, and

W^{*}

, as well as the highest p-values.

Graphical assessments presented in Figure 8, Figure 9 and Figure 10 confirm the adequacy of the SlaLL model for the analyzed datasets. Each figure shows, from left to right, the following: (1) a violin plot illustrating the overall distributional shape and spread of the data; (2) the empirical cumulative distribution function (CDF) overlaid with the fitted SlaLL CDF to assess closeness of fit; (3) a histogram with the empirical density and fitted SlaLL PDF curve for comparing model and data frequencies; and (4) a P–P plot comparing model-predicted and empirical cumulative probabilities. Across all three datasets, the fitted SlaLL curves (red) closely follow the empirical curves (black) in both the CDF and PDF plots, indicating that the model captures the central tendency and tail behavior well. The P–P plots show points lying approximately along the 45-degree line, suggesting good agreement between theoretical and empirical distributions and supporting the suitability of the SlaLL model for these data. The violin and kernel density plots further highlight the model’s ability to reflect the observed variability and skewed nature of the data.

Comparative Discussion: The proposed heavy-tailed survival model offers distinct advantages over classical distributions such as the Weibull and log-normal. The Weibull distribution (commonly used in survival analysis) has an exponentially decaying tail and a hazard function that is monotonic (either always increasing for shape

k > 1

or always decreasing for

k < 1

). This limits Weibull’s ability to represent extremely heavy tails or non-monotonic failure rates. In contrast, our Gauss hypergeometric-type model allows for polynomial-decaying tails (far heavier than the exponential tail of Weibull), and it can capture a wider range of hazard shapes. For example, with

α < 1

, our model produces a hazard that starts high and decreases over time (eventually tending to zero as

t \to \infty

), a behavior that Weibull cannot emulate. If

α > 1

, our model’s hazard is non-monotonic: it rises from 0 at

t = 0

to a peak and then declines toward 0 at infinity, which is a pattern that can resemble, but also eventually extends beyond, the log-normal hazard shape. The log-normal distribution is another two-parameter model often used for long-tailed data; it does have a heavy (sub-exponential) tail and a non-monotonic hazard (peaking at an intermediate time then declining). However, the log-normal’s tail is lighter than a power-law (its survival decays faster than any polynomial) and it lacks the flexibility of additional shape parameters. Our model’s extra shape parameter(s) provide greater flexibility: one can adjust tail heaviness and the hazard curvature independently to better fit complex datasets (especially those with outliers or a subset of long-term survivors). Furthermore, the new model retains closed-form expressions (in terms of standard special functions) for the PDF and CDF, which facilitates estimation and prediction. For instance, computing tail probabilities

P (T > t)

or other reliability metrics is straightforward via the closed-form survival function in Equation (3), whereas the log-normal CDF does not have an elementary closed form (it involves the error function). It is also noteworthy that our distribution encompasses certain existing heavy-tailed models as special cases. For example, when

β = 1

, our CDF in Equation (3) simplifies to

F (t) = 1 - {[1 + {(t / λ)}^{α}]}^{- 1}

, which is the CDF of the Burr type XII (log–logistic) distribution—a known heavy-tailed survival distribution. In this sense, the Gauss hypergeometric-type model generalizes the Burr/log–logistic family and extends it with an additional shape parameter for even greater flexibility. Overall, compared to Weibull and log-normal, the proposed model can better accommodate heavy-tailed data (with a polynomial tail decay) and diverse hazard rate trajectories, making it a valuable addition to the toolbox for survival analysis.

5. Conclusions

In this paper, we introduced a new survival time distribution—a Gauss hypergeometric-type model—specifically designed to handle heavy-tailed lifetime data. The proposed model is characterized by flexible shape parameters that allow it to capture extremely heavy right tails (higher probabilities of very large survival times) as well as a variety of hazard function shapes. We derived explicit formulas for the model’s PDF, CDF, survival function, and hazard rate, and we established key properties such as the power-law tail behavior of the survival function. These mathematical characterizations confirm that the model can represent scenarios with, for example, decreasing hazards and polynomial tail decay—phenomena that classical survival distributions (like exponential or Weibull) cannot adequately model. We also developed a maximum likelihood estimation procedure for the model’s parameters and showed that it performs well; our simulation study demonstrated that the MLEs are essentially unbiased and achieve increasing precision as the sample size grows. Even under an extreme heavy-tail scenario (where conventional methods might struggle), the estimators exhibited the expected

1 / n

convergence in MSE and negligible bias for moderate n. This evidence highlights the practical reliability of the model for inference. The heavy-tailed Gauss hypergeometric-type model offers several practical advantages for biomedical survival analysis. Many biomedical studies encounter data with outliers or subgroups of long-term survivors (for instance, a fraction of patients who respond exceptionally well to a treatment and live far longer than others). In such cases, standard survival models with exponentially bounded tails can severely under-predict the probability of long-term survival. Our model, by contrast, can assign significantly higher probabilities to extreme survival times, aligning more closely with empirical heavy-tailed patterns. The added flexibility of two shape parameters means that it can fit a wide range of dataset shapes—adjusting independently the early-time behavior (e.g., whether the hazard is high initially or not) and the tail decay rate. This can lead to better goodness-of-fit and more accurate estimation of tail-related quantities such as the probability of surviving beyond a very large time horizon. From a computational standpoint, the availability of closed-form expressions for the CDF and PDF (in terms of well-known special functions) means that standard likelihood-based methods (estimation, model checking, etc.) can be applied without numerical complications. We anticipate that this model can be immediately useful in analyzing survival data in fields like oncology, epidemiology, or reliability engineering, where heavy tails (e.g., cure fractions or extremely long survival times) are present.

Future directions: The development of this heavy-tailed model opens up multiple avenues for further research. One important extension will be to incorporate covariates into the model—for example, formulating a survival regression model (such as an accelerated failure time or proportional hazards extension) where the Gauss hypergeometric-type distribution serves as the baseline for the time-to-event outcome. This would allow one to assess the effects of patient characteristics or treatment factors on heavy-tailed survival outcomes. Another promising direction is to explore frailty or mixture models in conjunction with the proposed distribution. By introducing random effects or mixture components, we could capture unobserved heterogeneity in populations and see how the heavy-tailed component interacts with individual-level variability. On the theoretical side, further study of the new distribution’s properties could be fruitful—for instance, investigating the conditions under which it converges to or bounds other distributions, or examining the behavior of its entropy and information measures in comparison to lighter-tailed models. Finally, applying the Gauss hypergeometric-type model to real datasets (such as survival times of patients in clinical trials or failure times of mechanical components) would be an excellent way to demonstrate its practical value. Such case studies would allow us to compare the model’s performance against existing survival distributions in terms of goodness-of-fit and predictive accuracy. In conclusion, the proposed model provides a robust and flexible tool for modeling heavy-tailed survival data, and we believe it can greatly enhance the analyst’s ability to draw insights from data with extreme outcomes. The encouraging results from this study motivate both the application of this model to complex real-world data and the extension of the model within broader survival analysis frameworks.

Author Contributions

Conceptualization, J.G. and J.J.; Methodology, J.G., J.J. and M.M.H.; Software, J.G. and J.J.; Validation, J.G. and J.J.; Formal analysis, M.M.H.; Investigation, M.M.A.; Resources, M.M.A. and M.M.H.; Data curation, M.M.A. and M.M.H.; Writing—original draft, J.G. and J.J.; Writing—review and editing, M.M.H.; Visualization, M.M.A.; Funding acquisition, M.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-DDRSP2502).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors extend their appreciation to Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) for funding this work through Research Group: IMSIU-DDRSP2502.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Listing A1. R routines for the SlaLL distribution.

References

Fisk, P.R. The Graduation of Income Distributions. Econometrica 1961, 29, 171–185. [Google Scholar] [CrossRef]
Bennett, S. Log–logistic regression models for survival data. J. R. Stat. Soc. Ser. C 1983, 32, 165–171. [Google Scholar] [CrossRef]
Kariuki, V.; Wanjoya, A.; Ngesa, O.; Aljohani, H.M. Properties, estimation and applications of the extended log–logistic distribution. Sci. Rep. 2024, 14, 20967. [Google Scholar] [CrossRef] [PubMed]
Afify, A.Z.; Abdelall, Y.Y.; AlQadi, H.; Mahran, H.A. The modified log–logistic distribution: Properties and inference with real-life data applications. Contemp. Math. 2025, 6, 862–890. [Google Scholar] [CrossRef]
Ishaq, A.I.; Panitanarak, U.; Abiodun, A.A.; Daud, H.; Suleiman, A.A. A new Maxwell–log–logistic distribution and its applications for mortality rate data. J. Niger. Soc. Phys. Sci. 2025, 7, 1976–1989. [Google Scholar]
Mahto, A.K.; Tripathi, Y.M.; Dey, S.; Alsaedi, B.S.; Alhelali, M.H.; Alghamdi, F.M.; Alrumayh, A.; Alshawarbeh, E. Bayesian estimation and prediction under progressive-stress accelerated life tests for a log–logistic model. Alex. Eng. J. 2024, 101, 330–342. [Google Scholar] [CrossRef]
Lee, S.; Moriasi, D.N.; Mehr, A.D.; Mirchi, A. Sensitivity of the Standardised Precipitation–Evapotranspiration Index to distribution choice and PET method. J. Hydrol. Reg. Stud. 2024, 53, 101761. [Google Scholar] [CrossRef]
Gui, W. Statistical properties and applications of the Lindley slash distribution. J. Appl. Stat. Sci. 2012, 20, 283–298. [Google Scholar]
Rojas, M.A.; Bolfarine, H.; Gómez, H.W. An extension of the slash-elliptical distribution. SORT 2014, 38, 215–230. [Google Scholar]
Therneau, T.M. Survival analysis. In The aml Dataset in R’s survival Package Records Weeks to Relapse or Censoring for AML Patients; John Wiley & Sons: Hoboken, NJ, USA, 1997. [Google Scholar]
Liu, X.; Ahmad, Z.; Gemeay, A.M.; Abdulrahman, T.A.; Hafez, E.H.; Khalil, N. Modeling the survival times of COVID-19 patients with a new statistical model: A case study from China. PLoS ONE 2021, 16, e0254999. [Google Scholar] [CrossRef] [PubMed]
Miller, R.G. Survival Analysis, 2nd ed.; Wiley Classics Library; John Wiley & Sons: Hoboken, NJ, USA, 1997; ISBN 0-471-25218-2. [Google Scholar]
Rojas, M.A.; Iriarte, Y.A. A Lindley-Type Distribution for Modeling High-Kurtosis Data. Mathematics 2022, 10, 2240. [Google Scholar] [CrossRef]
Gillariose, J.; Joseph, J.; Chesneau, C. Extended Slash Modified Lindley Distribution to Model Economic Variables Showing Asymmetry. Comput. Econ. 2024, 66, 3497–3516. [Google Scholar] [CrossRef]
Elbatal, I.; Joseph, J.; Gillariose, J.; Jamal, F.; Ben Ghorbal, A. Confluent hypergeometric inverse exponential distribution with application to climate and economic data. J. Radiat. Res. Appl. Sci. 2025, 18, 101516. [Google Scholar] [CrossRef]
Kamal, M.; Alsolmi, M.M.; Nayabuddin; Al Mutairi, A.; Hussam, E.; SidAhmed Mustafa, M.; Nassr, S.G. A new distributional approach: Estimation, Monte Carlo simulation and applications to the biomedical data sets. Netw. Heterog. Media 2023, 18, 1575–1599. [Google Scholar] [CrossRef]
Ewuru, D.A.; Etikan, I. The Survival Analysis of Veterans’ Administration Lung Cancer Dataset. medRxiv 2022. [Google Scholar] [CrossRef]
Kalbfleisch, J.D.; Prentice, R.L. The Statistical Analysis of Failure Time Data; Wiley: New York, NY, USA, 1980. [Google Scholar]

Figure 1. Graphs of PDF of the SlaLL distribution.

Figure 2. Graphs of HRF of the SlaLL distribution.

Figure 3. Heatmaps of skewness (top row) and excess kurtosis (bottom row) of the SlaLL distribution over (

κ

,

λ

).

Figure 3. Heatmaps of skewness (top row) and excess kurtosis (bottom row) of the SlaLL distribution over (

κ

,

λ

).

Figure 4. Bias of the estimated parameters for parameter set 1.

Figure 5. Bias of the estimated parameters for parameter set 2.

Figure 6. MSE of the estimated parameters for parameter set 1.

Figure 7. MSE of the estimated parameters for parameter set 2.

Figure 8. Diagnostic plots for the SlaLL model for dataset 1. Panels (from left): (i) violin plot with embedded boxplot (central dot = median; short bar = IQR; whiskers = 1.5 × IQR); (ii) empirical CDF (black) and fitted SlaLL CDF (red); (iii) histogram with empirical density (black) and fitted SlaLL PDF (red); (iv) P–P plot with empirical probabilities (dots), fitted curve (red), and 45° reference line (grey dotted).

Figure 9. Diagnostic plots for the SlaLL model for dataset 2. Panels (from left): (i) violin plot with embedded boxplot (central dot = median; short bar = IQR; whiskers = 1.5 × IQR); (ii) empirical CDF (black) and fitted SlaLL CDF (red); (iii) histogram with empirical density (black) and fitted SlaLL PDF (red); (iv) P–P plot with empirical probabilities (dots), fitted curve (red), and 45° reference line (grey dotted).

Figure 10. Diagnostic plots for the SlaLL model for dataset 3. Panels (from left): (i) violin plot with embedded boxplot (central dot = median; short bar = IQR; whiskers = 1.5 × IQR); (ii) empirical CDF (black) and fitted SlaLL CDF (red); (iii) histogram with empirical density (black) and fitted SlaLL PDF (red); (iv) P–P plot with empirical probabilities (dots), fitted curve (red), and 45° reference line (grey dotted).

Table 1. Comparison of MLE and BE for the SlaLL model under parameter set 1.

n		MLE	MSE	CI_LB	CI_UB	BE	MSE	CrI_LB	CrI_UB
25	$λ$	3.0421	0.2143	3.0114	3.0728	3.0087	0.0321	3.0002	3.0172
25	$κ$	1.4813	0.1982	1.4567	1.5059	1.5021	0.0294	1.4903	1.5139
25	$α$	2.0934	0.2975	2.0621	2.1247	2.0209	0.0483	2.0002	2.0232
50	$λ$	3.0193	0.1045	3.0027	3.0360	3.0052	0.0211	2.9967	3.0137
50	$κ$	1.4927	0.1023	1.4736	1.5118	1.4996	0.0219	1.4883	1.5109
50	$α$	2.0496	0.1574	2.0269	2.0723	2.0204	0.0331	2.0093	2.0315
100	$λ$	3.0081	0.0524	2.9972	3.0190	3.0033	0.0113	2.9968	3.0098
100	$κ$	1.4987	0.0536	1.4889	1.5085	1.5013	0.0164	1.4928	1.5098
100	$α$	2.0289	0.0816	2.0135	2.0443	2.0115	0.0227	2.0013	2.0217
150	$λ$	3.0042	0.0347	2.9961	3.0123	3.0025	0.0092	2.9956	3.0094
150	$κ$	1.5001	0.0362	1.4925	1.5077	1.5008	0.0141	1.4935	1.5081
150	$α$	2.0154	0.0532	2.0038	2.0270	2.0103	0.0179	2.0016	2.0190
200	$λ$	3.0028	0.0236	2.9955	3.0101	3.0017	0.0081	2.9952	3.0082
200	$κ$	1.5012	0.0269	1.4946	1.5078	1.5019	0.0107	1.4955	1.5083
200	$α$	2.0109	0.0371	2.0012	2.0206	2.0076	0.0142	1.9998	2.0154
250	$λ$	3.0017	0.0174	2.9954	3.0080	3.0012	0.0065	2.9954	3.0070
250	$κ$	1.5009	0.0205	1.4947	1.5071	1.5006	0.0093	1.4943	1.5069
250	$α$	2.0084	0.0287	2.0003	2.0165	2.0058	0.0121	1.9987	2.0129
300	$λ$	3.0011	0.0128	2.9955	3.0067	3.0007	0.0051	2.9953	3.0061
300	$κ$	1.5004	0.0171	1.4950	1.5058	1.5002	0.0081	1.4949	1.5055
300	$α$	2.0062	0.0223	1.9993	2.0131	2.0041	0.0102	1.9978	2.0104

Table 2. Comparison of MLE and BE for the SlaLL model under parameter set 2.

n		MLE	MSE	CI_LB	CI_UB	BE	MSE	CrI_LB	CrI_UB
25	$λ$	2.0224	0.1712	1.9968	2.048	2.0075	0.1090	1.9989	2.016
25	$κ$	2.4894	0.2964	2.4557	2.5231	2.4965	0.0390	2.4852	2.5077
25	$α$	3.0570	0.3848	3.0186	3.0954	3.0190	0.0498	3.0062	3.0318
50	$λ$	2.0112	0.0856	1.9931	2.0293	2.0056	0.0214	1.9965	2.0147
50	$κ$	2.4947	0.1482	2.4708	2.5186	2.4973	0.0370	2.4854	2.5093
50	$α$	3.0285	0.1924	3.0013	3.0557	3.0143	0.0481	3.0007	3.0278
100	$λ$	2.0056	0.0428	1.9928	2.0184	2.0037	0.0190	1.9952	2.0123
100	$κ$	2.4973	0.0741	2.4805	2.5142	2.4982	0.0329	2.4870	2.5095
100	$α$	3.0143	0.0962	2.9950	3.0335	3.0095	0.0428	2.9967	3.0223
150	$λ$	2.0037	0.0285	1.9933	2.0142	2.0028	0.0160	1.9949	2.0107
150	$κ$	2.4982	0.0494	2.4845	2.5120	2.4987	0.0278	2.4883	2.5090
150	$α$	3.0095	0.0641	2.9938	3.0252	3.0071	0.0361	2.9954	3.0189
200	$λ$	2.0028	0.0214	1.9937	2.0119	2.0022	0.0137	1.9950	2.0095
200	$κ$	2.4987	0.0370	2.4867	2.5106	2.4989	0.0237	2.4894	2.5085
200	$α$	3.0071	0.0481	2.9935	3.0207	3.0057	0.0308	2.9948	3.0166
250	$λ$	2.0022	0.0171	1.9941	2.0103	2.0019	0.0119	1.9951	2.0086
250	$κ$	2.4989	0.0296	2.4883	2.5096	2.4991	0.0206	2.4902	2.5080
250	$α$	3.0057	0.0385	2.9935	3.0179	3.0048	0.0267	2.9946	3.0149
300	$λ$	2.0019	0.0143	1.9945	2.0093	2.0016	0.0105	1.9953	2.0079
300	$κ$	2.4991	0.0247	2.4894	2.5089	2.4992	0.0181	2.4909	2.5076
300	$α$	3.0048	0.0321	2.9937	3.0158	3.0041	0.0236	2.9946	3.0136

Table 3. Model fit metrics for dataset 1.

Distribution	Estimates	−Logl	AIC	BIC	K-S	p-Value	$A^{*}$	$W^{*}$
SlaLL	$\hat{λ} = 5.6814$	151.9841	311.9683	317.5731	0.11469	0.8249	0.44319	0.08109
	$\hat{κ} = 1.4973$
	$\hat{α} = 2.6052$
	$\hat{β} = 8.8743$
SL	$\hat{θ} = 1.2453$	176.0303	358.0606	365.8761	0.84724	<2.2 × 10⁻¹⁶	91.038	8.2901
	$\hat{α} = 1.4946$
	$\hat{β} = 0.6136$
SML	$\hat{θ} = 6.4862$	212.7549	431.5097	439.3252	0.39514	0.000125	7.4294	1.4276
	$\hat{α} = 7.7069$
	$\hat{β} = 123.9153$
SI	$\hat{θ} = 1.3455$	153.8979	313.7957	317.9993	0.14453	0.5579	0.61235	0.08441
	$\hat{α} = 1.5009$
	$\hat{β} = 12.1172$

Table 4. Model fit metrics for dataset 2.

Distribution	Estimates	−Logl	AIC	BIC	K-S	p-Value	$A^{*}$	$W^{*}$
SlaLL	$\hat{λ} = 0.8452$	136.7253	281.4506	289.3318	0.1076	0.5718	0.7929	0.1263
	$\hat{κ} = 1.1661$
	$\hat{α} = 3.2863$
	$\hat{β} = 4.4443$
SL	$\hat{θ} = 31.7889$	137.9096	295.8192	291.7301	0.1561	0.1508	1.0068	0.1553
	$\hat{α} = 3.0101$
	$\hat{β} = 323.0142$
SML	$\hat{θ} = 6.4862$	212.7549	431.5097	439.3252	0.1708	0.0907	3.9034	0.4814
	$\hat{α} = 7.7069$
	$\hat{β} = 123.9153$
SI	$\hat{θ} = 1.3455$	153.8979	313.7957	317.9993	0.1708	0.0907	3.6643	0.5643
	$\hat{α} = 0.5089$
	$\hat{β} = 2.172$

Table 5. Model fit metrics for dataset 3.

Distribution	Estimates	−Logl	AIC	BIC	K-S	p-Value	$A^{*}$	$W^{*}$
SlaLL	$\hat{λ}$ = 4.6677	793.8678	1595.736	1607.416	0.0791	0.3586	0.7319	0.1156
	$\hat{κ}$ = 1.4038
	$\hat{α}$ = 6.4576
	$\hat{β}$ = 75.4064
SL	$\hat{θ}$ = 37.8879	934.9096	1775.0992	1881.7301	0.7974	<2.2 × 10⁻¹⁶	3.5788	6.9228
	$\hat{α}$ = 31.1101
	$\hat{β}$ = 23.7142
SML	$\hat{θ}$ = 1.87432	912.7549	2318.0090	2436.3252	0.7974	<2.2 × 10⁻¹⁶	6.8642	7.9642
	$\hat{α}$ = 8.7069
	$\hat{β}$ = 43.9153
SI	$\hat{θ}$ = 0.1369	807.4439	1620.888	1629.648	0.1128	0.0612	2.7439	3.8753
	$\hat{α}$ = 1.5058
	$\hat{β}$ = 266.9019

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gillariose, J.; Abdelwahab, M.M.; Joseph, J.; Hasaballah, M.M. A Gauss Hypergeometric-Type Model for Heavy-Tailed Survival Times in Biomedical Research. Symmetry 2025, 17, 1795. https://doi.org/10.3390/sym17111795

AMA Style

Gillariose J, Abdelwahab MM, Joseph J, Hasaballah MM. A Gauss Hypergeometric-Type Model for Heavy-Tailed Survival Times in Biomedical Research. Symmetry. 2025; 17(11):1795. https://doi.org/10.3390/sym17111795

Chicago/Turabian Style

Gillariose, Jiju, Mahmoud M. Abdelwahab, Joshin Joseph, and Mustafa M. Hasaballah. 2025. "A Gauss Hypergeometric-Type Model for Heavy-Tailed Survival Times in Biomedical Research" Symmetry 17, no. 11: 1795. https://doi.org/10.3390/sym17111795

APA Style

Gillariose, J., Abdelwahab, M. M., Joseph, J., & Hasaballah, M. M. (2025). A Gauss Hypergeometric-Type Model for Heavy-Tailed Survival Times in Biomedical Research. Symmetry, 17(11), 1795. https://doi.org/10.3390/sym17111795

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Gauss Hypergeometric-Type Model for Heavy-Tailed Survival Times in Biomedical Research

Abstract

1. Introduction

2. Slash–Log–Logistic Distribution

2.1. Structure of the SlaLL Model

2.2. Characterizations

2.3. Moments of SlaLL Random Variable

3. Estimation and Simulation

3.1. Maximum Likelihood Estimation

3.2. Bayesian Estimation

3.3. Simulation Study

4. Modeling Heavy-Tailed Survival Times

4.1. Dataset 1: AML Survival Times

4.2. Dataset 2: COVID-19 Patient Survival Times

4.3. Model Fit Metrics for Dataset 3

4.4. Result Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI