1. Introduction
The Weibull is a traditional distribution for positive real data. However, it does not accommodate data with unimodal hazard function or bathtub shape. Several modifications of the Weibull appeared to model non-monotone hazard rates, including the extended Weibull (EW) model [
1]. There are also many references regarding extensions in which one seeks to obtain hazard functions that are unimodal or bathtub shaped (see [
2,
3,
4], which provide a survey of the modified Weibull distributions). Most recently, refs. [
5,
6] defined the Maxwell-Weibull and the alpha power Kumaraswamy Weibull, respectively.
Two papers on EW distribution [
7,
8] have been most seminal in that they pioneered the development of distributions for bathtub-shaped hazard rates. Since the publication of these papers, many distributions and in particular other generalizations of the two-parameter Weibull distribution have been proposed, each allowing for non-monotone and bathtub-shaped hazard rates. It has been proven in the literature that the EW distribution provides significantly better fits than traditional models based on the exponential, gamma, Weibull and lognormal distributions. Thus, this is a central point to choose this distribution for the baseline model in this article.
The Weibull-G (W-G) class [
9] is still little explored when compared to other competitors. Some recently proposed distributions within this class are: Weibull–Dagum [
10], Weibull–Kumaraswamy [
11], Weibull Birnbaum–Saunders [
12], Weibull inverse Lomax [
13] and Weibull–Power Lomax [
14]. Recently, ref. [
15] addressed the Weibull–Beta Prime distribution.
Some works using influenza data are studied from a non-parametric point of view [
16] or by using logistic regression [
17] and functional data analysis [
18]. On the other hand, spatial regression [
19], machine learning models [
20], Markov chains [
21], and epidemiological models involving the fractal–fractional Caputo category [
22] have been used in studies with hepatitis data. Our main idea with applications to real data is to show the flexibility of the new distribution that adds one more parameter in the EW distribution as well as to the new log-Weibull extended Weibull (LWEW) regression model. As examples of the application of these models, we use time data (in days), which comprises the date of hospitalization until cure of influenza patients. To apply the LWEW regression model, a data set obtained from the literature of a study with hepatitis patients is used, in which the variable of interest is the time until death from hepatitis. The result “time until the occurrence of an event of interest” is the variable of interest in survival analysis studies, and one of the main characteristics of this type of study is censoring, i.e., the partial observation of the response. Furthermore, when considering the regression structure, we can analyze possible influences of characteristics of individuals in the sample under study on the response variable.
The three-parameter EW probability density function (pdf) of the random variable
X is
where
and
are the shapes, and
is the scale. The support of the EW distribution is
, and its
rth ordinary moment becomes
where
and
are the gamma and beta functions, respectively.
For lifetime models, it is of interest to know the
rth incomplete moment of
X, say
, which has the form
where
is the hypergeometric function defined by
and
is the incomplete gamma function.
We define the Weibull extended Weibull (WEW) distribution in
Section 2. The quantile function (qf) and linear representation are reported in
Section 3. Estimation by the maximum likelihood method is discussed in
Section 4. A simulation and a misspecification study are presented in
Section 5. We define the log-Weibull extended Weibull (LWEW) regression in
Section 6 and perform a simulation study for this model. Applications to influenza and hepatitis data are reported in
Section 7. Some conclusions are summarized in
Section 8.
2. The WEW Distribution
Consider the W-G class of distributions [
9] with scale
and shape
. By taking the pdf (
1) for the baseline in this class, the cumulative distribution function (cdf) and pdf of the WEW distribution become (for
)
respectively.
Henceforth, we change the notation and let
have pdf (
5). The WEW distribution has some special cases: the EW when
, W-Weibull (WW) when
, W-exponential (WE) when
and
.
Figure 1 and
Figure 2 report the densities and hazard rate functions (hrfs) for fixed parameters, respectively. Plots of the WEW hrf can be inverted bathtub, bathtub, monotonically increasing, and monotonically decreasing.
4. Estimation
Let
be a sample of size
n from (
5). The log-likelihood function for
from this sample reduces to
Equation (
10) for
gives the log-likelihood for the WW distribution. The maximum likelihood estimates (MLEs) can be found by maximizing
using the AdequecyModel library [
25] of the
R software; another option is the maxLik function via the maxLik library that provides a convenient interface for the MLEs [
26], or by the optim function by selecting an optimization method, for example, BFGS, CG, and SANN, and still finding the Hessian matrix. We also can maximize (
10) numerically using SAS (PROCNLMIXED) or the Ox program (sub-routine MaxBFGS), among others. The score components in
(for
) are reported in
Appendix A.
6. The LWEW Regression Model
If
X has the WEW pdf (
5), then
has the log-Weibull extended Weibull (LWEW) pdf (with real support) reparameterized in terms of
and
, which can be expressed as (for
)
where
and
. For
, we obtain the log-Weibull Weibull (LWW) model, where
is a location and
is a scale.
The survival function of
Y has the form
The density of
(for
) can be expressed as
We construct a regression based on the LWEW distribution
where
has pdf (
13),
is the vector of coefficients, and
is the vector of covariates for the
ith response
, which models the location parameter
.
Consider that
F and
C are groups of individuals that failed and are censored, respectively. The log-likelihood for
can be found from (
13) and (
14) as
where
q is the number of failures, and
. The MLE
of
can be found by maximizing (
15).
Regression Simulation Study
A simulation study was conducted using the BFGS algorithm in R to examine the accuracy of the MLEs of the LWEW regression model with parameters: , , , and . We considered 1000 Monte Carlo replications for , 50, and 100, and censoring percentages 0%, 10%, 30%, and 66% generated using the inverse transformation method. Occurrences of the Bernoulli distribution with success probability are generated to obtain the censored observations, where p is the percentage of censoring. The location parameter is , where .
The AEs, biases, and MSEs are reported in
Table 5. The biases and MSEs usually decrease when
n grows. By increasing the percentage of censoring for a fixed sample size, the biases and MSEs decrease for most AEs. Thus, an improvement in the accuracy of the estimators occurs.
Clearly, it is not possible to note the same behavior for b. This can be explained, probably, because the estimators are naturally biased since the likelihood function in the presence of censoring has the contribution of the survival function.
8. Conclusions
We introduced the Weibull extended Weibull density and provided some of its properties. The consistency of the maximum likelihood estimators is proven by a simulation study. An application to real influenza data revealed its flexibility. We constructed a regression model log-Weibull extended Weibull and performed some simulations to study the behavior of the estimators in small and large samples. We compared the fit to acute viral hepatitis data with other existing models and performed a residual analysis study for the final model. Overall, the two applications showed the utility of the new models for symmetric and asymmetric data, censored or uncensored. In future works, we can, for example, select other systematic components for the regression model and, as an alternative method, present the estimation of the model parameters from the Bayesian approach.