1. Introduction
The zero-inflated gamma (ZIG) distribution is suitable for fitting data comprising both non-negative and zero observations: the proportion of zero values is binomially distributed while the positive values follow a gamma distribution with shape and rate parameters. Point and interval estimation and hypothesis testing are the two basic methods used in probability and statistical inference to estimate a model parameter. The CI is the most popular interval estimate method, and numerous researchers have concentrated on the CI for the ZIG distribution. Meanwhile, Kaewprasert et al. [
1] broadened the scope by comparing the difference between the means of two ZIG distributions using fiducial method, Bayesian methods, and highest posterior density (HPD). Wang et al. [
2] created CIs for the mean of a ZIG distributions based on fiducial inference, parametric bootstrap (PB), and the method of variance estimates recovery (MOVER). Khooriphan et al. [
3] proposed Bayesian estimation of rainfall dispersion in Thailand using ZIG distributions. Khooriphan et al. [
4] proposed CIs for the ratio of variance of a ZIG distributions using fiducial quantities, Bayesian credible intervals, and HPD intervals. Muralidharan and Kale [
5] proposed CIs for the mean of a modified gamma distribution with singularity at zero.
Because of this, the mean is the most widely used unit for measuring central tendency. It is possible to estimate the means from several populations by simultaneously comparing the pairwise differences between their CIs for this parameter provided that each population is independently and identically distributed (i.i.d.). If we compare two populations using the difference between their means, this difference is probably going to be small, and thus firm and conclusive inference is difficult. Hence, when investigating multiple populations, simultaneously comparing the ratios of the means is more accurate than the differences between the means. Meanwhile, Ren et al. [
6] provided simultaneous CIs for the difference between the means of several ZIG distributions based on the fiducial approach. Wang et al. [
7] proposed CIs for the difference between the means of two gamma populations. Maneerat et al. [
8] constructed Bayesian CIs for a single mean and the difference between two means of delta-lognormal distributions. Maneerat et al. [
9] created simultaneous CIs for the difference between the means of several delta-lognormal distributions based on a PB, a fiducial generalized CI (GCI), the MOVER, and Bayesian credible intervals. Malekzadeh and Kharrati-Kopaei [
10] constructed simultaneous CIs for the pairwise quantile differences of several heterogeneous two-parameter exponential distributions. Jana and Gautam [
11] proposed CIs of difference and ratio of means for zero-adjusted inverse Gaussian distributions using MOVER and Bayesian approaches. Long et al. [
12] suggested population mean ratio estimators that used either the first or third quartiles of the auxiliary variable. Indeed, Maneerat and Niwitpong [
13] created CIs for the ratio of the means of two delta-lognormal distributions using Bayesian credible intervals, fiducial GCI, and MOVER. Zhang et al. [
14] created simultaneous CIs for the ratios of the means of several zero-inflated log-normal distributions using fiducial method and the MOVER. Therefore, datasets of daily rainfall from the six regions in September 2021 were selected. These data comprise positive values that conform to a gamma distribution rather than a lognormal distribution. However, creating simultaneous CIs for the ratios of the means of several ZIG distributions has not yet been reported. Moreover, the applicability of using simultaneous CIs for the ratios of the means of rainfall datasets from several regions that fit ZIG distributions is also an interesting research topic.
In this study, we constructed simultaneous CIs for the ratio of the means of several ZIG populations (k > 2), and we used k = 3 or 6 to estimate the ratio of the means of natural rainfall datasets from six regions in Thailand during September at the height of the rainy season. The fiducial GCI approach, Bayesian, and HPD interval methods based on the Jeffreys’rule or uniform prior, and the MOVER were used to construct simultaneous CIs in this study. The study of Ren et al. [
6] served as our inspiration for adopting the fiducial approach to construct simultaneous CIs, while the use of several priors by Maneerat et al. [
9] served as our inspiration for developing simultaneous CIs for disparities in the HPD interval and the MOVER. These studies motivated our contribution to this research area of creating simultaneous CIs based on our suggested techniques to clarify the pairwise ratios between the means of multiple ZIG distributions. We calculated the pairwise ratios of the means of daily rainfall records from the Northern, Northeastern, Central, Eastern, Western, and Southern regions of Thailand as a practical demonstration. Importantly, this method could be applied to identify and foretell natural disasters in a specific region
The rest of this paper is organized as follows. In
Section 2, we provide the methodologies for the methods to estimate the simultaneous CIs for the ratios of the means of multiple ZIG populations. In
Section 3 and
Section 4, we conduct simulation studies and analyze a rainfall dataset from six regions in Thailand. Finally, a discussion and conclusions are offered in
Section 5 and
Section 6, respectively.
2. Materials and Methods
For
k populations of observations, the probability of observing a zero response is represented
in the
ith group, while the nonzero observations fit a gamma distribution. For sample
,
randomly generated from a ZIG distribution, the
is derived as
where
is the probability density function (pdf) of the gamma distribution with shape parameter
and rate parameter
, and
. The probability of containing zero observations follows binomial distribution denoted as
, while
, where
and
are the numbers of zero and nonzero values, respectively.
Krishnmoorthy et al. [
15] and Krishnmoorthy and Wang [
16] showed that
can be transformed by using the cube-root approximation. As a result,
follows a normal distribution with the mean and variance respectively given by
Since
is the mean of a gamma distribution,
and
can be respectively rewritten to yield
Thus,
is the mean of a gamma distribution and
, where
, is the mean of a ZIG distribution.
The simultaneous CIs for the ratios of the means of several ZIG populations are what we are interested in creating, and so
where
and
.
One can respectively replace , and with their maximum likelihood estimators as follows: , and . Thus .
Similarly, the simultaneous CIs for the ratios of the means of several ZIG populations can be defined as
2.1. The Fiducial GCI Method
Hannig et al. [
17] first introduced the fiducial generalized pivotal quantity (GPQ), a subclass of the GPQ, to construct the simultaneous fiducial approach. Let
,
be a random sample from a ZIG distribution with parameter of interest
across
k independent samples and assume that
,
represents
observations. The GPQ of
is referred to as a fiducial GPQ if it satisfies the following two requirements:
The conditional distribution is parameter-free for each .
The observed value of at , is the parameter of interest.
From
,
and
are the sample mean and variance of
, respectively, where
Z and
are standard normal and Chi-squared distributions with
degrees of freedom, respectively. By replacing
with
and estimating
and
from the sample mean and variance, respectively, we obtain
Accordingly, the respective fiducial GPQs for
,
and
are
and
Subsequently, the fiducial GPQ of
is simply
Therefore, the fiducial GPQ for the ratios of the means of several ZIG distributions can be written as
Hence, the two-sided simultaneous CI for based on the fiducial GCI method can be written as , where and are the th and th quantiles of , respectively.
2.2. The Bayesian Methods
The joint likelihood function of
k independent ZIG distributions can be obtained from the distribution of
, for
, with the unknown parameters
,
, and
, as follows:
The Fisher information matrix of the unknown parameters can be represented as the second-order partial derivative of the log-likelihood function with respect to the unknown parameters:
The Jeffreys’ rule and uniform priors used to construct equal-tailed simultaneous CIs and simultaneous HPD intervals are covered in the following subsections.
2.2.1. The Jeffreys Rule Prior
The square root of the determinant of the Fisher information matrix is used to calculate the Jeffreys rule prior. It is common knowledge that gamma and binomial distributions comprise a ZIG distribution. From the mean
, the parameters of interest are
,
, and
; Harvey and Van Der Merwe [
18] used the Jeffreys rule prior for these parameters as
and
, respectively.
The joint posterior density function can be expressed as the likelihood function and the prior distribution of a ZIG distribution as follows:
where
and
.
The respective posterior distributions of
,
, and
are obtained using integration as
and
As indicated by
,
, and
, respectively,
follows a normal distribution,
follows an inverse gamma distribution, and
follows a beta distribution. The result is that
,
, and
can be replaced, resulting in
Therefore, the
equal-tailed simultaneous CI and simultaneous HPD intervals for
based on the Bayesian method are
, where
and
are the lower and upper bounds of the intervals, respectively. We computed
and
using the
package in the R software package to determine the
simultaneous HPD intervals for
.
2.2.2. The Uniform Prior
Bolstad and Curran [
19] proposed that the uniform priors of
,
and
are 1 (
and
, respectively) because the uniform prior has a constant function for the prior probability. Subsequently,
is the uniform prior for a ZIG distribution for which the joint posterior density function is
where
and
.
The respective posterior distributions of
,
, and
are obtained using integration as
and
Thus, the posterior distributions are , , and , respectively.
To construct the equal-tailed simultaneous CI and simultaneous HPD intervals,
,
and
can be substituted into Equation (
1).
2.3. Method of Variance Estimates Recovery (MOVER)
First introduced by Donner and Zou [
20], the MOVER approach is applied to construct the
two-sided simultaneous CI for
, for which
, where
and
are the lower and upper bounds of the interval, respectively expressed as
and
for
and
.
The parameters of interest in
are
,
, and
, for which it is possible to construct CIs. From Hannig’s [
21] paper on the fiducial GPQ of
in Equation (
4), the
CI for
can be written as
where
and
are the
-th and
-th quantiles of
, respectively.
By using the CI definitions for parameters
and
in Equations (
2) and (
3), respectively, we can define the
CI for
as
where
and
Thus, the
CI for
can be written as
where
and
By ensuring that
and
;
,
follow a standard normal distribution, the
MOVER interval for
becomes
Similarly, we can obtain
. Therefore, the
two-sided simultaneous CI for
based on the MOVER method can be obtained at
, for
and
. This process is specified in Algorithm 1.
Algorithm 1 All six methods. |
Begin loop M. Generate , with sample size from . Perform cube-root transformation on nonzero observations and estimate , , and . Get and by computing the parameter.
- (a)
Fiducial GCI: compute , , , , and . - (b)
Bayesian and HPD based on Jeffreys rule prior: compute , , , , and . - (c)
Bayesian and HPD based on uniform prior: compute , , , , and . - (d)
MOVER: compute , , , , , , , , , , and .
Repeat steps (3) and (4) a total m () times. Compute the % simultaneous CI for .
- (a)
Fiducial GCI: compute and using Equation ( 5). - (b)
Bayesian based on Jeffreys rule prior: compute and using Equation ( 6). - (c)
HPD based on Jeffreys rule prior: using Equation ( 6) to compute by utilizing the package. - (d)
Bayesian based on uniform prior: compute and . - (e)
HPD based on uniform prior: compute and . - (f)
MOVER: Compute the simultaneous CIs based on MOVER using Equations ( 7) and ( 8).
End loop M.
|
3. Simulation Study
We conducted simulation studies to assess how well the proposed methods perform with finite samples using the following requirements:
The coverage probabilities and expected lengths are derived as
where
is the number of
that is contained in the interval,
and
are the lower and upper bounds of the interval respectively, and
M is the total number of simulations that were run for the study.
For each scenario, the best-performing CI has a coverage probability above or close to the nominal confidence level (0.95) and the shortest expected length. The performances of the proposed methods were compared via a Monte Carlo simulation study carried out with the aid of the R statistical software suite. For each set of parameters, 5000 iterations of the simulations were run. In addition, for each parameter combination, 2000 replications of the fiducial and Bayesian methods were performed.
Figure 1 show a flowchart for the simulation study. The chosen sample sizes were 30, 50, or 100. As reported in
Table 1 and
Table 2, we used 12 parameter settings for
,
, and
with
or
.
5. Discussion
We applied the approach laid out by Kaewprasert et al. [
1] who generated CIs for the mean and the difference between the means of several ZIG distributions by using the fiducial GCI and Bayesian and HPD interval methods. The optimal approach was discovered to be the HPD interval based on the Jeffreys rule prior. In addition, by utilizing fiducial GCI, we expanded Zhang et al. [
14] method for constructing simultaneous CIs for distributions containing some zero observations. In the present study, we used the fiducial GCI, Bayesian, HPD interval, and MOVER approaches to construct CIs to compare the means of multiple ZIG distributions via simulation studies and using real rainfall datasets containing zero observations from six regions in Thailand.
The outcomes of the simulation study with a range of sample sizes and probabilities for nonzero values shed light on the analytical conduct of the simultaneous CIs. For or 6, we discovered that the HPD interval based on the Jeffreys rule prior is the most suitable approach for all of the scenarios tested. The coverage probabilities and expected lengths of the 95% simultaneous CIs for were comparable to those for for various sample sizes. Moreover, the expected lengths of the approaches decreased as the probability of nonzero values was increased.
Importantly, the practicability of these methods was demonstrated by estimating the ratios of the means of multiple daily rainfall datasets in September 2021 for the six areas in Thailand. The selected rainfall station for each location had the same average number of rainy days, resulting in the probabilities of nonzero values being roughly the same. The results of this empirical application were in agreement with those of the simulation study results in that the HPD interval based on the Jeffreys rule prior was the most appropriate. Hence, it is possible to predict the ratio of rainfall in September of the following year in regions of Thailand that have an average chance of frequent rainfall. Therefore, our approach could be used to create an imminent natural alarm for natural disasters such as floods and landslides to alert people to make preparations in advance.
6. Conclusions
Herein, six methods for constructing simultaneous CIs for the ratios of the means of multiple ZIG distributions based on the fiducial GCI approach, Bayesian, and HPD interval approaches based on the Jeffreys rule or uniform prior and MOVER are presented. Their coverage probabilities and expected lengths from a simulation study indicate that the HPD interval based on the Jeffreys rule prior performed the best in most cases, while in some situations, the fiducial GCI performed well for both and 6. Applying the methods to compare the rainfall datasets for September 2021 from six regions in Thailand shows that the HPD interval based on the Jeffreys rule prior and the fiducial GCI once again performed the best, which is consistent with the simulation results. Hence, constructing simultaneous CIs for the ratios of the means of multiple ZIG datasets should be carried out by using the HPD interval based on the Jeffreys rule prior. For some applications, we offer the fiducial GCI as an alternative approach. Researchers that are interested in analyzing rainfall means can use the R package we developed. Future studies will investigate into other statistical parameters like the coefficient of variation because they are important when making statistical inferences. In addition, we discovered that the coefficient of variation is an useful tool for evaluating rainfall dispersion. On CIs for the coefficient of variation of a zero-inflated gamma population, there are few research studies published. Therefore, we will investigate into this soon.