1. Introduction
For statistical inference, variance is the second central moment that gives a measure of the spread or variability of a distribution and is often used for probability and statistical inference. Many researchers have studied and constructed the confidence interval for the variance of various distributions by using several methods. For example, Harvey and van der Merwe [
1] proposed Bayesian confidence interval methods for the means and variances of lognormal and bivariate lognormal distributions. Niwitpong [
2] suggested the generalized confidence interval approach for a function of the variance of a lognormal distribution. Puggard et al. [
3] constructed the confidence intervals for the variance and difference between the variances of several Birnbaum-Saunders distributions. Puggard et al. [
4] proposed the confidence interval for comparing the variances of two independent Birnbaum–Saunders distributions.
Populations containing positive observations, such as environmental data, can be reasonably assessed by using a gamma distribution [
5]. Gibbons and Coleman [
6] pointed out that the use of a gamma distribution is more appropriate than a normal distribution when variability and concentration are related, as is the case with many environmental datasets. However, rainfall data often contain zero observations, which violates the necessity for positive data for gamma modeling, and so this must be taken into account when studying this phenomenon. Aitchison [
7] provided guidelines for modeling populations containing zero observations whereby the probability of obtaining zeros (
) is constrained by
while the positive observations comprise the remaining probability (
). Later, Aitchison and Brown [
8] coped with this issue by introducing the delta-lognormal distribution in which the number of zero observations can be viewed as a random variable with a binomial distribution while the positive observations are assumed to be from a random variable with a lognormal distribution.
Many researchers have developed various methods for constructing confidence intervals for various parameters of a delta-lognormal distribution. For example, Yosboonruang et al. [
9] constructed the confidence interval for the coefficient of variation of a single delta-lognormal distribution. Maneerat et al. [
10,
11] suggested Bayesian confidence interval methods for the variance of a delta-lognormal distribution and the difference between the variances of delta-lognormal distributions and applied them to analyze rainfall dispersion. Maneerat et al. [
12] used the Bayesian approach to construct the confidence interval for comparing the ratio of the variances of delta-lognormal-distributed rainfall dispersion datasets in Thailand. Zhang et al. [
13] proposed simultaneous confidence intervals for the ratio of the means of zero-inflated lognormal populations. In a slightly different approach, Ren et al. [
14] proposed simultaneous confidence intervals for the difference between the means of multiple zero-inflated gamma distributions by using one exact and two approximate fiducial methods and applied them for analyzing precipitation datasets and found that the exact method provided more accurate results. Muralidharan and Kale [
15] proposed a modified gamma distribution with a singularity at zero and thereby obtained the confidence interval for the mean of the mixed distribution. Lecomte et al. [
16] provided compound Poisson-gamma and delta-gamma distributions to handle zero-inflated continuous data under a variable sampling regime. Kaewprasert et al. [
17] used Bayesian estimation for the mean of delta-gamma distributions with application to rainfall data in Thailand. Khooriphan et al. [
18] proposed a Bayesian estimation of rainfall dispersion in Thailand using gamma distribution with excess zeros. Wang et al. [
19] proposed confidence interval methods for the parameters of a zero-inflated gamma distribution.
The ratio of the variances of two populations of rainfall data containing zero observations, which can thus be modeled by using the delta-gamma distribution, is a suitable approach for comparing rainfall dispersion in two areas. Thus, we constructed the confidence interval for the ratio of the variances of delta-gamma distributions by using six Bayesian approaches: Bayesian credible intervals based on the Jeffreys (BAY-J), uniform (BAY-U), or normal-gamma-beta (BAY-NGB) priors and highest posterior density (HPD) intervals based on the Jeffreys (HPD-J), uniform (HPD-U), or normal-gamma-beta (HPD-NGB) priors and compared them with the fiducial quantity (FQ) approach.
This article is organized as follows. The theoretical background for the proposed methods for constructing the confidence interval for the ratio of variances of delta-gamma distributions is covered in
Section 2. Simulation study parameters and results are presented in
Section 3. The application of the methods to real datasets is reported in
Section 4. Finally, conclusions based on the study are covered in
Section 5.
2. The Confidence Interval for the Ratio of the Variances of Two Delta-Gamma Distributions
Let
be a random sample from a delta-gamma distribution, denoted as
. The distribution function of a delta-gamma can be derived as
where
stands for the gamma cumulative distribution function. The mean and variance of a gamma
distribution with shape parameter
and scale parameter
can be defined as
and
, respectively. The zero and non-zero observed values are denoted as
and
, respectively, where
. The zero observations follow binomial distribution
while the non-zero observations follow a gamma distribution.
The maximum likelihood estimators of parameters and can be defined as
where
is the sample mean of
[
20].
According to Aitchison [
7], the population mean and variance of
can be written as
Subsequently, the ratio of the two variances becomes
The methods to construct the confidence interval for are proposed in the following sub-sections.
2.1. The Fiducial Quantity Method
Krishnamoorthy et al. [
21] developed FQs based on cube-root-transforming a sample. Let
be a random sample from a delta-gamma distribution with shape parameter
and scale parameter
. For
, then
is approximately normally distributed with respective means and variances
and
given by
Consider the stochastic representations
where notation “
” means “distributed as”. Let
and
are the observed values of
and
, respectively;
is an independent random variable from a standard normal distribution; and
is an independent random variable from Chi-squared distribution (
is the sample size). By solving the above equations for
and
, we arrive at
by replacing
and
with
and
respectively. Thus the FQs of
and
are
The above FQs, and are fixed, and and are random variables whose distributions do not depend on any parameters.
Meanwhile, the respective FQs for
are as follows [
22]
The FQs for the means are [
5]
We can express the FQs for the variances as follows:
Let
. Thus, we can rewrite Equation (
5). as
We can find
by solving the above equations for
. Thus, the FQs for the variances are obtained as:
where
and
are defined as in Equation (
7).
Hence, the FQs for
and
become
Thus, the FQs for the ratio of the variances of two delta-gamma distributions can be derived as
Therefore, the equal-tails
FQ interval for the ratio of variances can be defined by
where
and
are the
and
percentiles of the distribution of
, respectively.
The confidence interval for the ratio of variances
can be obtained by executing Algorithm 1.
Algorithm 1 FQ |
- 1:
For a given sample from , compute and of the cube-root-transformed sample. - 2:
Generate standard normal variate and Chi-squared variate . - 3:
Generate and . - 4:
Compute , , and from Equations (7). and (8). - 5:
Compute the FQs for the mean () and variance () of a gamma distribution from Equations (9). and (10), respectively. - 6:
Compute and from Equations (11) and (12), respectively. - 7:
Repeat Steps 2–6 5000 times and obtains an array of . - 8:
Compute the 95% confidence interval for from Equation ( 13). - 9:
Repeat Steps 1–8 10,000 times to compute the coverage probability (CP) and the average length (AL).
|
2.2. The Bayesian Methods
The Bayesian credible interval involves estimating the parameter of interest from the posterior distribution [
23], while HPD intervals are based on the Bayesian approach where the posterior density for every parameter value within the confidence region is higher than those outside of the region [
24]. HPD is regarded as the narrowest possible interval for the parameter of interest for probability
[
25]. Box and Tiao [
26] described the HPD definition as follows:
Definition 1.
Let be a posterior density function. A region R in the parameter space of θ is called a HPD region of content if
(i) ,
(ii) For and .
We can explain that the HPD interval has two main properties for a given probability level , the interval has the narrowest length and every point within the interval has a higher probability density than the points outside of it.
In this section, the Bayesian credible interval approaches based on the Jeffreys, uniform, and normal-gamma-beta priors are presented.
2.2.1. The Bayesian Methods Using the Jeffreys Prior
This is derived from the square root of the Fisher information matrix [
27]; i.e.,
. Since
are random samples from a delta-gamma distribution. For,
, then
is approximately normally distributed with means
and variance and
. The delta-gamma distribution for three unknown parameters can be denoted as
with likelihood function
Therefore, the Fisher information for becomes
Bolstad and Curran [
25] defined the Jeffreys prior for
in a binomial distribution as
. This allows us to obtain the marginal posterior distribution of
as
Subsequently, the Jeffreys prior for
in a lognormal distribution is
. Therefore, the marginal posterior distribution of
is
and the marginal posterior distribution of
is
We compute the mean and variance of the gamma distribution by using
and
, respectively, as follows:
Subsequently,
such that
The Bayesian credible interval and HPD interval for the ratio of variances of delta-gamma distributions are respectively obtained as
2.2.2. The Bayesian Methods Using the Uniform Prior
For the uniform prior that gives equally likely a priori to all possible values [
28], the prior probability is a constant function [
29]. The uniform prior for
in binomial distribution is
[
25], which leads to obtaining the marginal posterior distribution of
as
Kalkur and Rao [
30] defined the uniform prior of
as
, with the marginal posterior distribution of
being
Similarly, the marginal posterior distribution of
is
We can compute the mean and variance of gamma by using
and
, respectively, as follows:
Hence,
such that
Thus, the Bayesian credible interval and HPD interval for the ratio of the variances of two delta-gamma distributions are respectively obtained as
2.2.3. The Bayesian Methods Using the Normal-Gamma-Beta Prior
Maneerat and Niwitpong [
31] proposed an HPD-NGB for the common mean of several delta-lognormal distributions, which performed well for small-to-large sample sizes and better than HPD-J derived by Harvey and van der Merwe [
1]. Let
be a random variable from a normal distribution with mean
and precision
where
and
. The HPD based on the normal-gamma-beta prior of
is defined as
, where
follows a normal-gamma distribution, and
follows a beta distribution. Thus, the respective marginal posterior distributions of
,
and
are as follows:
where
denotes a Student’s t distribution with
degrees of freedom.
We compute the mean and variance of gamma using
and
, respectively, as follows
Hence,
such that
The credible interval and HPD interval based on the BAY-NGB and HPD-NGB methods for the ratio of variances of delta-gamma distributions are respectively obtained as
The confidence interval for the ratio of variances
can be obtained by executing Algorithm 2.
Algorithm 2 Bayesian interval |
- 1:
Generate and compute and of the cube-root-transformed sample. - 2:
Generate from Equations (14), (22), and (30). - 3:
Generate from Equations (15), (23), and (31). - 4:
Generate from Equations (16), (24), and (32). - 5:
Compute the mean and variance of a gamma distribution. - 6:
Compute and . - 7:
Compute the 95% credible intervals and HPD intervals for by using Equations (21), (29), and (37). - 8:
Repeat Steps 1–7 10,000 times to compute the CP and the AL.
|
3. The Simulation Study and Results
A simulation study to generate the confidence interval for the ratio of the variance of two independent delta-gamma distributions by using the proposed methods was conducted with 10,000 replications (M), 5000 repetitions (m) for FQ, and the nominal confidence level set as 0.95 using R statistical software version 4.1.0. For equal sample sizes , we used (30,30), (50,50), (100,100), or (200,200), and for unequal sample sizes , we used (30,50), (50,100), or (100,200). For the two probabilities of data containing zeros , we set shape parameters as (7.00,7.00), (7.00,7.50), (7.50,7.00), or (7.50,7.50); for , we set as (2.00,2.00), (2.00,2.50), (2.50,2.00), or (2.50,2.50); and for , we set as (1.25,1.25), (1.25,1.50), (1.50,1.25), or (1.50,1.50); we set rate parameters as (1,1) for all cases. The performances of FQ, BAY-J, HPD-J, BAY-U, HPD-U, BAY-NGB, and HPD-NGB were assessed by comparing their CPs and ALs, with the best-performing one for a particular scenario having a CP close to or greater than 0.95 and the shortest AL.
The findings show that FQ, HPD-J, HPD-U, BAY-NGB, and HPD-NGB attained CPs greater than or close to the nominal confidence level of 0.95. For small-to-moderate sample sizes, FQ, BAY-NGB, and HPD-NGB performed well for both small and large
whereas the HPD-J and HPD-U performed well for small
. For large
, the ALs of HPD-NGB were the shortest. For large sample sizes, FQ and HPD-J performed well for small
whereas HPD-U, BAY-NGB, and HPD-NGB performed well for large
. For small
, the ALs of FQ and HPD-J were shorter than the other methods whereas for large
, the ALs of HPD-NGB were the shortest. The results in
Figure 1 and
Figure 3 reveal that FQ, BAY-NGB, and HPD-NGB performed well in almost all cases.
Figure 2 and
Figure 4 show that BAY-J and HPD-J provide the shortest ALs.
Maneerat et al. [
12] proposed the confidence interval for the ratio of the variances of two delta-lognormal distributions using an HPD based on the normal-gamma prior (HPD-NG), as well as the method of variance estimates recovery (MOVER). These proposed methods were compared with existing HPD-J, HPD based on the Jeffreys’ rule prior, the generalized confidence interval (GCI), and the fiducial GCI. They found that HPD-NG performed very well in various situations while MOVER could be recommended for scenarios with small equal sample sizes. From the simulation results of the present study, it can be seen that HPD-NGB performed well for moderate-to-large sample sizes, while HPD-J and HPD-NGB both performed well for small-to-large sample sizes. Hence, both methods can be recommended for constructing the confidence interval for the ratio of the variances of two delta-gamma distributions.