The Bayesian Posterior and Marginal Densities of the Hierarchical Gamma–Gamma, Gamma–Inverse Gamma, Inverse Gamma–Gamma, and Inverse Gamma–Inverse Gamma Models with Conjugate Priors

Zhang, Li; Zhang, Ying-Ying

doi:10.3390/math10214005

Open AccessArticle

The Bayesian Posterior and Marginal Densities of the Hierarchical Gamma–Gamma, Gamma–Inverse Gamma, Inverse Gamma–Gamma, and Inverse Gamma–Inverse Gamma Models with Conjugate Priors

by

Li Zhang

^1,† and

Ying-Ying Zhang

^1,2,3,*,†

¹

Department of Statistics and Actuarial Science, College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China

²

Chongqing Key Laboratory of Analytic Mathematics and Applications, Chongqing University, Chongqing 401331, China

³

Department of Statistics, School of Mathematics and Statistics, Yunnan University, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2022, 10(21), 4005; https://doi.org/10.3390/math10214005

Submission received: 3 October 2022 / Revised: 20 October 2022 / Accepted: 24 October 2022 / Published: 28 October 2022

(This article belongs to the Section D1: Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

Positive, continuous, and right-skewed data are fit by a mixture of gamma and inverse gamma distributions. For 16 hierarchical models of gamma and inverse gamma distributions, there are only 8 of them that have conjugate priors. We first discuss some common typical problems for the eight hierarchical models that do not have conjugate priors. Then, we calculate the Bayesian posterior densities and marginal densities of the eight hierarchical models that have conjugate priors. After that, we discuss the relations among the eight analytical marginal densities. Furthermore, we find some relations among the random variables of the marginal densities and the beta densities. Moreover, we discuss random variable generations for the gamma and inverse gamma distributions by using the R software. In addition, some numerical simulations are performed to illustrate four aspects: the plots of marginal densities, the generations of random variables from the marginal density, the transformations of the moment estimators of the hyperparameters of a hierarchical model, and the conclusions about the properties of the eight marginal densities that do not have a closed form. Finally, we illustrate our method by a real data example, in which the original and transformed data are fit by the marginal density with different hyperparameters.

Keywords:

conjugate prior; gamma and inverse gamma distribution; hierarchical model and mixture distribution; marginal density; posterior density

MSC:

62C10; 62F15; 93A13

1. Introduction

Mixture distribution refers to a distribution arising from a hierarchical structure. According to [1], a random variable X is said to have a mixture distribution if the distribution of X depends on a quantity that also has a distribution. In general, a hierarchical model will lead to a mixture distribution. In Bayesian analysis, we have a likelihood and a prior, and they naturally assemble into a hierarchical model. Therefore, the likelihood and prior naturally lead to a mixture distribution, which is the marginal distribution of the hierarchical model. Important hierarchical models or mixture distributions include binomial Poisson (also known as the Poisson binomial distribution; see [2,3,4,5,6]), binomial–negative binomial ([1]), Poisson gamma ([7,8,9,10,11,12,13,14]), binomial beta (also known as the beta binomial distribution; see [15,16,17,18,19]), negative binomial beta (also known as the beta negative binomial distribution; see [20,21,22,23,24]), multinomial Dirichlet ([25,26,27,28,29]), Chi-squared Poisson ([1]), normal–normal ([1,30,31,32,33,34]), normal–inverse gamma ([18,30,35,36]), normal–normal inverse gamma ([30,37,38,39]), gamma–gamma ([35]), inverse gamma–inverse gamma ([40]), and many others. See also [1,30,35] and the references therein.

By introducing the new parameter(s), several researchers considered new generalizations of the two-parameter gamma distribution, including [41,42,43,44]. Using the generalized gamma function of [45], ref. [44] defined the generalized gamma-type distribution with four parameters, based on which [46] introduced a new type of three-parameter finite mixture of gamma distributions, which can be regarded as mixing the shape parameter of the gamma distribution by a discrete distribution.

However, the present paper is not in the same direction of the literature in the previous paragraph. This paper is in the line of [35,40]. Reference [35] mixed the rate parameter of a gamma distribution by a gamma distribution. Reference [40] mixed the rate parameter of an inverse gamma distribution by an inverse gamma distribution. In this paper, we mix the scale or rate parameter of a gamma or inverse gamma distribution by a gamma or inverse gamma distribution. It is worth mentioning that the mixture distributions are no longer gamma or inverse gamma distributions.

A positive continuous datum could be modeled by a gamma or inverse gamma distribution

G (ν, θ)

,

\tilde{G} (ν, θ)

,

I G (ν, θ)

, and

\tilde{I G} (ν, θ)

, where

ν

is a shape parameter and

θ

is an unknown parameter of interest and a scale or rate parameter. Suppose that

θ

also has a gamma or inverse gamma prior

G (α, β)

,

\tilde{G} (α, β)

,

I G (α, β)

, and

\tilde{I G} (α, β)

. The definitions of G,

\tilde{G}

,

I G

, and

\tilde{I G}

will be made clear in Section 2.1. Then, we have

16 = 4 \times 4

hierarchical models. The new and original things in this paper are described as follows. To the best of our knowledge, there is no literature that considers the 16 hierarchical models as a whole. For the 16 hierarchical models of the gamma and inverse gamma distributions, we found that there are only 8 hierarchical models that have conjugate priors, that is

G - I G

,

G - \tilde{I G}

,

\tilde{G} - G

,

\tilde{G} - \tilde{G}

,

I G - I G

,

I G - \tilde{I G}

,

\tilde{I G} - G

, and

\tilde{I G} - \tilde{G}

. We then calculated the posterior densities and marginal densities of these eight hierarchical models that have conjugate priors. Moreover, some numerical simulations were performed to illustrate four aspects: the plots of marginal densities, the generations of random variables from the marginal density, the transformations of the moment estimators of the hyperparameters of a hierarchical model, and the conclusions about the properties of the eight marginal densities that do not have a closed form. Finally, we investigated a real data example, in which the original and transformed data, the close prices of the Shanghai, Shenzhen, and Beijing (SSB) A shares, were fit by the marginal density with different hyperparameters.

The rest of the paper is organized as follows. In the next Section 2, we first provide some preliminaries. Then, we discuss some common typical problems for the eight hierarchical models that do not have conjugate priors. After that, we analytically calculate the Bayesian posterior and marginal densities of the eight hierarchical models that have conjugate priors. Furthermore, the relations among the marginal densities and the relations among the random variables of the marginal densities and the beta densities are also given in this section. Moreover, we discuss random variable generations for the gamma and inverse gamma distributions by using the R software. In Section 3, numerical simulations are performed to illustrate four aspects. In Section 4, we illustrate our method by a real data example, in which the original and transformed data are fit by the marginal density with different hyperparameters. Finally, some conclusions and discussions are provided in Section 5.

2. Main Results

In this section, we first provide some preliminaries. Then, we discuss some common typical problems for the eight hierarchical models that do not have conjugate priors. After that, we analytically calculate the Bayesian posterior and marginal densities of the eight hierarchical models that have conjugate priors. Furthermore, we find some relations among the marginal densities. Moreover, we discover some relations among the random variables of the marginal densities and the beta densities. Finally, we discuss random variable generations for the gamma and inverse gamma distributions by using the R software.

2.1. Preliminaries

In this subsection, we provide some preliminaries. We first give the probability density functions (pdfs) of the

G (α, β)

,

I G (α, β)

,

\tilde{G} (α, β)

, and

\tilde{I G} (α, β)

distributions. After that, we summarize the 16 hierarchical models of the gamma and inverse gamma distributions in a table. Note that there are only 8 hierarchical models that have conjugate priors, and the other 8 hierarchical models do not have conjugate priors.

Suppose that

X \sim G (α, β)

and

Y = 1 / X \sim I G (α, β)

. The pdfs of X and Y are, respectively, given by

\begin{matrix} f_{X} (x | α, β) & = \frac{1}{Γ (α) β^{α}} x^{α - 1} exp (- \frac{x}{β}), x > 0, α, β > 0, \\ f_{Y} (y | α, β) & = \frac{1}{Γ (α) β^{α}} {(\frac{1}{y})}^{α + 1} exp (- \frac{1}{β y}), y > 0, α, β > 0 . \end{matrix}

Suppose that

X \sim \tilde{G} (α, β)

and

Y = 1 / X \sim \tilde{I G} (α, β)

. The pdfs of X and Y are, respectively, given by

\begin{matrix} f_{X} (x | α, β) & = \frac{β^{α}}{Γ (α)} x^{α - 1} exp (- β x), x > 0, α, β > 0, \\ f_{Y} (y | α, β) & = \frac{β^{α}}{Γ (α)} {(\frac{1}{y})}^{α + 1} exp (- \frac{β}{y}), y > 0, α, β > 0 . \end{matrix}

The 16 hierarchical models of the gamma and inverse gamma distributions are summarized in Table 1. However, there are only eight hierarchical models that have conjugate priors, that is

G - I G

,

G - \tilde{I G}

,

\tilde{G} - G

,

\tilde{G} - \tilde{G}

,

I G - I G

,

I G - \tilde{I G}

,

\tilde{I G} - G

, and

\tilde{I G} - \tilde{G}

, where the former density represents the likelihood and the latter density represents the prior, that is the likelihood prior. Note that the eight hierarchical models that have conjugate priors are highlighted in boxes in the table. The posterior densities and marginal densities of these eight hierarchical models that have conjugate priors will be calculated later.

For the other eight hierarchical models in Table 1, that is

G - G

,

G - \tilde{G}

,

\tilde{G} - I G

,

\tilde{G} - \tilde{I G}

,

I G - G

,

I G - \tilde{G}

,

\tilde{I G} - I G

, and

\tilde{I G} - \tilde{I G}

, they do not have conjugate priors. Therefore, the posterior densities and marginal densities of these eight hierarchical models cannot be recognized. In particular, the posterior densities are not gamma distributions, nor the inverse gamma distributions. Moreover, the marginal densities of these eight hierarchical models cannot be obtained in analytical forms, although they are proper densities, that is they integrate to 1.

2.2. Common Typical Problems for the Eight Hierarchical Models That Do Not Have Conjugate Priors

In this subsection, we discuss some common typical problems for the eight hierarchical models that do not have conjugate priors in terms of the hierarchical

G - G

model.

Let us calculate the posterior density and marginal density for the hierarchical

G - G

model:

\{\begin{matrix} X | θ \sim G (ν, θ), \\ θ \sim G (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters. It is easy to obtain

\begin{matrix} f (x | θ) & = \frac{1}{Γ (ν) θ^{ν}} x^{ν - 1} exp (- \frac{x}{θ}), \\ π (θ) & = \frac{1}{Γ (α) β^{α}} θ^{α - 1} exp (- \frac{θ}{β}) . \end{matrix}

By the Bayes theorem, we have

\begin{matrix} π (θ | x) & \propto f (x | θ) π (θ) \\ \propto \frac{1}{θ^{ν}} exp (- \frac{x}{θ}) θ^{α - 1} exp (- \frac{θ}{β}) \\ = θ^{α - ν - 1} exp (- \frac{x}{θ} - \frac{θ}{β}) \\ = exp (log θ^{α - ν - 1}) exp (- \frac{x}{θ} - \frac{θ}{β}) \\ = exp [(α - ν - 1) log θ - \frac{x}{θ} - \frac{θ}{β}], \end{matrix}

which cannot be recognized as a familiar density. In particular, it is not a gamma distribution nor an inverse gamma distribution. However,

π (θ | x)

belongs to an exponential family (see [1]). Moreover, the marginal density of X is

\begin{matrix} m (x | α, β, ν) & = \int_{0}^{\infty} f (x, θ) d θ \\ = \int_{0}^{\infty} f (x | θ) π (θ) d θ \\ = \int_{0}^{\infty} \frac{1}{Γ (ν) θ^{ν}} x^{ν - 1} exp (- \frac{x}{θ}) \frac{1}{Γ (α) β^{α}} θ^{α - 1} exp (- \frac{θ}{β}) d θ \\ = \int_{0}^{\infty} \frac{1}{Γ (ν) Γ (α)} \frac{1}{β^{α}} x^{ν - 1} θ^{α - ν - 1} exp (- \frac{x}{θ} - \frac{θ}{β}) d θ \\ = \frac{1}{Γ (ν) Γ (α)} \frac{1}{β^{α}} x^{ν - 1} \int_{0}^{\infty} θ^{α - ν - 1} exp (- \frac{x}{θ} - \frac{θ}{β}) d θ . \end{matrix}

(1)

The above integral cannot be analytically calculated, and thus, the marginal density cannot be obtained in its analytical form. However,

m (x | α, β, ν)

is a proper density, as

\begin{matrix} \int_{0}^{\infty} m (x | α, β, ν) d x & = \int_{0}^{\infty} \int_{0}^{\infty} f (x, θ) d θ d x \\ = \int_{0}^{\infty} \int_{0}^{\infty} f (x, θ) d x d θ \\ = \int_{0}^{\infty} \int_{0}^{\infty} f (x | θ) π (θ) d x d θ \\ = \int_{0}^{\infty} [\int_{0}^{\infty} f (x | θ) d x] π (θ) d θ \\ = \int_{0}^{\infty} π (θ) d θ \\ = 1, \end{matrix}

since

f (x | θ)

and

π (θ)

are proper densities.

In the above calculations, we encountered some common typical problems for the eight hierarchical models that do not have conjugate priors. The problem for

π (θ | x)

is that its kernel is

θ^{α - ν - 1} exp (- \frac{x}{θ} - \frac{θ}{β}),

which is not a gamma distribution nor an inverse gamma distribution. The problem for

m (x | α, β, ν)

is that the integral

\int_{0}^{\infty} θ^{α - ν - 1} exp (- \frac{x}{θ} - \frac{θ}{β}) d θ

cannot be analytically calculated, as the integrand cannot be recognized as a familiar kernel. We remark that the above integral and the marginal density given by (1) can be numerically evaluated by an R built-in function integrate(), which is an adaptive quadrature of functions of one variable over a finite or infinite interval.

2.3. The Bayesian Posterior and Marginal Densities of the Eight Hierarchical Models That Have Conjugate Priors

In this subsection, we calculate the Bayesian posterior and marginal densities of the eight hierarchical models that have conjugate priors, that is,

G - I G

,

G - \tilde{I G}

,

\tilde{G} - G

,

\tilde{G} - \tilde{G}

,

I G - I G

,

I G - \tilde{I G}

,

\tilde{I G} - G

, and

\tilde{I G} - \tilde{G}

. The straightforward, but lengthy calculations can be found in the Supplementary.

Model 1. Suppose that we observe X from the hierarchical

G - I G

model:

\{\begin{matrix} X | θ \sim G (ν, θ), \\ θ \sim I G (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim I G (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = \frac{β}{x β + 1} .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x β}{x β + 1})}^{ν} {(\frac{1}{x β + 1})}^{α}, x > 0, α, β, ν > 0 .

Model 2. Suppose that we observe X from the hierarchical

G - \tilde{I G}

model:

\{\begin{matrix} X | θ \sim G (ν, θ), \\ θ \sim \tilde{I G} (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim \tilde{I G} (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = x + β .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + β})}^{ν} {(\frac{β}{x + β})}^{α}, x > 0, α, β, ν > 0 .

Model 3. Suppose that we observe X from the hierarchical

\tilde{G} - G

model:

\{\begin{matrix} X | θ \sim \tilde{G} (ν, θ), \\ θ \sim G (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim G (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = \frac{β}{x β + 1} .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x β}{x β + 1})}^{ν} {(\frac{1}{x β + 1})}^{α}, x > 0, α, β, ν > 0 .

Model 4. Suppose that we observe X from the hierarchical

\tilde{G} - \tilde{G}

model:

\{\begin{matrix} X | θ \sim \tilde{G} (ν, θ), \\ θ \sim \tilde{G} (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim \tilde{G} (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = x + β .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + β})}^{ν} {(\frac{β}{x + β})}^{α}, x > 0, α, β, ν > 0 .

Model 5. Suppose that we observe X from the hierarchical

I G - I G

model:

\{\begin{matrix} X | θ \sim I G (ν, θ), \\ θ \sim I G (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim I G (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = {(\frac{1}{x} + \frac{1}{β})}^{- 1} .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + β})}^{α} {(\frac{β}{x + β})}^{ν}, x > 0, α, β, ν > 0 .

Model 6. Suppose that we observe X from the hierarchical

I G - \tilde{I G}

model:

\{\begin{matrix} X | θ \sim I G (ν, θ), \\ θ \sim \tilde{I G} (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim \tilde{I G} (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = \frac{1}{x} + β .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x β}{x β + 1})}^{α} {(\frac{1}{x β + 1})}^{ν}, x > 0, α, β, ν > 0 .

Model 7. Suppose that we observe X from the hierarchical

\tilde{I G} - G

model:

\{\begin{matrix} X | θ \sim \tilde{I G} (ν, θ), \\ θ \sim G (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim G (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = {(\frac{1}{x} + \frac{1}{β})}^{- 1} .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + β})}^{α} {(\frac{β}{x + β})}^{ν}, x > 0, α, β, ν > 0 .

Model 8. Suppose that we observe X from the hierarchical

\tilde{I G} - \tilde{G}

model:

\{\begin{matrix} X | θ \sim \tilde{I G} (ν, θ), \\ θ \sim \tilde{G} (α, β), \end{matrix}

where

α > 0

,

β > 0

, and

ν > 0

are hyperparameters.

The posterior density of

θ

is

π (θ | x) \sim \tilde{G} (α^{*}, β^{*}),

where

α^{*} = α + ν and β^{*} = \frac{1}{x} + β .

Moreover, the marginal density of X is

m (x | α, β, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x β}{x β + 1})}^{α} {(\frac{1}{x β + 1})}^{ν}, x > 0, α, β, ν > 0 .

2.4. Relations among the Marginal Densities

In this subsection, we find some relations among the marginal densities.

It is not difficult to see that the marginal densities of the eight hierarchical models that have conjugate priors can be divided into four types, denoted as

m_{i} (x | α, β, ν)

(i = 1, 2, 3, 4)

, where

\begin{matrix} m_{1} (x | α, β, ν) & = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x β}{x β + 1})}^{ν} {(\frac{1}{x β + 1})}^{α}, (G - I G, \tilde{G} - G), \\ m_{2} (x | α, β, ν) & = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + β})}^{ν} {(\frac{β}{x + β})}^{α}, (G - \tilde{I G}, \tilde{G} - \tilde{G}), \\ m_{3} (x | α, β, ν) & = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + β})}^{α} {(\frac{β}{x + β})}^{ν}, (I G - I G, \tilde{I G} - G), \\ m_{4} (x | α, β, ν) & = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x β}{x β + 1})}^{α} {(\frac{1}{x β + 1})}^{ν}, (I G - \tilde{I G}, \tilde{I G} - \tilde{G}), \end{matrix}

(2)

for

x > 0

and

α, β, ν > 0

.

There are some associations among the marginal densities of the hierarchical models with the same likelihood function and different prior distributions, for example

G - I G

and

G - \tilde{I G}

,

\tilde{G} - G

and

\tilde{G} - \tilde{G}

,

I G - I G

and

I G - \tilde{I G}

, and

\tilde{I G} - G

and

\tilde{I G} - \tilde{G}

. Suppose that

X \sim m_{3} (x | α, β_{3}, ν)

and

β_{4} = 1 / β_{3}

; it is easy to see that

\begin{matrix} m_{3} (x | α, β_{3}, ν) & = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + β_{3}})}^{α} {(\frac{β_{3}}{x + β_{3}})}^{ν} \\ = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x}{x + \frac{1}{β_{4}}})}^{α} {(\frac{\frac{1}{β_{4}}}{x + \frac{1}{β_{4}}})}^{ν} \\ = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} (\frac{1}{x}) {(\frac{x β_{4}}{x β_{4} + 1})}^{α} {(\frac{1}{x β_{4} + 1})}^{ν} \\ = m_{4} (x | α, β_{4}, ν) . \end{matrix}

Similarly,

m_{1} (x | α, β_{1}, ν) = m_{2} (x | α, β_{2}, ν),

where

β_{2} = 1 / β_{1}

.

Moreover, we can check that

m_{1} (x | α, β, ν) = m_{4} (x | α = ν, β, ν = α),

and

m_{2} (x | α, β, ν) = m_{3} (x | α = ν, β, ν = α) .

In summary, we have

m_{1} (x | α, β, ν) = m_{2} (x | α, β = \frac{1}{β}, ν) = m_{3} (x | α = ν, β = \frac{1}{β}, ν = α) = m_{4} (x | α = ν, β, ν = α) .

(3)

From (3), we find that the four families of marginal densities are the same, that is,

\begin{matrix} \{m_{1} (x | α, β, ν) | x > 0, α, β, ν > 0\} \\ = \{m_{2} (x | α, β, ν) | x > 0, α, β, ν > 0\} \\ = \{m_{3} (x | α, β, ν) | x > 0, α, β, ν > 0\} \\ = \{m_{4} (x | α, β, ν) | x > 0, α, β, ν > 0\} . \end{matrix}

(4)

In other words, for every

α, β, ν > 0

, there exists a marginal density in each of the four families, such that the four marginal densities are the same. For example,

\begin{matrix} m_{1} (x | α = 1, β = 2, ν = 3) \\ = m_{2} (x | α = 1, β = \frac{1}{2}, ν = 3) \\ = m_{3} (x | α = 3, β = \frac{1}{2}, ν = 1) \\ = m_{4} (x | α = 3, β = 2, ν = 1) \\ = \frac{Γ (4)}{Γ (1) Γ (3)} (\frac{1}{x}) {(\frac{2 x}{2 x + 1})}^{3} {(\frac{1}{2 x + 1})}^{1}, x > 0 . \end{matrix}

In summary, for the 16 hierarchical models of the gamma and inverse gamma distributions, only the 8 hierarchical models that have conjugate priors have analytical marginal densities. Moreover, the eight marginal densities can be divided into four types. Furthermore, in terms of families of marginal densities, there is only one.

2.5. Relations among the Random Variables of the Marginal Densities and the Beta Densities

In this subsection, we discover some relations among the random variables of the marginal densities and the beta densities.

Suppose that

X \sim m_{2} (x | α, β, ν)

and

Y \sim B e t a (α, ν)

, where

α, β, ν > 0

, and the probability density function (pdf) of Y is given by

f_{Y} (y | α, ν) = \frac{Γ (α + ν)}{Γ (α) Γ (ν)} y^{α - 1} {(1 - y)}^{ν - 1}, 0 < y < 1, α, ν > 0 .

In the Supplementary, we provide two methods to prove that

X = (\frac{1}{Y} - 1) β .

(5)

Let

X_{i} \sim m_{i} (x | α, β, ν) (i = 1, 2, 3, 4)

,

Y \sim B e t a (α, ν)

, and

Z \sim B e t a (ν, α)

. Then, the random variables

X_{i} (i = 1, 2, 3, 4)

, Y, and Z have the following relationships:

\begin{matrix} X_{1} & = (\frac{1}{Y} - 1) \frac{1}{β}, \\ X_{2} & = (\frac{1}{Y} - 1) β, \\ X_{3} & = (\frac{1}{Z} - 1) β, \\ X_{4} & = (\frac{1}{Z} - 1) \frac{1}{β} . \end{matrix}

(6)

Alternatively, the marginal densities can be expressed as

\begin{matrix} m_{1} (x | α, β, ν) & = f_{X_{1}} (x) = f_{(\frac{1}{B e t a (α, ν)} - 1) \frac{1}{β}} (x), \\ m_{2} (x | α, β, ν) & = f_{X_{2}} (x) = f_{(\frac{1}{B e t a (α, ν)} - 1) β} (x), \\ m_{3} (x | α, β, ν) & = f_{X_{3}} (x) = f_{(\frac{1}{B e t a (ν, α)} - 1) β} (x), \\ m_{4} (x | α, β, ν) & = f_{X_{4}} (x) = f_{(\frac{1}{B e t a (ν, α)} - 1) \frac{1}{β}} (x), \end{matrix}

where

f_{X_{i}} (x)

is the pdf of the random variable

X_{i} (i = 1, 2, 3, 4)

.

2.6. Random Variable Generations for the Gamma and Inverse Gamma Distributions

In this subsection, we discuss the random variable generations for the gamma and inverse gamma distributions

G (α, β)

,

I G (α, β)

,

\tilde{G} (α, β)

, and

\tilde{I G} (α, β)

by using the R software ([47]).

To generate random variables from the gamma and inverse gamma distributions

G (α, β)

,

I G (α, β)

,

\tilde{G} (α, β)

, and

\tilde{I G} (α, β)

, we need to be careful about whether

β

is a scale or rate parameter of the gamma and inverse gamma distributions.

The definition of the scale parameter can be found in Definition 3.5.4 of [1]. For convenience, we state it as follows. Let

f (x)

be any pdf. Then, for any

σ > 0

, the family of pdfs

(1 / σ) f (x / σ)

, indexed by the parameter

σ

, is called the scale family with standard pdf

f (x)

and

σ

is called the scale parameter of the family.

Alternatively, we can equivalently define the scale parameter by a random variable. Let Z be a standard random variable having the standard pdf

f (z)

. Note that the standard random variable Z in this paper is a random variable corresponding to the standard pdf

f (z)

. It does not mean that Z is a random variable with unit variance. Then,

\{σ Z : σ > 0 and Z \sim f (z)\}

is a scale family and

σ

is called the scale parameter of the family. It is easy to show that the pdf of

X = σ Z

is

(1 / σ) f (x / σ)

.

For convenience, now let

G (α, β)

,

I G (α, β)

,

\tilde{G} (α, β)

, and

\tilde{I G} (α, β)

be random variables having the corresponding distributions. For example,

G (α, β)

is a random variable having the

G (α, β)

distribution.

It is straightforward to show that

G (α, β) = β G (α, 1),

by checking that the pdfs of the random variables of the both sides are equal. Therefore,

\{G (α, β) : α, β > 0\} = \{β G (α, 1) : α, β > 0\}

is a scale family with a scale parameter

β

and a standard random variable

G (α, 1)

.

Moreover, we have

\tilde{I G} (α, β) = \frac{1}{\tilde{G} (α, β)} = \frac{1}{G (α, \frac{1}{β})} = \frac{1}{\frac{1}{β} G (α, 1)} = \frac{1}{\frac{1}{β} \tilde{G} (α, 1)} = β \tilde{I G} (α, 1) .

Hence,

\{\tilde{I G} (α, β) : α, β > 0\} = \{β \tilde{I G} (α, 1) : α, β > 0\}

is a scale family with a scale parameter

β

and a standard random variable

\tilde{I G} (α, 1)

.

Furthermore, we have

\tilde{G} (α, β) = G (α, \frac{1}{β}) = \frac{1}{β} G (α, 1) = \frac{1}{β} \tilde{G} (α, 1) .

Thus,

\{\tilde{G} (α, β) : α, β > 0\} = \{\frac{1}{β} \tilde{G} (α, 1) : α, β > 0\}

is a scale family with

scale = \frac{1}{β} or β = \frac{1}{scale} = rate

and a standard random variable

\tilde{G} (α, 1)

. Hence,

β

is the rate parameter of the

\tilde{G} (α, β)

distribution.

Finally, it is easy to obtain

I G (α, β) = \frac{1}{G (α, β)} = \frac{1}{β G (α, 1)} = \frac{1}{β} I G (α, 1) .

Consequently,

\{I G (α, β) : α, β > 0\} = \{\frac{1}{β} I G (α, 1) : α, β > 0\}

is a scale family with

scale = \frac{1}{β} or β = \frac{1}{scale} = rate

and a standard random variable

I G (α, 1)

. Hence,

β

is the rate parameter of the

I G (α, β)

distribution.

In summary,

β

is the scale parameter of the

G (α, β)

and

\tilde{I G} (α, β)

distributions, while

β

is the rate parameter of the

\tilde{G} (α, β)

and

I G (α, β)

distributions, as depicted in Figure 1.

Now, let us generate random variables from the gamma and inverse gamma distributions in the R software. Let

α

and

β

be positive numbers. Then,

G = rgamma (n = 1, shape = α, scale = β)

is a random variable from the

G (α, β)

distribution, where

rgamma ()

is an R built-in function for random number generation for the gamma distribution with parameters shape and scale, and n is the number of observations. Moreover,

\tilde{G} = rgamma (n = 1, shape = α, rate = β)

is a random variable from the

\tilde{G} (α, β)

distribution, where

rate

is the rate parameter of the gamma distribution, and it is an alternative way to specify the scale. The two parameters have a relation

rate = 1 /

scale. Unfortunately, there are no built-in R functions that can directly generate random variables from the

I G (α, β)

and

\tilde{I G} (α, β)

distributions. To generate random variables from the

I G (α, β)

and

\tilde{I G} (α, β)

distributions, we can utilize the reciprocal relationships of the gamma and inverse gamma distributions:

I G (α, β) = \frac{1}{G (α, β)} and \tilde{I G} (α, β) = \frac{1}{\tilde{G} (α, β)} .

Therefore,

I G = 1 / rgamma (n = 1, shape = α, scale = β)

is a random variable from the

I G (α, β)

distribution. Similarly,

\tilde{I G} = 1 / rgamma (n = 1, shape = α, rate = β)

is a random variable from the

\tilde{I G} (α, β)

distribution.

It is worth mentioning that we do not use

I G^{'} = 1 / rgamma (n = 1, shape = α, rate = β)

to generate an

I G (α, β)

random variable, although

β

is the rate parameter of the

I G (α, β)

distribution, as

I G^{'} = \frac{1}{\tilde{G} (α, β)} = \tilde{I G} (α, β) .

Similarly, we do not use

{\tilde{I G}}^{'} = 1 / rgamma (n = 1, shape = α, scale = β)

to generate an

\tilde{I G} (α, β)

random variable, although

β

is the scale parameter of the

\tilde{I G} (α, β)

distribution, as

{\tilde{I G}}^{'} = \frac{1}{G (α, β)} = I G (α, β) .

3. Simulations

In this section, we conduct numerical simulations to illustrate four aspects. First, we plot the marginal densities for various hyperparameters. Second, we illustrate that the random variables can be generated from the marginal density by three methods. Third, we illustrate that the transformations of the moment estimators of the hyperparameters of one of the 8 hierarchical models that have conjugate priors can be used to obtain the moment estimators of the hyperparameters of another one of the 8 hierarchical models that have conjugate priors. Fourth, we obtain some conclusions about the properties of the eight marginal densities that do not have a closed form by simulation techniques and numerical integration.

3.1. Marginal Densities for Various Hyperparameters

In this subsection, we plot the marginal densities

m_{1} (x | α, β, ν)

given by (2) for various hyperparameters

α

,

β

, and

ν

. We explore how the marginal densities change around

m_{1} (x | α = 1, β = 2, ν = 3)

. Other numerical values of the hyperparameters can also be specified. The goal of this subsection is to see what kind of data could be modeled by the marginal densities.

Figure 2 plots the marginal densities

m_{1} (x | α, β, ν)

for varied

α = 1, 2, 3, 4

, holding

β = 2

and

ν = 3

fixed. From the figure, we can see that as

α

increases, the peak value of the curve increases and the position of the peak is gradually close to zero. Moreover, the marginal densities are right-skewed.

Figure 3 plots the marginal densities

m_{1} (x | α, β, ν)

for varied

β = 1, 2, 3, 4

, holding

α = 1

and

ν = 3

fixed. From the figure, we can also see that as

β

increases, the peak value of the curve increases and the position of the peak is gradually close to zero. The marginal densities are also right-skewed.

Figure 4 plots the marginal densities

m_{1} (x | α, β, ν)

for varied

ν = 1, 2, 3, 4

, holding

α = 1

and

β = 2

fixed. It is observed from the figure that as

ν

increases, the peak value of the curve decreases, and the position of the peak is gradually away from zero. Moreover, all the marginal densities are right-skewed.

3.2. Random Variable Generations from the Marginal Density by Three Methods

In this subsection, we generate random variables from the marginal densities

m_{i} (x | α, β, ν) (i = 1, 2, 3, 4)

by three methods. The goal of this subsection is to illustrate the theoretical result that the random variables can be generated from

m_{i} (x | α, β, ν) (i = 1, 2, 3, 4)

by three methods.

Let

α = 1

,

β = 2

,

ν = 3

, and

n = 10, 000

. Figure 5 shows the histograms of the samples generated from

m_{1} (x | α = 1, β = 2, ν = 3)

by the three methods:

(a): Method 1: $x_{i} (i = 1, \dots, n)$ are generated from the $G - I G$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $I G (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $G (ν, θ_{i})$ .
(b): Method 2: $x_{i} (i = 1, \dots, n)$ are generated from the $\tilde{G} - G$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $G (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $\tilde{G} (ν, θ_{i})$ .
(c): Method 3: $x_{i} (i = 1, \dots, n)$ are generated from the transformations of beta random variables. First, we draw $y_{i} (i = 1, \dots, n)$ iid from $B e t a (α, ν)$ . After that, $x_{i} (i = 1, \dots, n)$ are obtained by the transformations $x_{i} = (1 / y_{i} - 1) / β$ .

Let

α = 1

,

β = 1 / 2

,

ν = 3

, and n = 10,000. Figure 6 shows the histograms of the samples generated from

m_{2} (x | α = 1, β = 1 / 2, ν = 3) = m_{1} (x | α = 1, β = 2, ν = 3)

by the three methods:

(a): Method 1: $x_{i} (i = 1, \dots, n)$ are generated from the $G - \tilde{I G}$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $\tilde{I G} (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $G (ν, θ_{i})$ .
(b): Method 2: $x_{i} (i = 1, \dots, n)$ are generated from the $\tilde{G} - \tilde{G}$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $\tilde{G} (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $\tilde{G} (ν, θ_{i})$ .
(c): Method 3: $x_{i} (i = 1, \dots, n)$ are generated from the transformations of beta random variables. First, we draw $y_{i} (i = 1, \dots, n)$ iid from $B e t a (α, ν)$ . After that, $x_{i} (i = 1, \dots, n)$ are obtained by the transformations $x_{i} = (1 / y_{i} - 1) β$ .

Let

α = 3

,

β = 1 / 2

,

ν = 1

, and

n = 10, 000

. Figure 7 shows the histograms of the samples generated from

m_{3} (x | α = 3, β = 1 / 2, ν = 1) = m_{1} (x | α = 1, β = 2, ν = 3)

by the three methods:

(a): Method 1: $x_{i} (i = 1, \dots, n)$ are generated from the $I G - I G$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $I G (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $I G (ν, θ_{i})$ .
(b): Method 2: $x_{i} (i = 1, \dots, n)$ are generated from the $\tilde{I G} - G$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $G (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $\tilde{I G} (ν, θ_{i})$ .
(c): Method 3: $x_{i} (i = 1, \dots, n)$ are generated from the transformations of beta random variables. First, we draw $z_{i} (i = 1, \dots, n)$ iid from $B e t a (ν, α)$ . After that, $x_{i} (i = 1, \dots, n)$ are obtained by the transformations $x_{i} = (1 / z_{i} - 1) β$ .

Let

α = 3

,

β = 2

,

ν = 1

, and

n = 10, 000

. Figure 8 shows the histograms of the samples generated from

m_{4} (x | α = 3, β = 2, ν = 1) = m_{1} (x | α = 1, β = 2, ν = 3)

by the three methods:

(a): Method 1: $x_{i} (i = 1, \dots, n)$ are generated from the $I G - \tilde{I G}$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $\tilde{I G} (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $I G (ν, θ_{i})$ .
(b): Method 2: $x_{i} (i = 1, \dots, n)$ are generated from the $\tilde{I G} - \tilde{G}$ model. First, we draw $θ_{i} (i = 1, \dots, n)$ iid from $\tilde{G} (α, β)$ . After that, we draw $x_{i} (i = 1, \dots, n)$ independently from $\tilde{I G} (ν, θ_{i})$ .
(c): Method 3: $x_{i} (i = 1, \dots, n)$ are generated from the transformations of beta random variables. First, we draw $z_{i} (i = 1, \dots, n)$ iid from $B e t a (ν, α)$ . After that, $x_{i} (i = 1, \dots, n)$ are obtained by the transformations $x_{i} = (1 / z_{i} - 1) / β$ .

From the three plots in each of Figure 5, Figure 6, Figure 7 and Figure 8, we see that the histograms of the samples generated from

m_{i} (x | α, β, ν) (i = 1, 2, 3, 4)

by the three methods are almost identical, which illustrates the theoretical result that the random variables can be generated from

m_{i} (x | α, β, ν) (i = 1, 2, 3, 4)

by the three methods.

3.3. Transformations of the Moment Estimators of the Hyperparameters of One of the Eight Hierarchical Models That Have Conjugate Priors

In this subsection, we illustrate that the transformations of the moment estimators of the hyperparameters of one of the 8 hierarchical models that have conjugate priors can be used to obtain the moment estimators of the hyperparameters of another one of the 8 hierarchical models that have conjugate priors. For example, the transformations of the moment estimators of the hyperparameters of the

I G - I G

model can be used to obtain the moment estimators of the hyperparameters of the other seven hierarchical models that have conjugate priors (

G - I G

,

G - \tilde{I G}

,

\tilde{G} - G

,

\tilde{G} - \tilde{G}

,

I G - \tilde{I G}

,

\tilde{I G} - G

, and

\tilde{I G} - \tilde{G}

). The goal of this subsection is to illustrate (3).

Let us illustrate the derivations of the moment estimators of the hyperparameters by Model 5 (

I G - I G

). The first three moments of X are, respectively, given by

E X = \frac{α β}{ν - 1}, for ν > 1,

E X^{2} = \frac{(α + 1) α β^{2}}{(ν - 1) (ν - 2)}, for ν > 2,

and

E X^{3} = \frac{(α + 2) (α + 1) α β^{3}}{(ν - 1) (ν - 2) (ν - 3)}, for ν > 3,

which can be obtained by iterated expectation. The moment estimators of

α

,

β

, and

ν

are calculated by equating the population moments to the sample moments, that is,

\begin{matrix} E X & = \frac{α β}{ν - 1} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} = A_{1}, \\ E X^{2} & = \frac{(α + 1) α β^{2}}{(ν - 1) (ν - 2)} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{2} = A_{2}, \\ E X^{3} & = \frac{(α + 2) (α + 1) α β^{3}}{(ν - 1) (ν - 2) (ν - 3)} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{3} = A_{3}, \end{matrix}

where

A_{k} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}^{k}

,

k = 1, 2, 3

, is the sample kth moment of X. After some tedious calculations, we obtain

\begin{matrix} α_{5} (n) & = \frac{2 A_{1}^{2} A_{3} - 2 A_{1} A_{2}^{2}}{- 2 A_{1}^{2} A_{3} + A_{1} A_{2}^{2} + A_{2} A_{3}}, \\ β_{5} (n) & = \frac{- 2 A_{1}^{2} A_{3} + A_{1} A_{2}^{2} + A_{2} A_{3}}{A_{1}^{2} A_{2} + A_{1} A_{3} - 2 A_{2}^{2}}, \\ ν_{5} (n) & = \frac{A_{1}^{2} A_{2} + 3 A_{1} A_{3} - 4 A_{2}^{2}}{A_{1}^{2} A_{2} + A_{1} A_{3} - 2 A_{2}^{2}} . \end{matrix}

The calculations of the moment estimators of the hyperparameters of the eight hierarchical models that have conjugate priors can be found in the Supplementary Materials.

Let

α = 4

,

β = 5

,

ν = 6

, and

n = 1, 000, 000

. The reason why we chose these values is that

α > 3

,

β > 0

, and

ν > 3

are required in the moment estimations of the hyperparameters. Other numerical values of the hyperparameters can also be specified.

First, let us generate a random sample

x

from the

I G - I G (α = 4, β = 5, ν = 6)

model. The histogram of the sample

x

with the marginal density

m_{3} (x | α = 4, β = 5, ν = 6)

is plotted in Figure 9. From the figure, we see that the histogram fits the marginal density very well.

Second, let us compute the moment estimators of the hyperparameters of the eight hierarchical models that have conjugate priors from the sample

x

, and the numerical results are summarized in Table 2. Moreover, the transformations (3) of the moment estimators of the hyperparameters of the

I G - I G

model are also summarized in this table. More precisely, for Model 5 (

I G - I G

) and Model 7 (

\tilde{I G} - G

), the identical transformation was used (

α

,

β

, and

ν

are unchanged); for Model 2 (

G - \tilde{I G}

) and Model 4 (

\tilde{G} - \tilde{G}

), the transformation was the position change of

α

and

ν

(

α \leftrightarrow ν

); for Model 6 (

I G - \tilde{I G}

) and Model 8 (

\tilde{I G} - \tilde{G}

), the transformation was replacing

β

by

1 / β

(

β \leftrightarrow 1 / β

); for Model 1 (

G - I G

) and Model 3 (

\tilde{G} - G

), the transformation was both the position change of

α

and

ν

and replacing

β

by

1 / β

(

α \leftrightarrow ν

,

β \leftrightarrow 1 / β

). From the table, we see that the moment estimators of the hyperparameters of the eight hierarchical models that have conjugate priors and the transformations of the moment estimators of the hyperparameters of the

I G - I G

model are the same.

3.4. The Marginal Densities of the Eight Hierarchical Models That Do Not Have Conjugate Priors

In this subsection, we obtain some conclusions about the properties of the eight marginal densities that do not have a closed form by simulation techniques and numerical integration. We take the hierarchical

G - G

model as an example. The results for the other seven hierarchical models that do not have conjugate priors are similar, and thus, they are omitted. The results thus obtained are compatible with those in Section 2.2. The goal of this subsection is to obtain some conclusions about the properties of the eight marginal densities that do not have a closed form.

The histogram of the sample with sample size

n = 5000

from the hierarchical

G (ν, θ) - G (α, β)

model with

α = 1

,

β = 2

, and

ν = 3

and the fit marginal densities are plotted in Figure 10. In Plot (a), the histogram is fit by

m_{i} (x | α = 1, β = 2, ν = 3)

,

i = 1, 2, 3, 4

. From this plot, we see that the histogram is poorly fit by the marginal densities. In other words, the four marginal densities

m_{i} (x | α = 1, β = 2, ν = 3)

,

i = 1, 2, 3, 4

, are not the marginal densities for the hierarchical

G (ν, θ) - G (α, β)

model with

α = 1

,

β = 2

, and

ν = 3

. In Plot (b), the histogram is fit by the four marginal densities

m_{i} (x | α = {\hat{α}}_{i}, β = {\hat{β}}_{i}, ν = {\hat{ν}}_{i})

,

i = 1, 2, 3, 4

, where

{\hat{α}}_{i}

,

{\hat{β}}_{i}

, and

{\hat{ν}}_{i}

are the moment estimators of the hyperparameters. Moreover, the histogram is also fit by

m_{G - G} (x | α = 1, β = 2, ν = 3)

given by (1). From this plot, we see that the four marginal densities

m_{i} (x | α = {\hat{α}}_{i}, β = {\hat{β}}_{i}, ν = {\hat{ν}}_{i})

,

i = 1, 2, 3, 4

, are the same, as the four curves overlap. It is easy to see that the four marginal densities do not fit the histogram well, especially when x is close to 0. It is worth noting that the marginal density

m_{G - G} (x | α = 1, β = 2, ν = 3)

fits the histogram very well. We would like to point out that the marginal density

m_{G - G} (x | α, β, ν)

given by (1) was numerically evaluated by an R built-in function integrate().

The fit marginal densities, D-values, and p-values of the one-sample Kolmogorov–Smirnov (KS) test for a sample with sample size

n = 5000

from the hierarchical

G (ν, θ) - G (α, β)

model when

α = 1

,

β = 2

, and

ν = 3

are summarized in Table 3. In this table, it is worth noting that the D-values and p-values of the four marginal densities

m_{i} (x | α = {\hat{α}}_{i}, β = {\hat{β}}_{i}, ν = {\hat{ν}}_{i})

,

i = 1, 2, 3, 4

, are the same, where

{\hat{α}}_{i}

,

{\hat{β}}_{i}

, and

{\hat{ν}}_{i}

are the moment estimators of the hyperparameters. From the table, we see that the p-values for the first eight marginal densities are all less than

0.05

, which indicates that the sample is not from the first eight marginal densities. However, the p-value for the marginal density

m_{G - G} (x | α = 1, β = 2, ν = 3)

is

0.792 > 0.05

, which indicates that the sample can be regarded as generated from the marginal density

m_{G - G} (x | α = 1, β = 2, ν = 3)

. Moreover, the D-value for the marginal density

m_{G - G} (x | α = 1, β = 2, ν = 3)

is

0.009

, which is the smallest among the D-values in this table. Therefore, the marginal density of the hierarchical

G (ν, θ) - G (α, β)

model when

α = 1

,

β = 2

, and

ν = 3

,

m_{G - G} (x | α = 1, β = 2, ν = 3)

, is not from the family of marginal densities (4). It is worth pointing out that the argument stop.on.error = FALSE is needed in integrate() to avoid the error that the “roundoff error is detected in the extrapolation table” when x is a small value, less than, say,

0.035

, when applying the one-sample KS test for

m_{G - G} (x | α = 1, β = 2, ν = 3)

.

4. A Real Data Example

In this section, we illustrate our method by a real data example.

The data were the close prices of the Shanghai, Shenzhen, and Beijing (SSB) A shares on 6 May 2022. The official name of the A shares is Renminbi (RMB) ordinary shares, that is shares issued by Chinese-registered companies, listed in China, and marked with a face value in RMB for trading and subscription in RMB by individuals and institutions in China (excluding Hong Kong, Macao, and Taiwan). There are 4593 SSB A shares in total; however, 18 of them did not have close prices on that day because of the suspension of trading. Therefore, the sample size of the real data (henceforth, the original data

x

) was 4575

(= 4593 - 18)

. The range of the original data

x

was

[1.21, 1793.00]

, meaning the lowest close price was RMB

1.21

and the highest close price was RMB

1793.00

. Some basic statistical indicators of the original data

x

are summarized in Table 4. From this table, we see that the original data

x

are right-skewed (skewness =

18.12 > 0

) and fat-tailed (kurtosis =

621.40 > 0

).

Moreover, the histogram of the original data

x

with a fit marginal density

m_{1} (x | α = 3.367, β = 0.010, ν = 0.482)

is plotted in Figure 11. From the figure, we see that the marginal density

m_{1} (x | α = 3.367, β = 0.010, ν = 0.482)

does not fit the original data

x

well. Furthermore, the one-sample KS test shows that the D-value is

0.30204

and the p-value (<2.2

\times 10^{- 16}

) is less than

0.05

, indicating a bad fit of the original data

x

by the marginal density.

Since the marginal density with the hyperparameters estimated by the moment estimators from the original data

x

does not fit the original data

x

well, we would like to transform the original data

x

to see whether the marginal density with the hyperparameters estimated by the moment estimators from the transformed data fits the transformed data well.

In the following, we encounter some vector operations, which are defined componentwise. For example, let

x = (x_{1}, \dots, x_{n})

and

y = (y_{1}, \dots, y_{n})

be two vectors of the same length and c be a constant. Then,

\begin{matrix} x + y & = (x_{1} + y_{1}, \dots, x_{n} + y_{n}), \\ x - y & = (x_{1} - y_{1}, \dots, x_{n} - y_{n}), \\ x \cdot y & = (x_{1} y_{1}, \dots, x_{n} y_{n}), \\ x / y & = (x_{1} / y_{1}, \dots, x_{n} / y_{n}), \\ log (x) & = (log (x_{1}), \dots, log (x_{n})), \\ \sqrt{x} & = (\sqrt{x_{1}}, \dots, \sqrt{x_{n}}), \\ 1 / x & = (1 / x_{1}, \dots, 1 / x_{n}), \\ x \pm c & = (x_{1} \pm c, \dots, x_{n} \pm c) . \end{matrix}

Because

x > 0

in the marginal density

m_{1} (x | α, β, ν)

, our ideas for all the transformations that will be used here are to ensure that the transformed data

y_{i} > 0

. A modified transformation is

y_{2} = x - min (x) + 1 \times 10^{- 50},

where

min (x) = 1.21

is the minimum of

x

,

x - min (x) \geq 0

, adding 1

\times 10^{- 50}

ensures that

y_{2} > 0

, and the range of

y_{2}

is

[1 \times 10^{- 50}, 1, 791.79]

. The moment estimators of the hyperparameters are

α_{1} (n) = 3.388

,

β_{1} (n) = 0.009

, and

ν_{1} (n) = 0.415

. Moreover, the one-sample KS test shows that the D-value is

0.2772

and the p-value (<2.2

\times 10^{- 16}

) is less than

0.05

, still indicating a bad fit of the transformed data

y_{2}

by the marginal density

m_{1} (x | α, β, ν)

.

Next, we proceed with a reciprocal transformation:

y_{3} = \frac{1}{x},

where the range of

y_{3}

is

[0.001, 0.826]

and

y_{3} > 0

. The moment estimators of the hyperparameters are

α_{1} (n) = - 93.694

,

β_{1} (n) = - 0.123

, and

ν_{1} (n) = 1.407

. Because the moment estimators of

α

and

β

are negative, the one-sample KS test cannot be performed. Hence, the marginal density

m_{1} (x | α, β, ν)

cannot be used to fit the transformed data

y_{3}

.

Similarly, we use a modified reciprocal transformation:

y_{4} = \frac{1}{x} - min (\frac{1}{x}) + 1 \times 10^{- 50},

so that

y_{4} > 0

, where

min (1 / x) = 0.0005577245

, and the range of

y_{4}

is

[1 \times 10^{- 50}, 0.826]

. The moment estimators of the hyperparameters are

α_{1} (n) = - 78.616

,

β_{1} (n) = - 0.145

, and

ν_{1} (n) = 1.388

. Because the moment estimators of

α

and

β

are negative, the one-sample KS test cannot be performed. Hence, the marginal density

m_{1} (x | α, β, ν)

cannot be used to fit the transformed data

y_{4}

.

After that, we continue with a log transformation:

y_{5} = log (x),

where the range of

y_{5}

is

[0.191, 7.492]

and

y_{5} > 0

. The moment estimators of the hyperparameters are

α_{1} (n) = - 56.947

,

β_{1} (n) = - 0.042

, and

ν_{1} (n) = 6.081

. Because the moment estimators of

α

and

β

are negative, the one-sample KS test cannot be performed. Hence, the marginal density

m_{1} (x | α, β, ν)

cannot be used to fit the transformed data

y_{5}

.

Similarly, we carry on with a modified log transformation:

y_{6} = log (x) - min (log (x)) + 1 \times 10^{- 50},

so that

y_{6} > 0

, where

min (log (x)) = 0.1906204

, and the range of

y_{6}

is

[1 \times 10^{- 50}, 7.301]

. The moment estimators of the hyperparameters are

α_{1} (n) = - 31.706

,

β_{1} (n) = - 0.065

, and

ν_{1} (n) = 4.866

. Because the moment estimators of

α

and

β

are negative, the one-sample KS test cannot be performed. Hence, the marginal density

m_{1} (x | α, β, ν)

cannot be used to fit the transformed data

y_{6}

.

Then, we attempt a square root transformation:

y_{7} = \sqrt{x},

where the range of

y_{7}

is

[1.100, 42.344]

and

y_{7} > 0

. The moment estimators of the hyperparameters are

α_{1} (n) = 5.072

,

β_{1} (n) = 2.109

, and

ν_{1} (n) = 33.695

. Moreover, the one-sample KS test shows that the D-value is

0.032274

and the p-value (=0.0001452) is less than

0.05

, still indicating a bad fit of the transformed data

y_{7}

by the marginal density

m_{1} (x | α, β, ν)

.

Finally, we seek out a modified square root transformation:

y_{8} = \sqrt{x} - min (\sqrt{x}) + 1 \times 10^{- 50},

so that

y_{8} > 0

, where

min (\sqrt{x}) = 1.100

, and the range of

y_{8}

is

[1 \times 10^{- 50}, 41.244]

. The moment estimators of the hyperparameters are

α_{1} (n) = 5.375

,

β_{1} (n) = 0.257

, and

ν_{1} (n) = 3.175

. Moreover, the one-sample KS test shows that the D-value is

0.018468

and the p-value (=0.08825) is larger than

0.05

, indicating a good fit of the transformed data

y_{8}

by the marginal density

m_{1} (x | α, β, ν)

.

The transformations, the moment estimators of the hyperparameters of the marginal density

m_{1} (x | α, β, ν)

, and the D-value and p-value of the KS test are summarized in Table 5. In the table, the NA value indicates that the one-sample KS test cannot be performed due to some errors. From the table, we see that only the modified square root transformation can be statistically and significantly used to fit the transformed data

y_{8}

by the marginal density

m_{1} (x | α, β, ν)

. The transformation is

y_{8} = \sqrt{x} - 1.1

or

x = {(y_{8} + 1.1)}^{2} .

The histogram of the transformed data

y_{8}

with a fit marginal density

m_{1} (x | α = 5.375, β = 0.257, ν = 3.175)

is plotted in Figure 12. From the figure, we see that the marginal density

m_{1} (x | α = 5.375, β = 0.257, ν = 3.175)

fits the transformed data

y_{8}

very well.

5. Conclusions and Discussion

For the 16 hierarchical models of the gamma and inverse gamma distributions, there are only 8 of them that have conjugate priors, that is

G - I G

,

G - \tilde{I G}

,

\tilde{G} - G

,

\tilde{G} - \tilde{G}

,

I G - I G

,

I G - \tilde{I G}

,

\tilde{I G} - G

, and

\tilde{I G} - \tilde{G}

. We first discussed some common typical problems for the eight hierarchical models that do not have conjugate priors. Then, we calculated the Bayesian posterior densities and marginal densities of the eight hierarchical models that have conjugate priors. After that, we discussed the relations among the eight analytical marginal densities of the hierarchical models that have conjugate priors and found that there is only one family of marginal densities. Furthermore, we found some relations (6) among the random variables of the marginal densities and the beta densities. Moreover, we discussed the random variable generations for the gamma and inverse gamma distributions

G (α, β)

,

I G (α, β)

,

\tilde{G} (α, β)

, and

\tilde{I G} (α, β)

by using the R software.

In the numerical simulations, we plotted the marginal densities

m_{1} (x | α, β, ν)

given by (2) for various hyperparameters

α

,

β

, and

ν

. Moreover, we illustrated the theoretical result that the random variables can be generated from

m_{i} (x | α, β, ν) (i = 1, 2, 3, 4)

by three methods from the almost identical histograms of the samples. Furthermore, we illustrated that the transformations of the moment estimators of the hyperparameters of one of the 8 hierarchical models that have conjugate priors can be used to obtain the moment estimators of the hyperparameters of another one of the 8 hierarchical models that have conjugate priors. In addition, we obtained some conclusions about the properties of the eight marginal densities that do not have a closed form by simulation techniques and numerical integration, taking the hierarchical

G - G

model as an example.

We illustrated our method by a real data example, in which the original and transformed data, the close prices of the SSB A shares, were fit by the marginal density

m_{1} (x | α, β, ν)

with different hyperparameters. Because

x > 0

in the marginal density

m_{1} (x | α, β, ν)

, our ideas for all the transformations that were used were to ensure that the transformed data

y_{i} > 0

,

i = 1, \dots, 8

.

We emphasize that the 8 hierarchical models that have conjugate priors are suitable for the study of positive, continuous, and right-skewed data, as the marginal densities of the 8 hierarchical models are positive, continuous, and right-skewed distributions. Moreover, the KS test can be used to judge whether the positive, continuous, and right-skewed data have a good fit by the marginal densities.

The eight hierarchical models that have conjugate priors are of particular interest in empirical Bayes analysis, which relies on conjugate prior modeling, where the hyperparameters are estimated from the observations by the marginal densities and the estimated prior is then used as a regular prior in the subsequent inference. See [40,48,49,50] and the references therein.

Supplementary Materials

The following Supporting Information can be downloaded at: https://www.mdpi.com/article/10.3390/math10214005/s1, Supplementary: Some proofs of the article. R folder: R codes used in the article. The R folder will be supplied after acceptance of the article.

Author Contributions

Conceptualization, Y.-Y.Z.; funding acquisition, Y.-Y.Z.; investigation, L.Z. and Y.-Y.Z.; methodology, L.Z. and Y.-Y.Z.; software, L.Z. and Y.-Y.Z.; validation, L.Z. and Y.-Y.Z.; writing—original draft preparation, L.Z. and Y.-Y.Z.; writing—review and editing, L.Z. and Y.-Y.Z. All authors have read and agreed to the current version of the manuscript.

Funding

The research was supported by the National Social Science Fund of China (21XTJ001).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Casella, G.; Berger, R.L. Statistical Inference, 2nd ed.; Pacific Grove: Duxbury, MA, USA, 2002. [Google Scholar]
Shumway, R.; Gurland, J. Fitting the poisson binomial distribution. Biometrics 1960, 16, 522–533. [Google Scholar] [CrossRef]
Chen, L.H.Y. Convergence of poisson binomial to poisson distributions. Ann. Probab. 1974, 2, 178–180. [Google Scholar] [CrossRef]
Ehm, W. Binomial approximation to the poisson binomial distribution. Stat. Probab. Lett. 1991, 11, 7–16. [Google Scholar] [CrossRef]
Daskalakis, C.; Diakonikolas, I.; Servedio, R.A. Learning Poisson Binomial Distributions. Algorithmica 2015, 72, 316–357. [Google Scholar] [CrossRef]
Duembgen, L.; Wellner, J.A. The density ratio of Poisson binomial versus Poisson distributions. Stat. Probab. Lett. 2020, 165, 1–7. [Google Scholar] [CrossRef]
Geoffroy, P.; Weerakkody, G. A Poisson-Gamma model for two-stage cluster sampling data. J. Stat. Comput. Simul. 2001, 68, 161–172. [Google Scholar] [CrossRef]
Vijayaraghavan, R.; Rajagopal, K.; Loganathan, A. A procedure for selection of a gamma-Poisson single sampling plan by attributes. J. Appl. Stat. 2008, 35, 149–160. [Google Scholar] [CrossRef]
Wang, J.P. Estimating species richness by a Poisson-compound gamma model. Biometrika 2010, 97, 727–740. [Google Scholar] [CrossRef] [PubMed]
Jakimauskas, G.; Sakalauskas, L. Note on the singularity of the Poisson-gamma model. Stat. Probab. Lett. 2016, 114, 86–92. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Wang, Z.Y.; Duan, Z.M.; Mi, W. The empirical Bayes estimators of the parameter of the Poisson distribution with a conjugate gamma prior under Stein’s loss function. J. Stat. Comput. Simul. 2019, 89, 3061–3074. [Google Scholar] [CrossRef]
Schmidt, M.; Schwabe, R. Optimal designs for Poisson count data with Gamma block effects. J. Stat. Plan. Inference 2020, 204, 128–140. [Google Scholar] [CrossRef]
Cabras, S. A Bayesian-deep learning model for estimating COVID-19 evolution in Spain. Mathematics 2021, 9, 2921. [Google Scholar] [CrossRef]
Wu, S.J. Poisson-Gamma mixture processes and applications to premium calculation. Commun. Stat.-Theory Methods 2022, 51, 5913–5936. [Google Scholar] [CrossRef]
Singh, S.K.; Singh, U.; Sharma, V.K. Expected total test time and Bayesian estimation for generalized Lindley distribution under progressively Type-II censored sample where removals follow the beta-binomial probability law. Appl. Math. Comput. 2013, 222, 402–419. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Zhou, M.Q.; Xie, Y.H.; Song, W.H. The Bayes rule of the parameter in (0,1) under the power-log loss function with an application to the beta-binomial model. J. Stat. Comput. Simul. 2017, 87, 2724–2737. [Google Scholar] [CrossRef]
Luo, R.; Paul, S. Estimation for zero-inflated beta-binomial regression model with missing response data. Stat. Med. 2018, 37, 3789–3813. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Xie, Y.H.; Song, W.H.; Zhou, M.Q. Three strings of inequalities among six Bayes estimators. Commun. Stat.-Theory Methods 2018, 47, 1953–1961. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Xie, Y.H.; Song, W.H.; Zhou, M.Q. The Bayes rule of the parameter in (0,1) under Zhang’s loss function with an application to the beta-binomial model. Commun. Stat.-Theory Methods 2020, 49, 1904–1920. [Google Scholar] [CrossRef]
Gerstenkorn, T. A compound of the generalized negative binomial distribution with the generalized beta distribution. Cent. Eur. J. Math. 2004, 2, 527–537. [Google Scholar] [CrossRef]
Broderick, T.; Mackey, L.; Paisley, J.; Jordan, M.I. Combinatorial clustering and the beta negative binomial process. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 290–306. [Google Scholar] [CrossRef] [PubMed]
Oliveira, C.C.F.; Cristino, C.T.; Lima, P.F. Unimodal behaviour of the negative binomial beta distribution. Sigmae 2015, 4, 1–5. [Google Scholar]
Heaukulani, C.; Roy, D.M. The combinatorial structure of beta negative binomial processes. Bernoulli 2016, 22, 2301–2324. [Google Scholar] [CrossRef]
Zhou, M.Q.; Zhang, Y.Y.; Sun, Y.; Sun, J.; Rong, T.Z.; Li, M.M. The empirical Bayes estimators of the probability parameter of the beta-negative binomial model under Zhang’s loss function. Chin. J. Appl. Probab. Stat. 2021, 37, 478–494. [Google Scholar]
Jiang, C.J.; Cockerham, C.C. Use of the multinomial dirichlet model for analysis of subdivided genetic populations. Genetics 1987, 115, 363–366. [Google Scholar] [CrossRef] [PubMed]
Lenk, P.J. Hierarchical bayes forecasts of multinomial dirichlet data applied to coupon redemptions. J. Forecast. 1992, 11, 603–619. [Google Scholar] [CrossRef]
Duncan, K.A.; Wilson, J.L. A Multinomial-Dirichlet Model for Analysis of Competing Hypotheses. Risk Anal. 2008, 28, 1699–1709. [Google Scholar] [CrossRef] [PubMed]
Samb, R.; Khadraoui, K.; Belleau, P.; Deschenes, A.; Lakhal-Chaieb, L.; Droit, A. Using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling. Stat. Appl. Genet. Mol. Biol. 2015, 14, 517–532. [Google Scholar] [CrossRef] [PubMed]
Grover, G.; Deo, V. Application of Parametric Survival Model and Multinomial-Dirichlet Bayesian Model within a Multi-state Setup for Cost-Effectiveness Analysis of Two Alternative Chemotherapies for Patients with Chronic Lymphocytic Leukaemia. Stat. Appl. 2020, 18, 35–53. [Google Scholar]
Mao, S.S.; Tang, Y.C. Bayesian Statistics, 2nd ed.; China Statistics Press: Beijing, China, 2012. [Google Scholar]
Zhang, Y.Y.; Ting, N. Bayesian sample size determination for a phase III clinical trial with diluted treatment effect. J. Biopharm. Stat. 2018, 28, 1119–1142. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.Y.; Ting, N. Sample size considerations for a phase III clinical trial with diluted treatment effect. Stat. Biopharm. Res. 2020, 12, 311–321. [Google Scholar] [CrossRef]
Zhang, Y.Y.; Rong, T.Z.; Li, M.M. The estimated and theoretical assurances and the probabilities of launching a phase iii trial. Chin. J. Appl. Probab. Stat. 2022, 38, 53–70. [Google Scholar]
Zhang, Y.Y.; Ting, N. Can the concept be proven? Stat. Biosci. 2021, 13, 160–177. [Google Scholar] [CrossRef]
Robert, C.P. The Bayesian Choice: From Decision-Theoretic Motivations to Computational Implementation, 2nd ed.; Springer: New York, NY, USA, 2007. [Google Scholar]
Zhang, Y.Y. The Bayes rule of the variance parameter of the hierarchical normal and inverse gamma model under Stein’s loss. Commun. Stat.-Theory Methods 2017, 46, 7125–7133. [Google Scholar] [CrossRef]
Chen, M.H. Bayesian Statistics Lecture; Statistics Graduate Summer School, School of Mathematics and Statistics, Northeast Normal University: Changchun, China, 2014. [Google Scholar]
Xie, Y.H.; Song, W.H.; Zhou, M.Q.; Zhang, Y.Y. The Bayes posterior estimator of the variance parameter of the normal distribution with a normal-inverse-gamma prior under Stein’s loss. Chin. J. Appl. Probab. Stat. 2018, 34, 551–564. [Google Scholar]
Zhang, Y.Y.; Rong, T.Z.; Li, M.M. The empirical Bayes estimators of the mean and variance parameters of the normal distribution with a conjugate normal-inverse-gamma prior by the moment method and the MLE method. Commun. Stat.-Theory Methods 2019, 48, 2286–2304. [Google Scholar] [CrossRef]
Sun, J.; Zhang, Y.Y.; Sun, Y. The empirical Bayes estimators of the rate parameter of the inverse gamma distribution with a conjugate inverse gamma prior under Stein’s loss function. J. Stat. Comput. Simul. 2021, 91, 1504–1523. [Google Scholar] [CrossRef]
Lee, M.; Gross, A. Lifetime distributions under unknown environment. J. Stat. Plan. Inference 1991, 29, 137–143. [Google Scholar]
Pham, T.; Almhana, J. The generalized gamma distribution: Its hazard rate and strength model. IEEE Trans. Reliab. 1995, 44, 392–397. [Google Scholar] [CrossRef]
Agarwal, S.K.; Kalla, S.L. A generalized gamma distribution and its application in reliability. Commun. Stat.-Theory Methods 1996, 25, 201–210. [Google Scholar] [CrossRef]
Agarwal, S.K.; Al-Saleh, J.A. Generalized gamma type distribution and its hazard rate function. Commun. Stat.-Theory Methods 2001, 30, 309–318. [Google Scholar] [CrossRef]
Kobayashi, K. On generalized gamma functions occurring in diffraction theory. J. Phys. Soc. Jpn. 1991, 60, 1501–1512. [Google Scholar] [CrossRef]
Al-Saleh, J.A.; Agarwal, S.K. Finite mixture of certain distributions. Commun. Stat.-Theory Methods 2002, 31, 2123–2137. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
Berger, J.O. Statistical Decision Theory and Bayesian Analysis, 2nd ed.; Springer: New York, NY, USA, 1985. [Google Scholar]
Maritz, J.S.; Lwin, T. Empirical Bayes Methods, 2nd ed.; Chapman & Hall: London, UK, 1989. [Google Scholar]
Carlin, B.P.; Louis, A. Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed.; Chapman & Hall: London, UK, 2000. [Google Scholar]

Figure 1. The gamma and inverse gamma distributions with a scale or rate parameter

β

.

Figure 1. The gamma and inverse gamma distributions with a scale or rate parameter

β

.

Figure 2. The marginal densities

m_{1} (x | α, β, ν)

for varied

α = 1, 2, 3, 4

, holding

β = 2

and

ν = 3

fixed.

Figure 2. The marginal densities

m_{1} (x | α, β, ν)

for varied

α = 1, 2, 3, 4

, holding

β = 2

and

ν = 3

fixed.

Figure 3. The marginal densities

m_{1} (x | α, β, ν)

for varied

β = 1, 2, 3, 4

, holding

α = 1

and

ν = 3

fixed.

Figure 3. The marginal densities

m_{1} (x | α, β, ν)

for varied

β = 1, 2, 3, 4

, holding

α = 1

and

ν = 3

fixed.

Figure 4. The marginal densities

m_{1} (x | α, β, ν)

for varied

ν = 1, 2, 3, 4

, holding

α = 1

and

β = 2

fixed.

Figure 4. The marginal densities

m_{1} (x | α, β, ν)

for varied

ν = 1, 2, 3, 4

, holding

α = 1

and

β = 2

fixed.

Figure 5. Random variable generations from

m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 1

,

β = 2

, and

ν = 3

. (a)

x_{i} (i = 1, \dots, n)

are iid from

G - I G

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{G} - G

. (c)

x_{i} = (1 / y_{i} - 1) / β

, where

y_{i}

are iid from

B e t a (α, ν)

for

i = 1, \dots, n

.

Figure 5. Random variable generations from

m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 1

,

β = 2

, and

ν = 3

. (a)

x_{i} (i = 1, \dots, n)

are iid from

G - I G

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{G} - G

. (c)

x_{i} = (1 / y_{i} - 1) / β

, where

y_{i}

are iid from

B e t a (α, ν)

for

i = 1, \dots, n

.

Figure 6. Random variable generations from

m_{2} (x | α = 1, β = 1 / 2, ν = 3) = m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 1

,

β = 1 / 2

, and

ν = 3

. (a)

x_{i} (i = 1, \dots, n)

are iid from

G - \tilde{I G}

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{G} - \tilde{G}

. (c)

x_{i} = (1 / y_{i} - 1) β

, where

y_{i}

are iid from

B e t a (α, ν)

for

i = 1, \dots, n

.

Figure 6. Random variable generations from

m_{2} (x | α = 1, β = 1 / 2, ν = 3) = m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 1

,

β = 1 / 2

, and

ν = 3

. (a)

x_{i} (i = 1, \dots, n)

are iid from

G - \tilde{I G}

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{G} - \tilde{G}

. (c)

x_{i} = (1 / y_{i} - 1) β

, where

y_{i}

are iid from

B e t a (α, ν)

for

i = 1, \dots, n

.

Figure 7. Random variable generations from

m_{3} (x | α = 3, β = 1 / 2, ν = 1) = m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 3

,

β = 1 / 2

, and

ν = 1

. (a)

x_{i} (i = 1, \dots, n)

are iid from

I G - I G

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{I G} - G

. (c)

x_{i} = (1 / z_{i} - 1) β

, where

z_{i}

are iid from

B e t a (ν, α)

for

i = 1, \dots, n

.

Figure 7. Random variable generations from

m_{3} (x | α = 3, β = 1 / 2, ν = 1) = m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 3

,

β = 1 / 2

, and

ν = 1

. (a)

x_{i} (i = 1, \dots, n)

are iid from

I G - I G

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{I G} - G

. (c)

x_{i} = (1 / z_{i} - 1) β

, where

z_{i}

are iid from

B e t a (ν, α)

for

i = 1, \dots, n

.

Figure 8. Random variable generations from

m_{4} (x | α = 3, β = 2, ν = 1) = m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 3

,

β = 2

, and

ν = 1

. (a)

x_{i} (i = 1, \dots, n)

are iid from

I G - \tilde{I G}

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{I G} - \tilde{G}

. (c)

x_{i} = (1 / z_{i} - 1) / β

, where

z_{i}

are iid from

B e t a (ν, α)

for

i = 1, \dots, n

.

Figure 8. Random variable generations from

m_{4} (x | α = 3, β = 2, ν = 1) = m_{1} (x | α = 1, β = 2, ν = 3)

by three methods with

α = 3

,

β = 2

, and

ν = 1

. (a)

x_{i} (i = 1, \dots, n)

are iid from

I G - \tilde{I G}

. (b)

x_{i} (i = 1, \dots, n)

are iid from

\tilde{I G} - \tilde{G}

. (c)

x_{i} = (1 / z_{i} - 1) / β

, where

z_{i}

are iid from

B e t a (ν, α)

for

i = 1, \dots, n

.

Figure 9. The histogram of the sample

x

with the marginal density

m_{3} (x | α = 4, β = 5, ν = 6)

.

Figure 9. The histogram of the sample

x

with the marginal density

m_{3} (x | α = 4, β = 5, ν = 6)

.

Figure 10. The histogram of the sample with sample size

n = 5000

from the hierarchical

G (ν, θ) - G (α, β)

model with

α = 1

,

β = 2

, and

ν = 3

and the fit marginal densities. (a) The histogram is fit by

m_{i} (x | α = 1, β = 2, ν = 3)

,

i = 1, 2, 3, 4

. (b) The histogram is fit by

m_{i} (x | α = {\hat{α}}_{i}, β = {\hat{β}}_{i}, ν = {\hat{ν}}_{i})

,

i = 1, 2, 3, 4

, where

{\hat{α}}_{i}

,

{\hat{β}}_{i}

, and

{\hat{ν}}_{i}

are the moment estimators of the hyperparameters. Moreover, the histogram is also fit by

m_{G - G} (x | α = 1, β = 2, ν = 3)

.

Figure 10. The histogram of the sample with sample size

n = 5000

from the hierarchical

G (ν, θ) - G (α, β)

model with

α = 1

,

β = 2

, and

ν = 3

and the fit marginal densities. (a) The histogram is fit by

m_{i} (x | α = 1, β = 2, ν = 3)

,

i = 1, 2, 3, 4

. (b) The histogram is fit by

m_{i} (x | α = {\hat{α}}_{i}, β = {\hat{β}}_{i}, ν = {\hat{ν}}_{i})

,

i = 1, 2, 3, 4

, where

{\hat{α}}_{i}

,

{\hat{β}}_{i}

, and

{\hat{ν}}_{i}

are the moment estimators of the hyperparameters. Moreover, the histogram is also fit by

m_{G - G} (x | α = 1, β = 2, ν = 3)

.

Figure 11. The histogram of the original data

x

with a fit marginal density

m_{1} (x | α = 3.367, β = 0.010, ν = 0.482)

.

Figure 11. The histogram of the original data

x

with a fit marginal density

m_{1} (x | α = 3.367, β = 0.010, ν = 0.482)

.

Figure 12. The histogram of the transformed data

y_{8}

with a fit marginal density

m_{1} (x | α = 5.375, β = 0.257, ν = 3.175)

.

Figure 12. The histogram of the transformed data

y_{8}

with a fit marginal density

m_{1} (x | α = 5.375, β = 0.257, ν = 3.175)

.

Table 1. The 16 hierarchical models of the gamma and inverse gamma distributions. The 8 hierarchical models that have conjugate priors are highlighted in boxes.

	Prior
Likelihood	$G (α, β)$	$\tilde{G} (α, β)$	$IG (α, β)$	$\tilde{IG} (α, β)$
$G (ν, θ)$	$G (ν, θ) - G (α, β)$	$G (ν, θ) - \tilde{G} (α, β)$	$G (ν, θ) - I G (α, β)$	$G (ν, θ) - \tilde{I G} (α, β)$
$\tilde{G} (ν, θ)$	$\tilde{G} (ν, θ) - G (α, β)$	$\tilde{G} (ν, θ) - \tilde{G} (α, β)$	$\tilde{G} (ν, θ) - I G (α, β)$	$\tilde{G} (ν, θ) - \tilde{I G} (α, β)$
$I G (ν, θ)$	$I G (ν, θ) - G (α, β)$	$I G (ν, θ) - \tilde{G} (α, β)$	$I G (ν, θ) - I G (α, β)$	$I G (ν, θ) - \tilde{I G} (α, β)$
$\tilde{I G} (ν, θ)$	$\tilde{I G} (ν, θ) - G (α, β)$	$\tilde{I G} (ν, θ) - \tilde{G} (α, β)$	$\tilde{I G} (ν, θ) - I G (α, β)$	$\tilde{I G} (ν, θ) - \tilde{I G} (α, β)$

Table 2. The moment estimators of the hyperparameters of the 8 hierarchical models that have conjugate priors and the transformations of the moment estimators of the hyperparameters of the

I G - I G

model.

Table 2. The moment estimators of the hyperparameters of the 8 hierarchical models that have conjugate priors and the transformations of the moment estimators of the hyperparameters of the

I G - I G

model.

	Moment Estimators of the Hyperparameters			Transformations from the $IG - IG$ Model
	$α$	$β$	$ν$	$α$	$β$	$ν$
Model 1 ( $G - I G$ )	$5.779$	$0.218$	$4.178$	$5.779$	$0.218$	$4.178$
Model 2 ( $G - \tilde{I G}$ )	$5.779$	$4.579$	$4.178$	$5.779$	$4.579$	$4.178$
Model 3 ( $\tilde{G} - G$ )	$5.779$	$0.218$	$4.178$	$5.779$	$0.218$	$4.178$
Model 4 ( $\tilde{G} - \tilde{G}$ )	$5.779$	$4.579$	$4.178$	$5.779$	$4.579$	$4.178$
Model 5 ( $I G - I G$ )	$4.178$	$4.579$	$5.779$	$4.178$	$4.579$	$5.779$
Model 6 ( $I G - \tilde{I G}$ )	$4.178$	$0.218$	$5.779$	$4.178$	$0.218$	$5.779$
Model 7 ( $\tilde{I G} - G$ )	$4.178$	$4.579$	$5.779$	$4.178$	$4.579$	$5.779$
Model 8 ( $\tilde{I G} - \tilde{G}$ )	$4.178$	$0.218$	$5.779$	$4.178$	$0.218$	$5.779$

Table 3. The fit marginal densities, D-values, and p-values of the one-sample KS test for a sample with sample size

n = 5000

from the hierarchical

G (ν, θ) - G (α, β)

model when

α = 1

,

β = 2

, and

ν = 3

.

Table 3. The fit marginal densities, D-values, and p-values of the one-sample KS test for a sample with sample size

n = 5000

from the hierarchical

G (ν, θ) - G (α, β)

model when

α = 1

,

β = 2

, and

ν = 3

.

Fit Marginal Densities	D-Value	p-Value
$m_{1} (x \| α = 1, β = 2, ν = 3)$	$0.169$	<2.2 $\times 10^{- 16}$
$m_{2} (x \| α = 1, β = 2, ν = 3)$	$0.259$	<2.2 $\times 10^{- 16}$
$m_{3} (x \| α = 1, β = 2, ν = 3)$	$0.534$	<2.2 $\times 10^{- 16}$
$m_{4} (x \| α = 1, β = 2, ν = 3)$	$0.777$	<2.2 $\times 10^{- 16}$
$m_{1} (x \| α = 11.887, β = 0.011, ν = 0.724)$	$0.036$	4.7 $\times 10^{- 6}$
$m_{2} (x \| α = 11.887, β = 91.382, ν = 0.724)$	$0.036$	4.7 $\times 10^{- 6}$
$m_{3} (x \| α = 0.724, β = 91.382, ν = 11.887)$	$0.036$	4. $\times 10^{- 6}$
$m_{4} (x \| α = 0.724, β = 0.011, ν = 11.887)$	$0.036$	4.7 $\times 10^{- 6}$
$m_{G - G} (x \| α = 1, β = 2, ν = 3)$	$0.009$	$0.792$

Table 4. Some basic statistical indicators of the original data

x

.

Table 4. Some basic statistical indicators of the original data

x

.

Sample Size	Mean	Variance	Standard Deviation	Skewness	Kurtosis
4575	$21.01$	$1907.64$	$43.68$	$18.12$	$621.40$

Table 5. The transformations, the moment estimators of the hyperparameters of the marginal density

m_{1} (x | α, β, ν)

, and the D-value and p-value of the KS test.

Table 5. The transformations, the moment estimators of the hyperparameters of the marginal density

m_{1} (x | α, β, ν)

, and the D-value and p-value of the KS test.

Transformations	Moment Estimators			KS Test
	$α$	$β$	$ν$	D-Value	p-Value
$y_{1} = x$	$3.367$	$0.010$	$0.482$	$0.3020$	<2.2 $\times 10^{- 16}$
$y_{2} = x - min (x) + 1 \times 10^{- 50}$	$3.388$	$0.009$	$0.415$	$0.2772$	<2.2 $\times 10^{- 6}$
$y_{3} = 1 / x$	$- 93.694$	$- 0.123$	$1.407$	NA	NA
$y_{4} = 1 / x - min (1 / x) + 1 \times 10^{- 50}$	$- 78.616$	$- 0.145$	$1.388$	NA	NA
$y_{5} = log (x)$	$- 56.947$	$- 0.042$	$6.081$	NA	NA
$y_{6} = log (x) - min (log (x)) + 1 \times 10^{- 50}$	$- 31.706$	$- 0.065$	$4.866$	NA	NA
$y_{7} = \sqrt{x}$	$5.072$	$2.109$	$33.695$	$0.0323$	$0.0001$
$y_{8} = \sqrt{x} - min (\sqrt{x}) + 1 \times 10^{- 50}$	$5.375$	$0.257$	$3.175$	$0.0185$	$0.0883$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Zhang, Y.-Y. The Bayesian Posterior and Marginal Densities of the Hierarchical Gamma–Gamma, Gamma–Inverse Gamma, Inverse Gamma–Gamma, and Inverse Gamma–Inverse Gamma Models with Conjugate Priors. Mathematics 2022, 10, 4005. https://doi.org/10.3390/math10214005

AMA Style

Zhang L, Zhang Y-Y. The Bayesian Posterior and Marginal Densities of the Hierarchical Gamma–Gamma, Gamma–Inverse Gamma, Inverse Gamma–Gamma, and Inverse Gamma–Inverse Gamma Models with Conjugate Priors. Mathematics. 2022; 10(21):4005. https://doi.org/10.3390/math10214005

Chicago/Turabian Style

Zhang, Li, and Ying-Ying Zhang. 2022. "The Bayesian Posterior and Marginal Densities of the Hierarchical Gamma–Gamma, Gamma–Inverse Gamma, Inverse Gamma–Gamma, and Inverse Gamma–Inverse Gamma Models with Conjugate Priors" Mathematics 10, no. 21: 4005. https://doi.org/10.3390/math10214005

APA Style

Zhang, L., & Zhang, Y.-Y. (2022). The Bayesian Posterior and Marginal Densities of the Hierarchical Gamma–Gamma, Gamma–Inverse Gamma, Inverse Gamma–Gamma, and Inverse Gamma–Inverse Gamma Models with Conjugate Priors. Mathematics, 10(21), 4005. https://doi.org/10.3390/math10214005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Bayesian Posterior and Marginal Densities of the Hierarchical Gamma–Gamma, Gamma–Inverse Gamma, Inverse Gamma–Gamma, and Inverse Gamma–Inverse Gamma Models with Conjugate Priors

Abstract

1. Introduction

2. Main Results

2.1. Preliminaries

2.2. Common Typical Problems for the Eight Hierarchical Models That Do Not Have Conjugate Priors

2.3. The Bayesian Posterior and Marginal Densities of the Eight Hierarchical Models That Have Conjugate Priors

2.4. Relations among the Marginal Densities

2.5. Relations among the Random Variables of the Marginal Densities and the Beta Densities

2.6. Random Variable Generations for the Gamma and Inverse Gamma Distributions

3. Simulations

3.1. Marginal Densities for Various Hyperparameters

3.2. Random Variable Generations from the Marginal Density by Three Methods

3.3. Transformations of the Moment Estimators of the Hyperparameters of One of the Eight Hierarchical Models That Have Conjugate Priors

3.4. The Marginal Densities of the Eight Hierarchical Models That Do Not Have Conjugate Priors

4. A Real Data Example

5. Conclusions and Discussion

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI