Measuring Bayesian Robustness Using Rényi Divergence

Al-Labadi, Luai; Asl, Forough Fazeli; Wang, Ce

doi:10.3390/stats4020018

Open AccessArticle

Measuring Bayesian Robustness Using Rényi Divergence

by

Luai Al-Labadi

^1,*,

Forough Fazeli Asl

²

and

Ce Wang

¹

Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada

²

Department of Mathematical Sciences, Isfahan University of Technology, Isfahan 84156-83111, Iran

^*

Author to whom correspondence should be addressed.

Stats 2021, 4(2), 251-268; https://doi.org/10.3390/stats4020018

Submission received: 17 February 2021 / Revised: 17 March 2021 / Accepted: 22 March 2021 / Published: 29 March 2021

(This article belongs to the Special Issue Robust Statistics in Action)

Download Review Reports Versions Notes

Abstract

:

This paper deals with measuring the Bayesian robustness of classes of contaminated priors. Two different classes of priors in the neighborhood of the elicited prior are considered. The first one is the well-known

ϵ

-contaminated class, while the second one is the geometric mixing class. The proposed measure of robustness is based on computing the curvature of Rényi divergence between posterior distributions. Examples are used to illustrate the results by using simulated and real data sets.

Keywords:

Bayesian robustness; ϵ-contamination; geometric contamination; Rényi divergence

1. Introduction

Bayesian inferences require the specification of a prior, which contains a priori knowledge about the parameter(s). If the selected prior, for instance, is flawed, this may yield erroneous inferences.

The goal of this paper is to measure the sensitivity of inferences to a chosen prior (known as robustness). Since, in most cases, it becomes very challenging to come up with only a sole prior distribution, we consider a class,

Γ

, of all possible priors over the parameter space. To construct

Γ

, a preliminary prior

π_{0}

is elicited. Then robustness for all priors

π

in a neighborhood of

π_{0}

is intended. A commonly accepted way to construct neighborhoods around

π_{0}

is through contamination. Specifically, we will consider two different classes of contaminated or mixture of priors, which are given by

Γ_{a} = \{π (θ) : π (θ) = (1 - ϵ) π_{0} (θ) + ϵ q (θ), q \in Q\}

(1)

and

Γ_{g} = \{π (θ) : π (θ) = c (ϵ) π_{0}^{1 - ϵ} (θ) q^{ϵ} (θ), q \in Q\},

(2)

where

π_{0}

is the elicited prior, Q is a class of distributions,

c (ϵ)

is normalizing constant and

0 \leq ϵ \leq 1

is a small given number denoting the amount of contamination. For other possible classes of priors, see for instance, De Robertis and Hartigan (1981) [1] and Das Gupta and Studden (1988a, 1988b) [2,3].

The class (1) is known as the

ϵ

-contaminated class of priors. Many papers about the class (1) are found in the literature. For instance, Berger (1984, 1990) [4,5], Berger and Berliner (1986) [6], and Sivaganesan and Berger (1989) [7] used various choices of Q. Wasserman (1989) [8] used (1) to study robustness of likelihood regions. Dey and Birmiwal (1994) [9] studied robustness based on the curvature. Al-Labadi and Evans (2017) [10] studied robustness of relative belief ratios (Evans, 2015 [11]) under class (1).

On the other hand, the class (2) will be referred as geometric contamination or mixture class. This class was first studied, in the context of Bayesian Robustness, by Gelfand and Dey (1991) [12], where the posterior robustness was measured using Kullback-Leibler divergence. Dey and Birmiwal (1994) [9] generalized the results of Gelfand and Dey (1991) [12] under (1) and (2) by using the

ϕ

divergence defined by

\begin{matrix} d_{ϕ} (π (θ | x), π_{0} (θ | x)) = \int π_{0} (θ | x) ϕ (π (θ | x) / π_{0} (θ | x)) d θ \end{matrix}

(3)

for a smooth convex function

ϕ

. For example,

ϕ (x) = x ln x

gives Kullbak-Leibler divergence.

In this paper, we extend the results of Gelfand and Dey (1991) [12] and Dey and Birmiwal (1994) [9] by applying Rényi divergence on both classes (1) and (2). This will give local sensitivity analysis on the effect of small perturbation to the prior. Rényi entropy, developed by Hungarian mathematician Alfréd Rényi in 1961, generalizes the Shannon entropy and includes other entropy measures as special cases. It finds applications, for instance, in statistics [13], pattern recognition [14], economics [15] and biomedicine [16].

Although the focus of this paper is on Rényi divergence, it also contains

(h, ϕ)

family of divergence measures (Menéndez et al., 1995 [17]). Examples of

(h, ϕ)

divergence include Rényi divergence, Shama-Mittal divergence and Bhattacharyya divergence. We refer the reader to Pardo (2006) [18] for more details about

(h, ϕ)

divergence.

An outline of this paper is as follows. In Section 2, we give definitions, notations and some properties of Rényi divergence. In Section 3, we develop curvature formulas for measuring robustness based on Rényi divergence and

(h, ϕ)

divergence. In Section 4, three examples are studied to illustrate the results numerically. Section 5 ends with a brief summary of the results.

2. Definitions and Notations

Suppose we have a statistical model that is given by the density function

f_{θ} (x)

(with respect to some measure), where

θ

is an unknown parameter that belongs to the parameter space

Θ

. Let

π (θ)

be the prior distribution of

θ

. After observing the data x, by Bayes’ theorem, the posterior distribution of

θ

is given by the density

π (θ | x) = \frac{f_{θ} (x) π (θ)}{m (x | π)},

where

m (x | π) = \int f_{θ} (x) π (θ) d θ

is the prior predictive density of the data.

To measure the divergence between two posterior distributions, we consider Rényi divergence (Rényi, 1961 [19]). Rényi divergence of order a between two posterior densities

π (θ | x)

and

π_{0} (θ | x)

is defined as:

\begin{matrix} d = d (π (θ | x), π_{0} (θ | x)) & = & \frac{1}{a - 1} ln (\int {(π (θ | x))}^{a} {(π_{0} (θ | x))}^{1 - a} d θ) \\ = & \frac{1}{a - 1} ln (E_{π_{0} (θ | x)} [{(\frac{π (θ | x)}{π_{0} (θ | x)})}^{a}]), \end{matrix}

(4)

where

a > 0

and

E_{π_{0} (θ | x)}

denotes the expectation with respect to the density

π_{0} (θ | x)

. It is known that

d (π (θ | x), π_{0} (θ | x)) \geq 0

for all

π (θ | x), π_{0} (θ | x), a > 0

and

d (π (θ | x), π_{0} (θ | x)) = 0

if and only if

π (θ | x) = π_{0} (θ | x)

. Please note that the case

a = 1

is defined by letting

a \to 1

. Other values of a of a particular interest are

a = 0, 0.5, 2

and ∞ (van Erven and Harremoës, 2014 [20]). For further properties of Rényi divergence consult, for example, Li and Turner (2016) [21].

Rényi divergence belongs to the following general class of family of divergence measures called the

(h, ϕ)

divergence (Menéndez et al., 1995 [17]).

Definition 1.

Let h be a differentiable increasing real function mapping from

[0, ϕ (0) + {lim}_{t \to \infty} \frac{ϕ (t)}{t}]

to

[0, \infty)

. The

(h, ϕ)

divergence measure between two posterior distributions

π (θ | x)

and

π_{0} (θ | x)

is defined as

\begin{matrix} d_{ϕ}^{h} (π (θ | x), π_{0} (θ | x)) = h (d_{ϕ} (π (θ | x), π_{0} (θ | x))), \end{matrix}

where

d_{ϕ} (π (θ | x), π_{0} (θ | x))

is the ϕ divergence defined in (3).

Please note that Rényi divergence is a

(h, ϕ)

divergence measure with

h (x) = \frac{1}{a - 1} ln [a (a - 1) x + 1]

,

ϕ (x) = \frac{x^{a} - a (x - 1) - 1}{a (a - 1)}

for

a \neq 0, 1

. To see this, from Definition 1, we have

\begin{matrix} h (d_{ϕ} (π (θ | x), π_{0} (θ | x))) & = \frac{1}{a - 1} ln [a (a - 1) d_{ϕ} (π (θ | x), π_{0} (θ | x)) + 1] \\ = \frac{1}{a - 1} ln [a (a - 1) \int π_{0} (θ | x) { \\ \frac{{[\frac{π (θ | x)}{π_{0} (θ | x)}]}^{a} - a [\frac{π (θ | x)}{π_{0} (θ | x)}] + a - 1}{a (a - 1)}} d θ + 1] \\ = \frac{1}{a - 1} ln [\int π_{0} (θ | x) {[\frac{π (θ | x)}{π_{0} (θ | x)}]}^{a} d θ \\ - a \int π_{0} (θ | x) [\frac{π (θ | x)}{π_{0} (θ | x)}] d θ \\ + a \int π_{0} (θ | x) d θ - \int π_{0} (θ | x) d θ + 1] \\ = \frac{1}{a - 1} ln (E_{π_{0} (θ | x)} [{(\frac{π (θ | x)}{π_{0} (θ | x)})}^{a}]), \end{matrix}

(5)

which is Rényi divergence as defined in (4).

Similar to McCulloch (1989) [22] and Dey and Birmiwal (1994) [9] for calibrating, respectively, the Kullback-Leibler divergence and the

ϕ

divergence, it is also possible to calibrate Rényi divergence as follows. Consider a biased coin where

X = 1

(heads) occurs with probability p. Then Rényi divergence between an unbiased and a biased coin is

d (f_{0}, f_{1}) = \frac{1}{a - 1} ln [2^{a - 1} (p^{a} + {(1 - p)}^{a})],

where for

x = 0, 1

,

f_{0} (x) = 0.5

and

f_{1} (x) = p^{x} {(1 - p)}^{1 - x}

. Now, setting

d (f_{0}, f_{1}) = d_{0}

gives

2^{1 - a} e^{(a - 1) d_{0}} = p^{a} + {(1 - p)}^{a} .

(6)

Then the number p is the calibration of d. In general, Equation (6) needs to be solved numerically for p. Please note that for the case

a = 1

(i.e., the Kullback-Leibler divergence) one may use the following explicit formula for p due to McCulloch (1989) [22]:

p = 0.5 + 0.5 {(1 - e^{- 2 d_{0}})}^{1 / 2} .

(7)

Values of p close to 1 indicate that

f_{0}

and

f_{1}

are quite different, while values of p close to 0.5 implies that they are similar. It is restricted that p is chosen so that it is between 0.5 and 1 there is a one-to-one correspondence between p and

d_{0}

.

A motivating key fact about Rényi divergence follows from its Taylor expansion. Let

f (ϵ) = d (π (θ | x), π_{0} (θ | x)) = \frac{1}{a - 1} ln (\int {(π (θ | x))}^{a} {(π_{0} (θ | x))}^{1 - a} d θ),

where

π (θ | x)

is the posterior distribution of

θ

given the data x under the prior

π

defined in (1) and (2). Assuming differentiability with respect to

ϵ

, the Taylor expansion of

f (ϵ)

about

ϵ = 0

is given by

f (ϵ) = f (0) + ϵ \frac{\partial f (ϵ)}{\partial ϵ} |_{ϵ = 0} + \frac{ϵ^{2}}{2} \frac{\partial^{2} f (ϵ)}{\partial ϵ^{2}} |_{ϵ = 0} + \dots .

Clearly,

f (0) = 0

. If integration and differentiation are interchangeable, we have

\begin{matrix} \frac{\partial f (ϵ)}{\partial ϵ} & = & \frac{a}{1 - a} \frac{\int {(π_{0} (θ | x))}^{1 - a} {(π (θ | x))}^{a - 1} \frac{\partial π (θ | x)}{\partial ϵ} d θ}{\int {(π_{0} (θ | x))}^{1 - a} {(π (θ | x))}^{a} d θ} . \end{matrix}

Hence,

\begin{matrix} \frac{\partial f (ϵ)}{\partial ϵ} |_{ϵ = 0} & = & \frac{a}{1 - a} \int \frac{\partial π (θ | x)}{\partial ϵ} d θ \\ = & \frac{a}{1 - a} \frac{\partial}{\partial ϵ} (\int π (θ | x) d θ) = \frac{a}{1 - a} \frac{\partial}{\partial ϵ} (1) = 0 . \end{matrix}

On the other hand,

\begin{matrix} \frac{\partial^{2} f (ϵ)}{\partial ϵ^{2}} & = & \frac{\partial}{\partial ϵ} (\frac{a}{1 - a} \frac{\int {(π_{0} (θ | x))}^{1 - a} {(π (θ | x))}^{a - 1} \frac{\partial π (θ | x)}{\partial ϵ} d θ}{\int {(π_{0} (θ | x))}^{1 - a} {(π (θ | x))}^{a} d θ}), \end{matrix}

which at

ϵ = 0

, reduces to

\begin{matrix} \frac{\partial^{2} f (ϵ)}{\partial ϵ^{2}} |_{ϵ = 0} & = & - a \int \frac{{(\frac{\partial π (θ | x)}{\partial ϵ})}^{2}}{π (θ | x)} d θ |_{ϵ = 0} \\ = & - a \int {(\frac{\frac{\partial π (θ | x)}{\partial ϵ}}{π (θ | x)})}^{2} π (θ | x) d θ |_{ϵ = 0} \\ = & - a E_{π (θ | x)} [{(\frac{\partial ln π (θ | x)}{\partial ϵ})}^{2}] |_{ϵ = 0} \\ = & - a I_{π (θ | x)} (ϵ) |_{ϵ = 0} . \end{matrix}

Here

I_{π (θ | x)} (ϵ) = E_{π (θ | x)} [{(\frac{\partial ln π (θ | x)}{\partial ϵ})}^{2}] |_{ϵ = 0}

is the Fisher information function for

π (θ | x)

(Lehmann and Casella, 1998 [23]). Thus, for

ϵ \approx 0

, we have

d (π (θ | x), π_{0} (θ | x)) \approx - \frac{a ϵ^{2}}{2} I_{π (θ | x)} (ϵ) .

(8)

Please note that

\partial^{2} f (ϵ) / \partial ϵ^{2} |_{ϵ = 0} = \partial^{2} d / \partial ϵ^{2} |_{ϵ = 0}

is known as the local curvature at

ϵ = 0

of Rényi divergence. Formula (8) justifies the use of the curvature to measure the Bayesian robustness of the two classes of priors

Γ_{a}

and

Γ_{g}

as defined in (1) and (2), respectively. Also this formula provide a direct relationship between Fisher’s information and the curvature of Rényi divergence.

3. Measuring Robustness Using Rényi Divergence

In this section, we explicitly obtain the local curvature at

ϵ = 0

of Rényi divergence (i.e.,

\partial^{2} d / \partial ϵ^{2} |_{ϵ = 0}

), to measure the Bayesian robustness of the two classes of priors

Γ_{a}

and

Γ_{g}

as defined in (1) and (2), respectively. The resulting quantities are presumably much easier to estimate than working directly with Rényi divergence.

Theorem 1.

For the ϵ-contaminated class defined in (1), the local curvature of Rényi divergence at

ϵ = 0

is

C_{a}^{Γ_{a}} = \frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} = a V a r_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}],

where

V a r_{π_{0} (θ | x)}

denotes the variance with respect to

π_{0} (θ | x)

.

Proof.

Under the prior

π

defined in (1), the marginal

m (θ | x)

and the posterior distribution

π (θ | x)

can be written as

\begin{matrix} m (x | π) = (1 - ϵ) m (x | π_{0}) + ϵ m (x | q) \end{matrix}

and

\begin{matrix} π (θ | x) & = & \frac{f_{θ} (x) π (θ)}{m (x | π)} \\ = & \frac{f_{θ} (x) ((1 - ϵ) π_{0} (θ) + ϵ q (θ))}{m (x | π)} \\ = & λ (x) π_{0} (θ | x) + (1 - λ (x)) q (θ | x), \end{matrix}

(9)

where

λ (x) = (1 - ϵ) \frac{m (x | π_{0})}{m (x | π)} .

Define

\begin{matrix} f (ϵ) & = & d (π (θ | x), π_{0} (θ | x)) \\ = & \frac{1}{a - 1} ln [\int {(π (θ | x))}^{a} {(π_{0} (θ | x))}^{1 - a} d θ] = \frac{1}{a - 1} ln [\int γ d θ], \end{matrix}

where

γ = {(π (θ | x))}^{a} {(π_{0} (θ | x))}^{1 - a} = {(λ (x) π_{0} (θ | x) + (1 - λ (x)) q (θ | x))}^{a} {(π_{0} (θ | x))}^{1 - a} .

Clearly,

γ |_{ϵ = 0} = π_{0} (θ | x) a n d \int γ |_{ϵ = 0} d θ = 1 .

(10)

We have

\frac{\partial γ}{\partial ϵ} = a \frac{m (x | q) m (x | π_{0}) (q (θ | x) - π_{0} (θ | x))}{[ϵ q (θ | x) m (x | q) + (1 - ϵ) m (x | π_{0}) π_{0} (θ | x)] [(1 - ϵ) m (x | π_{0}) + ϵ m (x | q)]}

and

\frac{\partial γ}{\partial ϵ} |_{ϵ = 0} = a \frac{m (x | q) (q (θ | x) - π_{0} (θ | x))}{m (x | π_{0})} .

Thus,

\int \frac{\partial γ}{\partial ϵ} d θ |_{ϵ = 0} = 0 .

(11)

Now,

\begin{matrix} \frac{\partial^{2} d}{\partial ϵ^{2}} = \frac{\partial}{\partial ϵ} (\frac{1}{a - 1} \frac{\int \frac{\partial γ}{\partial ϵ} d θ}{\int γ d θ}) & = \frac{1}{a - 1} \frac{[\int γ d θ] [\int \frac{\partial^{2} γ}{\partial ϵ^{2}} d θ] - {[\int \frac{\partial γ}{\partial ϵ} d θ]}^{2}}{{[\int γ d θ]}^{2}} . \end{matrix}

By (10) and (11),

\begin{matrix} \frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} & = \frac{1}{a - 1} \int \frac{\partial^{2} γ}{\partial ϵ^{2}} |_{ϵ = 0} d θ . \end{matrix}

We have

\begin{matrix} \begin{matrix} \frac{\partial^{2} γ}{\partial ϵ^{2}} ∣_{ϵ = 0} = & (\frac{π_{0} (θ | x) m (x | π_{0}) - q (θ | x) m (x | q)}{π_{0} (θ | x) m (x | π_{0})} + \frac{m (x | π_{0}) - m (x | q)}{m (x | π_{0})} + \\ \frac{a \frac{m (x | q)}{m (x | π_{0})} (q (θ | x) - π_{0} (θ | x))}{π_{0} (θ | x)}) \times \\ a \frac{m (x | q)}{m (x | π_{0})} (q (θ | x) - π_{0} (θ | x)) . \end{matrix} \end{matrix}

(12)

Since

\begin{matrix} \frac{m (x | q)}{m (x | π_{0})} & = \frac{\int f_{θ} (x) q (θ) d θ}{m (x | π_{0})} = \frac{\int f_{θ} (x) π_{0} (θ) \frac{q (θ)}{π_{0} (θ)} d θ}{m (x | π_{0})} \\ = \int π_{0} (θ | x) \frac{q (θ)}{π_{0} (θ)} d θ \\ = E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}], \end{matrix}

(13)

from (12), we get

\begin{matrix} \begin{matrix} \frac{\partial^{2} γ}{\partial ϵ^{2}} |_{ϵ = 0} = & a (2 - E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}]) E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}] (q (θ | x) - π_{0} (θ | x)) \\ - a {(E_{π_{0} (θ | x)})}^{2} [\frac{q (θ)}{π_{0} (θ)}] (\frac{q (θ | x)}{π_{0} (θ | x)}) (q (θ | x) - π_{0} (θ | x)) \\ + a^{2} {(E_{π_{0} (θ | x)})}^{2} [\frac{q (θ)}{π_{0} (θ)}] \frac{{(q (θ | x) - π_{0} (θ | x))}^{2}}{π_{0} (θ | x)} . \end{matrix} \end{matrix}

Therefore,

\begin{matrix} \frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} = a ({(E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}])}^{2} E_{π_{0} (θ | x)} [{(\frac{q (θ | x)}{π_{0} (θ | x)})}^{2}] \\ - {(E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}])}^{2}) . \end{matrix}

(14)

Please note that

{(\frac{q (θ | x)}{π_{0} (θ | x)})}^{2} = {(\frac{q (θ) f_{θ} (x) / m (x | q)}{π (θ) f_{θ} (x) / m (x | π_{0})})}^{2} = {(\frac{q (θ)}{π (θ)})}^{2} {(\frac{m (x | π_{0})}{m (x | q)})}^{2}

Hence, by (13),

E_{π_{0} (θ | x)} [{(\frac{q (θ | x)}{π_{0} (θ | x)})}^{2}] = E_{π_{0} (θ | x)} [{(\frac{q (θ)}{π_{0} (θ)})}^{2}] \frac{1}{{(E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}])}^{2}} .

(15)

Thus, by (14) and (15),

\begin{matrix} \frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} & = a (E_{π_{0} (θ | x)} [{(\frac{q (θ)}{π_{0} (θ)})}^{2}] - {(E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}])}^{2}) \\ = a V a r_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}] . \end{matrix}

□

Theorem 2.

For the geometric contaminated class defined in (2), the local curvature of Rényi divergence at ϵ = 0 is

C_{a}^{Γ_{g}} = \frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} = a V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})],

V a r_{π_{0} (θ | x)}

denotes the variance with respect to

π_{0} (θ | x)

.

Proof.

Define

γ = {(π (θ | x))}^{a} {(π_{0} (θ | x))}^{1 - a} .

Thus,

\begin{matrix} d & = & \frac{1}{a - 1} ln (\int γ d θ) . \end{matrix}

We have

\begin{matrix} \frac{\partial d}{\partial ϵ} & = & \frac{1}{a - 1} \times \frac{\int \frac{\partial γ}{\partial ϵ} d θ}{\int γ d θ} \end{matrix}

and

\begin{matrix} \frac{\partial^{2} d}{\partial ϵ^{2}} & = & \frac{1}{a - 1} \times \frac{\int γ d θ \int \frac{\partial^{2} γ}{\partial ϵ^{2}} d θ - {(\int \frac{\partial γ}{\partial ϵ} d θ)}^{2}}{{(\int γ d θ)}^{2}} . \end{matrix}

(16)

Since

γ |_{ϵ = 0} = π_{0} (θ | x)

,

\begin{matrix} \frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} & = & \int \frac{\partial^{2} γ}{\partial ϵ^{2}} d θ |_{ϵ = 0} - {(\int \frac{\partial γ}{\partial ϵ} d θ)}^{2} |_{ϵ = 0} . \end{matrix}

For the geometric class defined in (2),

\begin{matrix} π (θ | x) = \frac{f_{θ} (x) π (θ)}{m (x | π)} = \frac{f_{θ} (x) c (ϵ) {(π_{0} (θ))}^{1 - ϵ} {(q (θ))}^{ϵ}}{m (x | π)} a n d π_{0} (θ | x) = \frac{f_{θ} (x) π_{0} (θ)}{m (x | π_{0})} . \end{matrix}

(17)

Thus,

\begin{matrix} γ & = & \frac{f_{θ} (x) {(c (ϵ))}^{a} {(π_{0} (θ))}^{1 - a ϵ} {(q (θ))}^{a ϵ}}{{(m (x | π))}^{a} {(m (x | π_{0}))}^{1 - a}} . \end{matrix}

Therefore,

\begin{matrix} ln (γ) & = & a ln (\frac{c (ϵ)}{m (x | π)}) - a ϵ ln (\frac{π_{0} (θ)}{q (θ)}) + ln \frac{f_{θ} (x) π_{0} (θ)}{{(m (x | π_{0}))}^{1 - a}} . \end{matrix}

We have

\begin{matrix} \frac{\partial γ}{\partial ϵ} = γ \frac{\partial ln γ}{\partial ϵ} = a γ (\frac{\partial}{\partial ϵ} ln (\frac{c (ϵ)}{m (x | π)}) - ln (\frac{π_{0} (θ)}{q (θ)})) . \end{matrix}

(18)

As

\frac{\partial}{\partial ϵ} ln (\frac{c (ϵ)}{m (x | π)}) = E_{π_{0} (θ | x)} [ln (\frac{π_{0} (θ)}{q (θ)})]

(Dey and Birmiwal, 1994 [9], Theorem 3.2), we get

\begin{matrix} \frac{\partial γ}{\partial ϵ} = a γ (E_{π_{0} (θ | x)} [ln (\frac{π_{0} (θ)}{q (θ)})] - ln (\frac{π_{0} (θ)}{q (θ)})) . \end{matrix}

Since

γ |_{ϵ = 0} = π_{0} (θ | x)

, by (16) and (18), it follows that

\int \frac{\partial γ}{\partial ϵ} d θ |_{ϵ = 0} = 0

and

\begin{matrix} \frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} & = & \int \frac{\partial^{2} γ}{\partial ϵ^{2}} d θ |_{ϵ = 0} . \end{matrix}

Now, by (18),

\begin{matrix} \frac{\partial^{2} γ}{\partial ϵ^{2}} & = & \frac{\partial}{\partial ϵ} (a γ (E_{π_{0} (θ | x)} [ln (\frac{π_{0} (θ)}{q (θ)})] - ln (\frac{π_{0} (θ)}{q (θ)}))) \\ = & a γ {(E_{π_{0} (θ | x)} [ln (\frac{π_{0} (θ)}{q (θ)})] - ln (\frac{π_{0} (θ)}{q (θ)}))}^{2} . \end{matrix}

Using the

γ |_{ϵ = 0} = π_{0} (θ | x)

one more time, we obtain

\frac{\partial^{2} d}{\partial ϵ^{2}} |_{ϵ = 0} = \int \frac{\partial^{2} γ}{\partial ϵ^{2}} |_{ϵ = 0} d θ = a V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})] .

□

The curvature of the family

(h, ϕ)

of divergence measures under classes (1) and (2) is derived in the next theorem.

Theorem 3.

The local curvature for the

(h, ϕ)

divergence under classes (1) and (2) are respectively given by

i.: $C_{a}^{Γ_{a}} = \frac{\partial^{2} d_{ϕ}^{h} (π (θ | x), π_{0} (θ | x))}{\partial ϵ^{2}} |_{ϵ = 0} = a ϕ^{''} (1) V a r_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}],$
ii.: $C_{a}^{Γ_{g}} = \frac{\partial^{2} d_{ϕ}^{h} (π (θ | x), π_{0} (θ | x))}{\partial ϵ^{2}} |_{ϵ = 0} = a ϕ^{''} (1) V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})],$

where

ϕ^{''} (1)

is the second derivation of smooth convex function ϕ at 1.

Proof.

To prove (i), from Equation (5), we have

\begin{matrix} \frac{\partial d_{ϕ}^{h} (π (θ | x), π_{0} (θ | x))}{\partial ϵ} & = \frac{1}{a - 1} [\frac{a (a - 1) \frac{\partial d_{ϕ} (π (θ | x), π_{0} (θ | x))}{\partial ϵ}}{a (a - 1) d_{ϕ} (π (θ | x), π_{0} (θ | x)) + 1}] \\ = a \frac{\frac{\partial d_{ϕ} (π (θ | x), π_{0} (θ | x))}{\partial ϵ}}{a (a - 1) d_{ϕ} (π (θ | x), π_{0} (θ | x)) + 1} . \end{matrix}

Now, we get

\begin{matrix} \frac{\partial^{2} d_{ϕ}^{h} (π (θ | x), π_{0} (θ | x))}{\partial ϵ^{2}} & = a {\frac{[a (a - 1) d_{ϕ} (π (θ | x), π_{0} (θ | x)) + 1] \frac{\partial^{2} d_{ϕ} (π (θ | x), π_{0} (θ | x))}{\partial ϵ^{2}}}{{[a (a - 1) d_{ϕ} (π (θ | x), π_{0} (θ | x)) + 1]}^{2}} \\ - \frac{a (a - 1) {[\frac{\partial d_{ϕ}^{h} (π (θ | x), π_{0} (θ | x))}{\partial ϵ}]}^{2}}{{[a (a - 1) d_{ϕ} (π (θ | x), π_{0} (θ | x)) + 1]}^{2}}} . \end{matrix}

(19)

From Dey and Birmiwal (1994, Thm 3.1) [9], under class (1), we have

\frac{\partial d_{ϕ} (π (θ | x), π_{0} (θ | x))}{\partial ϵ} |_{ϵ = 0} = 0

and

\frac{\partial^{2} d_{ϕ} (π (θ | x), π_{0} (θ | x))}{\partial ϵ^{2}} |_{ϵ = 0} = ϕ^{''} (1) V a r_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}] .

Therefore,

\begin{matrix} \frac{\partial^{2} d_{ϕ}^{h} (π (θ | x), π_{0} (θ | x))}{\partial ϵ^{2}} |_{ϵ = 0} = \frac{[0 + 1] a ϕ^{''} (1) V a r_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}] - 0}{1}, \end{matrix}

and the proof of (i) is concluded. To prove (ii), from Dey and Birmiwal (1994, Thm 3.2.) [9], under class (2), we have

\frac{\partial d_{ϕ} (π (θ | x), π_{0} (θ | x))}{\partial ϵ} |_{ϵ = 0} = 0

and

\frac{\partial^{2} d_{ϕ} (π (θ | x), π_{0} (θ | x))}{\partial ϵ^{2}} |_{ϵ = 0} = ϕ^{''} (1) V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})] .

Similar to the proof of (i), by considering the above equations in (19) the proof of (ii) is concluded. □

Please note that since for Rényi divergence

ϕ (x) = \frac{x^{a} - a (x - 1) - 1}{a (a - 1)}

, we have

ϕ^{''} (1) = 1

. This implies that Theorems 1 and 2 can be obtained by Theorem 3. However, the proofs of Theorems 1 and 2 are more general and could be applied to cases that are not a member of

(h, ϕ)

divergence.

4. Examples

In this section, the derived results are explained through three examples: the Bernoulli model, the multinomial model and the location normal model. In each example, the curvature values for the two classes (1) and (2) are reported. Additionally, in Example 1, we computed Rényi divergence between

π (θ | x)

and

π_{0} (θ | x)

and reported the calibrated value p as described in (6) and (7). Recall that curvature values close to zero indicate robustness of the used prior whereas larger values suggest lack of robustness. On the other hand, values of p close to 0.5 suggest robustness whereas values of p close to 1 means absence of robustness.

Example 1

(Bernoulli Model). Suppose

x = (x_{1}, \dots, x_{n})

is a sample from a Bernoulli distribution with a parameter θ. Let the prior

π_{0} (θ)

be Beta

(α, β)

, i.e.,

π (θ) = \frac{Γ (α + β)}{Γ (α) Γ (β)} θ^{α - 1} {(1 - θ)}^{β - 1} .

Thus,

π_{0} (θ | x)

is

\begin{matrix} B e t a (α + t, β + n - t), \end{matrix}

where

t = \sum_{i = 1}^{n} x_{i} .

Let

q (θ)

be Beta

(c α, c β)

for

c > 0

.

Now consider the two samples

x = (0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1)

and

x = (0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1)

of sizes

n = 20

and

n = 50

generated from Bernoulli

(0.5)

. For comparison purposes, we consider several values of

α, β

and c. Although it is possible to find exact formulas of the curvature by some algebraic manipulation, it looks more convenient to use a Monte Carlo approach in this example. The computational steps are summarized in Algorithm 1.

Algorithm 1 Computing curvature based on Monte Carlo approach

For $s = 1, \dots, 10^{6}$ , generate $θ^{(s)}$ from the posterior $π_{0} (θ | x)$ .
For each $θ^{(s)}$ , find $q (θ^{(s)})$ and $π_{0} (θ^{(s)})$ .
Compute the sample variance of the $10^{6}$ values of $q (θ^{(s)}) / π_{0} (θ^{(s)})$ .
Denote this value by ${\hat{V a r}}_{π_{0} (θ | x)} (q (θ) / π_{0} (θ))$ .
Return $a {\hat{V a r}}_{π_{0} (θ | x)} (q (θ) / π_{0} (θ))$ as the curvature value under class (1).
Compute the sample variance of the $10^{6}$ values of $ln (q (θ^{(s)}) / π_{0} (θ^{(s)}))$ .
Denote this values by ${\hat{V a r}}_{π_{0} (θ | x)} (ln (q (θ) / π_{0} (θ)))$ .
Return $a {\hat{V a r}}_{π_{0} (θ | x)} (ln (q (θ) / π_{0} (θ)))$ as the curvature value under class (2).

The values of the curvature for both classes (1) and (2) are reported in Table 1. Remarkably, for the cases when

α = β = 1

(uniform prior on

[0, 1]

) and

α = β = 0.5

(Jeffreys’ prior), the curvature values are prominently small for all values of c. Also, it is clear that when

c = 1

, the curvature values are 0. It worth noticing here that when fixing the parameters

α, β

and c, the curvature decrease by increasing the sample size. This supports the fact that the effect of the prior dissipates with increasing the sample.

While it is easier to quantify the curvature based on Theorems 1 and 2, in this example, for comparison purposes, we computed Rényi divergence between

π (θ | x)

and

π_{0} (θ | x)

under classes (1) and (2). It can be shown that under class (1) in (9),

π (θ | x) = λ (x) B e t a (α + t, β + n - t) + (1 - λ (x)) B e t a (c α + t, c β + n - t),

where

\begin{matrix} λ (x) = (1 - ϵ) \frac{\frac{Γ (α + β)}{Γ (α) Γ (β)} \frac{Γ (α + t) Γ (β - t + n)}{Γ (α + β + n)}}{(1 - ϵ) \frac{Γ (α + β)}{Γ (α) Γ (β)} \frac{Γ (α + t) Γ (β - t + n)}{Γ (α + β + n)} + ϵ \frac{Γ (c α + c β)}{Γ (c α) Γ (c β)} \frac{Γ (c α + t) Γ (c β - t + n)}{Γ (c α + c β + n)}} . \end{matrix}

Also, from (17), it can be easily concluded that the posterior

π (θ | x)

under class (2) is obtained as

\begin{matrix} π (θ | x) & = K \times \frac{θ^{t} {(1 - θ)}^{n - t} {[B e t a (α, β)]}^{1 - ϵ} {[B e t a (c α, c β)]}^{ϵ}}{{[\frac{Γ (α + β)}{Γ (α) Γ (β)}]}^{(1 - ϵ)} {[\frac{Γ (c α + c β)}{Γ (c α) Γ (c β)}]}^{ϵ}}, \end{matrix}

K = \frac{Γ (t + (1 - ϵ) (α - 1) + ϵ (c α - 1) + 1) Γ (n - t + (1 - ϵ) (β - 1) + ϵ (c β - 1) + 1)}{Γ ((1 - ϵ) (α + β - 2) + ϵ (c α + c β - 2) + n + 2)} .

Please note that since

d (π (θ | x), π_{0} (θ | x)) = \frac{1}{a - 1} ln (E_{π_{0} (θ | x)} [{(\frac{π (θ | x)}{π_{0} (θ | x)})}^{a}])

, it possible to compute the distance based on a Monte Carlo approach. When

a = 1

,

d (π (θ | x), π_{0} (θ | x)) = E_{π_{0} (θ | x)} [\frac{π (θ | x)}{π_{0} (θ | x)} ln (\frac{π (θ | x)}{π_{0} (θ | x)})]

, the Kullback-Leibler divergence. We also calibrated Rényi divergence values as described in (6) and (7).To save space, the results based on class (1) and (2) of the sample of size

n = 20

are reported in Table 2 and Table 3, respectively.

Please note that from (8), by multiplying the curvature value in Table 1 by

ϵ^{2} / 2

, one may get the value of the corresponding distance in Table 2 and Table 3. For instance, setting

α = 1, β = 3, c = 0.5, a = 0.5

in Table 1, gives

C_{a}^{Γ_{a}} = 0.0265

. The corresponding distance is

0.0265 \times 0 . 5^{2} / 2 = 0.0033

, which close to the one reported in Table 2.

Now we consider the Australian AIDS survival data, available in the R package “Mass”. There are 2843 patients diagnosed with AIDS in Australia before 1 July 1991. The data frame contains the following columns: state, sex, date of diagnosis, date of death at end of observation, status (“

A = 0

” (alive) or “

D = 1

” (dead) at end of observation), reported transmission category, and age at diagnosis. There are 1082 and 1761 alive and dead cases. We consider the values of column status. Under the prior distribution given above, the values of the curvatures for two classes (1) and (2) are summarized in Table 4 for a random sample of size

n = 20

and for the whole data. The sampled data is

x = (1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0)

. It interesting to notice that unlike the sample of size

n = 20

, for the whole dataset (i.e.,

n = 2843

), the value of the curvature is small for all cases of

α, β

and c, demonstrating less effect of the prior in the presence of a large sample size.

Example 2

(Multinomial model). Suppose that

x = (x_{1}, x_{2}, \dots, x_{k})

is an observation from a multinomial distribution with parameters

(N, (θ_{1}, \dots, θ_{k}))

, where

\sum_{i = 1}^{k} x_{i} = N

and

\sum_{i = 1}^{k} θ_{i} = 1

. Let the prior

π_{0} (θ_{1}, \dots, θ_{k})

be Dirichlet

(α_{1},

\dots, α_{k})

. Then

π_{0} (θ_{1}, \dots, θ_{k} | x)

is

D i r i c h l e t (α_{1} + x_{1}, \dots, α_{k} + x_{k}) .

Let

q (θ_{1}, \dots, θ_{k}) \sim D i r i c h l e t (c α_{1}, \dots, c α_{k})

. We consider the observation

x = (6, 4, 5, 5)

generated from Multinomial

(20, (1 / 4, 1 / 4, 1 / 4, 1 / 4))

. As in Example 1, we use Monte Carlo approach to compute curvature values. Table 5 reports values of the curvature for different values of

α_{1}, \dots, α_{k}

and c. For the cases when

α_{1} = α_{2} = α_{3} = α_{4} = 1

(uniform prior over

{[0, 1]}^{4}

) and

α_{1} = α_{2} = α_{3} = α_{4} = 0.5

(Jeffreys’ prior), the curvature values are prominently small.

Example 3

(Location normal model). Suppose that

x = (x_{1}, x_{2}, \dots, x_{n})

is a sample from

N (θ, 1)

distribution with

θ \in R^{1}

. Let the prior

π_{0} (θ)

of θ be

N (θ_{0}, σ_{0}^{2})

. Then

\begin{matrix} \begin{matrix} π_{0} (θ | x) \sim N (μ_{x}, σ_{x}^{2}), \end{matrix} \end{matrix}

(20)

\begin{matrix} μ_{x} = (\frac{θ_{0}}{σ_{0}^{2}} + n \bar{x}) {(\frac{1}{σ_{0}^{2}} + n)}^{- 1} and σ_{x}^{2} = {(\frac{1}{σ_{0}^{2}} + n)}^{- 1} . \end{matrix}

Let

q (θ) \sim N (c θ_{0}, σ_{0}^{2})

,

c > 0

. Due to some interesting theoretical properties in this example, we present the exact formulas of the curvature for class (1) and class (2). We have

\begin{matrix} \begin{matrix} \frac{q (θ)}{π_{0} (θ)} = exp \{\frac{θ_{0} θ (c - 1) + 0.5 θ_{0}^{2} (1 - c^{2})}{σ_{0}^{2}}\} . \end{matrix} \end{matrix}

(21)

Therefore, for the class (1), we have

\begin{matrix} V a r_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}] & = & E_{π_{0} (θ | x)} [{(\frac{q (θ)}{π_{0} (θ)})}^{2}] - {(E_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}])}^{2} \\ = & exp \{\frac{θ_{0}^{2} (1 - c^{2})}{σ_{0}^{2}}\} [M_{π_{0} (θ | x)} (\frac{2 θ_{0} (c - 1)}{σ_{0}^{2}}) - \\ {(M_{π_{0} (θ | x)} (\frac{θ_{0} (c - 1)}{σ_{0}^{2}}))}^{2}], \end{matrix}

where

M_{π_{0} (θ | x)} (t)

is the moment generating function with respect to the density

π_{0} (θ | x)

. Thus,

V a r_{π_{0} (θ | x)} [\frac{q (θ)}{π_{0} (θ)}]

is equal to

\begin{matrix} exp \{\frac{θ_{0}^{2} (1 - c^{2})}{σ_{0}^{2}}\} [exp \{\frac{2 θ_{0} (c - 1) μ_{x}}{σ_{0}^{2}} + \frac{2 θ_{0}^{2} {(c - 1)}^{2} σ_{x}^{2}}{σ_{0}^{4}}\} - exp {\frac{2 θ_{0} (c - 1) μ_{x}}{σ_{0}^{2}} + \\ \frac{θ_{0}^{2} {(c - 1)}^{2} σ_{x}^{2}}{σ_{0}^{4}}}] \\ = & exp \{\frac{θ_{0}^{2} (1 - c^{2})}{σ_{0}^{2}}\} exp \{\frac{2 θ_{0} (c - 1) μ_{x}}{σ_{0}^{2}}\} exp \{\frac{θ_{0}^{2} {(c - 1)}^{2} σ_{x}^{2}}{σ_{0}^{4}}\} \\ \times [exp \{\frac{θ_{0}^{2} {(c - 1)}^{2} σ_{x}^{2}}{σ_{0}^{4}}\} - 1] . \end{matrix}

On the other hand, for the geometric contaminated class, we have

\begin{matrix} ln (\frac{q (θ)}{π_{0} (θ)}) = \frac{θ_{0} θ (c - 1) + 0.5 θ_{0}^{2} (1 - c^{2})}{σ_{0}^{2}} . \end{matrix}

Thus, by (20), we get

\begin{matrix} V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})] & = & \frac{θ_{0}^{2} {(c - 1)}^{2}}{σ_{0}^{4}} V a r_{π_{0} (θ | x)} [θ] \\ = & \frac{θ_{0}^{2} {(c - 1)}^{2}}{σ_{0}^{4}} σ_{x}^{2} \\ = & \frac{θ_{0}^{2} {(c - 1)}^{2}}{σ_{0}^{4}} {(\frac{1}{σ_{0}^{2}} + n)}^{- 1} . \end{matrix}

(22)

Interestingly, from (22),

V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})]

depends on the sample only through its size n. For fixed values of

θ_{0}

and c, as

n \to \infty

or

σ_{0} \to \infty

,

V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})] \to 0,

which indicates robustness. Also, for fixed values of

σ_{0}

and n, as

θ_{0} \to \infty

or

c \to \infty

,

V a r_{π_{0} (θ | x)} [ln (\frac{q (θ)}{π_{0} (θ)})] \to \infty

and no robustness will be found.

Now we consider a numerical example by generating a sample of size

n = 20

from

N (4, 1)

distribution. We obtain

\begin{matrix} \begin{matrix} x = (3.37, 4.18, 3.16, 5.59, 4.32, 3.17, 4.48, 4.73, 4.57, 3.69, 5.51, 4.38, 3.37, \\ 1.78, 5.12, 3.95, 3.98, 4.94, 4.82, 4.59) \end{matrix} \end{matrix}

(with

t = \bar{x} = 4.1905

). Table 6 reports the values of the curvature for different values of

θ_{0}, σ_{0}

and c.

Clearly, for large values of

σ_{0}^{2}

, the value of the curvature is small, which is an indication of robustness. For instance, for

μ_{0} = 0.5

in Table 6, that value of the curvature when

σ_{0}^{2} = 5

is much smaller than the value of the curvature when

σ_{0}^{2} = 1

.

5. Conclusions

Measuring Bayesian robustness of two classes of contaminated priors is studied. The approach is based on computing the curvature of Rényi divergence between posterior distributions. Two different proofs are given for the results. The first one is general and depends on a direct derivation of the curvatures. The second one uses the connection between

(h, ϕ)

divergence and

ϕ

divergence. The derived results do not require specifying values for

ϵ

and its computation is straightforward. Examples illustrating the approach are considered. Finally, it is possible to extend the results in this paper to other divergences. See, for instance, Liese and Vajda (1982) [24]. We leave this direction for future work.

Author Contributions

All authors have contributed equally on this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not available.

Acknowledgments

The authors thank the Editor, the Associate Editor and anonymous referees for their important and constructive comments that led to significant improvement of the paper. In particular, the connection between

(h, ϕ)

divergence and

ϕ

divergence is highly appreciated.

Conflicts of Interest

The authors declare no conflict of interest.

References

De Robertis, L.; Hartigan, J.A. Bayesian inference using intervals of measures. Ann. Stat. 1981, 9, 235–244. [Google Scholar]
Das Gupta, A.; Studden, W.J. Robust Bayesian Analysis and Optimal Experimental Designs in Normal Linear Models with Many Parameters I; Tech. Report; Department of Statistics, Purdue University: West Lafayette, IN, USA, 1988. [Google Scholar]
Das Gupta, A.; Studden, W.J. Variations in Posterior Measures for Priors in a Band: Effect of Additional Restrictions; Tech. Report; Department of Statistics, Purdue University: West Lafayette, IN, USA, 1988. [Google Scholar]
Berger, J. The robust Bayesian viewpoint (with discussion). In Robustness in Baysian Statistics; Kadane, J., Ed.; Springer: Amsterdam, The Netherlands, 1984. [Google Scholar]
Berger, J. Robust Bayesian analysis: Sensitivity to the prior. J. Stat. Plan. Inference 1990, 25, 303–328. [Google Scholar] [CrossRef]
Berger, J.; Berliner, L.M. Robust Bayes and empirical Bayes analysis with c-contaminated priors. Ann. Stat. 1986, 14, 461–486. [Google Scholar] [CrossRef]
Sivaganesan, S.; Berger, J. Ranges of posterior measures for priors with unimodal contaminations. Ann. Stat. 1989, 17, 868–889. [Google Scholar] [CrossRef]
Wasserman, L. A robust Bayesian interpretation of likelihood regions. Ann. Stat. 1989, 17, 1387–1393. [Google Scholar] [CrossRef]
Dey, D.K.; Birmiwal, L.R. Robust Bayesian analysis using divergence measures. Stat. Probab. Lett. 1994, 20, 287–294. [Google Scholar] [CrossRef]
Al-Labadi, L.; Evans, M. Optimal robustness results for relative belief inferences and the relationship to prior-data conflict. Bayesian Anal. 2017, 12, 705–728. [Google Scholar] [CrossRef]
Evans, M. Measuring Statistical Evidence Using Relative Belief; Monographs on Statistics and Applied Probability, 144; CRC Press, Taylor & Francis Group: Boca Raton, FL, USA, 2015. [Google Scholar]
Gelfand, A.E.; Dey, D.K. On measuring Bayesian robustness of contaminated classes of priors. Stat. Decis. 1991, 9, 63–80. [Google Scholar]
Kanaya, F.; Han, T.S. The asymptotics of posterior entropy and error probability for Bayesian estimation. IEEE Trans. Inf. Theory 1995, 41, 1988–1992. [Google Scholar] [CrossRef]
Jenssen, R.; Hild, K.E.; Erdogmus, D.; Principe, J.C.; Eltoft, T. Clustering using Rényi’s entropy. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; pp. 523–528. [Google Scholar]
Bentes, S.R.; Menezes, R.; Mendes, D.A. Long memory and volatility clustering: Is the empirical evidence consistent across stock markets? Phys. A Stat. Mech. Appl. 2008, 387, 3826–3830. [Google Scholar] [CrossRef] [Green Version]
Lake, D.E. Renyi entropy measures of heart rate Gaussianity. IEEE Trans. Biomed. Eng. 2006, 53, 21–27. [Google Scholar] [CrossRef] [PubMed]
Menéndez, M.L.; Morales, D.; Pardo, L.; Salicrú, M. Asymptotic behavior and statistical applications of divergence measures in multinomial populations: A unified study. Stat. Pap. 1995, 36, 1–29. [Google Scholar] [CrossRef]
Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Turner, R.E. Rényi Divergence Variational Inference. arxiv 2016, arXiv:1602.02311. [Google Scholar]
McCulloch, R. Local prior influence. J. Am. Stat. Assoc. 1989, 84, 473–478. [Google Scholar] [CrossRef]
Lehmann, E.L.; Casella, G. Theory of Point Estimation, 2nd ed.; Springer: New York, NY, USA, 1998. [Google Scholar]
Liese, F.; Vajda, I. Convex Statistical Distances; Teubner-Texte zur Mathematik, Band 95; Teubner: Leipzig, Germany, 1987. [Google Scholar]

Table 1. Values of the local curvature for two classes

Γ_{a}

and

Γ_{g}

for a sample generated from Bernoulli(0.5).

Table 1. Values of the local curvature for two classes

Γ_{a}

and

Γ_{g}

for a sample generated from Bernoulli(0.5).

n	$(\begin{matrix} α \\ β \end{matrix})$	c	$a = 0.5$		$a = 1$		$a = 2$
n	$(\begin{matrix} α \\ β \end{matrix})$	c	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$
20	$(\begin{matrix} 0.5 \\ 0.5 \end{matrix})$	0.5	$8 \times 10^{- 5}$	$0.0002$	$0.0001$	$0.0004$	$0.0003$	$0.0008$
		1	0	0	0	0	0	0
		1.5	$0.0003$	$0.0002$	$0.0006$	$0.0004$	$0.0013$	$0.0008$
		3	$0.0098$	$0.0033$	$0.0196$	$0.0067$	$0.0393$	$0.0135$
		5	$0.0531$	$0.0135$	$0.1062$	$0.0271$	$0.2125$	$0.0543$
	$(\begin{matrix} 1 \\ 1 \end{matrix})$	0.5	$0.0003$	$0.0007$	$0.0007$	$0.0015$	$0.0014$	$0.0030$
		1	0	0	0	0	0	0
		1.5	$0.0010$	$0.0007$	$0.0021$	$0.0015$	$0.0042$	$0.0030$
		3	$0.0241$	$0.0121$	$0.0483$	$0.0243$	$0.0967$	$0.0486$
		5	$0.1065$	$0.0486$	$0.2130$	$0.0972$	$0.4260$	$0.1945$
	$(\begin{matrix} 1 \\ 3 \end{matrix})$	0.5	$0.0265$	$0.0235$	$0.0530$	$0.0470$	$0.1060$	$0.0941$
		1	0	0	0	0	0	0
		1.5	$0.0171$	$0.0235$	$0.0342$	$0.0470$	$0.0684$	$0.0941$
		3	$0.1061$	$0.3767$	$0.2122$	$0.7535$	$0.4244$	$1.5070$
		5	$0.1660$	$1.5070$	$0.3320$	$3.0141$	$0.6641$	$6.0282$
	$(\begin{matrix} 3 \\ 1 \end{matrix})$	0.5	$0.0089$	$0.0113$	$0.0179$	$0.0227$	$0.0133$	$0.0454$
		1	0	0	0	0	0	0
		1.5	$0.0108$	$0.0113$	$0.0216$	$0.0227$	$0.0433$	$0.0454$
		3	$0.1162$	$0.1819$	$0.2324$	$0.3638$	$0.4648$	$0.7277$
		5	$0.2774$	$0.7277$	$0.5548$	$1.4555$	$1.1096$	$2.9110$
50	$(\begin{matrix} 0.5 \\ 0.5 \end{matrix})$	0.5	$10^{- 5}$	$4 \times 10^{- 5}$	$3 \times 10^{- 5}$	$8 \times 10^{- 5}$	$6 \times 10^{- 5}$	0.0001
		1	0	0	0	0	0	0
		1.5	$6 \times 10^{- 5}$	$4 \times 10^{- 5}$	0.0001	$8 \times 10^{- 5}$	0.0002	0.0001
		3	0.0022	0.0006	0.0044	0.0013	0.0089	0.0026
		5	0.0139	0.0026	0.0279	0.0052	0.0559	0.0104
	$(\begin{matrix} 1 \\ 1 \end{matrix})$	0.5	$6 \times 10^{- 5}$	0.0001	0.0001	0.0003	0.0002	0.0006
		1	0	0	0	0	0	0
		1.5	0.0002	0.0001	0.0004	0.0003	0.0009	0.0006
		3	0.0066	0.0024	0.0132	0.0049	0.0265	0.0099
		5	0.0359	0.0099	0.0718	0.0198	0.1437	0.0397
	$(\begin{matrix} 1 \\ 3 \end{matrix})$	0.5	0.0106	0.0112	0.0212	0.0225	0.0425	0.0451
		1	0	0	0	0	0	0
		1.5	0.0087	0.0112	0.0174	0.0225	0.0349	0.0451
		3	0.0490	0.1805	0.0980	0.3610	0.1960	0.7221
		5	0.0535	0.7221	0.1070	1.4442	0.2140	2.8885
	$(\begin{matrix} 3 \\ 1 \end{matrix})$	0.5	0.0042	0.0060	0.0084	0.0121	0.0169	0.0243
		1	0	0	0	0	0	0
		1.5	0.0061	0.0060	0.0123	0.0121	0.0247	0.0243
		3	0.0672	0.0972	0.1344	0.1944	0.2688	0.3889
		5	0.1407	0.3889	0.2814	0.7779	0.5628	1.5559

Table 2. Values of

d_{0}

and p in (6) (for

a \neq 1

) and (7) (for

a = 1

) under class (1) for a sample generated from Bernoulli(0.5).

Table 2. Values of

d_{0}

and p in (6) (for

a \neq 1

) and (7) (for

a = 1

) under class (1) for a sample generated from Bernoulli(0.5).

$(\begin{matrix} α \\ β \end{matrix})$	c		$a = 0.5$			$a = 1$			$a = 2$
$(\begin{matrix} α \\ β \end{matrix})$	c		$ϵ = 0.05$	$ϵ = 0.5$	$ϵ = 1$	$ϵ = 0.05$	$ϵ = 0.5$	$ϵ = 1$	$ϵ = 0.05$	$ϵ = 0.5$	$ϵ = 1$
$(\begin{matrix} 0.5 \\ 0.5 \end{matrix})$	0.5	$d_{0}$	$2 \times 10^{- 7}$	$4 \times 10^{- 6}$	$9 \times 10^{- 5}$	$5 \times 10^{- 7}$	$3 \times 10^{- 5}$	$0.0002$	$10^{- 6}$	$7 \times 10^{- 5}$	$0.0004$
		p	(0.5003)	(0.5022)	(0.51)	(0.5005)	(0.5042)	(0.5107)	(0.5003)	(0.5041)	(0.5106)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$2 \times 10^{- 6}$	$4 \times 10^{- 5}$	$0.0001$	$2 \times 10^{- 7}$	$5 \times 10^{- 5}$	$0.0001$	$3 \times 10^{- 7}$	$0.0001$	$0.0003$
		p	(0.5013)	(0.5068)	(0.5104)	(0.5003)	(0.5054)	(0.5098)	(0.5003)	(0.5053)	(0.5096)
	3	$d_{0}$	$4 \times 10^{- 6}$	$0.0004$	$0.0015$	$10^{- 5}$	$0.0012$	$0.0028$	$3 \times 10^{- 5}$	$0.0023$	$0.0054$
		p	(0.5022)	(0.5204)	(0.5393)	(0.5031)	(0.5244)	(0.5379)	(0.5030)	(0.5239)	(0.5367)
	5	$d_{0}$	$5 \times 10^{- 5}$	$0.0019$	$0.0055$	$0.0001$	$0.0048$	$0.0102$	$0.0002$	$0.0090$	$0.0181$
		p	(0.5071)	(0.5437)	(0.5741)	(0.5074)	(0.5493)	(0.5711)	(0.5074)	(0.5476)	(0.5676)
$(\begin{matrix} 1 \\ 1 \end{matrix})$	0.5	$d_{0}$	$7 \times 10^{- 7}$	$5 \times 10^{- 5}$	$0.0003$	$10^{- 6}$	$0.0001$	$0.0008$	$3 \times 10^{- 6}$	$0.0002$	$0.0017$
		p	(0.5007)	(0.5071)	(0.5193)	(0.5009)	(0.5083)	(0.5204)	(0.5007)	(0.5084)	(0.5207)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$2 \times 10^{- 7}$	$7 \times 10^{- 5}$	$0.0003$	$10^{- 6}$	$0.0002$	$0.0006$	$2 \times 10^{- 6}$	$0.0003$	$0.0013$
		p	(0.5003)	(0.5084)	(0.5193)	(0.5008)	(0.5100)	(0.5185)	(0.5007)	(0.51)	(0.5180)
	3	$d_{0}$	$10^{- 5}$	$0.0013$	$0.0050$	$5 \times 10^{- 5}$	$0.0034$	$0.0092$	$0.0001$	$0.0065$	$0.0165$
		p	(0.5042)	(0.5364)	(0.5706)	(0.5050)	(0.5416)	(0.5677)	(0.505)	(0.5405)	(0.5645)
	5	$d_{0}$	$8 \times 10^{- 5}$	$0.0050$	$0.0167$	0.0002	0.0124	0.0297	0.0004	0.0225	0.0494
		p	(0.5092)	(0.5708)	(0.6279)	(0.5107)	(0.5785)	(0.6201)	(0.5106)	(0.5755)	(0.6125)
$(\begin{matrix} 1 \\ 3 \end{matrix})$	0.5	$d_{0}$	$2 \times 10^{- 5}$	$0.0032$	$0.0133$	$7 \times 10^{- 5}$	$0.0067$	$0.0282$	$0.0001$	$0.0145$	$0.0623$
		p	(0.5053)	(0.5565)	(0.6143)	(0.5059)	(0.5580)	(0.6171)	(0.5060)	(0.5604)	(0.6268)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$2 \times 10^{- 5}$	0.0023	0.0104	$3 \times 10^{- 5}$	0.0045	0.0199	$7 \times 10^{- 5}$	0.0088	0.0370
		p	(0.505)	(0.5484)	(0.6015)	(0.5044)	(0.5476)	(0.5989)	(0.5044)	(0.5472)	(0.5971)
		p	(0.5081)	(0.5846)	(0.6878)	(0.5077)	(0.5834)	(0.6795)	(0.5077)	(0.5833)	(0.6793)
	3	$d_{0}$	0.0001	0.0175	01213	0.0002	0.0349	0.2125	0.0005	0.0691	0.3421
		p	(0.5119)	(0.6308)	(0.8181)	(0.5115)	(0.6299)	(0.7942)	(0.5117)	(0.6337)	(0.8193)
	5	$d_{0}$	0.0002	0.0308	0.3423	0.0004	0.0638	0.5519	0.0008	0.1337	0.6003
		p	(0.5145)	(0.6715)	(0.9536)	(0.5146)	(0.6731)	(0.9087)	(0.5144)	(0.6891)	(0.9535)
$(\begin{matrix} 3 \\ 1 \end{matrix})$	0.5	$d_{0}$	$7 \times 10^{- 6}$	0.0012	0.0063	$2 \times 10^{- 5}$	0.0027	0.0135	$5 \times 10^{- 5}$	0.0057	0.0295
		p	(0.5026)	(0.5356)	(0.5791)	(0.5036)	(0.5369)	(0.5816)	(0.5034)	(0.5379)	(0.5866)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$10^{- 5}$	0.0013	0.0051	$2 \times 10^{- 5}$	0.0025	0.0096	$4 \times 10^{- 5}$	0.0048	0.0180
		p	(0.5040)	(0.5364)	(0.5713)	(0.5034)	(0.5354)	(0.5692)	(0.5032)	(0.535)	(0.5674)
	3	$d_{0}$	0.0001	0.0139	0.0600	0.0002	0.0286	0.1054	0.0005	0.0505	0.1711
		p	(0.5125)	(0.6168)	(0.7342)	(0.5117)	(0.6143)	(0.7180)	(0.5119)	(0.6137)	(0.7160)
	5	$d_{0}$	0.0003	0.0340	0.1724	0.0006	0.0657	0.2786	0.0012	0.1231	0.4062
		p	(0.5196)	(0.68)	(0.865)	(0.5183)	(0.6754)	(0.8268)	(0.5177)	(0.6809)	(0.8539)

Table 3. Values of

d_{0}

and p in (6) (for

a \neq 1

) and (7) (for

a = 1

) under class (2) for a sample generated from Bernoulli(0.5).

Table 3. Values of

d_{0}

and p in (6) (for

a \neq 1

) and (7) (for

a = 1

) under class (2) for a sample generated from Bernoulli(0.5).

$(\begin{matrix} α \\ β \end{matrix})$	c		$a = 0.5$			$a = 1$			$a = 2$
$(\begin{matrix} α \\ β \end{matrix})$	c		$ϵ = 0.05$	$ϵ = 0.5$	$ϵ = 1$	$ϵ = 0.05$	$ϵ = 0.5$	$ϵ = 1$	$ϵ = 0.05$	$ϵ = 0.5$	$ϵ = 1$
$(\begin{matrix} 0.5 \\ 0.5 \end{matrix})$	0.5	$d_{0}$	$2 \times 10^{- 7}$	$2 \times 10^{- 5}$	$9 \times 10^{- 5}$	$10^{- 6}$	$5 \times 10^{- 5}$	0.0002	$2 \times 10^{- 6}$	0.0001	0.0004
		p	(0.5003)	(0.5043)	(0.51)	(0.5007)	(0.5054)	(0.5107)	(0.5007)	(0.5053)	(0.5106)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$7 \times 10^{- 7}$	$3 \times 10^{- 5}$	0.0001	$3 \times 10^{- 8}$	$4 \times 10^{- 5}$	0.0001	$6 \times 10^{- 8}$	$9 \times 10^{- 5}$	0.0003
		p	(0.5007)	(0.5053)	(0.5104)	(0.5001)	(0.5048)	(0.5098)	(0.5)	(0.505)	(0.5096)
	3	$d_{0}$	$6 \times 10^{- 6}$	0.0004	0.0015	$6 \times 10^{- 6}$	0.0007	0.0028	$10^{- 5}$	0.0014	0.0054
		p	(0.5023)	(0.5204)	(0.5393)	(0.5017)	(0.5195)	(0.5379)	(0.5014)	(0.5191)	(0.5367)
	5	$d_{0}$	$2 \times 10^{- 5}$	0.0015	0.0055	$2 \times 10^{- 5}$	0.0028	0.0102	$5 \times 10^{- 5}$	0.0054	0.0181
		p	(0.5045)	(0.5393)	(0.5741)	(0.5038)	(0.5379)	(0.5711)	(0.5036)	(0.5367)	(0.5676)
$(\begin{matrix} 1 \\ 1 \end{matrix})$	0.5	$d_{0}$	$6 \times 10^{- 8}$	$8 \times 10^{- 5}$	0.0003	$2 \times 10^{- 6}$	0.0002	0.0008	$5 \times 10^{- 6}$	0.0004	0.0017
		p	(0.5)	(0.5095)	(0.5193)	(0.5012)	(0.5101)	(0.5204)	(0.5011)	(0.5103)	(0.5207)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$10^{- 6}$	$0.0001$	0.0003	$8 \times 10^{- 7}$	0.0001	0.0006	$10^{- 6}$	0.0003	0.0013
		p	(0.5013)	(0.51)	(0.5193)	(0.5006)	(0.5093)	(0.5185)	(0.5007)	(0.5093)	(0.5180)
	3	$d_{0}$	$10^{- 5}$	0.0014	0.0050	$2 \times 10^{- 5}$	0.0026	0.0092	$5 \times 10^{- 5}$	0.0048	0.0165
		p	(0.5043)	(0.5373)	(0.5706)	(0.5035)	(0.5360)	(0.5677)	(0.5037)	(0.535)	(0.5645)
	5	$d_{0}$	$6 \times 10^{- 5}$	0.0050	0.0167	0.0001	0.0092	0.0297	0.0002	0.0165	0.0494
		p	(0.5081)	(0.5706)	(0.6279)	(0.5074)	(0.5677)	(0.6201)	(0.5073)	(0.5645)	(0.6125)
$(\begin{matrix} 1 \\ 3 \end{matrix})$	0.5	$d_{0}$	$2 \times 10^{- 5}$	0.0030	0.0133	$6 \times 10^{- 5}$	0.0064	0.0282	0.0001	0.0135	0.0623
		p	(0.505)	(0.5555)	(0.6143)	(0.5056)	(0.5566)	(0.6171)	(0.5054)	(0.5583)	(0.6268)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$3 \times 10^{- 5}$	0.0028	0.0104	$5 \times 10^{- 5}$	0.0053	0.0199	0.0001	0.0103	0.0370
		p	(0.5059)	(0.5527)	(0.6015)	(0.5022)	(0.5517)	(0.5989)	(0.5053)	(0.5509)	(0.5971)
	3	$d_{0}$	0.0004	0.0373	0.1213	0.0008	0.0690	0.2125	0.0017	0.1210	0.3421
		p	(0.5216)	(0.6878)	(0.8181)	(0.5211)	(0.6795)	(0.7942)	(0.5209)	(0.6793)	(0.8193)
	5	$d_{0}$	0.0018	0.1213	0.3423	0.0034	0.2125	0.5519	0.0067	0.3421	0.6003
		p	(0.5425)	(0.8181)	(0.9536)	(0.5417)	(0.7942)	(0.9087)	(0.5411)	(0.8193)	(0.9535)
$(\begin{matrix} 3 \\ 1 \end{matrix})$	0.5	$d_{0}$	$10^{- 5}$	0.0014	0.0063	$3 \times 10^{- 5}$	0.0031	0.0135	$6 \times 10^{- 5}$	0.0065	0.0295
		p	(0.5031)	(0.5381)	(0.5791)	(0.5040)	(0.5394)	(0.5816)	(0.5039)	(0.5403)	(0.5866)
	1	$d_{0}$	0	0	0	0	0	0	0	0	0
		p	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)	(0.5)
	1.5	$d_{0}$	$10^{- 5}$	0.0014	0.0052	$2 \times 10^{- 5}$	0.0025	0.0096	$4 \times 10^{- 5}$	0.0049	0.0180
		p	(0.5041)	(0.5376)	(0.5720)	(0.5034)	(0.5359)	(0.5692)	(0.5033)	(0.5353)	(0.5674)
	3	$d_{0}$	0.0002	0.0185	0.0604	0.0004	0.0338	0.1054	0.0008	0.0596	0.1711
		p	(0.5153)	(0.6341)	(0.735)	(0.5145)	(0.6278)	(0.7180)	(0.5143)	(0.6239)	(0.7160)
	5	$d_{0}$	0.0008	0.0604	0.1724	0.0016	0.1054	0.2786	0.0032	0.1711	0.4074
		p	(0.53)	(0.735)	(0.865)	(0.5289)	(0.7180)	(0.8268)	(0.5284)	(0.7160)	(0.8545)

Table 4. Values of the local curvature for the two classes

Γ_{a}

and

Γ_{g}

for the real data set AIDS.

Table 4. Values of the local curvature for the two classes

Γ_{a}

and

Γ_{g}

for the real data set AIDS.

n	$(\begin{matrix} α \\ β \end{matrix})$	c	$a = 0.5$		$a = 1$		$a = 2$
n	$(\begin{matrix} α \\ β \end{matrix})$	c	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$
20	$(\begin{matrix} 0.5 \\ 0.5 \end{matrix})$	0.5	0.0001	0.0004	0.0003	0.0008	0.0006	0.0016
		1	0	0	0	0	0	0
		1.5	0.0006	0.0004	0.0012	0.0008	0.0025	0.0016
		3	0.0174	0.0065	0.0348	0.0130	0.0697	0.0260
		5	0.0876	0.0260	0.1752	0.0521	0.3504	0.1043
	$(\begin{matrix} 1 \\ 1 \end{matrix})$	0.5	0.0007	0.0014	0.0014	0.0028	0.0029	0.0057
		1	0	0	0	0	0	0
		1.5	0.0019	0.0014	0.0038	0.0028	0.0076	0.0057
		3	0.0395	0.0229	0.0791	0.0458	0.1583	0.0916
		5	0.1578	0.0916	0.3156	0.1832	0.6312	0.3665
	$(\begin{matrix} 1 \\ 3 \end{matrix})$	0.5	0.0049	0.0071	0.0099	0.0143	0.0198	0.0286
		1	0	0	0	0	0	0
		1.5	0.0075	0.0071	0.0150	0.0143	0.0301	0.0286
		3	0.0995	0.1146	0.1991	0.2293	0.3982	0.4586
		5	0.2799	0.4586	0.5599	0.9173	1.1198	1.8346
	$(\begin{matrix} 3 \\ 1 \end{matrix})$	0.5	0.0457	0.0319	0.0915	0.0638	0.1831	0.1277
		1	0	0	0	0	0	0
		1.5	0.0195	0.0319	0.0391	0.0638	0.0782	0.1277
		3	0.0855	0.5111	0.1710	1.0223	0.3420	2.0446
		5	0.1030	2.0446	0.2060	4.0892	0.4121	8.1784
2843	$(\begin{matrix} 0.5 \\ 0.5 \end{matrix})$	0.5	$9 \times 10^{- 7}$	$2 \times 10^{- 6}$	$10^{- 6}$	$5 \times 10^{- 6}$	$3 \times 10^{- 6}$	$10^{- 5}$
		1	0	0	0	0	0	0
		1.5	$4 \times 10^{- 6}$	$2 \times 10^{- 6}$	$8 \times 10^{- 6}$	$5 \times 10^{- 6}$	$10^{- 5}$	$10^{- 5}$
		3	0.0001	$4 \times 10^{- 5}$	0.0003	$8 \times 10^{- 5}$	0.0006	0.0001
		5	0.0009	0.0001	0.0019	0.0003	0.0038	0.0006
	$(\begin{matrix} 1 \\ 1 \end{matrix})$	0.5	$4 \times 10^{- 6}$	$10^{- 5}$	$9 \times 10^{- 6}$	$2 \times 10^{- 5}$	$10^{- 5}$	$4 \times 10^{- 5}$
		1	0	0	0	0	0	0
		1.5	$10^{- 5}$	$10^{- 5}$	$3 \times 10^{- 5}$	$2 \times 10^{- 5}$	$6 \times 10^{- 5}$	$4 \times 10^{- 5}$
		3	0.0004	0.0001	0.0009	0.0003	0.0018	0.0006
		5	0.0025	0.0006	0.0051	0.0013	0.0102	0.0027
	$(\begin{matrix} 1 \\ 3 \end{matrix})$	0.5	0.0005	0.0004	0.0010	0.0008	0.0021	0.0016
		1	0	0	0	0	0	0
		1.5	0.0002	0.0004	0.0004	0.0008	0.0008	0.0016
		3	0.0002	0.0064	0.0004	0.0129	0.0009	0.0259
		5	$10^{- 5}$	0.0259	$3 \times 10^{- 5}$	0.0518	$7 \times 10^{- 5}$	0.1037
	$(\begin{matrix} 3 \\ 1 \end{matrix})$	0.5	$2 \times 10^{- 5}$	$5 \times 10^{- 5}$	$5 \times 10^{- 5}$	0.0001	0.0001	0.0002
		1	0	0	0	0	0	0
		1.5	$6 \times 10^{- 5}$	$5 \times 10^{- 5}$	0.0001	0.0001	0.0002	0.0002
		3	0.0014	0.0008	0.0029	0.0016	0.0058	0.0032
		5	0.0054	0.0032	0.0108	0.0064	0.0216	0.0129

Table 5. Values of the local curvature for two classes

Γ_{a}

and

Γ_{g}

for a sample generated from Mn(20,(1/4,1/4,1/4,1/4)).

Table 5. Values of the local curvature for two classes

Γ_{a}

and

Γ_{g}

for a sample generated from Mn(20,(1/4,1/4,1/4,1/4)).

$(\begin{matrix} α_{1} \\ ⋮ \\ α_{4} \end{matrix})$	c	$a = 0.5$		$a = 1$		$a = 2$
$(\begin{matrix} α_{1} \\ ⋮ \\ α_{4} \end{matrix})$	c	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$
$(\begin{matrix} 0.25 \\ 0.25 \\ 0.25 \\ 0.25 \end{matrix})$	0.5	$2 \times 10^{- 5}$	$0.0006$	$5 \times 10^{- 5}$	$0.0012$	$0.0001$	$0.0024$
	1	0	0	0	0	0	0
	1.5	$0.0031$	$0.0006$	$0.0062$	$0.0012$	$0.0124$	$0.0024$
	3	$0.5285$	$0.0097$	$1.0570$	$0.0195$	$2.1141$	$0.0390$
	5	$8.4050$	$0.0301$	$16.816$	$0.0780$	$33.632$	$0.1560$
$(\begin{matrix} 0.5 \\ 0.5 \\ 0.5 \\ 0.5 \end{matrix})$	0.5	$0.0001$	$0.0021$	$0.0003$	$0.0043$	$0.0004$	$0.0087$
	1	0	0	0	0	0	0
	1.5	$0.0080$	$0.0021$	$0.0161$	$0.0043$	$0.0323$	$0.0087$
	3	$0.7706$	$0.0349$	$1.5413$	$0.0699$	$3.0826$	$0.1398$
	5	$8.0246$	$0.1398$	$16.049$	$0.2797$	$32.098$	$0.5595$
$(\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix})$	0.5	$0.0008$	$0.0071$	$0.0017$	$0.0142$	$0.0035$	$0.0284$
	1	0	0	0	0	0	0
	1.5	$0.0185$	$0.0071$	$0.0370$	$0.0142$	$0.0741$	$0.0284$
	3	$0.9799$	$0.1137$	$1.9598$	$0.2274$	$3.9196$	$0.4549$
	5	$6.7661$	$0.4549$	$13.532$	$0.9098$	$27.064$	$1.8197$
$(\begin{matrix} 2 \\ 1 \\ 1 \\ 1 \end{matrix})$	0.5	$0.0018$	$0.0120$	$0.0037$	$0.0240$	$0.0074$	$0.0480$
	1	0	0	0	0	0	0
	1.5	$0.0270$	$0.0120$	$0.0540$	$0.0240$	$0.1081$	$0.0480$
	3	$1.1052$	$0.1923$	$2.2104$	$0.3847$	$4.4209$	$0.7695$
	5	$6.3984$	$0.7695$	$12.796$	$1.5390$	$25.593$	$3.0780$

Table 6. Values of the local curvature for two classes

Γ_{a}

and

Γ_{g}

for a sample generated from N(4,1).

Table 6. Values of the local curvature for two classes

Γ_{a}

and

Γ_{g}

for a sample generated from N(4,1).

$(\begin{matrix} θ_{0} \\ σ_{0}^{2} \end{matrix})$	c	$a = 0.5$		$a = 1$		$a = 2$
$(\begin{matrix} θ_{0} \\ σ_{0}^{2} \end{matrix})$	c	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$	$C_{a}^{Γ_{a}}$	$C_{a}^{Γ_{g}}$
$(\begin{matrix} 0.1 \\ 0.1 \end{matrix})$	0.5	$0.0001$	$0.0059$	$0.0002$	$0.0119$	$0.0004$	$0.0238$
	1	0	0	0	0	0	0
	1.5	$0.2908$	$0.0059$	$0.5816$	$0.0119$	$1.1633$	$0.0238$
	3	$498, 033.7$	$0.0953$	$996, 067.4$	$0.1907$	$1, 992, 135$	$0.3814$
	5	$8 \times 10^{12}$	$0.3814$	$10^{13}$	$0.7629$	$3 \times 10^{13}$	$1.5258$
$(\begin{matrix} 0.5 \\ 1 \end{matrix})$	0.5	$0.0002$	$0.0014$	$0.0004$	$0.0029$	$0.0009$	$0.0059$
	1	0	0	0	0	0	0
	1.5	$0.0081$	$0.0014$	$0.0162$	$0.0029$	$0.0325$	$0.0059$
	3	$10.629$	$0.0238$	$21.258$	$0.0476$	$42.517$	$0.0953$
	5	$2964.9$	$0.0935$	$2929.8$	$0.1907$	$11, 859.7$	$0.3814$
$(\begin{matrix} 0.5 \\ 5 \end{matrix})$	0.5	$4 \times 10^{- 5}$	$5 \times 10^{- 5}$	$8 \times 10^{- 5}$	$0.0001$	$0.0001$	$0.0002$
	1	0	0	0	0	0	0
	1.5	$8 \times 10^{- 5}$	$5 \times 10^{- 5}$	$0.0001$	$0.0001$	$0.0003$	$0.0002$
	3	$0.0031$	$0.0009$	$0.0063$	$0.0019$	$0.0127$	$0.0038$
	5	$0.0288$	$0.0038$	$0.0576$	$0.0076$	$0.1152$	$0.0152$
$(\begin{matrix} 4 \\ 5 \end{matrix})$	0.5	$0.0001$	$0.0038$	$0.0029$	$0.0076$	$0.0059$	$0.0152$
	1	0	0	0	0	0	0
	1.5	$0.0020$	$0.0038$	$0.0040$	$0.0076$	$0.0080$	$0.0152$
	3	$3 \times 10^{- 7}$	$0.0610$	$7 \times 10^{- 7}$	$0.1220$	$10^{- 6}$	$0.2441$
	5	$9 \times 10^{- 23}$	$0.2441$	$10^{- 22}$	$0.4882$	$3 \times 10^{- 22}$	$0.9765$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Asl, F.F.; Wang, C. Measuring Bayesian Robustness Using Rényi Divergence. Stats 2021, 4, 251-268. https://doi.org/10.3390/stats4020018

AMA Style

Al-Labadi L, Asl FF, Wang C. Measuring Bayesian Robustness Using Rényi Divergence. Stats. 2021; 4(2):251-268. https://doi.org/10.3390/stats4020018

Chicago/Turabian Style

Al-Labadi, Luai, Forough Fazeli Asl, and Ce Wang. 2021. "Measuring Bayesian Robustness Using Rényi Divergence" Stats 4, no. 2: 251-268. https://doi.org/10.3390/stats4020018

APA Style

Al-Labadi, L., Asl, F. F., & Wang, C. (2021). Measuring Bayesian Robustness Using Rényi Divergence. Stats, 4(2), 251-268. https://doi.org/10.3390/stats4020018

Article Menu

Measuring Bayesian Robustness Using Rényi Divergence

Abstract

1. Introduction

2. Definitions and Notations

3. Measuring Robustness Using Rényi Divergence

4. Examples

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI