Conjugate Representations and Characterizing Escort Expectations in Information Geometry

Wada, Tatsuaki; Matsuzoe, Hiroshi

doi:10.3390/e19070309

Open AccessArticle

Conjugate Representations and Characterizing Escort Expectations in Information Geometry

by

Tatsuaki Wada

^1,* and

Hiroshi Matsuzoe

²

¹

Department of Electrical and Electronic Engineering, Ibaraki University, Nakanarusawa-cho, Hitachi, Ibaraki 316-8511, Japan

²

Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(7), 309; https://doi.org/10.3390/e19070309

Submission received: 29 May 2017 / Revised: 22 June 2017 / Accepted: 27 June 2017 / Published: 28 June 2017

(This article belongs to the Special Issue Information Geometry II)

Download Versions Notes

Abstract

:

Based on the maximum entropy (MaxEnt) principle for a generalized entropy functional and the conjugate representations introduced by Zhang, we have reformulated the method of information geometry. For a set of conjugate representations, the associated escort expectation is naturally introduced and characterized by the generalized score function which has zero-escort expectation. Furthermore, we show that the escort expectation induces a conformal divergence.

Keywords:

conjugate representations; information geometry; escort expectations; deformed exponential functions; conformal divergences

1. Introduction

Information geometry (IG) [1,2] is a differential geometrical method based on a Riemannian metric on a statistical manifold, which is constructed from a given parameterized probability distribution function (pdf)

p_{θ} (x)

. It provides a useful tool to study, for example, the dually flat structures of a statistical manifold. Recently, much effort has been made to study some deformed exponential families of pdfs, in which the standard exponential function

exp (x)

and its inverse function

ln (x)

are replaced with a deformed exponential and its inverse function, which is called a deformed logarithmic function. Among different deformed exponential functions, relatively well known ones include Tsallis’ q-deformed exponential [3] and Kaniadakis’

κ

-deformed exponential functions [4]. Naudts [5] introduced the so-called

φ

-logarithmic function in terms of a positive increasing function

φ (x)

, and studied the generalized thermostatistics. It is shown that a q-deformed relative entropy is proportional to Amari’s

α

-divergence and is related with the

α

-geometry on the statistical manifold with a constant curvature [6]. In order to construct a suitable statistical manifold in IG, usually the

α

-representation (rep.), or

α

-immersion, of a pdf is used. It is well known that the

α

-rep. works fine with an exponential family but does not necessarily work fine for a non-exponential pdf, e.g., a

κ

-deformed exponential family [4]. A generalization of the

α

-rep. is conjugate representations, or

(ρ, τ)

-reps., by Zhang [7]. He also introduced the

(ρ, τ)

-divergences from the point of view of “representation duality”. By finding out a suitable conjugate rep. for

κ

-deformed exponential pdf, the IG of the

κ

-generalized thermostatistics [8] was studied. We further studied the IG structures among the thermodynamic potentials in the

κ

-thermostatistics [9], in which the escort pdfs and escort expectations play an important role. Zhang [10] further showed that his conjugate reps. also include Naudts’

φ

-logarithm [5] as a special case. In this way, Zhang’s conjugate reps. are very useful as a generalization of

α

-reps in IG. Amari [11] showed that

(ρ, τ)

-divergences generate a dually flat structure in the manifold of positive measures and in that of positive-definite matrices. In this contribution, we reformulate the IG structures based on Zhang’s conjugate reps. and the maximum entropy (MaxEnt) principle for a generalized entropy functional. Our approach is different from the previous works [10,11] in that we relate the

ρ

-rep. of a pdf

p (x)

with the Lagrange multipliers in the MaxEnt problem. This enables us to introduce a generalized score function and characterize the escort expectations.

The rest of the paper is organized as follows. The next section provides us with the preliminaries for the basics of IG for the exponential families of pdf. In Section 3, after a brief review of conjugate reps. introduced by Zhang [7], we reformulate the IG structures based on the conjugate reps. and discuss the maximum entropy (MaxEnt) principle for a generalized entropy. For a set of conjugate reps., the associated generalized score function is introduced. The escort rep. and escort expectation are then naturally induced. Section 4 relates the conformal divergence to the difference of the entropies in terms of the escort expectations. The final section is devoted to our concluding remarks. Throughout the paper, we use the abbreviations

\partial_{i}

for

\partial / \partial θ^{i}

, and

\partial^{i}

for

\partial / \partial η^{i}

.

2. Preliminaries

Information geometry [1,2] provides us a useful tool for studying a family

\begin{matrix} S = \{p_{θ} (x) | p_{θ} (x) > 0, \int d x p_{θ} (x) = 1\}, \end{matrix}

(1)

of a probability distribution function (pdf)

p_{θ} (x)

characterized by a set of real parameters

θ = (θ^{1}, θ^{2} \dots, θ^{M})

.

S

is called a (M-dimensional) statistical model and the pdf

p_{θ} (x)

of

S

can be regarded as a point in a differential manifold

M

with local coordinates

{θ^{i}}

.

M

is called a statistical manifold and a Riemannian metric on

M

is provided by the Fisher information matrix

g_{i j}^{F}

[2,

\begin{matrix} g_{i j}^{F} (θ) = E_{p_{θ}} [\partial_{i} ℓ_{θ} (x) \partial_{j} ℓ_{θ} (x)], i, j = 1, 2, \dots, M, \end{matrix}

(2)

where

ℓ_{θ} (x) \equiv ln p_{θ} (x)

. In this contribution, we assume that

g^{F}

is positive definite, and

E_{p_{θ}} [\cdot]

stands for the linear expectation with respect to the pdf

p_{θ} (x)

.

A manifold

M

is said to be e-flat (exponential-flat) if a set of coordinate systems

{θ^{i}}

satisfies

\begin{matrix} E_{p_{θ}} [\partial_{i} \partial_{j} ℓ_{θ} (x) \partial_{k} ℓ_{θ} (x)] = 0, \forall i, j, k, \end{matrix}

(3)

identically. Any set of coordinates

{θ^{i}}

satisfying (3) is called e-affine coordinates. A well-known example of e-flat manifolds is the exponential family

\begin{matrix} S_{\exp} = \{p_{θ} (x) | p_{θ} (x) = exp [\sum_{m = 1}^{M} θ^{m} F_{m} (x) - Ψ (θ)], \int d x p_{θ} (x) = 1\}, \end{matrix}

(4)

where each

F_{m} (x)

is a given function of a random value x and

Ψ (θ)

is the normalization factor of a pdf

p_{θ} (x)

. From the normalization of the pdf

p_{θ} (x)

, we see that

\begin{matrix} E_{p_{θ}} [\partial_{i} ℓ_{θ} (x)] = \partial_{i} E_{p_{θ}} [1] = 0, \end{matrix}

(5)

and, for the exponential family, we have

\begin{matrix} \partial_{i} \partial_{j} ℓ_{θ} (x) = - \partial_{i} \partial_{j} Ψ (θ), \end{matrix}

(6)

which does not depend on x. Hence, the condition (3) is satisfied and one confirms that the exponential family is e-flat. In addition, for the exponential family, we have

\begin{matrix} \partial_{i} ℓ_{θ} (x) = f_{i} (x) - \partial_{i} Ψ (θ) . \end{matrix}

(7)

Taking the expectation of both sides and using Equation (5), we see that the m-affine coordinates (

η

-coordinates) of the exponential family are given by

\begin{matrix} η_{i} = E_{p_{θ}} [f_{i} (x)] = \partial_{i} Ψ (θ) . \end{matrix}

(8)

Accounting for Equations (7) and (8), from definition (2), we obtain

\begin{matrix} g_{i j}^{F} (θ) = E_{p_{θ}} [(f_{i} - E_{p_{θ}} [f_{i}]) (f_{j} - E_{p_{θ}} [f_{j}])], i, j = 1, 2, \dots, M, \end{matrix}

(9)

which is the covariance matrix for the statistical model

S_{\exp}

.

A manifold

M

is said to be m-flat (mixture-flat) if a set of coordinate systems

{η_{i}}

satisfies

\begin{matrix} E_{p_{θ}} [\frac{1}{p_{η} (x)} \partial^{i} \partial^{j} p_{η} (x) \partial^{k} ln p_{η} (x)] = 0, \forall i, j, k, \end{matrix}

(10)

identically. In this case, the set of coordinates

{η_{i}}

is called m-affine coordinates.

In a dually flat structure, the

θ

- and

η

-coordinates are related by the Legendre transformation

\begin{matrix} Ψ (θ) & + Ψ^{*} (η) - θ \cdot η = 0, \end{matrix}

(11)

\begin{matrix} θ^{i} & = \partial^{i} Ψ^{*} (η), \end{matrix}

(12)

\begin{matrix} η_{i} & = \partial_{i} Ψ (θ), \end{matrix}

(13)

where

Ψ (θ)

and

Ψ^{*} (η)

are Legendre–Fenchel dual to each other and are called

θ

- and

η

-potential functions, respectively. The canonical divergence function [2] for a set of two pdf

p_{θ_{p}} (x)

and

r_{θ_{r}} (x)

can be defined by

\begin{matrix} D (p | r) \equiv Ψ (θ_{p}) - Ψ (θ_{r}) - \nabla Ψ (θ_{r}) \cdot (θ_{p} - θ_{r}), \end{matrix}

(14)

which is a Bregman divergence with the convex function

Ψ (θ)

.

For a dually flat manifold

S

, Pythagorean relation is generalized in terms of divergence. Let

p, r, s

be three probability distributions in

S

. When the e-geodesic connecting p and r is orthogonal at r to the m-geodesic connecting r and s, the following generalized Pythagorean relation [2] holds.

\begin{matrix} D (p | r) + D (r | s) = D (p | s) . \end{matrix}

(15)

As is well known, maximizing the Boltzmann–Gibbs–Shannon (BGS) entropy

\begin{matrix} S^{BGS} \equiv - \int d x p (x) ln p (x) = E_{p_{θ}} [- ln p], \end{matrix}

(16)

under the M-constraints

\begin{matrix} E_{p_{θ}} [F_{m} (x)] & = U_{m}, m = 1, 2, \dots, M, \end{matrix}

(17)

for a given set of

U_{m}

and the normalization

\int d x p (x) = 1

, leads to the optimized pdf belonging to the exponential family

S_{\exp}

. The Lagrange multipliers are the control parameters

{θ^{m}}

for the above M-constraints. From the normalization of an exponential pdf, we readily obtain the

θ

-potential function

Ψ (θ)

as

\begin{matrix} Ψ (θ) = ln (\int d x exp [\sum_{m = 1}^{M} θ^{m} F_{m} (x)]) . \end{matrix}

(18)

We note that, in addition to Equation (2), the Fisher metric

g^{F}

can be written equivalently in other different expressions

\begin{matrix} g_{i j}^{F} & = \int d x \partial_{i} p_{θ} (x) \partial_{j} ℓ_{θ} (x) \end{matrix}

(19)

\begin{matrix} = - \int d x p_{θ} (x) \partial_{i} \partial_{j} ℓ_{θ} (x) \end{matrix}

(20)

\begin{matrix} = \int d x \frac{1}{p_{θ} (x)} \partial_{i} p_{θ} (x) \partial_{j} p_{θ} (x) . \end{matrix}

(21)

In particular, combining Equation (6) with (20), we readily confirm the important relation

\begin{matrix} g_{i j}^{F} = \partial_{i} \partial_{j} Ψ (θ), \end{matrix}

(22)

that is, the Fisher metric coincides with the Hessian matrix of the

θ

-potential function

Ψ (θ)

. It is known that an exponential family naturally has the dualistic Hessian structures and their canonical divergences coincide with the Kullback–Leibler divergences. Furthermore, using Equation (8), the Fisher matrix can also be rewritten as

\begin{matrix} g_{i j}^{F} = \partial_{i} η_{j} = \partial_{i} E_{p_{θ}} [f_{j}], \end{matrix}

(23)

which holds for the exponential family

S_{\exp}

.

In general, the dual affine connections are induced from the metric. By applying

\partial_{i}

to Equation (19) for

g^{F}

, we see that the following relation holds

\begin{matrix} \partial_{i} g_{j k}^{F} = Γ_{i j, k}^{(e)} + Γ_{i j, k}^{(m)}, \end{matrix}

(24)

where the Christoffel symbol of the first kind for the e-affine connection and that for the m-affine connection are defined by

\begin{matrix} Γ_{i j, k}^{(e)} & \equiv \int d x \partial_{k} p_{θ} (x) \partial_{i} \partial_{j} ℓ_{θ} (x) = E_{p_{θ}} [\partial_{k} ℓ_{θ} \partial_{i} \partial_{j} ℓ_{θ}], \end{matrix}

(25)

\begin{matrix} Γ_{i j, k}^{(m)} & \equiv \int d x \partial_{i} \partial_{j} p_{θ} (x) \partial_{k} ℓ_{θ} (x) = E_{p_{θ}} [\frac{1}{p_{θ} (x)} \partial_{i} \partial_{j} p_{θ} (x) \partial_{k} ℓ_{θ}], \end{matrix}

(26)

respectively. In addition, we can introduce a cubic form

\begin{matrix} C_{i j k} \equiv Γ_{i j, k}^{(m)} - Γ_{i j, k}^{(e)}, \end{matrix}

(27)

which characterizes the difference between the affine connection

\nabla^{(e)}

(or

\nabla^{(m)}

) and Levi–Civita connection

\nabla^{(0)}

through the relations

\begin{matrix} Γ_{i j, k}^{(e)} & = Γ_{i j, k}^{(0)} - \frac{1}{2} C_{i j k}, \end{matrix}

(28)

\begin{matrix} Γ_{i j, k}^{(m)} & = Γ_{i j, k}^{(0)} + \frac{1}{2} C_{i j k} . \end{matrix}

(29)

3. Conjugate Representations

Here, we briefly review Zhang’s conjugate representations [7]. For a parameterized probability density function (pdf)

p_{θ} (x)

with a set of real parameters

{θ^{m}}, m = 1, 2, \dots, M

, information geometry is founded by Prof. Amari [1], based on his

α

-representations (reps.) defined by

\begin{matrix} ℓ^{α} (p_{θ} (x)) = \frac{2}{1 - α} p_{θ}^{\frac{1 - α}{2}} (x), ℓ^{- α} (p_{θ} (x)) = \frac{2}{1 + α} p_{θ}^{\frac{1 + α}{2}} (x), \end{matrix}

(30)

for a real parameter

α \neq 1

, and on the

α

-divergence

\begin{matrix} D^{α} (p | r) = \frac{4}{1 - α^{2}} \int d x \{(\frac{1 - α}{2}) p (x) + (\frac{1 + α}{2}) r (x) - p^{\frac{1 - α}{2}} (x) r^{\frac{1 + α}{2}} (x)\} . \end{matrix}

(31)

As a generalization of the

α

-reps., Zhang [7] introduced the conjugate representations as follows.

Definition 1.

A ρ-representation of a real positive number ξ is a mapping

ξ \mapsto ρ (ξ)

, where

ρ (ξ)

is a strictly monotone function. For a smooth and strictly convex function

f (ρ)

, a τ-representation:

ξ \mapsto τ (ξ)

is said to be conjugate to the ρ-representation with respect to

f (ρ)

if the following relations are satisfied,

\begin{matrix} τ (ξ) & = \frac{d f (ρ)}{d ρ} |_{ρ = {ρ (ξ)}^{'}} ρ (ξ) = \frac{d f^{★} (τ)}{d τ} |_{τ = {τ (ξ)}^{'}} \end{matrix}

(32)

where the convex functions

f (ρ)

and

f^{★} (τ)

are Legendre dual to each other:

\begin{matrix} f (ρ) = ρ τ (ρ) - f^{★} (τ (ρ)), f^{★} (τ) = ρ (τ) τ - f (ρ (τ)) . \end{matrix}

(33)

By utilizing the conjugate reps., the associated Bregman divergence can be defined as

\begin{matrix} D_{f, ρ} (p | r) & = \int d x [f (ρ (p (x))) - f (ρ (r (x))) - τ (r (x)) (ρ (p (x)) - ρ (r (x)))], \end{matrix}

(34)

\begin{matrix} D_{f^{★}, τ} (p | r) & = \int d x [f^{★} (τ (p (x))) - f^{★} (τ (r (x))) - ρ (r (x)) (τ (p (x)) - τ (r (x)))] . \end{matrix}

(35)

The

α

-rep. is, of course, an example of the conjugate reps., and they are related as follows.

\begin{matrix} ρ_{α} (p) & = ℓ^{α} (p), & τ_{α} (p) & = ℓ^{- α} (p), \end{matrix}

(36)

\begin{matrix} f_{α} (ρ_{α}) & = \frac{2}{1 + α} {(\frac{1 - α}{2} ρ_{α})}^{\frac{2}{1 - α}}, & f_{α}^{★} (τ_{α}) & = \frac{2}{1 - α} {(\frac{1 + α}{2} τ_{α})}^{\frac{2}{1 + α}} . \end{matrix}

(37)

The

α

-divergence (31) is expressed as

D_{f_{α}, ρ_{α}} (p | r)

.

Remark 1.

We assume that ρ- and τ-functions satisfy the suitable regularity conditions throughout this paper. It is important to describe the domains and the tangents of the relevant ρ- and τ-functions. However, this is a very difficult matter in general. For example, consider a statistical manifold which is a set of q-Gaussian distributions, and using the α-rep. (36). We see that it is an α-affine manifold with

α = 2 q - 1

[1,7]. In this case, if the domain Ω (the total sample space) is

R

, then α must satisfy

1 < α < 5

. If

Ω = R^{2}

, then α must satisfy

1 < α < 3

. The lower bound comes from the regularity conditions of the statistical manifold (Amari and Nagaoka [1], Chapter 2), and the upper bounds come from the integrability conditions of probability densities. In this way, the regularity conditions for a set of ρ- and τ-functions are not determined from these functions themselves only, but depend on the total sample space and the given statistical model. Some arguments have been given in our previous paper [12].

3.1. MaxEnt

For a set

(ρ, τ, f (ρ), f^{★} (τ))

of conjugate reps., let us introduce a generalized entropy functional S defined by

\begin{matrix} S : = - \int d x f^{★} (τ (p (x))), \end{matrix}

(38)

and consider the following MaxEnt problem.

\begin{matrix} \frac{δ}{δ p (x)} [S + \sum_{m = 1}^{M} θ^{m} \int d x τ (p (x)) F_{m} (x) - γ \int d x τ (p (x))] = 0, \end{matrix}

(39)

where

F_{m} (x)

is a given function of x, and

θ^{m}, m = 1, 2, \dots, M

and

γ

are the Lagrange multipliers. Using the relation

d f^{★} (τ) / d τ = ρ

defined in Equation (32), this MaxEnt problem leads to

\begin{matrix} ρ (p (x)) τ^{'} (p (x)) = \sum_{m = 1}^{M} θ^{m} F_{m} (x) τ^{'} (p (x)) - γ (θ) τ^{'} (p (x)), \end{matrix}

(40)

where

τ^{'} (p)

stands for

d τ (p) / d p

. We assume

τ^{'} (p (x)) \neq 0

because if

τ^{'} (p) = 0

then the

τ

-rep. is a constant mapping, which fails to work as a rep., or immersion, of a pdf

p (x)

. We thus obtain

\begin{matrix} ρ (p (x)) = \sum_{m = 1}^{M} θ^{m} F_{m} (x) - γ (θ) . \end{matrix}

(41)

Remark 2.

Note that unless

τ (p) = p

, the constraints of this generalized MaxEnt problem are neither the standard expectations

\int d x p (x) F_{m} (x)

nor the normalization

\int d x p (x) = 1

of the pdf

p (x)

. However, the solution of this MaxEnt problem is expressed in terms of the inverse function of

ρ (p)

as

\begin{matrix} p (x) = ρ^{- 1} (\sum_{m = 1}^{M} θ^{m} F_{m} (x) - γ (θ)) . \end{matrix}

(42)

Definition 2.

For any given ρ-rep., the generalized score function

s_{ρ} (x)

is defined by

\begin{matrix} s_{ρ} (x) : = \partial_{i} ρ (p (x)) . \end{matrix}

(43)

Remark 3.

In the above MaxEnt setting, substituting Equation (42) into the generalized score function, we obtain that

\begin{matrix} s_{ρ} (x) = F_{i} (x) - \partial_{i} γ . \end{matrix}

(44)

Theorem 1.

For any set of conjugate reps., and the associated generalized score function

s (x)

,

\begin{matrix} \int d x τ (p (x)) s_{ρ} (x) = \partial_{i} \int d x f (ρ (p (x))) \end{matrix}

(45)

holds.

Proof.

From the definition (32) of the conjugate reps., we see

τ = d f (ρ) / d ρ

. It follows that

\begin{matrix} τ (p (x)) \partial_{i} ρ (p (x)) = \frac{d f (ρ)}{d ρ} \partial_{i} ρ (p (x)) = \partial_{i} f (ρ (p (x))), \end{matrix}

(46)

and integrating both sides by x, we obtain the result. ☐

Definition 3 (Escort rep.).

For a given ρ-rep. which satisfies

d ρ (p) / d p \neq 0

, we can introduce a new

\tilde{τ}

-rep., which is called the escort rep. of a pdf

p (x)

and is defined by

\begin{matrix} \tilde{τ} (p) : = \frac{c}{\frac{d ρ (p)}{d p}} = c \frac{d ρ^{- 1} (ξ)}{d ξ} |_{ξ = ρ (p)}, \end{matrix}

(47)

where c is an appropriate constant and

ρ^{- 1} (ξ)

is the inverse function of

ρ (ξ)

.

Remark 4.

For the α-reps., we have

\begin{matrix} ρ_{α}^{- 1} (ξ) = {(\frac{1 - α}{2} ξ)}^{\frac{2}{1 - α}} . \end{matrix}

(48)

We thus see that

\begin{matrix} \tilde{τ} (p) = \frac{2}{1 + α} \frac{d ρ_{α}^{- 1} (ξ)}{d ξ} |_{ξ = ρ_{α} (p)} = \frac{2}{1 + α} p^{\frac{1 + α}{2}} = τ_{α} (p), \end{matrix}

(49)

which states that

τ_{α} (p)

is a self-escort rep. with the constant

c = 2 / (1 + α)

.

One of the merits of introducing the escort rep.

\tilde{τ} (p)

is the next theorem.

Theorem 2.

A

\tilde{τ}

-rep. satisfies

\begin{matrix} \int d x \tilde{τ} (p (x)) s_{ρ} (x) = \int d x \tilde{τ} (p (x)) \partial_{i} ρ (p (x)) = 0, \end{matrix}

(50)

Proof.

Substituting the relation

\begin{matrix} \partial_{i} ρ (p) = \frac{d ρ (p)}{d p} \partial_{i} p, \end{matrix}

(51)

into Equation (50) leads to

\begin{matrix} \partial_{i} \int d x p (x) = \partial_{i} 1 = 0, \end{matrix}

(52)

because the pdf

p (x)

is normalized. ☐

For this

\tilde{τ}

-rep., we can introduce the associated convex functions

\tilde{f} (ρ)

and

{\tilde{f}}^{★} (\tilde{τ})

that satisfy

\begin{matrix} \frac{d \tilde{f} (ρ)}{d ρ} = \tilde{τ}, \frac{d {\tilde{f}}^{★} (\tilde{τ})}{d \tilde{τ}} = ρ, \end{matrix}

(53)

respectively.

Note that combining Theorem 1 with Theorem 2 leads to

\begin{matrix} \int d x \tilde{τ} (p (x)) s_{ρ} (x) = \partial_{i} \int d x \tilde{f} (ρ (p (x))) = 0 . \end{matrix}

(54)

We then obtain the following corollary

Corollary 1.

For the conjugate reps. ρ and

\tilde{τ}

, the associated

\tilde{f} (ρ)

function satisfies that

\begin{matrix} \int d x \tilde{f} (ρ (p (x))) = c \int d x p (x) = c, \end{matrix}

(55)

which is the constant c defined in Equation (47) for any normalized pdf

p (x)

.

Remark 5.

For the α-rep., we see that

\begin{matrix} f_{α} (ρ_{α} (p)) = \frac{2}{1 + α} p . \end{matrix}

(56)

Definition 4 (Escort pdf and escort exprectation).

Define the escort pdf

P^{esc} (x)

with regards to a pdf

p (x)

by utilizing the escort rep.

\tilde{τ} (p)

as follows.

\begin{matrix} P^{esc} (x) : = \frac{\tilde{τ} (p (x))}{\int d x \tilde{τ} (p (x))}, \end{matrix}

(57)

and define the escort expectation with regards to

p (x)

of a given function

F_{i} (x)

as

\begin{matrix} E_{P_{esc}} [F_{i}] : = \int d x P^{esc} (x) F_{i} (x) . \end{matrix}

(58)

Theorem 3.

In the MaxEnt setting of Equation (39), the score function

s_{ρ} (x)

has zero-escort expectation, i.e.,

E_{P_{esc}} [s_{ρ} (x)] = 0

, and it follows that

\begin{matrix} \partial_{i} γ (θ) = E_{P_{esc}} [F_{i}] . \end{matrix}

(59)

Proof.

From Equation (41) we have

\begin{matrix} E_{P_{esc}} [s_{ρ}] = E_{P_{esc}} [\partial_{i} ρ (p (x))] = E_{P_{esc}} [F_{i}] - \partial_{i} γ, \end{matrix}

(60)

where we used

E_{P_{esc}} [1] = 1

. Since

E_{P_{esc}} [s_{ρ}] = 0

, we obtain Equation (59). ☐

Remark 6.

In our formalism, the escort expectation is characterized by the generalized score function

s_{ρ} (x)

which is unbiased, i.e.,

s_{ρ} (x)

has zero-escort expectation.

We see that the Lagrange multiplier

γ (θ)

is the θ-potential

\tilde{Ψ} (θ)

associated with the escort expectation. The dual affine coordinate

η^{esc}

is

\begin{matrix} η_{i}^{esc} = \partial_{i} γ (θ) = E_{P_{esc}} [F_{i}], \end{matrix}

(61)

and the associated Riemannian metric and cubic form are

\begin{matrix} g_{i j}^{esc} (θ) = \partial_{i} \partial_{j} γ (θ), \end{matrix}

(62)

\begin{matrix} C_{i j k}^{esc} (θ) = \partial_{i} \partial_{j} \partial_{k} γ (θ), \end{matrix}

(63)

respectively. Since

g^{esc}

is a Hessian metric, the statistical manifold described by the θ- and

η^{esc}

-coordinates is dually flat.

4. Conformal Divergence

Let us consider the Bregman divergence () of the escort reps., i.e.,

\begin{matrix} D_{{\tilde{f}}^{★}, \tilde{τ}} (p | r) = \int d x [{\tilde{f}}^{★} (\tilde{τ} (p (x))) - {\tilde{f}}^{★} (\tilde{τ} (r (x))) - ρ (r (x)) (\tilde{τ} (p (x)) - \tilde{τ} (r (x)))] . \end{matrix}

(64)

The next theorem is a main result of this contribution.

Theorem 4.

The relative escort expectation of ρ-reps. is the conformal (or scaled) divergence of

D_{{\tilde{f}}^{★}, \tilde{τ}} (p | r)

with the scaling factor

1 / \int d x \tilde{τ} (p (x))

, i.e.,

\begin{matrix} E_{P_{esc}} [ρ (p (x))] - E_{P_{esc}} [ρ (r (x))] = \frac{1}{\int d x \tilde{τ} (p (x))} D_{{\tilde{f}}^{★}, \tilde{τ}} (p | r) . \end{matrix}

(65)

Proof.

From Corollary 1, we see that

\begin{matrix} \int d x {\tilde{f}}^{★} (\tilde{τ} (p (x))) = \int d x \tilde{τ} (p (x)) ρ (p (x)) - c, \end{matrix}

(66)

where c is an appropriate constant for any normalized pdf

p (x)

, and it follows that

\begin{matrix} \int d x [{\tilde{f}}^{★} (\tilde{τ} (p (x))) - {\tilde{f}}^{★} (\tilde{τ} (r (x)))] = \int d x [\tilde{τ} (p (x)) ρ (p (x)) - \tilde{τ} (r (x)) ρ (r (x))] . \end{matrix}

(67)

Substituting this relation into Equation (64) leads to

\begin{matrix} D_{{\tilde{f}}^{★}, \tilde{τ}} (p | r) = \int d x \tilde{τ} (p (x)) \{ρ (p (x)) - ρ (r (x))\} . \end{matrix}

(68)

Dividing both sides by

\int d x \tilde{τ} (p (x))

and using the escort expectation, we obtain the result. ☐

Remark 7.

As an example of Theorem 4, let us consider the α-rep. case. Since

{\tilde{τ}}_{α} = τ_{α}

as shown in Remark 4, it follows that

{\tilde{f}}_{α}^{★} (\tilde{τ}) = f_{α}^{★} (τ)

. The corresponding escort pdf becomes

\begin{matrix} P_{α}^{esc} (x) = \frac{p^{\frac{1 + α}{2}}}{\int d x p^{\frac{1 + α}{2}}}, \end{matrix}

(69)

and Equation (65) becomes

\begin{matrix} \frac{2}{1 - α} (E_{P_{esc}} [p^{\frac{1 - α}{2}}] - E_{P_{esc}} [r^{\frac{1 - α}{2}}]) = \frac{(1 + α) / 2}{\int d x p^{\frac{1 + α}{2}} (x)} D_{{\tilde{f}}_{α}^{★}, τ_{α}} (p | r) . \end{matrix}

(70)

When we set

q = (1 + α) / 2

this relation becomes

\begin{matrix} E_{P_{esc}} [{ln}_{q} p] - E_{P_{esc}} [{ln}_{q} r] = \frac{q}{\int d x p^{q} (x)} D_{{\tilde{f}}_{α}^{★}, τ_{α}} (p | r), \end{matrix}

(71)

which was first shown by Matsuzoe and Ohara [13].

5. Concluding Remarks

We have discussed and reformulated the method of information geometry in terms of the conjugate reps. introduced by Zhang [7]. For an appropriate set of conjugate reps., the MaxEnt principle for a generalized entropy relates the associated Lagrange multipliers to the corresponding

ρ

rep. (41) of the optimal pdf (42). For a generalized score function (2), the escort rep. and escort expectation are then naturally induced. The conformal divergence is related to the difference of the entropies in terms of the escort expectations, as shown in Theorem 4.

In previous work [9], we studied, for the

κ

-deformed exponential family, the dualistic Hessian geometries among the thermodynamic potentials in the

κ

-deformed thermostatistics, and found that there exist two different kinds of dual affine-coordinates: one

η

is associated with the standard expectation; and the other

η^{esc}

is associated with the escort expectation. There, the double escort distributions, i.e., the escort of the escort distributions, play an important role. For the q-deformed exponential family, one of the authors (H.M.) further studied a sequence (or hierarchy) of escort distributions [14]. We think that these results are not specific to the q- or

κ

-deformed exponential pdf. We believe that these results [9,14] can be systematically studied by applying the reformulated method developed in this work. Further studies are needed and will be carried out in future work.

Acknowledgments

We acknowledge an anonymous reviewer for providing useful comments to improve our manuscript. The first named author is partially supported by Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research (KAKENHI) Grant Number JP17K05341. The second named author is partially supported by the JSPS Grants-in-Aid for Scientific Research (KAKENHI) Grant Number JP26108003 and JP15K04842.

Author Contributions

Tatsuaki Wada designed the main subject of this research and mainly wrote the manuscript. Hiroshi Matsuzoe commented on the manuscript at all stages. All authors equally promoted the research and discussed the results. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Amari, S.-I.; Nagaoka, H. Method of Information Geometry; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
Amari, S.-I. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: New York, NY, USA, 2009. [Google Scholar]
Kaniadakis, G. Theoretical foundations and mathematical formalism of the power-law tailed statistical distributions. Entropy 2013, 15, 3983–4010. [Google Scholar] [CrossRef]
Naudts, J. Generalized Thermostatistics; Springer: Berlin, Germany, 2011. [Google Scholar]
Amari, S.-I.; Ohara, A. A Geometry of q-Exponential Family of Probability Distributions. Entropy 2011, 13, 1170–1185. [Google Scholar] [CrossRef]
Zhang, J. Divergence function, duality and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
Wada, T.; Scarfone, A.M. Information geometry on the κ-thermostatistics. Entropy 2015, 17, 1204–1217. [Google Scholar] [CrossRef]
Wada, T.; Matsuzoe, H.; Scarfone, A.M. Dualistic Hessian structures among the thermodynamic potentials in the κ-thermostatistics. Entropy 2015, 17, 7213–7229. [Google Scholar] [CrossRef]
Zhang, J. On monotone embedding in information geometry. Entropy 2015, 17, 4485–4499. [Google Scholar] [CrossRef]
Amari, S.-I. Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure. Entropy 2014, 16, 2131–2145. [Google Scholar] [CrossRef]
Matsuzoe, H.; Wada, T. Deformed Algebras and Generalizations of Independence on Deformed Exponential Families. Entropy 2015, 17, 5729–5751. [Google Scholar] [CrossRef]
Matsuzoe, H.; Ohara, A. Geometry for q-exponential families. In Recent Progress in Differential Geometry and Its Related Fields, Proceedings of the 2nd International Colloquium on Differential Geometry and Its Related Fields, Veliko Tarnovo, Bulgaria, 6–10 September 2010; Adachi, T., Hashimoto, H., Hristov, M.J., Eds.; World Scientific: Hackensack, NJ, USA, 2011; pp. 55–71. [Google Scholar]
Matsuzoe, H. A sequence of escort distributions and generalizations of expectations on q-exponential family. Entropy 2017, 19, 7. [Google Scholar] [CrossRef]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wada, T.; Matsuzoe, H. Conjugate Representations and Characterizing Escort Expectations in Information Geometry. Entropy 2017, 19, 309. https://doi.org/10.3390/e19070309

AMA Style

Wada T, Matsuzoe H. Conjugate Representations and Characterizing Escort Expectations in Information Geometry. Entropy. 2017; 19(7):309. https://doi.org/10.3390/e19070309

Chicago/Turabian Style

Wada, Tatsuaki, and Hiroshi Matsuzoe. 2017. "Conjugate Representations and Characterizing Escort Expectations in Information Geometry" Entropy 19, no. 7: 309. https://doi.org/10.3390/e19070309

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Conjugate Representations and Characterizing Escort Expectations in Information Geometry

Abstract

1. Introduction

2. Preliminaries

3. Conjugate Representations

3.1. MaxEnt

4. Conformal Divergence

5. Concluding Remarks

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI