Next Article in Journal
On the Thermodynamics of Classical Micro-Canonical Systems
Next Article in Special Issue
Kullback–Leibler Divergence Measure for Multivariate Skew-Normal Distributions
Previous Article in Journal
EA/G-GA for Single Machine Scheduling Problems with Earliness/Tardiness Costs
Previous Article in Special Issue
Distances in Probability Space and the Statistical Complexity Setup
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Geometry of q-Exponential Family of Probability Distributions

1
Laboratory for Mathematical Neuroscience, RIKEN Brain Science Institute, Hirosawa 2-1, Wako-shi, Saitama 351-0198, Japan
2
Department of Electrical and Electronics Engineering, Graduate School of Engineering, University of Fukui, Bunkyo 3-9-1, Fukui-shi, Fukui 910-8507, Japan
*
Authors to whom correspondence should be addressed.
Entropy 2011, 13(6), 1170-1185; https://doi.org/10.3390/e13061170
Submission received: 11 February 2011 / Revised: 1 June 2011 / Accepted: 2 June 2011 / Published: 14 June 2011
(This article belongs to the Special Issue Distance in Information and Statistical Physics Volume 2)

Abstract

:
The Gibbs distribution of statistical physics is an exponential family of probability distributions, which has a mathematical basis of duality in the form of the Legendre transformation. Recent studies of complex systems have found lots of distributions obeying the power law rather than the standard Gibbs type distributions. The Tsallis q-entropy is a typical example capturing such phenomena. We treat the q-Gibbs distribution or the q-exponential family by generalizing the exponential function to the q-family of power functions, which is useful for studying various complex or non-standard physical phenomena. We give a new mathematical structure to the q-exponential family different from those previously given. It has a dually flat geometrical structure derived from the Legendre transformation and the conformal geometry is useful for understanding it. The q-version of the maximum entropy theorem is naturally induced from the q-Pythagorean theorem. We also show that the maximizer of the q-escort distribution is a Bayesian MAP (Maximum A posteriori Probability) estimator.

Graphical Abstract

1. Introduction

Statistical physics is founded on the Gibbs distribution for microstates, which forms an exponential family of probability distributions known in statistics. Important macro-quantities such as energy, entropy, free energy, etc. are connected with it. However, recent studies show that there are non-standard complex systems which are subject to the power law instead of the exponential law of the Gibbs type distributions. See [1,2] as well as extensive literatures cited in them.
Tsallis [3] defined the q-entropy to elucidate various physical phenomena of this type, followed by many related research works on this subject (see, [1]). The concept of the q-Gibbs distribution or q-exponential family of probability distributions is naturally induced from this framework (see also [4]). However, its mathematical structure has not yet been explored enough [2,5,6], while the Gibbs type distribution has been studied well as the exponential family of distributions [7]. We need a mathematical (geometrical) foundation to study the properties of the q-exponential family. This paper presents a geometrical foundation for the q-exponential family based on information geometry [8], giving geometrical definitions of the q-potential function, q-entropy and q-divergence in a unified way.
We define the q-geometrical structure consisting of a Riemannian metric and a pair of dual affine connections. By using this framework, we prove that a family of q-exponential distributions is dually flat, in which the q-Pythagorean theorem holds. This naturally induces the corresponding q-maximum entropy theorem similarly to the case of the Tsallis q-entropy [1,9,10]. The q-structure is ubiquitous since the family S n of all discrete probability distributions can always be endowed with the structure of the q-exponential family for arbitrary q. It is possible to generalize the q-structure to any family of probability distributions. Further, it has a close relation with the α-geometry [8], which is one of information geometric structure of constant curvature. This new dually flat structure, different from the old one given rise to from the invariancy in information geometry, can be also obtained by conformal flattening of the α-geometry [11,12], using a technique in the conformal and projective geometry [13,14,15].
The present framework prepares mathematical tools for analyzing physical phenomena subject to the power law. The Legendre transformation again plays a fundamental role for deriving the geometrical dual structure. There exist lots of applications of q-geometry to information theory ([16] and others) and statistics, including Bayes q-statistics.
It is possible to generalize our framework to a more general non-linear family of distributions by using a positive convex function instead of q-exponential function (See [2,17]). A good example is the κ-exponential family [18,19,20], but we do not state it here.

2. q-Gibbs or q-Exponential Family of Distributions

2.1. q-Logarithm and q-Exponential Function

It is the first step to generalize the logarithm and exponential functions to include a family of power functions, where the logarithm and exponential functions are included as the limiting case [1,5,21]. This was also used for defining the α-family of distributions in information geometry [8]. We define the q-logarithm by
log q ( u ) = 1 1 - q u 1 - q - 1 , u > 0
and its inverse function, the q-exponential, by
exp q ( u ) = 1 + ( 1 - q ) u 1 1 - q , u > - 1 / ( 1 - q )
for a positive q with q 1 . The limiting case q 1 reduces to
log 1 ( u ) = log u
exp 1 ( u ) = exp u
so that log q and exp q are defined for q > 0 .

2.2. q-Exponential Family

The standard form of an exponential family of distributions is written as
p (x , θ ) = exp θ i x i - ψ ( θ )
with respect to an adequate measure μ ( x ) , where x = x 1 , , x n is a set of random variables and θ = θ 1 , , θ n are the canonical parameters to describe the underlying system. The Gibbs distribution is of this type. Here, ψ ( θ ) is called the free energy, which is the cumulant generating function.
The power version of the Gibbs distribution is written as
p ( x , θ ) = exp q θ · x - ψ q ( θ )
log q p ( x , θ ) = θ · x - ψ q ( θ )
where θ · x = θ i x i . This is the q-Gibbs distribution or q-exponential family [4], which we denote by S, where the domain of x is restricted such that p ( x , θ ) > 0 holds. The function ψ q ( θ ) , called the q-free energy or q-potential function, is determined from the normalization condition:
exp q θ · x - ψ q ( θ ) d x = 1
where we replaced d μ ( x ) by d x for brevity’s sake. The function ψ q depends on q, but we hereafter neglect suffix q in most cases. Research on the q-exponential family can be found, for example, in [2,4,19]. The q-Gaussian distribution is given by
p (x , μ , σ ) = exp q - ( x - μ ) 2 2 σ 2 - ψ ( μ , σ )
and is studied in [22,23,24,25] in detail. Here, we need to introduce a vector random variable x = ( x , x 2 ) and a new parameter θ , which is a vector-valued function of μ and σ, to represent it in the standard form (7). It is an interesting observation that the domain of x in the q-Gaussian case depends on q if 0 < q < 1 . Hence, that q- and q -Gaussian are in general not absolutely continuous when q q .
It should be remarked that the q-exponential family itself is the same as the α-family of distributions in information geometry [8]. Here, we introduce a different geometrical structure, generalizing the result of [24].
We mainly use the family S n of discrete distributions over ( n + 1 ) elements X = x 0 , x 1 , , x n , although we can easily extend the results to the case of continuous random variables. Here, random variable x takes values over X. We also treat the case of 0 < q < 1 , and the limiting cases of q = 0 or 1 give the well-known ones.
Let us put p i = Prob x = x i and denote the probability distribution by vector p = p 0 , p 1 , , p n , where
i = 1 n p i = 1
The probability of x is also written as
p (x ) = i = 0 n p i δ i (x )
where
δ i (x ) = 1 , x = x i , 0 , otherwise .
Theorem 1
The family S n of discrete probability distributions has the structure of a q-exponential family for any q.
Proof 
We take log q of distribution p ( x ) of (11). For any function f ( u ) , we have
f i = 1 n p i δ i ( x ) = i = 0 n f p i δ i (x )
By taking
δ 0 (x ) = 1 - i = 1 n δ i (x )
into account, discrete distribution (11) can be rewritten in the form (8) as
log q p (x ) = 1 1 - q i = 1 n p i 1 - q - p 0 1 - q δ i ( x ) + p 0 1 - q - 1
where
p 0 = 1 - i = 1 n p i
is treated as a function of p 1 , , p n . Hence, S n is q-exponential family (6) for any q, with the following q-canonical parameters, random variables and q-potential function:
θ i = 1 1 - q p i 1 - q - p 0 1 - q , i = 1 , , n
x i = δ i ( x )
ψ ( θ ) = - log q p 0
This completes the proof. □
Note that the q-potential ψ ( θ ) and the canonical parameter θ depend on q as is seen in (17) and (19). It should also be remarked that Theorem 1 does not contradict to the theorem 1 in [19] stating that a parametrized family of probability distributions can belong to at most one q-exponential family. The author considers an m-dimensional parametrized submanifold in S n with m < n where the canonical parameter depending on q is given via the variational principle. Therefore, by denoting the q-canonical parameter by θ q R m , we can restate his theorem in terms of geometry that a linear submanifold parametrized by θ q R m is not a linear submanifold parametrized by θ q R m when q q . On the other hand, the present theorem states that there exists the q-canonical parameter θ q R n on whole S n for any q and the manifold has linear structure with respect to any θ q . This is a surprising new finding.

2.3. q-Potential Function

We study the q-geometrical structure of S. The q-log-likelihood is a linear form defined by
l q (x , θ ) = log q p (x , θ ) = i = 1 n θ i x i - ψ ( θ )
By differentiating it with respect to θ i , with the abbreviated notation i = θ i , we have
i l q ( x , θ ) = x i - i ψ ( θ )
i j l q ( x , θ ) = - i j ψ ( θ )
From this we have the following important theorem.
Theorem 2
The q-free energy or q-potential ψ q ( θ ) is a convex function of θ q .
Proof 
We omit the suffix q for simplicity’s sake. We have
i p ( x , θ ) = p ( x , θ ) q x i - i ψ
i j p ( x , θ ) = q p ( x , θ ) 2 q - 1 x i - i ψ x j - j ψ - p ( x , θ ) q i j ψ
The following identities hold:
i p ( x , θ ) d x = i p ( x , θ ) d x = 0
i j p ( x , θ ) d x = i j p ( x , θ ) d x = 0
Here, we define an important functional
h q ( θ ) = h q [ p ( x , θ ) ] = p ( x , θ ) q d x
in particular for discrete S n ,
h q (p) = i = 0 n p i q
for 0 < q < 1 . This function plays a key role in the following. From (25) and (26), by using (23) and (24), we have
i ψ ( θ ) = 1 h q ( θ ) x i p ( x , θ ) q d x
i j ψ ( θ ) = q h q ( θ ) x i - i ψ x j - j ψ p ( x , θ ) 2 q - 1 d x
The latter shows that i j ψ ( θ ) is positive-definite, and hence ψ is convex. □

2.4. q-Divergence

A convex function ψ ( θ ) makes it possible to define a divergence of the Bregman-type between two probability distributions p x , θ 1 and p x , θ 2 [8,26,27]. It is given by using the gradient = / θ ,
D q p x , θ 1 : p x , θ 2 = ψ θ 2 - ψ θ 1 - ψ θ 1 · θ 2 - θ 1
satisfying the non-negativity condition
D q p x , θ 1 : p x , θ 2 0
with equality when and only when θ 1 = θ 2 . This gives a q-divergence in S n different from the invariant divergence of S n [28]. The divergence is canonical in the sense that it is uniquely determined in accordance with dually flat structure of q-exponential family in Section 3 and Section 4. The canonical divergence is different from the α-divergence or conventional Tsallis relative entropy used in information geometry (See the discussion in the end of this subsection). Note that it is used in [16].
Theorem 3
For two discrete distributions p ( x ) = p and r ( x ) = r , the q-divergence is given by
D q [ p : r ] = 1 ( 1 - q ) h q ( p ) 1 - i = 0 n p i q r i 1 - q
Proof 
The potentials are, from (19),
ψ (p) = - log q p 0 , ψ ( r ) = - log q r 0
for p and r. We need to calculate ψ ( θ ) given in (29). In our case, x i = δ i (x ) and hence
i ψ = p i q h q ( p )
By using this and (17), we obtain (33). □
It is useful to consider a related probability distribution,
p ^ q (x ) = 1 h q [ p ( x ) ] p ( x ) q
for defining the q-expectation. This is called the q-escort probability distribution [1,4,29]. Introducing the q-expectation of random variable f ( x ) by
E p ^ [ f ( x ) ] = 1 h q [ p ( x ) ] p ( x ) q f (x ) d x
we can rewrite the q-divergence (31) for p ( x ) , r ( x ) S as
D q p (x ) : r (x ) = E p ^ log q p ( x ) - log q r ( x )
because of the relations (20) and (29). The expression (38) is also valid on the exterior of S × S when it is integrable. This is different from the definition of the Tsallis relative entropy [30,31]
D ˜ q [ p ( x ) : r ( x ) ] = 1 1 - q 1 - p ( x ) q r ( x ) 1 - q d x
which is equal to the well-known α-divergence up to a constant factor where α = 1 - 2 q (see [8,28]), satisfying the invariance criterion. We have
D q [ p ( x ) : r ( x ) ] = 1 h q [ p ( x ) ] D ˜ q [ p ( x ) : r ( x ) ]
This is a conformal transformation of divergence, as we see in the following. See also the derivation based on affine differential geometry [12].

2.5. q-Riemannian Metric

When θ 2 is infinitesimally close to θ 1 , by putting θ 1 = θ , θ 2 = θ + d θ and using the Taylor expansion, we have
D q p (x , θ ) : p (x , θ + d θ ) = g i j q ( θ ) d θ i d θ j
where
g i j ( q ) = i j ψ ( θ )
is a positive-definite matrix. We call g i j ( q ) ( θ ) the q-Fisher information matrix. When q = 1 , this reduces to the ordinary Fisher information matrix given by
g i j ( 1 ) ( θ ) = g i j F ( θ ) = E i log p ( x , θ ) j log p ( x , θ )
The positive-definite matrix g i j ( q ) ( θ ) defines a Riemannian metric on S n , giving it the q-Riemannian structure.
When a metric tensor g i j ( θ ) is transformed to
g ˜ i j ( θ ) = σ ( θ ) g i j ( θ )
by a positive function σ ( θ ) , we call it a conformal transformation. See, e.g., [13,14,15,32]. The conformal transformation of divergence induces that of the Riemannian metric.
Theorem 4
The q-Fisher information metric is given by a conformal transformation of the Fisher information metric g i j F as
g i j ( q ) ( θ ) = q h q ( θ ) g i j F ( θ )
Proof 
The q-metric is derived from the Taylor expansion of D q p : p + d p . We have
D q p ( x , θ ) : p ( x , θ + d θ ) = 1 ( 1 - q ) h q ( θ ) 1 - p ( x , θ ) q p ( x , θ + d θ ) 1 - q d x = q h q ( θ ) 1 p ( x , θ ) i p ( x , θ ) j p ( x , θ ) d x d θ i d θ j
using the identities (25) and (26). When q = 1 , this is the Fisher information given by (43). Hence, the q-Fisher information is given by (45). □
A Riemannian metric defines the length of a tangent vector X = X 1 , , X n at θ by
X 2 = g i j ( θ ) X i X j
Similarly, for two tangent vectors X and Y, their inner product is defined by
X , Y = g i j X i Y j
When X , Y vanishes, X and Y are said to be orthogonal. The orthogonality, or more generally the angle, of two vectors X and Y does not change by a conformal transformation, although their magnitudes change.

3. Dually Flat Structure of q-Exponential Family

3.1. Legendre Transformation and q-Entropy

Given a convex function ψ ( θ ) , the Legendre transformation is defined by
η = ψ ( θ )
where = / θ i is the gradient. Since the correspondence between θ and η is one-to-one, we may consider η as another coordinate system of S.
The dual potential function is defined by
φ ( η ) = max θ θ · η - ψ ( θ )
which is convex with respect to η . The original coordinates are recovered from the inverse transformation given by
θ = φ ( η )
where = / η i , so that θ and η are in dual correspondence.
The following theorem gives explicit relations among these quantities.
Theorem 5
The dual coordinates η are given by
η = E p ^ [ x ]
and the dual potential is given by
φ ( η ) = 1 1 - q 1 h q ( p ) - 1
Proof 
The relation (52) is immediate from (29). From the Legendre duality, the dual potential satisfies
φ ( η ) + ψ ( θ ) - θ · η = 0
when θ and η correspond to each other by η = ψ ( θ ) . Therefore,
φ ( η ) = i = 1 n θ i η i - ψ ( θ )
= E p ^ log q p ( x , θ )
= 1 ( 1 - q ) h q ( θ ) 1 - p q ( x , θ ) d x
= 1 1 - q 1 h q ( θ ) - 1
This is a convex function of η . □
We call the q-dual potential
φ ( η ) = E log q p ( x , θ ) = 1 1 - q 1 h q - 1
the negative q-entropy, because it is the Legendre-dual of the q-free energy ψ ( θ ) . There are various definitions of q-entropy. The Tsallis q-entropy [3] is originally defined by
H Tsallis = 1 1 - q h q - 1
while the Rényi q-entropy [33] is
H Rényi = 1 1 - q log h q
They are mutually related by monotone functions. When q 1 , all of them reduce to the Shannon entropy.
Our definition of
H q = 1 1 - q 1 - 1 h q = H Tsallis h q
is also monotonically connected with the previous ones, but is more natural from the point of view of q-geometry. The entropy H q has been known as the normalized q-entropy, which was studied in [16,34,35,36,37].

3.2. q-Dually Flat Structure

There are two dually coupled coordinate systems θ and η in q-exponential family S with two potential functions ψ ( θ ) and φ ( η ) for each q. Two affine structures are introduced by the two convex functions ψ and φ. See information geometry of dually flat space [8]. Although S is a Riemannian manifold given by the q-Fisher information matrix (45), we may nevertheless regard S as an affine manifold where θ is an affine coordinate system. They represent intensive quantities of a physical system. Dually, we introduce a dual affine structure to S, where η is another affine coordinate system. They represent extensive quantities. We can define two types of straight lines or geodesics in S due to the q-affine structures.
For two distributions p x , θ 1 and p x , θ 2 in S, a curve p x , θ ( t ) is said to be a q-geodesic connecting them, when
θ ( t ) = t θ 1 + ( 1 - t ) θ 2
where t is the parameter of the curve. Dually, in terms of dual coordinates η , when
η ( t ) = t η 1 + ( 1 - t ) η 2
holds, the curve is said to be a dual q-geodesic.
More generally, the q-geodesic connecting two distribution p 1 (x ) and p 2 (x ) is given by
log q p (x , t ) = t log q p 1 (x ) + ( 1 - t ) log q p 2 (x ) - c ( t )
where c ( t ) is a normalizing term. This is rewritten as
p ( x , t ) 1 - q = t p 1 ( x ) 1 - q + ( 1 - t ) p 2 ( x ) 1 - q - c ( t )
Dually, the dual q-geodesic connecting p 1 (x ) and p 2 (x ) is given by using the escort distributions as
p ^ (x , t ) = t p ^ 1 (x ) + ( 1 - t ) p ^ 2 (x )
Since the manifold S has a q-Riemannian structure, the orthogonality of two tangent vectors is defined by the Riemannian metric. We rewrite the orthogonality of two geodesics in terms of the affine coordinates. Let us consider two small deviations d 1 p (x ) and d 2 p (x ) of p ( x ) , that is, from p ( x ) to p (x ) + d 1 p (x ) and p (x ) + d 2 p (x ) , which are regarded as two (infinitesimal) tangent vectors of S at p ( x ) .
Lemma 1
The inner product of two deviations d 1 p and d 2 p is given by
d 1 p ( x ) , d 2 p ( x ) = d 1 p ^ (x ) d 2 log q p (x ) d x
Proof 
By simple calculations, we have
d 1 p ^ (x ) d 2 log q p (x ) d x = q h q d 1 p ( x ) d 2 p ( x ) p ( x ) d x
of which the right-hand side is the Riemannian inner product in the form of (46). □
Corollary.
Two curves θ 1 ( t ) and η 2 ( t ) , intersecting at t = 0 , are orthogonal when θ ˙ 1 ( 0 ) , η ˙ 2 ( 0 ) = 0 . Here, θ ˙ 1 ( t ) and η ˙ 2 ( t ) denote derivatives of θ 1 ( t ) and η 2 ( t ) by t, respectively.
The two geodesics and the orthogonality play a fundamental role in S as will be seen in the following.

4. q-Pythagorean and q-Max-Ent Theorems

A dually flat Riemannian manifold admits the generalized Pythagorean theorem and the related projection theorem [8]. We state them in our case.
q-Pythagorean Theorem.
For three distributions p 1 ( x ) , p 2 ( x ) and p 3 ( x ) in S, it holds that
D q p 1 : p 2 + D q p 2 : p 3 = D q p 1 : p 3
when the dual geodesic connecting p 1 ( x ) and p 2 ( x ) is orthogonal at p 2 ( x ) to the geodesic connecting p 2 ( x ) and p 3 ( x ) (see Figure 1).
Figure 1. q-Pythagorean theorem.
Figure 1. q-Pythagorean theorem.
Entropy 13 01170 g001
Given a distribution p ( x ) S and a submanifold M S , a distribution r ( x ) M is said to be the q-projection (dual q-projection) of p ( x ) to M, when the q-geodesic (dual q-geodesic) connecting p ( x ) and r ( x ) is orthogonal to M at r ( x ) (Figure 2).
Figure 2. q-projection of p to M.
Figure 2. q-projection of p to M.
Entropy 13 01170 g002
q-Projection Theorem.
Let M be a submanifold of S. Given p ( x ) S , the point r ( x ) M that minimizes D q [ p (x ) : r (x ) ] is given by the dual q-projection of p ( x ) to M. The point r ( x ) M that minimizes D q [ r (x ) : p (x ) ] is given by the q-projection of p ( x ) to M.
We show that the well-known q-max-ent theorem in the case of Tsallis q-entropy [1,4,9,11] is a direct consequence of the above q-Pythagorean and q-projection theorems.
q-Max-Ent Theorem.
Probability distributions maximizing the q-entropies H Tsallis , H R e ´ nyi and H q under q-linear constraints for m random variables c k ( x ) and various values of a k
E p ^ c k (x ) = a k , k = 1 , , m
form a q-exponential family
log q p ( x , θ ) = i = 1 m θ i c i ( x ) - ψ ( θ )
The proof is easily obtained by the standard analytical method. Here, we give a geometrical proof. Let us consider the subspace M * S whose member p ( x ) satisfies the m constraints
E p ^ c k (x ) = p ^ ( x ) c k ( x ) d x = a k , k = 1 , , m .
Since the constraints are linear in the dual affine coordinates η or p ^ ( x ) , M * is a linear subspace of S with respect to the dual affine connection. Let p 0 ( x , θ 0 ) be the uniform distribution defined by θ 0 = 0 , which implies p 0 ( x , θ 0 ) = const from (6). Let p ¯ ( x ) M * be the q-projection of p 0 ( x ) to M * (Figure 3). Then, the divergence D q p : p 0 from p ( x ) M * to p 0 ( x ) is decomposed as
D q p : p 0 = D q p : p ¯ + D q p ¯ : p 0
Let η p be the dual coordinates of p ( x ) . Since the divergence is written as
D q p : p 0 = ψ θ 0 + φ η p - θ 0 · η p
the minimizer of D q p : p 0 among p ( x ) M * is just p ¯ ( x ) , which is also the maximizer of the entropy - φ η p .
The trajectories of p ¯ ( x ) for various values of a k form a flat subspace orthogonal to M * , implying that they form a q-exponential family of the form (6) (see Figure 3). The tangent directions d p ^ ( x ) of M * satisfies
d p ^ ( x ) c k ( x ) d x = 0 , k = 1 , , m .
Hence, a q-exponential family of the form
log q p ( x , ξ ) = i = 1 m ξ i d i ( x ) - ψ ( ξ )
is orthogonal to M * , when
d p ^ ( x ) d log q p ( x , ξ ) d x = 0
This implies that d i ( x ) = c i ( x ) . Hence, we have the q-exponential family (72) that maximizes the q-entropies.
Figure 3. q-Max-Ent theorem.
Figure 3. q-Max-Ent theorem.
Entropy 13 01170 g003

5. q-Bayesian MAP Estimator

Given N iid observations x 1 , , x N from a statistical model M = p ( x , ξ ) , we have
p x 1 , , x N , ξ = i = 1 N p x i , ξ
Since log q u is a monotonically increasing function, the maximizer of the q-likelihood
l q ( x 1 , , x N , ξ ) = log q p ( x 1 , , x N , ξ )
is the same as the ordinary maximum likelihood estimator (mle). However, the maximizer of the q-escort distribution that maximizes the q-escort log-likelihood,
1 q l ^ ( x 1 , , x N , ξ ) = log p ( x 1 , , x N , ξ ) - 1 q log h q ( ξ )
is different from this. We show that the q-mle is a Bayesian MAP (maximum a posteriori probability) estimator. This clarifies the meaning of the q-escort mle.
The q-escort mle is the maximizer of the q-escort distribution,
ξ ^ q = arg max p ^ x 1 , , x N , ξ
Theorem 6
The q-escort mle ξ ^ q is the Bayesian MAP estimator with the prior distribution
π ( ξ ) = h q ( ξ ) - N / q
Proof 
The Bayesian MAP is the maximizer of the posterior distribution with prior π ( ξ )
p ξ | x 1 , , x N = π ( ξ ) p x 1 , , x N , ξ p x 1 , , x N
which also maximizes
π ( ξ ) p x 1 , , x N , ξ q , for   q > 0
On the other hand, the q-escort mle is the maximizer of
p ^ x 1 , , x N , ξ = i = 1 N p ^ ( x i , ξ ) = i = 1 N p x i , ξ q h q ( ξ )
Hence, when
π ( ξ ) = h q ( ξ ) - N / q
the two estimators are identical. □
The theorem shows that the Bayesian prior has a peak at the maximizer of our q-entropy H q .

6. Conclusions

Much attention has been recently paid to the probability distributions subject to the power law, instead of the exponential law, since Tsallis proposed the q-entropy and related theories. The power law is also found in various communication networks. It is now a hot topic of research.
However, we do not have a geometrical foundation while that for the ordinary family of probability distributions is given by information geometry [8]. The present paper tried to give a geometrical foundation to the q-family of probability distributions. We introduced a new notion of the q-geometry. The q-structure is ubiquitous in the sense that the family of all the discrete probability distributions (and the family of all the continuous probability distributions, if we neglect delicate problems involved in the infinite dimensionality) belongs to the q-exponential family of distributions for any q. That is, we can introduce the q-geometrical structure to an arbitrary family of probability distributions, because any parametrized family of probability distributions forms a submanifold embedded in the entire manifold.
The q-structure consists of a Riemannian metric together with a pair of dually coupled affine connections, which sits in the framework of the standard information geometry. However, the q-structure is essentially different from the standard one derived by the invariance criterion of the manifold of probability distributions. We have a novel look on the theory related to the q-entropy from a viewpoint of conformal transformation. This leads us to unified definitions of various quantities such as the q-entropy, q-divergence, q-potential function and their duals, as well as new interpretations of known quantities.
This is a geometrical foundation and we expect that the paper contributes to provide further developments in this field.

References

  1. Tsallis, C. Introduction to Nonextensive Statistical Mechanics; Springer: New York, NY, USA, 2009. [Google Scholar]
  2. Naudts, J. Generalised Thermostatistics; Springer: London, UK, 2011. [Google Scholar]
  3. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  4. Naudts, J. The q-exponential family in statistical Physics. Cent. Eur. J. Phys. 2009, 7, 405–413. [Google Scholar] [CrossRef]
  5. Suyari, H. Mathematical structures derived from the q-multinomial coefficient in Tsallis statistics. Physica A 2006, 368, 63–82. [Google Scholar] [CrossRef]
  6. Suyari, H.; Wada, T. Multiplicative duality, q-triplet and μ, ν, q-relation derived from the one-to-one correspondence between the (μ, ν)-multinomial coefficient and Tsallis entropy Sq. Physica A 2008, 387, 71–83. [Google Scholar] [CrossRef]
  7. Barndorff-Nielsen, O.E. Information and Exponential Families in Statistical Theory; Wiley: New York, NY, USA, 1978. [Google Scholar]
  8. Amari, S.; Nagaoka, H. Methods of Information Geometry (Translations of Mathematical Monographs); Oxford University Press: Oxford, UK, 2000. [Google Scholar]
  9. Ohara, A. Geometry of distributions associated with Tsallis statistics and properties of relative entropy minimization. Phys. Lett. A 2007, 370, 184–193. [Google Scholar] [CrossRef]
  10. Furuichi, S. On the maximum entropy principle and the minimization of the Fisher information in Tsallis statistics. J. Math. Phys. 2009, 50, 013303. [Google Scholar] [CrossRef]
  11. Ohara, A. Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation. Eur. Phys. J. B 2009, 70, 15–28. [Google Scholar] [CrossRef]
  12. Ohara, A.; Matsuzoe, H.; Amari, S. A dually flat structure with escort probability and its application to alpha-Voronoi diagrams. In arXiv; 2010; arXiv:cond-mat/1010.4965. [Google Scholar]
  13. Kurose, T. On the Divergence of 1-conformally Flat Statistical Manifolds. Tôhoku Math. J. 1994, 46, 427–433. [Google Scholar] [CrossRef]
  14. Matsuzoe, H. Geometry of contrast functions and conformal geometry. Hiroshima Math. J. 1999, 29, 175–191. [Google Scholar]
  15. Kurose, T. Conformal-projective geometry of statistical manifolds. Interdisciplinary Information Sciences 2002, 8, 89–100. [Google Scholar] [CrossRef]
  16. Yamano, T. Information theory based on non-additive information content. Phys. Rev. E 2001, 63, 046105. [Google Scholar] [CrossRef]
  17. Naudts, J. Estimators, escort probabilities, and phi-exponential families in statistical physics. J. Ineq. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
  18. Pistone, G. kappa-exponential models from the geometrical viewpoint. Eur. Phys. J. B 2009, 70, 29–37. [Google Scholar] [CrossRef]
  19. Naudts, J. Generalized exponential families and associated entropy functions. Entropy 2008, 10, 131–149. [Google Scholar] [CrossRef]
  20. Kaniadakis, G.; Lissia, M.; Scarfone, A.M. Deformed logarithms and entropies. Physica A 2004, 340, 41–49. [Google Scholar] [CrossRef]
  21. Yamano, T. Some properties of q-logarithmic and q-exponential functions in Tsallis statistics. Physica A 2002, 305, 486–496. [Google Scholar] [CrossRef]
  22. Tsallis, C.; Levy, S.V.F.; Souza, A.M.C.; Maynard, R. Statistical-mechanical foundation of the ubiquity of Levy distributions in nature. Phys. Rev. Lett. 1995, 75, 3589–3593, Erratum Phys. Rev. Lett. 1996, 77, 5442.. [Google Scholar] [CrossRef]
  23. Tanaka, M. A consideration on the family of q-Gaussian distributions. IEICE (Japan) 2002, J85–D2, 161–173. (in Japanese). [Google Scholar]
  24. Zhang, Z.; Zhong, F.; Sun, H. Information geometry of the power inverse Gaussian distribution. Appl. Sci. 2007, 9, 194–203. [Google Scholar]
  25. Ohara, A.; Wada, T. Information geometry of q-Gaussian densities and behaviors of solutions to related diffusion equations. J. Phys. A: Math. Theor. 2010, 43, 035002. [Google Scholar] [CrossRef]
  26. Wada, T. Generalized log-likelihood functions and Bregman divergences. J. Math. Phys. 2009, 50, 113301. [Google Scholar] [CrossRef]
  27. Cichocki, A.; Cruces, S.; Amari, S. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 2011, 13, 134–170. [Google Scholar] [CrossRef]
  28. Amari, S. α-divergence is unique, belonging to both f-divergence and Bregman divergence classes. IEEE Trans. Inform. Theor. 2009, 55, 4925–4931. [Google Scholar] [CrossRef]
  29. Beck, C.; Schlögl, F. Thermodynamics of Chaotic Systems; Cambridge University Press: Cambridge, UK, 1993. [Google Scholar]
  30. Borland, L.; Plastino, A.R.; Tsallis, C. Information gain within nonextensive thermostatistics. J. Math. Phys. 1998, 39, 6490–6501. [Google Scholar] [CrossRef]
  31. Furuichi, S. Fundamental properties of Tsallis relative entropy. J. Math. Phys. 2004, 45, 4868–4877. [Google Scholar] [CrossRef]
  32. Okamoto, I.; Amari, S.; Takeuchi, K. Asymptotic theory of sequential estimation procedures for curved exponential families. Ann. Stat. 1991, 19, 961–981. [Google Scholar] [CrossRef]
  33. Rényi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; pp. 547–561.
  34. Landsberg, P.T.; Vedral, V. Distributions and channel capacities in generalized statistical mechanics. Phys. Lett. A 1998, 247, 211–217. [Google Scholar] [CrossRef]
  35. Rajagopal, A.K.; Abe, S. Implications of form invariance to the structure of nonextensive entropies. Phys. Rev. Lett. 1999, 83, 1711–1714. [Google Scholar] [CrossRef]
  36. Yamano, T. Source coding theorem based on a nonadditive information content. Physica A 2002, 305, 190–195. [Google Scholar] [CrossRef]
  37. Wada, T.; Scarfone, A.M. Connections between Tsallis’ formalisms employing the standard linear average energy and ones employing the normalized q-average energy. Phys. Lett. A 2005, 335, 351–362. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Amari, S.-i.; Ohara, A. Geometry of q-Exponential Family of Probability Distributions. Entropy 2011, 13, 1170-1185. https://doi.org/10.3390/e13061170

AMA Style

Amari S-i, Ohara A. Geometry of q-Exponential Family of Probability Distributions. Entropy. 2011; 13(6):1170-1185. https://doi.org/10.3390/e13061170

Chicago/Turabian Style

Amari, Shun-ichi, and Atsumi Ohara. 2011. "Geometry of q-Exponential Family of Probability Distributions" Entropy 13, no. 6: 1170-1185. https://doi.org/10.3390/e13061170

Article Metrics

Back to TopTop