Next Article in Journal
Entropy Generation in Magnetohydrodynamic Mixed Convection Flow over an Inclined Stretching Sheet
Next Article in Special Issue
Perturbative Treatment of the Non-Linear q-Schrödinger and q-Klein–Gordon Equations
Previous Article in Journal / Special Issue
Active and Purely Dissipative Nambu Systems in General Thermostatistical Settings Described by Nonlinear Partial Differential Equations Involving Generalized Entropy Measures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Dissipation of Relative Entropy by Diffusion Flows

Department of Information Sciences, Ochanomizu University, 2-1-1, Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan
Entropy 2017, 19(1), 9; https://doi.org/10.3390/e19010009
Submission received: 5 October 2016 / Revised: 18 December 2016 / Accepted: 22 December 2016 / Published: 27 December 2016

Abstract

:
Given a probability measure, we consider the diffusion flows of probability measures associated with the partial differential equation (PDE) of Fokker–Planck. Our flows of the probability measures are defined as the solution of the Fokker–Planck equation for the same strictly convex potential, which means that the flows have the same equilibrium. Then, we shall investigate the time derivative for the relative entropy in the case where the object and the reference measures are moving according to the above diffusion flows, from which we can obtain a certain dissipation formula and also an integral representation of the relative entropy.

1. Introduction

We shall begin with the definitions and fix some notations. Several results in the literature that we will use later are also gathered in this section. Probability measures on R n in this paper are always assumed to be absolutely continuous with respect to the Lebesgue measure. Thus, we say that a probability measure μ has the continuous density f, which means d μ ( x ) = f ( x ) d x , and the measure μ is sometimes identified with its density f. Throughout this paper, if we simply write an integral symbol without specifying any domain, which means R n is the integral over the whole space on R n .
Definition 1.
For a probability measure μ on R n with the density f, the entropy of μ (or f) is defined by
H ( μ ) = H ( f ) = f log f d x ,
and, if the density f is smooth, then we define the Fisher information of μ (or f) as
I ( μ ) = I ( f ) = f 2 f d x = log f 2 f d x .
For an R n -valued random variable X, if X is distributed according to the probability measure μ, then we define the entropy H ( X ) and the Fisher information I ( X ) of X by H ( X ) = H ( μ ) and I ( X ) = I ( μ ) , respectively.
In a one-dimensional case, the gradient log f in (2) is usually called the score function of X (or μ) and denoted by ρ X (or ρ μ ). For a differentiable function ξ with bounded derivative, the score function behaves that
ξ ( x ) ρ μ ( x ) d μ ( x ) = ξ ( x ) d μ ( x ) ,
which is known as Stein’s identity.
Let X be an R n -valued random variable distributed according to the probability measure μ, and Z be an n-dimensional standard (with mean vector 0 and identity covariance matrix I n ) Gaussian random variable independent of X. Then, for τ > 0 , the independent sum X + τ Z is called the Gaussian perturbation of X.
We denote by μ τ the probability measure corresponding to the Gaussian perturbation X + τ Z , and f τ stands for the density of μ τ . It is fundamental that the density function f τ satisfies the heat equation:
τ f τ = 1 2 Δ f τ ,
where Δ = · is the Laplacian operator.
The remarkable relationship between the entropy and the Fisher information can be established by the Gaussian perturbation as follows (see, for instance, [1] or [2]), which is known as the de Bruijn identity.
Lemma 1.
Let X be an R n -valued random variable distributed according to the probability measure μ. Then, for the Gaussian perturbation, it holds that
d d τ H ( X + τ Z ) = 1 2 I ( X + τ Z ) f o r τ > 0 .
Namely, using the density f τ of the Gaussian perturbed measure μ τ , we can write
d d τ H ( f τ ) = 1 2 I ( f τ ) f o r τ > 0 .
Definition 2.
Let μ and ν be probability measures on R n with μ ν (μ is absolutely continuous with respect to ν). We denote the probability density functions of μ and ν by f and g, respectively. Then, as the ways of indicating the difference between two measures, we shall introduce the following quantities: the relative entropy H ( μ | ν ) of μ with respect to ν, H ( f | g ) of f with respect to g, is defined by
H ( μ | ν ) = H ( f | g ) = log f g f d x .
Although it does not appear to have received widespread attention, it is natural to define the relative Fisher information I ( μ | ν ) of μ with respect to ν, I ( f | g ) of f with respect to g as (see, for instance, [3])
I ( μ | ν ) = I ( f | g ) = | | log f g | | 2 f d x = 4 | | f g | | 2 g d x ,
where the relative density f / g is assumed to be sufficiently smooth such that the above expressions make sense.
The relative entropy H ( f | g ) and the relative Fisher information I ( f | g ) take non-negative values and 0 if and only if f ( x ) = g ( x ) for almost all x R n . Similar to Definition 1, for random variables X and Y with the distributions μ and ν, the relative entropy and the relative Fisher information of X with respect to Y are defined as H ( X | Y ) = H ( μ | ν ) and I ( X | Y ) = I ( μ | ν ) , respectively.
In view of the de Bruijn identity, one might expect that there is a similar connection between the relative entropy and the relative Fisher information. Indeed, Vérdu in [4] investigated the derivative in τ of H ( X + τ Z | Y + τ Z ) for two Gaussian perturbations, and derived the following identity of the de Bruijn type via minimum mean-square error (MMSE) in estimation theory.
Lemma 2.
Let X and Y be R n -valued random variables distributed according to the probability measure μ and ν, respectively. Then, for the Gaussian perturbations, it holds that
d d τ H X + τ Z | Y + τ Z = 1 2 I X + τ Z | Y + τ Z f o r τ > 0 ,
that is,
d d τ H μ τ | ν τ = 1 2 I μ τ | ν τ f o r τ > 0 ,
where μ τ and ν τ are the corresponding measures of the Gaussian perturbations.
An alternative proof of this identity by direct calculation with integrations by part has been given in [5]. It should be noted here that the reference measure does move by the same heat equation in the formula of Lemma 1.
Other derivative formulas of the relative entropy have been investigated in [6,7,8], which are closely related to the theory of optimal transport and functional inequalities of informations. It is common in these fields that the reference measure is unchanged with the equilibrium measure. Here, we shall recall such a derivative formula and will list some useful related results.
Let V be a C 1 map on R n and consider the probability measure κ by
d κ ( x ) = 1 Z e V ( x ) d x ,
where Z = e V ( x ) d x , the normalization constant. Such a probability measure κ is called the equilibrium (or Gibbs) measure for the potential function V.
Given a probability measure μ 0 , we consider the diffusion flow of probability measures μ t t 0 associated with the gradient V , that is, the density f t of the measure μ t ( t > 0 ) is defined as the solution to the partial differential equation:
t f t = · ( f t + f t V ) ,
which is called the Fokker–Planck equation. It is easily found that the long-time asymptotically stationary measure for Fokker–Planck Equation (12) is given by the above equilibrium (Gibbs) measure.
Setting the equilibrium measure as the reference, we can understand the relationship between the relative entropy and the relative Fisher information via the Fokker–Planck equation as follows (see, for instance, [8]):
Proposition 1.
Let μ t t 0 be a diffusion flow of the probability measure associated with the gradient V , and let κ be the equilibrium measure for the potential function V. Then, it holds the differential formula:
d d t H μ t | κ = I μ t | κ .
Definition 3.
A C 2 function V on R n is called strictly K-convex if there exists a positive constant K > 0 such that Hess ( V ) K I n .
In the case where the potential function V has the above convexity, we can obtain the inequality between the relative entropy and the relative Fisher information with respect to the equilibrium measure for the potential V, which is known as the logarithmic Sobolev inequality (see, for instance, [7,9,10]).
Theorem 1.
Let κ be the equilibrium measure for the potential function V. If the potential function V is strictly K-convex, then it follows that, for any probability measure μ ( κ ) ,
H μ | κ 1 2 K I μ | κ .
Combining Proposition 1 and Theorem 1, we can obtain the following convergence of the diffusion flow to the equilibrium:
Proposition 2.
Let μ t t 0 be the diffusion flow of probability measures by the Fokker–Planck equation associated with the strictly K-convex potential V, and κ be the equilibrium for the potential V. Then, it follows that
d d t H μ t | κ 2 K H μ t | κ ,
which implies that μ t converges exponentially fast, as t , to the equilibrium κ in the relative entropy. Namely,
H ( μ t | κ ) e 2 K t H ( μ 0 | κ ) f o r t > 0 .
The diffusion flow by the quadratic potential V ( x ) = x 2 2 is called the Ornstein–Uhlenbeck flow, and the corresponding Fockker–Planck equation is reduced to
t f t = · ( f t + f t x ) for t > 0 .
In this case, we can obtain the explicit solution f t , and it follows that the equilibrium measure becomes the standard Gaussian.
Furthermore, it is known that the solution to Equation (17) can be represented in terms of random variables as follows: let X be a random variable on R n having the initial density f 0 , and Z be an n-dimensional standard Gaussian random variable independent of X. Then, the density function of the independent sum
X t = e 2 t X + 1 e 2 t Z for t > 0
gives the solution f t to partial differential Equation (17). Since the Ornstein–Uhlenbeck flow has Gaussian equilibrium, it has been widely used as the technical tool for the proof of Gross’s logarithmic Sobolev inequality [9] and Talagrand inequality [11].
Here, we shall mention one more useful result concerned with the convergence of the relative entropy, which is called the Csiszár–Kullback–Pinsker inequality (see, for instance, [12] or [13]).
Lemma 3.
The convergence in the relative entropy is stronger than in L 1 -norm, that is, it holds for probability densities that
1 2 | f ( x ) g ( x ) | d x 2 H ( f | g ) .
The problem of finding the time derivative of the relative entropy between two densities under the same continuity equation has been investigated in [14,15]. In this paper, we will treat the Focker–Planck equation with strictly convex potential as our continuity equation, which is because the first natural extension of the heat equation and the similar dissipation formula in Lemma 2 of Vérdu can be derived by the fundamental method, the integration by parts like in [5].
The time integration of our formula will give an integral representation of the relative entropy. Applying this representation to the Ornstein–Uhlenbeck flows, we can give an extension of the formula for entropy gap.

2. Dissipation of the Relative Entropy

We will calculate the time derivative of the relative entropy for the case where the objective and the reference measures are evolved by the Fokker–Planck equation with the same strictly convex potential. We shall begin with describing our situation of calculation precisely.
• Situation A:
Let μ 0 and ν 0 be Lebesgue absolutely continuous probability measures on R n with μ 0 ν 0 , and let μ t and ν t ( t 0 ) be the diffusion flows by the Fokker–Planck equation with the strictly K-convex potential function V starting from μ 0 and ν 0 , respectively. Here, the growth rate of the potential function V is assumed to be at most polynomial.
We assume that, for t 0 , the measures μ t and ν t have finite Fisher information I ( μ t ) < and I ( ν t ) < , and are absolutely continuous with respect to the Lebesgue measure, the densities f t and g t of which are sufficiently smooth and rapidly decreasing at infinity. Furthermore, it is naturally required that μ t ν t .
Here, we shall impose the following assumption on the relative densities, which does not cause any loss of generality but for simplicity of the proof.
• Assumption on the relative densities D:
Let d κ ( x ) = e V ( x ) d x be the equilibrium measure of the potential function V, where the potential function V is normalized (shifted) so that Z = 1 . The Fokker–Planck equation will not be made of any effect by this normalization (shift) because it depends only on the gradient V .
We may assume that the relative densities f t ( x ) e V ( x ) and g t ( x ) e V ( x ) are bounded away from zero and infinity for sufficiently large t. Namely, there exist uniform constants 0 < m 1 M 1 < and 0 < m 2 M 2 < for sufficiently large t such that
m 1 f t ( x ) e V ( x ) M 1 for x R n
and
m 2 g t ( x ) e V ( x ) M 2 for x R n .
Hence, the relative density f t g t is also bounded away from zero and infinity for sufficiently large t, that is, there exist uniform constants 0 < m 0 M 0 < such that
m 0 f t ( x ) g t ( x ) M 0 for x R n ,
for sufficiently large t.
Remark 1.
The above technical assumptions on the relative densities are due to the non-linear approximation argument given by Otto and Villani in [8] and the following fact: in our situation, it follows that the density f t of the diffusion flow of probability measure by the Fokker–Planck equation converges to the equilibrium e V in L 1 -norm as t by combining Proposition 2 with Lemma 3—so does g t .
Proposition 3.
Let μ t and ν t ( t 0 ) be the flows of the probability measures on R n by the Fokker–Planck equation as in Situation A with the assumptions on the relative densities D. Then, it holds that, for t > 0 ,
d d t H ( μ t | ν t ) = I ( μ t | ν t ) .
Proof. 
We expand the derivative of the relative entropy as
d d t H ( f t | g t ) = d d t ( log f t ) f t d x d d t ( log g t ) f t d x .
Since we know that f t and g t converge to the equilibrium e V in L 1 , and that the time derivatives t f t and t g t are converging to 0 as t , and the densities f t , g t and | t f t | , | t g t | are uniformly bounded for t. Furthermore, by our assumptions on the relative density, f t g t is bounded away from zero and infinity. Hence, we are allowed to exchange integration and t-differentiation, which is justified by a routine argument with the bounded convergence theorem (see, for instance, [2] and also [8]).
Then, the first term on the right-hand side of (24) is calculated with the Fokker–Planck equation as follows:
d d t ( log f t ) f t d x = t f t f t f t d x + ( log f t ) ( t f t ) d x = t f t d x + ( log f t ) Δ f t + · f t V d x = t f t d x ( I ) + ( log f t ) Δ f t d x ( II ) + ( log f t ) · f t V d x ( III ) .
The integral (I) in (25) is clearly 0. By applying integration by parts, the integral (II) can be written by
( II ) = log f t · f t d x = | | f t f t | | 2 f t d x .
Here, it should be noted that ( log f t ) f t will vanish at infinity by the following observation: if we factorize it as
( log f t ) f t = 2 ( f t log f t ) f t f t ,
then, as μ t has finite Fisher information I ( μ t ) < , f t f t has finite L 2 -norm in L 2 R n , d x and must be bounded at infinity. Furthermore, f t log f t will vanish at infinity by the limit formula, lim ξ + 0 ξ log ξ = 0 .
The integral (III) in (25) becomes
( III ) = f t · V d x ,
by the following observations: since f t is rapidly decreasing at infinity and the growth rate of the potential function V is at most polynomial by our assumption, we have lim | x | f t V = 0 . The limit lim | x | f t log f t = 0 is the same as above. Thus,
( log f t ) f t V = 2 f t log f t f t V
will vanish at infinity.
Substituting (26) and (28) into (25), we can obtain
d d t ( log f t ) f t d x = | | f t f t | | 2 f t d x f t · V d x .
Next, we shall see the second term on the right-hand side of (24), which can be reformulated by the Fokker–Planck equation as follows:
d d t ( log g t ) f t d x = t g t g t f t d x ( log g t ) ( t f t ) d x = Δ g t + · ( g t V ) g t f t d x ( log g t ) ( Δ f t + · ( f t V ) ) d x = Δ g t f t g t d x ( IV ) · g t V f t g t d x ( V ) ( log g t ) Δ f t d x ( VI ) ( log g t ) · ( f t V ) d x ( VII ) .
The integral (IV) in (31) can be reformulated by applying integration by parts as follows:
( IV ) = g t · f t g t d x = g t · g t ( f t ) f t ( g t ) g t 2 d x = g t g t · f t f t f t d x | | g t g t | | 2 f t d x ,
where we can see that g t f t g t vanishes at infinity by the following observation: factorize it as
g t f t g t = g t g t f t g t f t ,
and then the boundedness of f t g t is by our assumptions on the relative density and that of g t g t comes from the finiteness of Fisher information I ( ν t ) < .
Applying integration by parts again, the integral (V) in (31) becomes
( V ) = g t V · f t g t d x = g t V · g t ( f t ) f t ( g t ) g t 2 d x = V · f t d x V · g t g t f t d x .
Because the growth rate of the function V is at most polynomial and f t is rapidly decreasing at infinity, thus g t V f t g t = f t V will vanish at infinity.
The integral (VI) in (31) can be reformulated as
( VI ) = log g t · f t = g t g t · f t f t f t d x ,
where we can find that ( log g t ) f t vanishes at infinity by factorizing
( log g t ) f t = 2 ( g t log g t ) f t f t f t g t ,
with the assumption of I ( μ t ) < and the boundedness of f t g t .
The last integral (VII) in (31) becomes
( VII ) = log g t · ( f t V ) d x = g t g t · V f t d x ,
where ( log g t ) f t V will vanish with the following factorization by the same reasons as above:
( log g t ) f t V = 2 ( g t log g t ) f t g t ( f t V ) .
In reformulation of the integrals (VI) and (VII), we have, of course, used integration by parts.
Substituting the equations from (32) to (37) into (31), we can have
d d t ( log g t ) f t d x = | | g t g t | | 2 f t d x + 2 f t f t · g t g t f t d x + V · f t d x .
Finally, combining (30) and (39), we obtain that
d d t H ( f t | g t ) = | | f t f t | | 2 f t d x + 2 f t f t · g t g t f t d x | | g t g t | | 2 f t d x = | | log f t log g t | | 2 f t d x = I ( f t | g t ) .
Remark 2.
The assumption of dropping surface terms on integrations by parts, that is, vanishing at infinity in the proof of Proposition 3, is rather common in various physics, which has been also repeatedly employed in a series of works by Plastino et al., for instance, in [16,17].
Next, we will see the convergence of the relative entropy for the pair of time evolutes by the same Fokker–Planck equations.
Proposition 4.
Under Situation A with Assumption D, the relative entropy H ( f t | g t ) converges exponentially fast to 0 as t .
Proof. 
We first expand the relative entropy H ( f t | g t ) as
H ( f t | g t ) = log f t ( x ) log g t ( x ) f t ( x ) d x = log f t ( x ) + V ( x ) log g t ( x ) + V ( x ) f t ( x ) d x ,
where V ( x ) is the potential function of the Fokker–Planck equation. Then, we obtain
H ( f t | g t ) log f t ( x ) + V ( x ) f t ( x ) d x + | log g t ( x ) + V ( x ) | f t ( x ) d x .
Since the first term on the right-hand side of (42) is the relative entropy H ( f t | e V ) , we concentrate our attention on the second term.
We put the set P R n as P = x R n : log g t ( x ) + V ( x ) 0 , and then we have
| log g t ( x ) + V ( x ) | = log g t ( x ) e V ( x ) g t ( x ) e V ( x ) 1 for x P , log e V ( x ) g t ( x ) e V ( x ) g t ( x ) 1 for x P c = R n P .
Thus, for sufficiently large t, it can be evaluated as follows:
| log g t ( x ) + V ( x ) | f t ( x ) d x P g t ( x ) e V ( x ) 1 f t ( x ) d x + P c e V ( x ) g t ( x ) 1 f t ( x ) d x = P | g t ( x ) e V ( x ) | f t ( x ) e V ( x ) d x + P c | g t ( x ) e V ( x ) | f t ( x ) g t ( x ) d x M 1 P | g t ( x ) e V ( x ) | d x + M 0 P c | g t ( x ) e V ( x ) | d x ,
where in the last inequality is by virtue of the assumption on the relative densities. Consequently, we can have the estimation that
| log g t ( x ) + V ( x ) | f t ( x ) d x M ¯ | g t ( x ) e V ( x ) | d x ,
with the positive constant M ¯ = max { M 0 , M 1 } .
As we have mentioned in Lemma 3 that the relative entropy controls the L 1 -norm, we have
| g t ( x ) e V ( x ) | d x 2 H g t | e V .
Thus, we obtain that, for sufficiently large t,
H f t | g t H f t | e V + M ¯ 2 H g t | e V .
Taking the limit t , it follows that H f t | g t 0 exponentially fast because H f t | e V and H g t | e V converge to 0 in exponentially fast with rate 2 K . ☐
By the dissipation formula in Proposition 3, together with the above convergence, we can obtain the following integral representation of relative entropy.
Theorem 2.
Let f t and g t ( t 0 ) be the flows of probability densities on R n by the Fokker–Planck equation under situation A with the assumptions on relative densities D. Then, we have the integral representation of the relative entropy
H f 0 | g 0 = 0 I f t | g t d t .
If we choose particularly the equilibrium e V as the initial measure of the reference g 0 , then it is stationary such that g t = e V ( t 0 ) . Hence, as the direct consequence of the above theorem, we have the following integral formula:
Corollary 1.
H f 0 | e V = 0 I f t | e V d t .

3. An Application to the Entropy Gap

In this section, we shall apply the formula of the time integration in Theorem 2 to the Ornstein–Uhlenbeck flows, which gives an extension of the formula of the entropy gap. For simplicity, we will consider the one-dimensional case in this section.
Among random variables with unit variance, the Gaussian has the largest entropy. Let X be a standardized (mean 0 and variance 1) random variable, and let Z be a standard Gaussian random variable. Then, the quantity H ( Z ) H ( X ) is called the entropy gap or the non-Gaussianity, which coincides, of course, with the relative entropy H ( X | Z ) . It is known (see, for instance, [18]) that this entropy gap can be written as the integration of the Fisher information. Namely,
H ( X | Z ) = 0 I X t 1 d t ,
where X t is the time evolute at t of the random variable X by the Ornstein–Uhlenbeck semigroup in (17). It is easy to find that our formula (48) of Theorem 2 covers (50) as one of the special cases.
In the formula (48) of Theorem 2, even for the case of the quadratic potential V ( x ) = x 2 2 , we can choose the initial reference measure ν 0 more freely other than the standard Gaussian as we will illustrate below. Let X be a centered random variable of variance σ 2 (not a unit in general) and G be a centered Gaussian of the same variance σ 2 . Then, applying the integral formula with the potential function V ( x ) = x 2 2 , the relative entropy H ( X | G ) , which is equal to the entropy gap H ( G ) H ( X ) , can be written by the integration
H ( X | G ) = 0 I X t | G t d t ,
where X t and G t are the time evolutes by the Ornstein–Uhlenbeck semigroup of X and G, respectively.
By formula (18), X t and G t can be given as
X t = e 2 t X + 1 e 2 t Z , G t = e 2 t G + 1 e 2 t Z ,
where Z is a standard Gaussian random variable independent of X and G.
Since the time evolute G t becomes a Gaussian random variable of variance e 2 t σ 2 + ( 1 e 2 t ) , the score function of which is given by
ρ G t ( x ) = x Var ( G t ) = x e 2 t σ 2 + ( 1 e 2 t ) ,
and it is easy to find that the time evolute X t has the same variance as of G t , Var ( X t ) = Var ( G t ) .
We denote by μ t the probability distribution of X t , and let ρ X t and ρ G t be the score functions of the random variables X t and G t , respectively. Then, by direct calculation, we obtain that
x ρ X t ( x ) d μ t ( x ) = 1 ,
which corresponds to the special case of the Stein’s identity in (3).
With the above observations, we can reformulate the relative Fisher information I X t | G t as follows:
I X t | G t = ρ X t ( x ) ρ G t ( x ) 2 d μ t ( x ) = ρ X t ( x ) 2 2 ρ X t ( x ) ρ G t ( x ) + ρ G t ( x ) 2 d μ t ( x ) = ρ X t ( x ) 2 d μ t ( x ) + 2 Var ( G t ) x ρ X t ( x ) d μ t ( x ) + 1 Var ( G t ) 2 x 2 d μ t ( x ) = I ( X t ) 2 Var ( G t ) + Var ( X t ) Var ( G t ) 2 = I ( X t ) 1 Var ( X t ) ,
where we have used formula (54) and the fact that Var ( G t ) = Var ( X t ) in the last equality. Now, the following formula can be obtained as a direct consequence of Theorem 2.
Proposition 5.
Let X be a centered random variable of finite variance σ 2 , and let G be a centered Gaussian variable of the same variance σ 2 . For t 0 , we denote by X t the evolute of X 0 = X by the Ornstein–Uhlenbeck semigroup, that is,
X t = e 2 t X + 1 e 2 t Z ,
where Z stands for the standard Gaussian random variable independent of X. Then, it follows that
H ( X | G ) = 0 I ( X t ) 1 Var ( X t ) d t = 0 I ( X t ) 1 ( σ 2 1 ) e 2 t + 1 d t .
Remark 3.
The Ornstein–Uhlenbeck model can be regarded as the time-dependent dilation of Gaussian perturbation. Thus, we can rewrite formula (57) in terms of Gaussian pertabation. Since the Fisher information behaves for dilation of a random variable as I ( α Y ) = 1 α 2 I ( Y ) , we obtain
I X t = I e 2 t X + 1 e 2 t Z = e 2 t I X + 1 e 2 t e 2 t Z .
Changing the variables by τ = 1 e 2 t e 2 t , the integral on the right-hand side of (56) becomes
0 ( 1 + τ ) I X + τ Z 1 + τ σ 2 + τ d τ 2 ( 1 + τ ) = 1 2 0 I X + τ Z 1 σ 2 + τ d τ .
Hence, we obtain
H ( G ) H ( X ) = 1 2 0 I X + τ Z 1 σ 2 + τ d τ ,
that is,
H ( X ) = 1 2 log ( 2 π σ 2 ) 1 2 0 I X + τ Z 1 σ 2 + τ d τ ,
which is the known integral representation for the entropy of a random variable X by the Fisher information of Gaussian perturbation derived by Barron in Section 2 in [2].

4. Some Numerical Examples

In this section, we will give numerical examples for the case of the Ornstein–Uhlenbeck flows, which are given by the potential V = x 2 2 , and, hence, have the standard Gaussian equilibrium. As we mentioned in (18), the densities of the flows at time t can be calculated analytically by convolution of the scaled initial measures.
Example 1.
In the first numerical example, we take the uniform distribution on the interval ( 1 , 1 ) as the initial objective measure. Namely, the density f 0 ( x ) = u ( x ) is given by
u ( x ) = 1 2 ( 1 x 1 ) , 0 otherwise ,
which has the mean 0 and the variance 1 3 . We set the centered Gaussian of variance 1 3 as the initial reference measure. Namely, g 0 ( x ) = φ ( x , 1 3 ) , where the function φ is defined by
φ ( x , σ 2 ) = 1 2 π σ 2 exp x 2 2 σ 2 .
Here, we can calculate the densities f t and g t of the Ornstein–Uhlenbeck flows at time t analytically as follows: we rescale the time parameter t as τ = e 2 t and put
u τ ( x ) = 1 τ u x τ , φ τ ( x ) = 1 1 τ 2 φ x 1 τ 2 , 1 .
Then, f t and g t are given by
f t ( x ) = u τ φ τ ( x ) = u τ ( y ) φ τ ( x y ) d y = 1 2 τ 1 τ 2 τ τ φ x y 1 τ 2 , 1 d y ,
and
g t ( x ) = φ x , 1 2 3 τ 2 ,
respectively. Now, we shall illustrate the convergence of H f t | g t and the upper bound of the right-hand side in (47) numerically by graphs. We claim that the constant M ¯ can be assumed to be 1 because, in our assumptions (20), (21), and (22), the relative densities converge uniformly to 1. The convergence of H f t | g t is illustrated in Figure 1.
In Figure 2, the dashed curve indicates the convergence of H f t | g t in Figure 1.
Example 2.
In the second example, we put the initial reference measure as g 0 ( x ) = φ ( x , 3 ) , that is, we take the centered Gaussian of variance 3 as the initial reference measure, but f 0 ( x ) = u ( x ) is unchanged. In Figure 3, the convergence of H f t | g t is illustrated.
In Figure 4, the dashed curve indicates the convergence of H f t | g t in Figure 3.
Example 3.
In the third numerical example, the initial objective and the initial reference measures are given as the uniform distributions on the intervals ( 1 , 1 ) and ( 1 2 , 1 2 ) , respectively. Namely, we set the densities as f 0 ( x ) = u ( x ) and g 0 = 2 u ( 2 x ) . We illustrate the convergence of H f t | g t in Figure 5.
In Figure 6, the dashed curve indicates the convergence of H f t | g t in Figure 5.

5. Conclusions

The partial differential equation of Fokker–Planck describes the flow of the probability measures for diffusion process. The diffusion by the Fokker–Planck equation with the strictly convex potential V has the long-time asymptotic stationary measure e V . In the case of the relative entropy endowed with the stationary measure e V as reference, the dissipation formula of the relative entropy of the diffusion flow by the Fokker–Planck equation with the potential V is known in literature.
In this paper, we have derived the similar dissipation formula under the more flexible situation. Namely, we have considered the situation that the reference measure is also evolved by the Fokker–Planck equation with the same potential function V for the objective measure. Then, we have obtained another integral representation of the relative entropy, which gives an extension of the formula of the entropy gap.

Acknowledgments

The author is grateful for comments and suggestions by anonymous reviewers. In particular, the numerical simulations are based on their valuable comments.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Contr. 1959, 2, 101–112. [Google Scholar] [CrossRef]
  2. Barron, A. Entropy and the central limit theorem. Ann. Probab. 1986, 14, 336–342. [Google Scholar] [CrossRef]
  3. Villani, C. Topics in Optimal Transportation; American Mathematical: Providence, RI, USA, 2003. [Google Scholar]
  4. Verdú, S. Mismatched estimation and relative entropy. IEEE Trans. Inf. Theory 2010, 56, 3712–3719. [Google Scholar] [CrossRef]
  5. Hirata, M.; Nemoto, A.; Yoshida, H. An integral representation of the relative entropy. Entropy 2012, 14, 1469–1477. [Google Scholar] [CrossRef]
  6. Arnold, A.; Markowich, P.; Toscani, G.; Unterreiter, A. On convex Sobolev inequalities and the rate of convergence to equilibrium for Fokker–Planck type equations. Commum. Partial Differ. Equ. 2001, 26, 43–100. [Google Scholar] [CrossRef]
  7. Bakry, D.; Émery, M. Diffusions hypercontractives. In Séminar de Probabilités XIX; Springer: Berlin/Heidelberg, Germany, 1985; pp. 177–206. [Google Scholar]
  8. Otto, F.; Villani, C. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 2000, 173, 361–400. [Google Scholar] [CrossRef]
  9. Gross, L. Logarithmic Sobolev inequalities. Am. J. Math. 1975, 97, 1061–1083. [Google Scholar] [CrossRef]
  10. Carlen, E. Superadditivity of Fisher’s information and logarithmic Sobolev inequalities. J. Funct. Anal. 1991, 101, 194–211. [Google Scholar] [CrossRef]
  11. Talagrand, M. Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 1996, 6, 587–600. [Google Scholar] [CrossRef]
  12. Csiszar, I. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
  13. Csiszar, I.; Korner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  14. Yanomo, T. De Bruijn-type identity for systems with flux. Eur. J. Phys. B 2013, 86, 363. [Google Scholar] [CrossRef]
  15. Yanomo, T. Phase space gradient of dissipated work and information: A role of relative Fisher information. J. Math. Phys. 2013, 54, 113301. [Google Scholar] [CrossRef]
  16. Daffertshofer, A.; Plastino, A.R.; Plastino, A. Classical No-Cloning Theorem. Phys. Rev. Lett. 2002, 88, 210601. [Google Scholar] [CrossRef] [PubMed]
  17. Plastino, A.R.; Daffertshofer, A. Liouville Dynamics and the Conservation of Classical Information. Phys. Rev. Lett. 2004, 93, 138701. [Google Scholar] [CrossRef] [PubMed]
  18. Carlen, E.A.; Soffer, A. Entropy production by block variable summation and central limit theorems. Commun. Math. Phys. 1991, 140, 339–371. [Google Scholar] [CrossRef]
Figure 1. The value of H f t | g t .
Figure 1. The value of H f t | g t .
Entropy 19 00009 g001
Figure 2. The value of H f t | e V + 2 H g t | e V .
Figure 2. The value of H f t | e V + 2 H g t | e V .
Entropy 19 00009 g002
Figure 3. The value of H f t | g t .
Figure 3. The value of H f t | g t .
Entropy 19 00009 g003
Figure 4. The value of H f t | e V + 2 H g t | e V .
Figure 4. The value of H f t | e V + 2 H g t | e V .
Entropy 19 00009 g004
Figure 5. The value of H f t | g t .
Figure 5. The value of H f t | g t .
Entropy 19 00009 g005
Figure 6. The value of H f t | e V + 2 H g t | e V .
Figure 6. The value of H f t | e V + 2 H g t | e V .
Entropy 19 00009 g006

Share and Cite

MDPI and ACS Style

Yoshida, H. A Dissipation of Relative Entropy by Diffusion Flows. Entropy 2017, 19, 9. https://doi.org/10.3390/e19010009

AMA Style

Yoshida H. A Dissipation of Relative Entropy by Diffusion Flows. Entropy. 2017; 19(1):9. https://doi.org/10.3390/e19010009

Chicago/Turabian Style

Yoshida, Hiroaki. 2017. "A Dissipation of Relative Entropy by Diffusion Flows" Entropy 19, no. 1: 9. https://doi.org/10.3390/e19010009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop