A Dissipation of Relative Entropy by Diffusion Flows

Yoshida, Hiroaki

doi:10.3390/e19010009

Open AccessArticle

A Dissipation of Relative Entropy by Diffusion Flows

by

Hiroaki Yoshida

Department of Information Sciences, Ochanomizu University, 2-1-1, Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan

Entropy 2017, 19(1), 9; https://doi.org/10.3390/e19010009

Submission received: 5 October 2016 / Revised: 18 December 2016 / Accepted: 22 December 2016 / Published: 27 December 2016

(This article belongs to the Special Issue Entropic Aspects of Nonlinear Partial Differential Equations: Classical and Quantum Mechanical Perspectives)

Download

Browse Figures

Versions Notes

Abstract

:

Given a probability measure, we consider the diffusion flows of probability measures associated with the partial differential equation (PDE) of Fokker–Planck. Our flows of the probability measures are defined as the solution of the Fokker–Planck equation for the same strictly convex potential, which means that the flows have the same equilibrium. Then, we shall investigate the time derivative for the relative entropy in the case where the object and the reference measures are moving according to the above diffusion flows, from which we can obtain a certain dissipation formula and also an integral representation of the relative entropy.

Keywords:

relative entropy; relative Fisher information; Fokker–Planck equation; entropy dissipation; entropy gap

1. Introduction

We shall begin with the definitions and fix some notations. Several results in the literature that we will use later are also gathered in this section. Probability measures on

R^{n}

in this paper are always assumed to be absolutely continuous with respect to the Lebesgue measure. Thus, we say that a probability measure μ has the continuous density f, which means

d μ (x) = f (x) d x

, and the measure μ is sometimes identified with its density f. Throughout this paper, if we simply write an integral symbol

\int

without specifying any domain, which means

\int_{R^{n}}

is the integral over the whole space on

R^{n}

.

Definition 1.

For a probability measure μ on

R^{n}

with the density f, the entropy of μ (or f) is defined by

H (μ) (= H (f)) = - \int f log f d x,

(1)

and, if the density f is smooth, then we define the Fisher information of μ (or f) as

I (μ) (= I (f)) = \int \frac{∥ \nabla f ∥^{2}}{f} d x = \int ∥ \nabla log f ∥^{2} f d x .

(2)

For an

R^{n}

-valued random variable X, if X is distributed according to the probability measure μ, then we define the entropy

H (X)

and the Fisher information

I (X)

of X by

H (X) = H (μ)

and

I (X) = I (μ)

, respectively.

In a one-dimensional case, the gradient

\nabla log f

in (2) is usually called the score function of X (or μ) and denoted by

ρ_{X}

(or

ρ_{μ}

). For a differentiable function ξ with bounded derivative, the score function behaves that

\int_{- \infty}^{\infty} ξ (x) ρ_{μ} (x) d μ (x) = - \int_{- \infty}^{\infty} ξ^{'} (x) d μ (x),

(3)

which is known as Stein’s identity.

Let X be an

R^{n}

-valued random variable distributed according to the probability measure μ, and Z be an n-dimensional standard (with mean vector

0

and identity covariance matrix

I_{n}

) Gaussian random variable independent of X. Then, for

τ > 0

, the independent sum

X + \sqrt{τ} Z

is called the Gaussian perturbation of X.

We denote by

μ_{τ}

the probability measure corresponding to the Gaussian perturbation

X + \sqrt{τ} Z

, and

f_{τ}

stands for the density of

μ_{τ}

. It is fundamental that the density function

f_{τ}

satisfies the heat equation:

\frac{\partial}{\partial τ} f_{τ} = \frac{1}{2} Δ f_{τ},

(4)

where

Δ = \nabla \cdot \nabla

is the Laplacian operator.

The remarkable relationship between the entropy and the Fisher information can be established by the Gaussian perturbation as follows (see, for instance, [1] or [2]), which is known as the de Bruijn identity.

Lemma 1.

Let X be an

R^{n}

-valued random variable distributed according to the probability measure μ. Then, for the Gaussian perturbation, it holds that

\frac{d}{d τ} H (X + \sqrt{τ} Z) = \frac{1}{2} I (X + \sqrt{τ} Z) f o r τ > 0 .

(5)

Namely, using the density

f_{τ}

of the Gaussian perturbed measure

μ_{τ}

, we can write

\frac{d}{d τ} H (f_{τ}) = \frac{1}{2} I (f_{τ}) f o r τ > 0 .

(6)

Definition 2.

Let μ and ν be probability measures on

R^{n}

with

μ ≪ ν

(μ is absolutely continuous with respect to ν). We denote the probability density functions of μ and ν by f and g, respectively. Then, as the ways of indicating the difference between two measures, we shall introduce the following quantities: the relative entropy

H (μ | ν)

of μ with respect to ν,

H (f | g)

of f with respect to g, is defined by

H (μ | ν) (= H (f | g)) = \int (log \frac{f}{g}) f d x .

(7)

Although it does not appear to have received widespread attention, it is natural to define the relative Fisher information

I (μ | ν)

of μ with respect to ν,

I (f | g)

of f with respect to g as (see, for instance, [3])

I (μ | ν) (= I (f | g)) = \int | | \nabla (log \frac{f}{g}) | |^{2} f d x = 4 \int | | \nabla \sqrt{\frac{f}{g}} | |^{2} g d x,

(8)

where the relative density

f / g

is assumed to be sufficiently smooth such that the above expressions make sense.

The relative entropy

H (f | g)

and the relative Fisher information

I (f | g)

take non-negative values and 0 if and only if

f (x) = g (x)

for almost all

x \in R^{n}

. Similar to Definition 1, for random variables X and Y with the distributions μ and ν, the relative entropy and the relative Fisher information of X with respect to Y are defined as

H (X | Y) = H (μ | ν)

and

I (X | Y) = I (μ | ν)

, respectively.

In view of the de Bruijn identity, one might expect that there is a similar connection between the relative entropy and the relative Fisher information. Indeed, Vérdu in [4] investigated the derivative in τ of

H (X + \sqrt{τ} Z | Y + \sqrt{τ} Z)

for two Gaussian perturbations, and derived the following identity of the de Bruijn type via minimum mean-square error (MMSE) in estimation theory.

Lemma 2.

Let X and Y be

R^{n}

-valued random variables distributed according to the probability measure μ and ν, respectively. Then, for the Gaussian perturbations, it holds that

\frac{d}{d τ} H (X + \sqrt{τ} Z | Y + \sqrt{τ} Z) = - \frac{1}{2} I (X + \sqrt{τ} Z | Y + \sqrt{τ} Z) f o r τ > 0,

(9)

that is,

\frac{d}{d τ} H (μ_{τ} | ν_{τ}) = - \frac{1}{2} I (μ_{τ} | ν_{τ}) f o r τ > 0,

(10)

where

μ_{τ}

and

ν_{τ}

are the corresponding measures of the Gaussian perturbations.

An alternative proof of this identity by direct calculation with integrations by part has been given in [5]. It should be noted here that the reference measure does move by the same heat equation in the formula of Lemma 1.

Other derivative formulas of the relative entropy have been investigated in [6,7,8], which are closely related to the theory of optimal transport and functional inequalities of informations. It is common in these fields that the reference measure is unchanged with the equilibrium measure. Here, we shall recall such a derivative formula and will list some useful related results.

Let V be a

C^{1}

map on

R^{n}

and consider the probability measure κ by

d κ (x) = \frac{1}{Z} e^{- V (x)} d x,

(11)

where

Z = \int e^{- V (x)} d x

, the normalization constant. Such a probability measure κ is called the equilibrium (or Gibbs) measure for the potential function V.

Given a probability measure

μ_{0}

, we consider the diffusion flow of probability measures

{(μ_{t})}_{t \geq 0}

associated with the gradient

\nabla V

, that is, the density

f_{t}

of the measure

μ_{t}

(

t > 0

) is defined as the solution to the partial differential equation:

\frac{\partial}{\partial t} f_{t} = \nabla \cdot (\nabla f_{t} + f_{t} \nabla V),

(12)

which is called the Fokker–Planck equation. It is easily found that the long-time asymptotically stationary measure for Fokker–Planck Equation (12) is given by the above equilibrium (Gibbs) measure.

Setting the equilibrium measure as the reference, we can understand the relationship between the relative entropy and the relative Fisher information via the Fokker–Planck equation as follows (see, for instance, [8]):

Proposition 1.

Let

{(μ_{t})}_{t \geq 0}

be a diffusion flow of the probability measure associated with the gradient

\nabla V

, and let κ be the equilibrium measure for the potential function V. Then, it holds the differential formula:

\frac{d}{d t} H (μ_{t} | κ) = - I (μ_{t} | κ) .

(13)

Definition 3.

A

C^{2}

function V on

R^{n}

is called strictly K-convex if there exists a positive constant

K > 0

such that

Hess (V) \geq K I_{n}

.

In the case where the potential function V has the above convexity, we can obtain the inequality between the relative entropy and the relative Fisher information with respect to the equilibrium measure for the potential V, which is known as the logarithmic Sobolev inequality (see, for instance, [7,9,10]).

Theorem 1.

Let κ be the equilibrium measure for the potential function V. If the potential function V is strictly K-convex, then it follows that, for any probability measure

μ (≪ κ)

,

H (μ | κ) \leq \frac{1}{2 K} I (μ | κ) .

(14)

Combining Proposition 1 and Theorem 1, we can obtain the following convergence of the diffusion flow to the equilibrium:

Proposition 2.

Let

{(μ_{t})}_{t \geq 0}

be the diffusion flow of probability measures by the Fokker–Planck equation associated with the strictly K-convex potential V, and κ be the equilibrium for the potential V. Then, it follows that

\frac{d}{d t} H (μ_{t} | κ) \leq - 2 K H (μ_{t} | κ),

(15)

which implies that

μ_{t}

converges exponentially fast, as

t \to \infty

, to the equilibrium κ in the relative entropy. Namely,

H (μ_{t} | κ) \leq e^{- 2 K t} H (μ_{0} | κ) f o r t > 0 .

(16)

The diffusion flow by the quadratic potential

V (x) = \frac{∥ x ∥^{2}}{2}

is called the Ornstein–Uhlenbeck flow, and the corresponding Fockker–Planck equation is reduced to

\frac{\partial}{\partial t} f_{t} = \nabla \cdot (\nabla f_{t} + f_{t} x) for t > 0 .

(17)

In this case, we can obtain the explicit solution

f_{t}

, and it follows that the equilibrium measure becomes the standard Gaussian.

Furthermore, it is known that the solution to Equation (17) can be represented in terms of random variables as follows: let X be a random variable on

R^{n}

having the initial density

f_{0}

, and Z be an n-dimensional standard Gaussian random variable independent of X. Then, the density function of the independent sum

X_{t} = \sqrt{e^{- 2 t}} X + \sqrt{1 - e^{- 2 t}} Z for t > 0

(18)

gives the solution

f_{t}

to partial differential Equation (17). Since the Ornstein–Uhlenbeck flow has Gaussian equilibrium, it has been widely used as the technical tool for the proof of Gross’s logarithmic Sobolev inequality [9] and Talagrand inequality [11].

Here, we shall mention one more useful result concerned with the convergence of the relative entropy, which is called the Csiszár–Kullback–Pinsker inequality (see, for instance, [12] or [13]).

Lemma 3.

The convergence in the relative entropy is stronger than in

L^{1}

-norm, that is, it holds for probability densities that

\frac{1}{2} {(\int | f (x) - g (x) | d x)}^{2} \leq H (f | g) .

(19)

The problem of finding the time derivative of the relative entropy between two densities under the same continuity equation has been investigated in [14,15]. In this paper, we will treat the Focker–Planck equation with strictly convex potential as our continuity equation, which is because the first natural extension of the heat equation and the similar dissipation formula in Lemma 2 of Vérdu can be derived by the fundamental method, the integration by parts like in [5].

The time integration of our formula will give an integral representation of the relative entropy. Applying this representation to the Ornstein–Uhlenbeck flows, we can give an extension of the formula for entropy gap.

2. Dissipation of the Relative Entropy

We will calculate the time derivative of the relative entropy for the case where the objective and the reference measures are evolved by the Fokker–Planck equation with the same strictly convex potential. We shall begin with describing our situation of calculation precisely.

• Situation A:

Let

μ_{0}

and

ν_{0}

be Lebesgue absolutely continuous probability measures on

R^{n}

with

μ_{0} ≪ ν_{0}

, and let

μ_{t}

and

ν_{t}

(

t \geq 0

) be the diffusion flows by the Fokker–Planck equation with the strictly K-convex potential function V starting from

μ_{0}

and

ν_{0}

, respectively. Here, the growth rate of the potential function V is assumed to be at most polynomial.

We assume that, for

t \geq 0

, the measures

μ_{t}

and

ν_{t}

have finite Fisher information

I (μ_{t}) < \infty

and

I (ν_{t}) < \infty

, and are absolutely continuous with respect to the Lebesgue measure, the densities

f_{t}

and

g_{t}

of which are sufficiently smooth and rapidly decreasing at infinity. Furthermore, it is naturally required that

μ_{t} ≪ ν_{t}

.

Here, we shall impose the following assumption on the relative densities, which does not cause any loss of generality but for simplicity of the proof.

• Assumption on the relative densities D:

Let

d κ (x) = e^{- V (x)} d x

be the equilibrium measure of the potential function V, where the potential function V is normalized (shifted) so that

Z = 1

. The Fokker–Planck equation will not be made of any effect by this normalization (shift) because it depends only on the gradient

\nabla V

.

We may assume that the relative densities

\frac{f_{t} (x)}{e^{- V (x)}}

and

\frac{g_{t} (x)}{e^{- V (x)}}

are bounded away from zero and infinity for sufficiently large t. Namely, there exist uniform constants

0 < m_{1} \leq M_{1} < \infty

and

0 < m_{2} \leq M_{2} < \infty

for sufficiently large t such that

m_{1} \leq \frac{f_{t} (x)}{e^{- V (x)}} \leq M_{1} for x \in R^{n}

(20)

and

m_{2} \leq \frac{g_{t} (x)}{e^{- V (x)}} \leq M_{2} for x \in R^{n} .

(21)

Hence, the relative density

\frac{f_{t}}{g_{t}}

is also bounded away from zero and infinity for sufficiently large t, that is, there exist uniform constants

0 < m_{0} \leq M_{0} < \infty

such that

m_{0} \leq \frac{f_{t} (x)}{g_{t} (x)} \leq M_{0} for x \in R^{n},

(22)

for sufficiently large t.

Remark 1.

The above technical assumptions on the relative densities are due to the non-linear approximation argument given by Otto and Villani in [8] and the following fact: in our situation, it follows that the density

f_{t}

of the diffusion flow of probability measure by the Fokker–Planck equation converges to the equilibrium

e^{- V}

in

L^{1}

-norm as

t \to \infty

by combining Proposition 2 with Lemma 3—so does

g_{t}

.

Proposition 3.

Let

μ_{t}

and

ν_{t}

(t \geq 0)

be the flows of the probability measures on

R^{n}

by the Fokker–Planck equation as in Situation A with the assumptions on the relative densities D. Then, it holds that, for

t > 0

,

\frac{d}{d t} H (μ_{t} | ν_{t}) = - I (μ_{t} | ν_{t}) .

(23)

Proof.

We expand the derivative of the relative entropy as

\frac{d}{d t} H (f_{t} | g_{t}) = \frac{d}{d t} \int (log f_{t}) f_{t} d x - \frac{d}{d t} \int (log g_{t}) f_{t} d x .

(24)

Since we know that

f_{t}

and

g_{t}

converge to the equilibrium

e^{- V}

in

L^{1}

, and that the time derivatives

\partial_{t} f_{t}

and

\partial_{t} g_{t}

are converging to 0 as

t \to \infty

, and the densities

f_{t}

,

g_{t}

and

| \partial_{t} f_{t} |

,

| \partial_{t} g_{t} |

are uniformly bounded for t. Furthermore, by our assumptions on the relative density,

\frac{f_{t}}{g_{t}}

is bounded away from zero and infinity. Hence, we are allowed to exchange integration and t-differentiation, which is justified by a routine argument with the bounded convergence theorem (see, for instance, [2] and also [8]).

Then, the first term on the right-hand side of (24) is calculated with the Fokker–Planck equation as follows:

\begin{matrix} \frac{d}{d t} \int (log f_{t}) f_{t} d x & = \int \frac{\partial_{t} f_{t}}{f_{t}} f_{t} d x + \int (log f_{t}) (\partial_{t} f_{t}) d x \\ = \int \partial_{t} f_{t} d x + \int (log f_{t}) (Δ f_{t} + \nabla \cdot (f_{t} \nabla V)) d x \\ = \underset{(I)}{\underset{︸}{\partial_{t} \int f_{t} d x}} + \underset{(II)}{\underset{︸}{\int (log f_{t}) Δ f_{t} d x}} + \underset{(III)}{\underset{︸}{\int (log f_{t}) \nabla \cdot (f_{t} \nabla V) d x}} . \end{matrix}

(25)

The integral (I) in (25) is clearly 0. By applying integration by parts, the integral (II) can be written by

(II) = - \int \nabla log f_{t} \cdot \nabla f_{t} d x = - \int | | \frac{\nabla f_{t}}{f_{t}} | |^{2} f_{t} d x .

(26)

Here, it should be noted that

(log f_{t}) \nabla f_{t}

will vanish at infinity by the following observation: if we factorize it as

(log f_{t}) \nabla f_{t} = 2 (\sqrt{f_{t}} log \sqrt{f_{t}}) \frac{\nabla f_{t}}{\sqrt{f_{t}}},

(27)

then, as

μ_{t}

has finite Fisher information

I (μ_{t}) < \infty

,

\frac{\nabla f_{t}}{\sqrt{f_{t}}}

has finite

L^{2}

-norm in

L^{2} (R^{n}, d x)

and must be bounded at infinity. Furthermore,

\sqrt{f_{t}} log \sqrt{f_{t}}

will vanish at infinity by the limit formula,

lim_{ξ \to + 0} ξ log ξ = 0

.

The integral (III) in (25) becomes

(III) = - \int \nabla f_{t} \cdot \nabla V d x,

(28)

by the following observations: since

f_{t}

is rapidly decreasing at infinity and the growth rate of the potential function V is at most polynomial by our assumption, we have

lim_{| x | \to \infty} \sqrt{f_{t}} \nabla V = 0 .

The limit

lim_{| x | \to \infty} \sqrt{f_{t}} log \sqrt{f_{t}} = 0

is the same as above. Thus,

(log f_{t}) f_{t} \nabla V = 2 (\sqrt{f_{t}} log \sqrt{f_{t}}) (\sqrt{f_{t}} \nabla V)

(29)

will vanish at infinity.

Substituting (26) and (28) into (25), we can obtain

\frac{d}{d t} \int (log f_{t}) f_{t} d x = - \int | | \frac{\nabla f_{t}}{f_{t}} | |^{2} f_{t} d x - \int \nabla f_{t} \cdot \nabla V d x .

(30)

Next, we shall see the second term on the right-hand side of (24), which can be reformulated by the Fokker–Planck equation as follows:

\begin{matrix} - \frac{d}{d t} \int (log g_{t}) f_{t} d x & = - \int \frac{\partial_{t} g_{t}}{g_{t}} f_{t} d x - \int (log g_{t}) (\partial_{t} f_{t}) d x \\ = - \int \frac{Δ g_{t} + \nabla \cdot (g_{t} \nabla V)}{g_{t}} f_{t} d x - \int (log g_{t}) (Δ f_{t} + \nabla \cdot (f_{t} \nabla V)) d x \\ = \underset{(IV)}{\underset{︸}{- \int (Δ g_{t}) \frac{f_{t}}{g_{t}} d x}} \underset{(V)}{\underset{︸}{- \int \nabla \cdot (g_{t} \nabla V) \frac{f_{t}}{g_{t}} d x}} \\ \underset{(VI)}{\underset{︸}{- \int (log g_{t}) Δ f_{t} d x}} \underset{(VII)}{\underset{︸}{- \int (log g_{t}) \nabla \cdot (f_{t} \nabla V) d x}} . \end{matrix}

(31)

The integral (IV) in (31) can be reformulated by applying integration by parts as follows:

\begin{matrix} (IV) & = \int \nabla g_{t} \cdot \nabla (\frac{f_{t}}{g_{t}}) d x = \int \nabla g_{t} \cdot \frac{g_{t} (\nabla f_{t}) - f_{t} (\nabla g_{t})}{g_{t}^{2}} d x \\ = \int \frac{\nabla g_{t}}{g_{t}} \cdot \frac{\nabla f_{t}}{f_{t}} f_{t} d x - \int | | \frac{\nabla g_{t}}{g_{t}} | |^{2} f_{t} d x, \end{matrix}

(32)

where we can see that

(\nabla g_{t}) \frac{f_{t}}{g_{t}}

vanishes at infinity by the following observation: factorize it as

(\nabla g_{t}) \frac{f_{t}}{g_{t}} = \frac{\nabla g_{t}}{\sqrt{g_{t}}} \sqrt{\frac{f_{t}}{g_{t}}} \sqrt{f_{t}},

(33)

and then the boundedness of

\sqrt{\frac{f_{t}}{g_{t}}}

is by our assumptions on the relative density and that of

\frac{\nabla g_{t}}{\sqrt{g_{t}}}

comes from the finiteness of Fisher information

I (ν_{t}) < \infty

.

Applying integration by parts again, the integral (V) in (31) becomes

\begin{matrix} (V) & = \int g_{t} \nabla V \cdot \nabla (\frac{f_{t}}{g_{t}}) d x = \int g_{t} \nabla V \cdot \frac{g_{t} (\nabla f_{t}) - f_{t} (\nabla g_{t})}{g_{t}^{2}} d x \\ = \int \nabla V \cdot \nabla f_{t} d x - \int \nabla V \cdot \frac{\nabla g_{t}}{g_{t}} f_{t} d x . \end{matrix}

(34)

Because the growth rate of the function V is at most polynomial and

f_{t}

is rapidly decreasing at infinity, thus

g_{t} \nabla V \frac{f_{t}}{g_{t}} = f_{t} \nabla V

will vanish at infinity.

The integral (VI) in (31) can be reformulated as

(VI) = \int (\nabla log g_{t}) \cdot \nabla f_{t} = \int \frac{\nabla g_{t}}{g_{t}} \cdot \frac{\nabla f_{t}}{f_{t}} f_{t} d x,

(35)

where we can find that

(log g_{t}) \nabla f_{t}

vanishes at infinity by factorizing

(log g_{t}) \nabla f_{t} = 2 (\sqrt{g_{t}} log \sqrt{g_{t}}) \frac{\nabla f_{t}}{\sqrt{f_{t}}} \sqrt{\frac{f_{t}}{g_{t}}},

(36)

with the assumption of

I (μ_{t}) < \infty

and the boundedness of

\frac{f_{t}}{g_{t}}

.

The last integral (VII) in (31) becomes

(VII) = \int (\nabla log g_{t}) \cdot (f_{t} \nabla V) d x = \int \frac{\nabla g_{t}}{g_{t}} \cdot \nabla V f_{t} d x,

(37)

where

(log g_{t}) f_{t} \nabla V

will vanish with the following factorization by the same reasons as above:

(log g_{t}) f_{t} \nabla V = 2 (\sqrt{g_{t}} log \sqrt{g_{t}}) \sqrt{\frac{f_{t}}{g_{t}}} (\sqrt{f_{t}} \nabla V) .

(38)

In reformulation of the integrals (VI) and (VII), we have, of course, used integration by parts.

Substituting the equations from (32) to (37) into (31), we can have

\begin{matrix} - \frac{d}{d t} & \int (log g_{t}) f_{t} d x = - \int | | \frac{\nabla g_{t}}{g_{t}} | |^{2} f_{t} d x + 2 \int \frac{\nabla f_{t}}{f_{t}} \cdot \frac{\nabla g_{t}}{g_{t}} f_{t} d x + \int \nabla V \cdot \nabla f_{t} d x . \end{matrix}

(39)

Finally, combining (30) and (39), we obtain that

\begin{matrix} \frac{d}{d t} H (f_{t} | g_{t}) & = - \int | | \frac{\nabla f_{t}}{f_{t}} | |^{2} f_{t} d x + 2 \int \frac{\nabla f_{t}}{f_{t}} \cdot \frac{\nabla g_{t}}{g_{t}} f_{t} d x - \int | | \frac{\nabla g_{t}}{g_{t}} | |^{2} f_{t} d x \\ = - \int | | \nabla log f_{t} - \nabla log g_{t} | |^{2} f_{t} d x = - I (f_{t} | g_{t}) . \end{matrix}

(40)

☐

Remark 2.

The assumption of dropping surface terms on integrations by parts, that is, vanishing at infinity in the proof of Proposition 3, is rather common in various physics, which has been also repeatedly employed in a series of works by Plastino et al., for instance, in [16,17].

Next, we will see the convergence of the relative entropy for the pair of time evolutes by the same Fokker–Planck equations.

Proposition 4.

Under Situation A with Assumption D, the relative entropy

H (f_{t} | g_{t})

converges exponentially fast to 0 as

t \to \infty

.

Proof.

We first expand the relative entropy

H (f_{t} | g_{t})

as

\begin{matrix} H (f_{t} | g_{t}) & = \int (log f_{t} (x) - log g_{t} (x)) f_{t} (x) d x \\ = \int ((log f_{t} (x) + V (x)) - (log g_{t} (x) + V (x))) f_{t} (x) d x, \end{matrix}

(41)

where

V (x)

is the potential function of the Fokker–Planck equation. Then, we obtain

H (f_{t} | g_{t}) \leq \int (log f_{t} (x) + V (x)) f_{t} (x) d x + \int | log g_{t} (x) + V (x) | f_{t} (x) d x .

(42)

Since the first term on the right-hand side of (42) is the relative entropy

H (f_{t} | e^{- V})

, we concentrate our attention on the second term.

We put the set

P \subset R^{n}

as

P = \{x \in R^{n} : log g_{t} (x) + V (x) \geq 0\}

, and then we have

| log g_{t} (x) + V (x) | = \{\begin{matrix} log (\frac{g_{t} (x)}{e^{- V (x)}}) \leq \frac{g_{t} (x)}{e^{- V (x)}} - 1 & for x \in P, \\ log (\frac{e^{- V (x)}}{g_{t} (x)}) \leq \frac{e^{- V (x)}}{g_{t} (x)} - 1 & for x \in P^{c} = R^{n} ∖ P . \end{matrix}

(43)

Thus, for sufficiently large t, it can be evaluated as follows:

\begin{matrix} \int | log g_{t} (x) & + V (x) | f_{t} (x) d x \\ \leq \int_{P} (\frac{g_{t} (x)}{e^{- V (x)}} - 1) f_{t} (x) d x + \int_{P^{c}} (\frac{e^{- V (x)}}{g_{t} (x)} - 1) f_{t} (x) d x \\ = \int_{P} | g_{t} (x) - e^{- V (x)} | (\frac{f_{t} (x)}{e^{- V (x)}}) d x + \int_{P^{c}} | g_{t} (x) - e^{- V (x)} | (\frac{f_{t} (x)}{g_{t} (x)}) d x \\ \leq M_{1} \int_{P} | g_{t} (x) - e^{- V (x)} | d x + M_{0} \int_{P^{c}} | g_{t} (x) - e^{- V (x)} | d x, \end{matrix}

(44)

where in the last inequality is by virtue of the assumption on the relative densities. Consequently, we can have the estimation that

\int | log g_{t} (x) + V (x) | f_{t} (x) d x \leq \bar{M} \int | g_{t} (x) - e^{- V (x)} | d x,

(45)

with the positive constant

\bar{M} = max {M_{0}, M_{1}}

.

As we have mentioned in Lemma 3 that the relative entropy controls the

L^{1}

-norm, we have

\int | g_{t} (x) - e^{- V (x)} | d x \leq \sqrt{2 H (g_{t} | e^{- V})} .

(46)

Thus, we obtain that, for sufficiently large t,

H (f_{t} | g_{t}) \leq H (f_{t} | e^{- V}) + \bar{M} \sqrt{2 H (g_{t} | e^{- V})} .

(47)

Taking the limit

t \to \infty

, it follows that

H (f_{t} | g_{t}) \to 0

exponentially fast because

H (f_{t} | e^{- V})

and

H (g_{t} | e^{- V})

converge to 0 in exponentially fast with rate

2 K

. ☐

By the dissipation formula in Proposition 3, together with the above convergence, we can obtain the following integral representation of relative entropy.

Theorem 2.

Let

f_{t}

and

g_{t}

(t \geq 0)

be the flows of probability densities on

R^{n}

by the Fokker–Planck equation under situation A with the assumptions on relative densities D. Then, we have the integral representation of the relative entropy

H (f_{0} | g_{0}) = \int_{0}^{\infty} I (f_{t} | g_{t}) d t .

(48)

If we choose particularly the equilibrium

e^{- V}

as the initial measure of the reference

g_{0}

, then it is stationary such that

g_{t} = e^{- V}

(t \geq 0)

. Hence, as the direct consequence of the above theorem, we have the following integral formula:

Corollary 1.

H (f_{0} | e^{- V}) = \int_{0}^{\infty} I (f_{t} | e^{- V}) d t .

(49)

3. An Application to the Entropy Gap

In this section, we shall apply the formula of the time integration in Theorem 2 to the Ornstein–Uhlenbeck flows, which gives an extension of the formula of the entropy gap. For simplicity, we will consider the one-dimensional case in this section.

Among random variables with unit variance, the Gaussian has the largest entropy. Let X be a standardized (mean 0 and variance 1) random variable, and let Z be a standard Gaussian random variable. Then, the quantity

H (Z) - H (X)

is called the entropy gap or the non-Gaussianity, which coincides, of course, with the relative entropy

H (X | Z)

. It is known (see, for instance, [18]) that this entropy gap can be written as the integration of the Fisher information. Namely,

H (X | Z) = \int_{0}^{\infty} (I (X_{t}) - 1) d t,

(50)

where

X_{t}

is the time evolute at t of the random variable X by the Ornstein–Uhlenbeck semigroup in (17). It is easy to find that our formula (48) of Theorem 2 covers (50) as one of the special cases.

In the formula (48) of Theorem 2, even for the case of the quadratic potential

V (x) = \frac{x^{2}}{2}

, we can choose the initial reference measure

ν_{0}

more freely other than the standard Gaussian as we will illustrate below. Let X be a centered random variable of variance

σ^{2}

(not a unit in general) and G be a centered Gaussian of the same variance

σ^{2}

. Then, applying the integral formula with the potential function

V (x) = \frac{x^{2}}{2}

, the relative entropy

H (X | G)

, which is equal to the entropy gap

H (G) - H (X)

, can be written by the integration

H (X | G) = \int_{0}^{\infty} I (X_{t} | G_{t}) d t,

(51)

where

X_{t}

and

G_{t}

are the time evolutes by the Ornstein–Uhlenbeck semigroup of X and G, respectively.

By formula (18),

X_{t}

and

G_{t}

can be given as

\{\begin{matrix} X_{t} & = \sqrt{e^{- 2 t}} X + \sqrt{1 - e^{- 2 t}} Z, \\ G_{t} & = \sqrt{e^{- 2 t}} G + \sqrt{1 - e^{- 2 t}} Z, \end{matrix}

(52)

where Z is a standard Gaussian random variable independent of X and G.

Since the time evolute

G_{t}

becomes a Gaussian random variable of variance

e^{- 2 t} σ^{2} + (1 - e^{- 2 t})

, the score function of which is given by

ρ_{G_{t}} (x) = - \frac{x}{Var (G_{t})} = - \frac{x}{e^{- 2 t} σ^{2} + (1 - e^{- 2 t})},

(53)

and it is easy to find that the time evolute

X_{t}

has the same variance as of

G_{t}

,

Var (X_{t}) = Var (G_{t})

.

We denote by

μ_{t}

the probability distribution of

X_{t}

, and let

ρ_{X_{t}}

and

ρ_{G_{t}}

be the score functions of the random variables

X_{t}

and

G_{t}

, respectively. Then, by direct calculation, we obtain that

\int_{- \infty}^{\infty} x ρ_{X_{t}} (x) d μ_{t} (x) = - 1,

(54)

which corresponds to the special case of the Stein’s identity in (3).

With the above observations, we can reformulate the relative Fisher information

I (X_{t} | G_{t})

as follows:

\begin{matrix} I (X_{t} | G_{t}) & = \int_{- \infty}^{\infty} {(ρ_{X_{t}} (x) - ρ_{G_{t}} (x))}^{2} d μ_{t} (x) \\ = \int_{- \infty}^{\infty} (ρ_{X_{t}} {(x)}^{2} - 2 ρ_{X_{t}} (x) ρ_{G_{t}} (x) + ρ_{G_{t}} {(x)}^{2}) d μ_{t} (x) \\ = \int_{- \infty}^{\infty} ρ_{X_{t}} {(x)}^{2} d μ_{t} (x) + \frac{2}{Var (G_{t})} \int_{- \infty}^{\infty} x ρ_{X_{t}} (x) d μ_{t} (x) + \frac{1}{Var {(G_{t})}^{2}} \int_{- \infty}^{\infty} x^{2} d μ_{t} (x) \\ = I (X_{t}) - \frac{2}{Var (G_{t})} + \frac{Var (X_{t})}{Var {(G_{t})}^{2}} \\ = I (X_{t}) - \frac{1}{Var (X_{t})}, \end{matrix}

(55)

where we have used formula (54) and the fact that

Var (G_{t}) = Var (X_{t})

in the last equality. Now, the following formula can be obtained as a direct consequence of Theorem 2.

Proposition 5.

Let X be a centered random variable of finite variance

σ^{2}

, and let G be a centered Gaussian variable of the same variance

σ^{2}

. For

t \geq 0

, we denote by

X_{t}

the evolute of

X_{0} = X

by the Ornstein–Uhlenbeck semigroup, that is,

X_{t} = \sqrt{e^{- 2 t}} X + \sqrt{1 - e^{- 2 t}} Z,

(56)

where Z stands for the standard Gaussian random variable independent of X. Then, it follows that

H (X | G) = \int_{0}^{\infty} (I (X_{t}) - \frac{1}{Var (X_{t})}) d t = \int_{0}^{\infty} (I (X_{t}) - \frac{1}{(σ^{2} - 1) e^{- 2 t} + 1}) d t .

(57)

Remark 3.

The Ornstein–Uhlenbeck model can be regarded as the time-dependent dilation of Gaussian perturbation. Thus, we can rewrite formula (57) in terms of Gaussian pertabation. Since the Fisher information behaves for dilation of a random variable as

I (α Y) = \frac{1}{α^{2}} I (Y)

, we obtain

I (X_{t}) = I (\sqrt{e^{- 2 t}} X + \sqrt{1 - e^{- 2 t}} Z) = e^{2 t} I (X + \sqrt{\frac{1 - e^{- 2 t}}{e^{- 2 t}}} Z) .

(58)

Changing the variables by

τ = \frac{1 - e^{- 2 t}}{e^{- 2 t}}

, the integral on the right-hand side of (56) becomes

\int_{0}^{\infty} \{(1 + τ) I (X + \sqrt{τ} Z) - \frac{1 + τ}{σ^{2} + τ}\} \frac{d τ}{2 (1 + τ)} = \frac{1}{2} \int_{0}^{\infty} \{I (X + \sqrt{τ} Z) - \frac{1}{σ^{2} + τ}\} d τ .

(59)

Hence, we obtain

H (G) - H (X) = \frac{1}{2} \int_{0}^{\infty} \{I (X + \sqrt{τ} Z) - \frac{1}{σ^{2} + τ}\} d τ,

(60)

that is,

H (X) = \frac{1}{2} log (2 π σ^{2}) - \frac{1}{2} \int_{0}^{\infty} (I (X + \sqrt{τ} Z) - \frac{1}{σ^{2} + τ}) d τ,

(61)

which is the known integral representation for the entropy of a random variable X by the Fisher information of Gaussian perturbation derived by Barron in Section 2 in [2].

4. Some Numerical Examples

In this section, we will give numerical examples for the case of the Ornstein–Uhlenbeck flows, which are given by the potential

V = \frac{x^{2}}{2}

, and, hence, have the standard Gaussian equilibrium. As we mentioned in (18), the densities of the flows at time t can be calculated analytically by convolution of the scaled initial measures.

Example 1.

In the first numerical example, we take the uniform distribution on the interval

(- 1, 1)

as the initial objective measure. Namely, the density

f_{0} (x) = u (x)

is given by

u (x) = \{\begin{matrix} \frac{1}{2} & (- 1 \leq x \leq 1), \\ 0 & otherwise, \end{matrix}

which has the mean 0 and the variance

\frac{1}{3}

. We set the centered Gaussian of variance

\frac{1}{3}

as the initial reference measure. Namely,

g_{0} (x) = φ (x, \frac{1}{3})

, where the function φ is defined by

φ (x, σ^{2}) = \frac{1}{\sqrt{2 π σ^{2}}} exp (- \frac{x^{2}}{2 σ^{2}}) .

Here, we can calculate the densities

f_{t}

and

g_{t}

of the Ornstein–Uhlenbeck flows at time t analytically as follows: we rescale the time parameter t as

τ = \sqrt{e^{- 2 t}}

and put

u_{τ} (x) = \frac{1}{τ} u (\frac{x}{τ}), φ_{τ} (x) = \frac{1}{\sqrt{1 - τ^{2}}} φ (\frac{x}{\sqrt{1 - τ^{2}}}, 1) .

Then,

f_{t}

and

g_{t}

are given by

\begin{matrix} f_{t} (x) & = (u_{τ} * φ_{τ}) (x) = \int_{- \infty}^{\infty} u_{τ} (y) φ_{τ} (x - y) d y \\ = \frac{1}{2 τ \sqrt{1 - τ^{2}}} \int_{- τ}^{τ} φ (\frac{x - y}{\sqrt{1 - τ^{2}}}, 1) d y, \end{matrix}

and

g_{t} (x) = φ (x, 1 - \frac{2}{3} τ^{2}),

respectively. Now, we shall illustrate the convergence of

H (f_{t} | g_{t})

and the upper bound of the right-hand side in (47) numerically by graphs. We claim that the constant

\bar{M}

can be assumed to be 1 because, in our assumptions (20), (21), and (22), the relative densities converge uniformly to 1. The convergence of

H (f_{t} | g_{t})

is illustrated in Figure 1.

In Figure 2, the dashed curve indicates the convergence of

H (f_{t} | g_{t})

in Figure 1.

Example 2.

In the second example, we put the initial reference measure as

g_{0} (x) = φ (x, 3)

, that is, we take the centered Gaussian of variance 3 as the initial reference measure, but

f_{0} (x) = u (x)

is unchanged. In Figure 3, the convergence of

H (f_{t} | g_{t})

is illustrated.

In Figure 4, the dashed curve indicates the convergence of

H (f_{t} | g_{t})

in Figure 3.

Example 3.

In the third numerical example, the initial objective and the initial reference measures are given as the uniform distributions on the intervals

(- 1, 1)

and

(- \frac{1}{2}, \frac{1}{2})

, respectively. Namely, we set the densities as

f_{0} (x) = u (x)

and

g_{0} = 2 u (2 x)

. We illustrate the convergence of

H (f_{t} | g_{t})

in Figure 5.

In Figure 6, the dashed curve indicates the convergence of

H (f_{t} | g_{t})

in Figure 5.

5. Conclusions

The partial differential equation of Fokker–Planck describes the flow of the probability measures for diffusion process. The diffusion by the Fokker–Planck equation with the strictly convex potential V has the long-time asymptotic stationary measure

e^{- V}

. In the case of the relative entropy endowed with the stationary measure

e^{- V}

as reference, the dissipation formula of the relative entropy of the diffusion flow by the Fokker–Planck equation with the potential V is known in literature.

In this paper, we have derived the similar dissipation formula under the more flexible situation. Namely, we have considered the situation that the reference measure is also evolved by the Fokker–Planck equation with the same potential function V for the objective measure. Then, we have obtained another integral representation of the relative entropy, which gives an extension of the formula of the entropy gap.

Acknowledgments

The author is grateful for comments and suggestions by anonymous reviewers. In particular, the numerical simulations are based on their valuable comments.

Conflicts of Interest

The author declares no conflict of interest.

References

Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Contr. 1959, 2, 101–112. [Google Scholar] [CrossRef]
Barron, A. Entropy and the central limit theorem. Ann. Probab. 1986, 14, 336–342. [Google Scholar] [CrossRef]
Villani, C. Topics in Optimal Transportation; American Mathematical: Providence, RI, USA, 2003. [Google Scholar]
Verdú, S. Mismatched estimation and relative entropy. IEEE Trans. Inf. Theory 2010, 56, 3712–3719. [Google Scholar] [CrossRef]
Hirata, M.; Nemoto, A.; Yoshida, H. An integral representation of the relative entropy. Entropy 2012, 14, 1469–1477. [Google Scholar] [CrossRef]
Arnold, A.; Markowich, P.; Toscani, G.; Unterreiter, A. On convex Sobolev inequalities and the rate of convergence to equilibrium for Fokker–Planck type equations. Commum. Partial Differ. Equ. 2001, 26, 43–100. [Google Scholar] [CrossRef]
Bakry, D.; Émery, M. Diffusions hypercontractives. In Séminar de Probabilités XIX; Springer: Berlin/Heidelberg, Germany, 1985; pp. 177–206. [Google Scholar]
Otto, F.; Villani, C. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 2000, 173, 361–400. [Google Scholar] [CrossRef]
Gross, L. Logarithmic Sobolev inequalities. Am. J. Math. 1975, 97, 1061–1083. [Google Scholar] [CrossRef]
Carlen, E. Superadditivity of Fisher’s information and logarithmic Sobolev inequalities. J. Funct. Anal. 1991, 101, 194–211. [Google Scholar] [CrossRef]
Talagrand, M. Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 1996, 6, 587–600. [Google Scholar] [CrossRef]
Csiszar, I. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
Csiszar, I.; Korner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Yanomo, T. De Bruijn-type identity for systems with flux. Eur. J. Phys. B 2013, 86, 363. [Google Scholar] [CrossRef]
Yanomo, T. Phase space gradient of dissipated work and information: A role of relative Fisher information. J. Math. Phys. 2013, 54, 113301. [Google Scholar] [CrossRef]
Daffertshofer, A.; Plastino, A.R.; Plastino, A. Classical No-Cloning Theorem. Phys. Rev. Lett. 2002, 88, 210601. [Google Scholar] [CrossRef] [PubMed]
Plastino, A.R.; Daffertshofer, A. Liouville Dynamics and the Conservation of Classical Information. Phys. Rev. Lett. 2004, 93, 138701. [Google Scholar] [CrossRef] [PubMed]
Carlen, E.A.; Soffer, A. Entropy production by block variable summation and central limit theorems. Commun. Math. Phys. 1991, 140, 339–371. [Google Scholar] [CrossRef]

Figure 1. The value of

H (f_{t} | g_{t})

.

Figure 1. The value of

H (f_{t} | g_{t})

.

Figure 2. The value of

H (f_{t} | e^{- V}) +

\sqrt{2 H (g_{t} | e^{- V})}

.

Figure 2. The value of

H (f_{t} | e^{- V}) +

\sqrt{2 H (g_{t} | e^{- V})}

.

Figure 3. The value of

H (f_{t} | g_{t})

.

Figure 3. The value of

H (f_{t} | g_{t})

.

Figure 4. The value of

H (f_{t} | e^{- V}) +

\sqrt{2 H (g_{t} | e^{- V})}

.

Figure 4. The value of

H (f_{t} | e^{- V}) +

\sqrt{2 H (g_{t} | e^{- V})}

.

Figure 5. The value of

H (f_{t} | g_{t})

.

Figure 5. The value of

H (f_{t} | g_{t})

.

Figure 6. The value of

H (f_{t} | e^{- V}) +

\sqrt{2 H (g_{t} | e^{- V})}

.

Figure 6. The value of

H (f_{t} | e^{- V}) +

\sqrt{2 H (g_{t} | e^{- V})}

.

© 2016 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yoshida, H. A Dissipation of Relative Entropy by Diffusion Flows. Entropy 2017, 19, 9. https://doi.org/10.3390/e19010009

AMA Style

Yoshida H. A Dissipation of Relative Entropy by Diffusion Flows. Entropy. 2017; 19(1):9. https://doi.org/10.3390/e19010009

Chicago/Turabian Style

Yoshida, Hiroaki. 2017. "A Dissipation of Relative Entropy by Diffusion Flows" Entropy 19, no. 1: 9. https://doi.org/10.3390/e19010009

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dissipation of Relative Entropy by Diffusion Flows

Abstract

1. Introduction

2. Dissipation of the Relative Entropy

3. An Application to the Entropy Gap

4. Some Numerical Examples

5. Conclusions

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI