Lagrangian Function on the Finite State Space Statistical Bundle

Pistone, Giovanni

doi:10.3390/e20020139

Open AccessArticle

Lagrangian Function on the Finite State Space Statistical Bundle

by

Giovanni Pistone

De Castro Statistics, Collegio Carlo Alberto, 10122 Torino, Italy

Entropy 2018, 20(2), 139; https://doi.org/10.3390/e20020139

Submission received: 26 December 2017 / Revised: 21 January 2018 / Accepted: 24 January 2018 / Published: 22 February 2018

(This article belongs to the Special Issue Theoretical Aspect of Nonlinear Statistical Physics)

Download Versions Notes

Abstract

:

The statistical bundle is the set of couples (

Q, W

) of a probability density Q and a random variable W such that

𝔼_{Q} [W] = 0

. On a finite state space, we assume Q to be a probability density with respect to the uniform probability and give an affine atlas of charts such that the resulting manifold is a model for Information Geometry. Velocity and acceleration of a one-dimensional statistical model are computed in this set up. The Euler–Lagrange equations are derived from the Lagrange action integral. An example Lagrangian using minus the entropy as potential energy is briefly discussed.

Keywords:

Information Geometry; statistical bundle; Lagrangian function

1. Introduction

The set-up of classical Lagrangian Mechanics is a finite-dimensional Riemannian manifold. For example, see the monographs by V.I. Arnold ([1], Chapters III–IV), R. Abraham and J.E. Mardsen ([2], Chapter 3), J.E. Marsden and T.S. Ratiu ([3], Chapter 7). Classical Information geometry, as it was first defined in the monograph by S.-I. Amari and H. Nagaoka [4], views parametric statistical models as a manifold endowed with a dually-flat connection. In a recent paper, M. Leok and J. Zhang [5] have pointed out the natural relation between these two topics and have given a wide overview of the mathematical structures involved.

In the present paper, we take up the same research program with two further qualifications. First, we assume a non-parametric approach by considering the full set of positive probability functions on a finite set, as it was done, for example, in our review paper [6]. The discussion is restricted here to a finite state space to avoid difficult technical problems. Second, we consider a specific expression of the tangent space of the statistical manifold, which is a Hilbert bundle that we call a statistical bundle. Our aim is to emphasize the basic statistical intuition of the geometric quantities involved. Because of that, we chose to systematically use the language of non-parametric differential geometry as it is developed in the monography of S. Lang [7].

Herein, we use our version of Information Geometry; see the review paper [6]. Preliminary versions of this paper have been presented at the SigmaPhy2017 Conference held in Corfu, Greece, 10–14 July 2017, and at a seminar held at Collegio Carlo Alberto, Moncalieri, on 5 September 2017. In these early versions, we did not refer to Leok and Zhang’s work, which we were unaware of at that time.

In Section 2, we review the definition and properties of the statistical bundle, and of the affine atlas that endows it with both a manifold structure and a natural family of transports between the fibers. In Section 3, we develop the formalism of the tangent space of the statistical bundle and derive the expression of the velocity and the acceleration of a one-dimensional statistical model in the given affine atlas. The derivation of the Euler–Lagrange equations, together with a relevant example, is discussed in Section 4.

2. Statistical Bundle

We consider a finite sample space

Ω

, with

# Ω = N

. The probability simplex is

Δ (Ω)

, and

Δ^{\circ} (Ω)

is its interior. The uniform probability on

Ω

is denoted as

μ

,

μ (x) = \frac{1}{N}

,

x \in Ω

. The maximal exponential family

ε (μ)

is the set of all strictly positive probability densities of

(Ω, μ)

. The expected value of

f : Ω \to R

with respect to the density

P \in ε (μ)

is denoted

𝔼_{P} [f] = 𝔼_{μ} [f P] = \frac{1}{N} \sum_{x \in Ω} f (x) P (x)

.

In [6,8,9], we made the case for the statistical bundle being the key structure of Information Geometry. The statistical bundle with base

Ω

is

S ε (μ) = \{(Q, V) | Q \in ε (μ), 𝔼_{Q} [V] = 0\} .

The statistical bundle is a semi-algebraic subset of

R^{2 N}

; i.e., it is defined by algebraic equations and strict inequalities. It is trivially a real manifold. At each

Q \in ε (μ)

, the fiber

S_{Q} ε (μ)

is endowed with the scalar product

(V_{1}, V_{2}) \mapsto {〈V_{1}, V_{2}〉}_{Q} = 𝔼_{Q} [V_{1} V_{2}] = {Cov}_{Q} (V_{1}, V_{2}) .

To this structure we add a special affine atlas of charts in order to show a structure of affine manifold, which is of interest in the statistical applications. The exponential atlas of the statistical manifold

S ε (μ)

is the collection of charts given for each

P \in ε (μ)

by

s_{P} : S ε (μ) ∋ (Q, V) \mapsto (s_{P} (Q),^{e} U_{Q}^{P} V) \in S_{P} ε (μ) \times S_{P} ε (μ),

(1)

where (with a slight abuse of notation)

s_{P} (Q) = log \frac{Q}{P} - 𝔼_{P} [log \frac{Q}{P}],^{e} U_{Q}^{P} V = V - 𝔼_{P} [V] .

(2)

As

s_{P} (P, V) = (0, V)

, we say that

s_{P}

is the chart centered at P. If

s_{P} (Q) = U

, it is easy to derive the exponential form of Q as a density with respect to P; namely,

Q = e^{U - 𝔼_{P} [log \frac{Q}{P}]} \cdot P

. As

𝔼_{μ} [Q] = 1

, then

1 = 𝔼_{p} [e^{U - 𝔼_{P} [log \frac{P}{Q}]}] = 𝔼_{p} [e^{U}] e^{- 𝔼_{P} [log \frac{P}{Q}]}

, so that the cumulant function

K_{P}

is defined on

S_{P} ε (μ)

by

K_{p} (U) = log 𝔼_{P} [e^{U}] = 𝔼_{P} [log \frac{P}{Q}] = D (P ∥ Q);

that is,

K_{P} (V)

is the expression in the chart at P of Kullback–Leibler divergence of

Q \mapsto D (P ∥ Q)

, and we can write

Q = e^{U - K_{P} (U)} \times P = e_{P} (U) .

The patch centered at P is

s_{P}^{- 1} = e_{P} : {(S_{P} ε (μ))}^{2} ∋ (U, W) \mapsto (e_{P} (U),^{e} U_{P}^{e_{P} (U)} W) \in S ε (μ) .

In statistical terms, the random variable

log (Q / P)

is the relative point-wise information about Q relative to the reference P, while

s_{P} (Q)

is the deviation from its mean value at P. The expression of the other divergence in the chart centered at P is

D (Q ∥ P) = 𝔼_{Q} [log \frac{Q}{P}] = 𝔼_{Q} [U - K_{P} (U)] = 𝔼_{Q} [U] - K_{P} (U) .

The equation above shows that the two divergences are convex conjugate functions in the proper charts; see [10].

The transition maps of the exponential atlas in Equations (1) and (2) are

\begin{matrix} s_{P_{2}} \circ e_{P_{1}} (U, W) = s_{P_{2}} (e_{P_{1}} (U),^{e} U_{P}^{e_{1} P (U)} W) = s_{P_{2}} (e^{U - K_{P_{1}} (U)} \times P_{1}, W - 𝔼_{e_{P_{1}} (U)} [W]) = \\ (U - K_{P_{1}} (U) + log \frac{P_{1}}{P_{2}} - 𝔼_{P_{2}} [U - K_{P_{1}} (U) + log \frac{P_{1}}{P_{2}}], W - 𝔼_{e_{P_{1}} (U)} [W] - 𝔼_{P_{2}} [W - 𝔼_{e_{P_{1}} (U)} [W]]) = \\ (^{e} U_{P_{1}}^{P_{2}} U + s_{P_{2}} (P_{1}),^{e} U_{P_{1}}^{P_{2}} W), \end{matrix}

so that the exponential atlas is indeed affine. Notice that the linear part is

^{e} U_{P_{1}}^{P_{2}}

.

3. The Tangent Space of the Statistical Bundle

Let us compute the expression of the velocity at time t of a smooth curve

t \mapsto γ (t) = (Q (t), W (t)) \in S ε (μ)

in the chart centered at P. The expression of the curve is

γ_{P} (t) = s_{P} (γ (t)) = (s_{P} (Q (t)),^{e} U_{Q (t)}^{P} W (t)),

and hence we have, by denoting the derivative in

R^{N}

by the dot,

\frac{d}{d t} s_{P} (Q (t)) = \frac{d}{d t} (log \frac{Q (t)}{P} - 𝔼_{P} [log \frac{Q (t)}{P}]) = \frac{\dot{Q} (t)}{Q (t)} - 𝔼_{P} [\frac{\dot{Q} (t)}{Q (t)}] =^{e} U_{Q (t)}^{P} \frac{\dot{Q} (t)}{Q (t)},

(3)

and

\frac{d}{d t}^{e} U_{Q (t)}^{P} W (t) = \frac{d}{d t} (W (t) - 𝔼_{P} [W (t)]) = \dot{W} (t) - 𝔼_{P} [\dot{W} (t)] =^{e} U_{Q (t)}^{P} (\dot{W} (t) - 𝔼_{Q (t)} [\dot{W} (t)]) .

(4)

If we define the velocity of

t \mapsto Q (t) = e^{U (t) - K_{p} (U (t))} \times P

to be

\overset{⋆}{Q} (t) = \frac{\dot{Q} (t)}{Q (t)} = \frac{d}{d t} log Q (t) = \dot{U} (t) - d K_{P} (U (t)) [\dot{U} (t)] \in S_{Q (t)} ε (μ),

then

t \mapsto (Q (t), \overset{⋆}{Q} (t))

is a curve in the statistical bundle whose expression in the chart centered at P is

t \mapsto (U (t), \dot{U} (t))

. The velocity as defined above is nothing else as the score function of the one-dimensional statistical model; see e.g., the textbook by B. Efron and T. Haste (Section 4.2, [11]). The variance of the score function (i.e., the squared norm of

\overset{⋆}{Q} (t)

in

S_{Q (t)} ε (μ)

) is classically known as Fisher information at t.

We define the second statistical bundle to be

S^{2} ε (μ) = \{(Q, W, X, Y) | (Q, W) \in S ε (μ), X, Y \in S_{Q} ε (μ)\},

with charts

s_{P} (Q, V, X, Y) = (s_{P} (Q, V),^{e} U_{Q}^{P} X,^{e} U_{Q}^{P} Y),

we can identify the second bundle with the tangent space of the first bundle as follows.

For each curve

t \mapsto γ (t) = (Q (t), W (t))

in the statistical bundle, define its velocity at t to be

\overset{⋆}{γ} (t) = (Q (t), W (t), \overset{⋆}{Q} (t), \dot{W} (t) - 𝔼_{Q (t)} [\dot{W} (t)]),

because

t \mapsto \overset{⋆}{γ} (t)

is a curve in the second statistical bundle, and its expression in the chart at P has the last two components equal to the values given in Equations (3) and (4).

In particular, consider the a curve

t \mapsto χ (t) = (Q (t), \overset{⋆}{Q} (t))

. The velocity is

\overset{⋆}{χ} (t) = (Q (t), \overset{⋆}{Q} (t), \overset{⋆}{Q} (t), \overset{* *}{Q} (t)),

where the acceleration

\overset{* *}{Q} (t)

is

\overset{* *}{Q} (t) = \frac{d}{d t} \frac{\dot{Q} (t)}{Q (t)} - 𝔼_{Q (t)} [\frac{d}{d t} \frac{\dot{Q} (t)}{Q (t)}] = \frac{\ddot{Q} (t)}{Q (t)} - (\overset{⋆}{Q} {(t)}^{2} - 𝔼_{Q (t)} [\overset{⋆}{Q} {(t)}^{2}])

(5)

It should be noted that the acceleration has been defined without explicitly mentioning the relevant connection. In fact, the connection here is implicitly defined by the transports

^{e} U_{P}^{Q}

, which is unusual in Differential Geometry, but is quite natural from the probabilistic point of view; see P. Gibilisco and G. Pistone [12]. We shall see below that the non-parametric approach to Information Geometry allows the definition of a dual transport, hence a dual connection as it was in [4]. Because of that, we could have defined other types of acceleration together with the one we have defined. Namely, we could consider an exponential acceleration

^{e} D^{2} Q (t) = \overset{* *}{Q} (t)

, a mixture acceleration

^{m} D^{2} Q (t) = \ddot{Q} (t) / Q (t)

, and a Riemannian acceleration

^{0} D^{2} Q (t) = \frac{1}{2} (^{e} D^{2} Q (t) +^{m} D^{2} Q (t)) = \frac{\ddot{Q} (t)}{Q (t)} - \frac{1}{2} ({(\frac{\dot{Q} (t)}{Q (t)})}^{2} - 𝔼_{Q (t)} [{(\frac{\dot{Q} (t)}{Q (t)})}^{2}]),

(6)

each acceleration being associated with a specific connection; see the review paper [6]. We do not further discuss the different second-order geometries associated with the statistical bundle in this paper.

Example 1 (Boltzmann–Gibbs).

Let us compare the formalism we have introduced above with standard computations in Statistical Physics. The Boltzmann–Gibbs distribution gives to point

x \in Ω

the probability

e^{- (1 / θ) H (x)} / Z (θ)

, with

Z (θ) = \sum_{x \in Ω} e^{- (1 / θ) H (x)}

and

θ > 0

, see Landau and Lifshitz ([13], Chapter 3). As a curve in

ε (μ)

, it is

Q (θ) = N e^{- (1 / θ) H} / Z (θ)

because of the reference to the uniform probability. The velocity defined above becomes in this case

\overset{⋆}{Q} (θ) = θ^{- 2} (H - 𝔼_{θ} [H])

, while the acceleration of Equation (5) is

\overset{* *}{Q} (θ) = - θ^{- 3} (H - 𝔼_{θ} [H])

. Notice that we have the equation

θ \overset{* *}{Q} (θ) + \overset{⋆}{Q} (θ) = 0

.

Following the original construction of Amari’s Information Geometry [4], we have defined on the statistical bundle a manifold structure which is both an affine and a Riemannian manifold. The base manifold

ε (μ)

is actually a Hessian manifold with respect to any of the convex functions

K_{p} (U) = log 𝔼_{p} [e^{U}]

,

U \in S_{p} ε (μ)

(see [14]). Many computations are actually performed using the Hessian structure. The following equations are easily checked and frequently used:

\begin{matrix} 𝔼_{e_{P} (U)} [H] & = d K_{P} (U) [H]; \end{matrix}

(7)

\begin{matrix} ^{e} U_{P}^{e_{P} (U)} H & = H - d K_{P} (U) [H]; \end{matrix}

(8)

\begin{matrix} d^{2} K_{P} (U) [H_{1}, H_{2}] & = {〈^{e} U_{P}^{e_{P} (U)} H_{1},^{e} U_{P}^{e_{P} (U)} H_{2}〉}_{e_{P} (U)}; \end{matrix}

(9)

\begin{matrix} d^{3} K_{p} (U) [H_{1}, H_{2}, H_{3}] & = 𝔼_{e_{P} (U)} [^{e} U_{P}^{e_{P} (U)} H_{1} \times^{e} U_{P}^{e_{P} (U)} H_{2} \times^{e} U_{P}^{e_{P} (U)} H_{3}] . \end{matrix}

(10)

We have defined a centering operation that can be thought of as a transport among fibers,

^{e} U_{P}^{Q} : S_{p} ε (μ) \to S_{q} ε (μ),

whose adjoint is

^{m} U_{q}^{p} V = \frac{q}{p} V

. In fact, is the adjoint of

^{e} U_{p}^{q}

,

{〈^{e} U_{P}^{Q} U, V〉}_{Q} = 𝔼_{Q} [(U - 𝔼_{Q} [U]) V] = 𝔼_{Q} [U V] = 𝔼_{P} [U (\frac{Q}{P} V)] = {〈U,^{m} U_{Q}^{P} V〉}_{P}

Moreover, iff

U, V \in S_{P} ε (μ)

, then

{〈^{e} U_{P}^{Q} U,^{m} U_{P}^{Q} V〉}_{Q} = {〈^{e} U_{Q}^{P}^{e} U_{P}^{Q} U, V〉}_{P} = {〈U, V〉}_{P} .

Example 2 (Entropy flow).

This example is taken from [8]. In the scalar field

ε (Q) = - 𝔼_{Q} [log Q]

, there is no dependence on the fiber. If

t \mapsto Q (t) = e^{V (t) - K_{P} (V (t))} \cdot P

is a smooth curve in

ε (μ)

expressed in the chart centered at P, then we can write

\begin{matrix} ℋ (Q (t)) = - 𝔼_{Q (t)} [V (t) - K_{P} (V (t)) + log P] = \\ K_{P} (V (t)) - 𝔼_{Q (t)} [V (t) + log P + ℋ (P)] + ℋ (P) = \\ K_{P} (V (t)) - d K_{P} (V (t)) [V (t) + log P + ℋ (P)] + ℋ (P), \end{matrix}

(11)

where the argument of the last expectation belongs to the fiber

S_{P} ε (μ)

and we have expressed the expected value as a derivative by using Equation (7).

Again using Equations (7) and (9), we compute the derivative of the entropy along the given curve as

\begin{matrix} \frac{d}{d t} ℋ (Q (t)) & = \frac{d}{d t} K_{P} (V (t)) - \frac{d}{d t} d K_{P} (V (t)) [V (t) + log P + ℋ (P)] = \\ d K_{P} (V (t)) [\dot{V} (t)] - d^{2} K_{P} (V (t)) [V (t) + log P + ℋ (P), \dot{V} (t)] - d K_{P} (V (t)) [\dot{V} (t)] = \\ - 𝔼_{Q (t)} [^{e} U_{P}^{Q (t)} (V (t) + log P)^{e} U_{P}^{Q (t)} \dot{V} (t)] . \end{matrix}

We use now the equations

V (t) + log P = log Q (t) + K_{P} (V (t)),^{e} U_{P}^{Q (t)} (log Q (t) + K_{P} (V (t))) = log Q (t) + ℋ (Q (t)),

and

^{e} U_{P}^{Q (t)} \dot{V} (t) = \overset{⋆}{Q} (t)

to obtain

\frac{d}{d t} ℋ (Q (t)) = - {〈log Q (t) + ℋ (Q (t)), \overset{⋆}{Q} (t)〉}_{Q (t)} .

We have identified the gradient of the entropy in the statistical bundle,

grad ℋ (Q) = - (log Q + ℋ (Q)) .

(12)

Notice that the previous computation could have been done using the exponential family

Q (t) = e_{P} (t V)

. See the computation of the gradient flow in [8].

In the next section, we extend the computation illustrated in the example above to scalar fields on the statistical bundle.

4. Lagrangian Function

A Lagrangian function is a smooth scalar field on the statistical bundle

L : S ε (μ) ∋ (Q, W) \mapsto L (Q, W) \in R .

At each fixed density

Q \in ε (μ)

, the partial mapping

S_{Q} ε (μ) ∋ W \mapsto L (Q, W)

(13)

is defined on the vector space

S_{q} ε (μ)

; hence, we can use the ordinary derivative, which in this case is called the fiber derivative,

d_{2} L (Q, W) [H_{2}] = {\frac{d}{d t} L (Q, W + t H_{2})|}_{t = 0}, H_{2} \in S_{Q} ε (μ) .

(14)

Example 3 (Running Example 1).

If

L (Q, W) = \frac{1}{2} {〈W, W〉}_{Q} + κ ℋ (Q), κ \geq o,

(15)

then

d_{2} L (Q, W) [H_{2}] = {〈W, H_{2}〉}_{Q}

. The example is suggested by the form of the classical Lagrangian function in mechanics, where the first term is the kinetic energy and

- κ ε (Q)

is the potential energy.

As the statistical bundle

S ε (μ)

is non-trivial, the computation of the partial derivative of the Lagrangian with respect to the first variable requires some care. We want to compute the expression of the total derivative in a chart of the affine atlas defined in Equations (1) and (2).

Let

t \mapsto γ (t) = (Q (t), W (t))

be a smooth curve in the statistical bundle. In the chart centered at P, we have

Q (t) = e^{U (t) - K_{P} (U (t))} \times P = e_{P} (U (t)), W (t) =^{e} U_{P}^{e_{P} (U (t))} V (t),

with

t \mapsto γ_{P} (t) = (U (t), V (t))

being a smooth curve in

{(S_{P} ε (μ))}^{2}

. Let us compute the velocity of variation of the Lagrangian L along the curve

γ

.

\frac{d}{d t} L (γ (t)) = \frac{d}{d t} L (Q (t), W (t)) = \frac{d}{d t} L (e_{P} (U (t)),^{e} U_{P}^{e_{P} (U (t))} V (t)) = \frac{d}{d t} L_{P} (U (t), V (t)),

with

L_{P} (U, V) = L (e_{P} (U),^{e} U_{P}^{e_{P} (U)} V)

. It follows that

\frac{d}{d t} L (Q (t), W (t)) = d_{1} L_{P} (U (t), V (t)) [\dot{U} (t)] + d_{2} L_{P} (U (t), V (t)) [\dot{V} (t)] .

(16)

If we write

Q = e_{P} (U)

and

W =^{e} U_{P}^{e_{P} (U)} V

, then we have

d_{2} L_{P} (U, V) [H_{2}] = {\frac{d}{d t} L_{P} (U, V + t H_{2})|}_{t = 0} = {\frac{d}{d t} L (Q, W + t^{e} U_{P}^{Q} H_{2})|}_{t = 0} = d_{2} L (Q, W) [^{e} U_{P}^{Q} H_{2}],

(17)

where

d_{2} L

is the fiber derivative of L. As

\dot{U} (t) =^{e} U_{Q (t)}^{P} \overset{⋆}{Q} (t)

and

^{e} U_{P}^{e_{P} (U (t))} \dot{V} (t) = \overset{⋆}{W} (t)

, it follows from Equations (16) and (17) that

\frac{d}{d t} L (Q (t), W (t)) = d_{1} L_{P} (U (t), V (t)) [^{e} U_{Q (t)}^{P} \overset{⋆}{Q} (t)] + d_{2} L (Q (t), W (t)) [\overset{⋆}{W} (t)] .

In the equation above, the first term on the RHS does not depend on P because the LHS and the second term of the RHS do not depend on P. Hence, we define the first partial derivative of the Lagrangian function to be

d_{1} (Q, W) [H_{1}] = d_{1} L_{P} (U, V) [^{e} U_{e_{P} (U)}^{P} H_{1}], H_{1} \in S_{Q} ε (μ),

(18)

so that the derivative of L along

γ

becomes

\frac{d}{d t} L (Q (t), W (t)) = d_{1} L (Q (t), W (t)) [\overset{⋆}{Q} (t)] + d_{2} L (Q (t), W (t)) [\overset{⋆}{W} (t)] .

(19)

In particular, if

W (t) = \overset{⋆}{Q} (t)

, then

\frac{d}{d t} L (Q (t), \overset{⋆}{Q} (t)) = d_{1} L (Q (t), \overset{⋆}{Q} (t)) [\overset{⋆}{Q} (t)] + d_{2} L (Q (t), \overset{⋆}{Q} (t)) [\overset{* *}{Q} (t)],

see Equation (5).

Example 4 (Running Example 2).

With the Lagrangian of Equation (15), we have

\begin{matrix} L_{P} (U, V) = \frac{1}{2} {〈^{e} U_{P}^{e_{P} (U)} V,^{e} U_{P}^{e_{P} (U)} V〉}_{e_{P} (U)} - κ 𝔼_{e_{P} (U)} [U - K_{P} (U) + log P] = \\ \frac{1}{2} d^{2} K_{P} (U) [V, V] + κ (K_{P} (U) - d K_{P} (U) [U + log P + ℋ (P)] + ℋ (P)), \end{matrix}

see Equations (9) and (11). The first partial derivative is

\begin{matrix} d_{1} L_{P} (U, V) [H_{1}] = \\ \frac{1}{2} d^{3} K_{P} (U) [V, V, H_{1}] + κ (d K_{P} (U) [H_{1}] - d^{2} K_{P} (U) [U + log P + ℋ (P), H_{1}] - d K_{P} (U) [H_{1}]) = \\ \frac{1}{2} d^{3} K_{P} (U) [V, V, H_{1}] - κ d^{2} K_{P} (U) [U + log P + ℋ (P), H_{1}] = \\ \frac{1}{2} 𝔼_{Q} [W^{2}^{e} U_{P}^{e_{P} (U)} H_{1}] - κ 𝔼_{Q} [(log Q + ℋ (Q))^{e} U_{P}^{e_{P} (U)} H_{1}] = \\ 𝔼_{Q} [(\frac{1}{2} (W^{2} - 𝔼_{Q} [W^{2}]) - κ (log Q + ℋ (Q)))^{e} U_{P}^{e_{P} (U)} H_{1}], \end{matrix}

where we have used Equations (9) and (10) together with

^{e} U_{P}^{e_{P} (U)} (U + log P + ℋ (P)) = log Q + ℋ (Q)

.

We have found that

d_{1} L (Q, W) [H_{1}] = {〈\frac{1}{2} (W^{2} - 𝔼_{Q} [W^{2}]) - κ (log Q + ℋ (Q)), H_{1}〉}_{Q}, H_{1} \in S_{Q} ε (μ),

(20)

and also

d_{1} L (Q (t), \overset{⋆}{Q} (t)) [\overset{⋆}{Q} (t)] = {〈\frac{1}{2} ({\overset{⋆}{Q} (t)}^{2} - 𝔼_{Q} [{\overset{⋆}{Q} (t)}^{2}]) - κ (log Q + ℋ (Q)), \overset{⋆}{Q} (t)〉}_{Q} .

Using the fiber derivative computed in the first part of the running example, we find

\frac{d}{d t} L (Q (t), \overset{⋆}{Q} (t)) = {〈\frac{1}{2} ({\overset{⋆}{Q} (t)}^{2} - 𝔼_{Q} [{\overset{⋆}{Q} (t)}^{2}]) - κ (log Q + ℋ (Q)), \overset{⋆}{Q} (t)〉}_{Q} + {〈\overset{⋆}{Q} (t), \overset{* *}{Q} (t)〉}_{Q} .

Notice that Equation (12) shows that one of the terms in the equations above is

grad ℋ (Q)

.

5. Action Integral

If

[0, 1] ∋ t \mapsto Q (t)

is a smooth curve in the exponential manifold, then the action integral

A (Q) = \int_{0}^{1} L (Q (t), \overset{⋆}{Q} (t)) d t

is well defined. We consider the expression of Q in the chart centered at P,

Q (t) = e^{U (t) - K_{P} (U (t))} \times P

.

Given

φ \in C^{1} ([0, 1])

with

φ (0) = φ (1) = 0

, for each

δ \in R

and

H \in S_{P} ε (μ)

, we define the perturbed curve

Q_{δ} (t) = e^{(U (t) + δ φ (t) H) - K_{P} (U (t) + δ φ (t) H)} \times P .

We have

Q_{δ} (0) = Q (0)

,

Q_{δ} (1) = Q (1)

, and

{\overset{⋆}{Q}}_{δ} (t) = \dot{U} (t) + δ \dot{φ} (t) H - 𝔼_{Q_{δ} (t)} [(\dot{U} (t) + δ \dot{φ} (t)) H],

whose expression in the chart centered at P is

\dot{U} (t) + δ \dot{φ} (t) H

.

Let us consider the variation in

δ

of the action integral. We apply Equation (19) applied to the smooth curve in

S ε (μ)

given by

δ \mapsto (Q_{δ} (t), {\overset{⋆}{Q}}_{δ} (t)),

where t is fixed. As

\frac{d}{d δ} log Q_{δ} (t) = \frac{d}{d δ} (U (t) + δ φ (t) H) - 𝔼_{Q_{δ} (t)} [\frac{d}{d δ} (U (t) + δ φ (t) H)] = φ (t) (H - 𝔼_{Q_{δ} (t)} [H])

and

^{e} U_{P}^{Q_{δ} (t)} \frac{d}{d δ} (\dot{U} (t) + δ \dot{φ} (t) H) = \dot{φ} (t) (H - 𝔼_{Q_{δ} (t)} [H]),

we obtain

\begin{matrix} \frac{d}{d δ} A (Q_{δ}) = \int_{0}^{1} \frac{d}{d δ} L (Q_{δ} (t), {\overset{⋆}{Q}}_{δ} (t)) d t = \\ \int_{0}^{1} (φ (t) d_{1} L (Q_{δ} (t), {\overset{⋆}{Q}}_{δ} (t)) [H - 𝔼_{Q_{δ} (t)} [H]] + \dot{φ} (t) d_{2} L (Q_{δ} (t), {\overset{⋆}{Q}}_{δ} (t)) [H - 𝔼_{Q_{δ} (t)} [H]]) d t = \\ \int_{0}^{1} φ (t) (d_{1} L (Q_{δ} (t), {\overset{⋆}{Q}}_{δ} (t)) [H - 𝔼_{Q_{δ} (t)} [H]] - \frac{d}{d t} d_{2} L (Q_{δ} (t), {\overset{⋆}{Q}}_{δ} (t)) [H - 𝔼_{Q_{δ} (t)} [H]]) d t . \end{matrix}

If

t \mapsto Q (t)

is a critical curve of the action integral, then

{\frac{d}{d δ} A (Q_{δ})|}_{δ = 0} = 0

; hence, for all

φ

and H, we have

\int_{0}^{1} φ (t) (d_{1} L (Q (t), \overset{⋆}{Q} (t)) [H - 𝔼_{Q (t)} [H]] - \frac{d}{d t} d_{2} L (Q (t), \overset{⋆}{Q} (t)) [H - 𝔼_{Q (t)} [H]]) d t = 0 .

(21)

This in turn implies that for each

t \in [0, 1]

and

H \in S_{Q (t)} ε (μ)

, the Euler–Lagrange equation holds:

d_{1} L (Q (t), \overset{⋆}{Q} (t)) [H] - \frac{d}{d t} d_{2} L (Q (t), \overset{⋆}{Q} (t)) [H] = 0 .

(22)

Example 5 (Running Example 3).

For the Lagrangian of Equation (15), we can use Equation (20) in the form

\begin{matrix} d_{1} L (Q (t), \overset{⋆}{Q} (t)) [H - 𝔼_{Q (t)} [H]] = \\ {〈\frac{1}{2} (\overset{⋆}{Q} {(t)}^{2} - 𝔼_{Q (t)} [\overset{⋆}{Q} {(t)}^{2}]) - κ (log (Q (t)) + ℋ (Q (t))), H - 𝔼_{Q (t)} [H]〉}_{Q (t)}, \end{matrix}

with

H \in S_{P} ε (μ)

. For the other term, we have

d_{2} L (Q (t), \overset{⋆}{Q} (t)) [H - 𝔼_{Q (t)} [H]] = {〈\overset{⋆}{Q} (t), H - 𝔼_{Q (t)} [H]〉}_{Q (t)} = d^{2} K_{P} (U (t)) [\dot{U} (t), H],

whose derivative is

\begin{matrix} \frac{d}{d t} d^{2} K_{P} (U (t)) [\dot{U} (t), H R] = d^{3} K_{P} (U (t)) [\dot{U} (t), \dot{U} (t), H] + d^{2} K_{P} (U (t)) [\ddot{U} (t), H] = \\ 𝔼_{Q (t)} [\overset{⋆}{Q} {(t)}^{2} (H - 𝔼_{Q (t)} [H])] + 𝔼_{Q (t)} [\overset{* *}{Q} (t) (H - 𝔼_{Q (t)} [H])] = \\ 𝔼_{Q (t)} [(\overset{⋆}{Q} {(t)}^{2} - 𝔼_{Q (t)} [\overset{⋆}{Q} {(t)}^{2}]) (H - 𝔼_{Q (t)} [H])] + 𝔼_{Q (t)} [\overset{* *}{Q} (t) (H - 𝔼_{Q (t)} [H])] . \end{matrix}

Dropping the generic H, the Euler–Lagrange equation becomes

\overset{* *}{Q} (t) + (\overset{⋆}{Q} {(t)}^{2} - 𝔼_{Q (t)} [\overset{⋆}{Q} {(t)}^{2}]) = \frac{1}{2} (\overset{⋆}{Q} {(t)}^{2} - 𝔼_{Q (t)} [\overset{⋆}{Q} {(t)}^{2}]) - κ (log (Q (t)) + ℋ (Q (t)));

that is,

\overset{* *}{Q} (t) + \frac{1}{2} (\overset{⋆}{Q} {(t)}^{2} - 𝔼_{Q (t)} [\overset{⋆}{Q} {(t)}^{2}]) = - κ (log (Q (t)) + ℋ (Q (t))) .

The equation above has been derived using the exponential affine geometry of the statistical bundle and involves

\overset{* *}{Q} (t)

. However, by using Equations (5), (6), and (12), we find the equivalent form

^{0} D^{2} Q (t) = κ grad ℋ (Q (t)) .

6. Discussion

We have shown that the research program consisting of applying concepts taken from Classical Mechanics to Statistics makes sense, even if no practical application has been produced in this paper. Some simple examples have been discussed in order to show clearly that the language from classical mechanics is indeed suggestive when applied to typical concepts in Statistics such as Fisher score and statistical entropy. The derivation of the Euler–Lagrange equations is classically done in the set-up of the Riemannian geometry, while here we have used the affine structure of Information Geometry. The present provisional results prompt a generalization to non-finite sample spaces and the development of applications. Finally, the related Hamiltonian formalism remains to be investigated.

Acknowledgments

The Author gratefully thanks Hiroshi Matsuzoe (Nagoya Institute of Technology, Japan), Lamberto Rondoni (Politecnico di Torino, Italy), Antonio Scarfone (CNR and Politecnico di Torino, Italy), Tatsuaki Wada (Ibaraki University, Japan), for their interesting comments on early versions of this piece of research. He thanks two anonymous referees for their useful and enlightening comments. He acknowledges the support of de Castro Statistics, Collegio Carlo Alberto, and of GNAMPA-INdAM.

Conflicts of Interest

The author declares no conflict of interest.

References

Arnold, V.I. Mathematical Methods of Classical Mechanics, 2nd ed.; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1989; Volume 60, p. xvi+516. [Google Scholar]
Abraham, R.; Marsden, J.E. Foundations of Mechanics, 2nd ed.; Advanced Book Program, Reading, Mass; Benjamin/Cummings Publishing Co., Inc.: San Francisco, CA, USA, 1978; pp. xxii+m–xvi+806. [Google Scholar]
Marsden, J.E.; Ratiu, T.S. Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems, 2nd ed.; Texts in Applied Mathematics; Springer: New York, NY, USA, 1999; Volume 17, p. xviii+582. [Google Scholar]
Amari, S.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000; p. x+206. [Google Scholar]
Leok, M.; Zhang, J. Connecting Information Geometry and Geometric Mechanics. Entropy 2017, 19, 518. [Google Scholar] [CrossRef]
Pistone, G. Nonparametric information geometry. In Geometric Science of Information, Proceedings of the First International Conference, GSI 2013, Paris, France, 28–30 August 2013; Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science; Springer: Heidelberg, Germany, 2013; Volume 8085, pp. 5–36. [Google Scholar]
Lang, S. Differential and Riemannian Manifolds, 3rd ed.; Graduate Texts in Mathematics; Springer: Berlin, Germany, 1995; Volume 160, p. xiv+364. [Google Scholar]
Pistone, G. Examples of the application of nonparametric information geometry to statistical physics. Entropy 2013, 15, 4042–4065. [Google Scholar] [CrossRef]
Lods, B.; Pistone, G. Information Geometry Formalism for the Spatially Homogeneous Boltzmann Equation. Entropy 2015, 17, 4323–4363. [Google Scholar] [CrossRef]
Pistone, G.; Rogantin, M. The exponential statistical manifold: mean parameters, orthogonality and space transformations. Bernoulli 1999, 5, 721–760. [Google Scholar] [CrossRef]
Efron, B.; Hastie, T. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science; Cambridge University Press: New York, NY, USA, 2016; Volume 5, p. xix+475. [Google Scholar]
Gibilisco, P.; Pistone, G. Connections on non-parametric statistical manifolds by Orlicz space geometry. IDAQP 1998, 1, 325–347. [Google Scholar] [CrossRef]
Landau, L.D.; Lifshits, E.M. Course of Theoretical Physics. Statistical Physics, 3rd ed.; Butterworth-Heinemann: Oxford, UK, 1980; Volume 5. [Google Scholar]
Shima, H. The Geometry of Hessian Structures; World Scientific Publishing Co. Pte. Ltd.: Hackensack, NJ, USA, 2007; p. xiv+246. [Google Scholar]

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pistone, G. Lagrangian Function on the Finite State Space Statistical Bundle. Entropy 2018, 20, 139. https://doi.org/10.3390/e20020139

AMA Style

Pistone G. Lagrangian Function on the Finite State Space Statistical Bundle. Entropy. 2018; 20(2):139. https://doi.org/10.3390/e20020139

Chicago/Turabian Style

Pistone, Giovanni. 2018. "Lagrangian Function on the Finite State Space Statistical Bundle" Entropy 20, no. 2: 139. https://doi.org/10.3390/e20020139

APA Style

Pistone, G. (2018). Lagrangian Function on the Finite State Space Statistical Bundle. Entropy, 20(2), 139. https://doi.org/10.3390/e20020139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lagrangian Function on the Finite State Space Statistical Bundle

Abstract

1. Introduction

2. Statistical Bundle

3. The Tangent Space of the Statistical Bundle

4. Lagrangian Function

5. Action Integral

6. Discussion

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI