Connecting Information Geometry and Geometric Mechanics

Leok, Melvin; Zhang, Jun

doi:10.3390/e19100518

Open AccessArticle

Connecting Information Geometry and Geometric Mechanics

by

Melvin Leok

¹

and

Jun Zhang

^2,*

¹

Department of Mathematics, University of California, San Diego, La Jolla, CA 92093, USA

²

Departments of Psychology and Mathematics, University of Michigan, Ann Arbor, MI 48109, USA

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(10), 518; https://doi.org/10.3390/e19100518

Submission received: 13 July 2017 / Revised: 10 September 2017 / Accepted: 20 September 2017 / Published: 27 September 2017

(This article belongs to the Special Issue Information Geometry II)

Download Versions Notes

Abstract

:

The divergence function in information geometry, and the discrete Lagrangian in discrete geometric mechanics each induce a differential geometric structure on the product manifold

Q \times Q

. We aim to investigate the relationship between these two objects, and the fundamental role that duality, in the form of Legendre transforms, plays in both fields. By establishing an analogy between these two approaches, we will show how a fruitful cross-fertilization of techniques may arise from switching formulations based on the cotangent bundle

T^{*} Q

(as in geometric mechanics) and the tangent bundle

T Q

(as in information geometry). In particular, we establish, through variational error analysis, that the divergence function agrees with the exact discrete Lagrangian up to third order if and only if Q is a Hessian manifold.

Keywords:

Lagrangian; Hamiltonian; Legendre map; symplectic form; divergence function; generating function; Hessian manifold

1. Introduction

Information geometry and geometric mechanics each induce geometric structures on an arbitrary manifold Q, and we investigate the relationship between these two approaches. More specifically, we study the interaction of three objects:

T Q

, the tangent bundle on which a Lagrangian function

L (q, v)

is defined;

T^{*} Q

, the cotangent bundle on which a Hamiltonian function

H (q, v)

is defined; and

Q \times Q

, the product manifold on which the divergence function (from the information geometric perspective) or the Type I generating function (from the geometric mechanics perspective) is defined. In discrete mechanics, while the

Q \times Q ⟷ T^{*} Q

correspondence is via a symplectomorphism given by the time-h flow map associated with the Hamiltonian

H (q, p)

, and the

Q \times Q ⟷ T Q

correspondence is via the map relating the boundary-value and initial-value formulation of the Euler–Lagrange flow, it is the correspondence between

T Q ⟷ T^{*} Q

through the fiberwise Legendre map based on L or H that actually serves to couple the Hamiltonian flow with Lagrangian flow, leading to one and the same dynamics. We propose a decoupling of the Lagrangian and Hamiltonian dynamics through the use of a divergence function

D (q, v, p)

defined on the Pontryagin bundle

T Q \oplus T^{*} Q

that measures the discrepancy (or duality gap) between

H (q, p)

and

L (q, v)

. We also establish, through variational error analysis that the divergence function agrees with exact discrete Lagrangian up to third order if and only if Q is a Hessian manifold.

Geometric mechanics [1] investigates the equations of motion using the Lagrangian, Hamiltonian, and Hamilton–Jacobi formulations of classical Newtonian mechanics. Two apparently different principles were used in those formulations: the principle of conservation (energy, momentum, etc.) leading to Hamiltonian dynamics and the principle of variation (least action) leading to Lagrangian dynamics. The conservation properties of the Hamiltonian approach are with respect to the underlying symplectic geometry on the cotangent bundle, whereas the variational principles that result in the Euler–Lagrange equation of motions and the Hamilton–Jacobi equations reflect the geometry of the semispray on the tangent bundle. Lagrangian and Hamiltonian mechanics reflect two sides of the same coin–that they describe the identical dynamics on the configuration space (base manifold) is both remarkable and also to be expected due to their construction: because the Lagrangian and Hamiltonian are dual to each other, and related via the Legendre transform.

Information geometry [2,3] in the broadest sense of the term, provides a dualistic Riemannian geometric structure that is induced by a class of functions called divergence functions, which essentially provide a method of smoothly measuring a directed distance between any two points on the manifold, where the manifold is the space of probability densities. It arises in various branches of information science, including statistical inference, machine learning, neural computation, information theory, optimization and control, etc. Various geometric structures can be induced from divergence functions, including metric, affine connection, symplectic structure, etc., and this is reviewed in [4]. Convex duality and the Legendre transform play a key role in both constructing the divergence function and characterizing the various dualities encoded by information geometry [5,6].

Given that geometric mechanics and information geometry both prescribe dualistic geometric structures on a manifold, it is interesting to explore the extent to which these two frameworks are related. In geometric mechanics, the Legendre transform provides a link between the Hamiltonian function that is defined on the cotangent bundle

T^{*} Q

, with the Lagrangian function that is defined on the tangent bundle

T Q

, whereas in information geometry, it provides a link between the biorthogonal coordinates of the base manifold Q if it is dually-flat and exhibits the Hessian geometry. To understand their deep relationship, it turns out that we need to resort to the discrete formulation of geometric mechanics, and investigate the product manifold

Q \times Q

. The basic tenet of discrete geometric mechanics is to preserve the fact that Hamiltonian flow is a symplectomorphism, and construct discrete time maps that are symplectic. This results in two ways of viewing discrete-time mechanics, either as maps on

Q \times Q

or

T^{*} Q

, which are related via discrete Legendre transforms. The shift in focus from

T^{*} Q

to

Q \times Q

precisely lends itself to establishing a connection to information geometry, as the divergence function is naturally defined on

Q \times Q

, and in both information geometry and discrete geometric mechanics, induces a symplectic structure on

Q \times Q

. This is the basic observation that connects geometric mechanics and information geometry, and we will explore the implications of this connection in the paper.

Our paper is organized as follows. Section 2 provides a contemporary viewpoint of geometric mechanics, with Lagrangian and Hamiltonian systems discussed in parallel with one another, in terms of geometry on

T Q

and

T^{*} Q

, respectively, including a discussion of Dirac mechanics on the Pontryagin bundle

T Q \oplus T^{*} Q

, which provides a unified treatment of Lagrangian and Hamiltonian mechanics. Section 3 summarizes the results of the discrete formulation of geometric mechanics, which is naturally defined on the product manifold

Q \times Q

. Section 4 is a review of now-classical information geometry, including the Riemannian metric and affine connections on

T Q

, and the manner in which the divergence function naturally induces dualistic Riemannian structures. The special cases of Hessian geometry and biorthogonal coordinates are highlighted, showing how the Legendre transform is essential for characterizing dually-flat spaces. Section 5 starts with a presentation of the symplectic structure on

Q \times Q

induced by a divergence function, which is naturally identified with the Type I generating function on it. We follow up by investigating its transformation into a Type II generating function (which plays a key role in discrete Hamiltonian mechanics). We then propose to decouple the discrete Hamiltonian and Lagrangian dynamics by using the divergence function to measure their duality gap. Finally, we perform variational error analysis to show that on a dually-flat Hessian manifold, the Bregman divergence is third-order accurate with respect to the exact discrete Lagrangian. Section 6 closes with a summary and discussion.

2. A Review of Geometric Mechanics

Consider an n-dimensional configuration manifold Q, with local coordinates

(q^{1}, \dots, q^{n})

. The Lagrangian formulation of mechanics is defined on the tangent bundle

T Q

, in terms of a Lagrangian

L : T Q \to R

. From this, one can construct an action integral

S

which is a functional of the curve

q : [t_{1}, t_{2}] \to Q

, given by

S (q) = \int_{t_{1}}^{t_{2}} L (q (t), \dot{q} (t)) d t .

Then, Hamilton’s variational principle states that,

δ S (q) = 0,

(1)

where the variation

δ S

is induced by an infinitesimal variation

δ q

of the trajectory q, subject to the condition that the variations vanish at the endpoints, i.e.,

δ q (t_{1}) = δ q (t_{2}) = 0

. Applying standard results from the calculus of variations, we obtain the following Euler–Lagrange equations of motion,

\frac{d}{d t} \frac{\partial L}{\partial {\dot{q}}^{i}} - \frac{\partial L}{\partial q^{i}} = 0 .

(2)

The Hamiltonian formulation of mechanics is defined on the cotangent bundle

T^{*} Q

, and the fiberwise Legendre transform,

F L : T Q \to T^{*} Q

, relates the tangent bundle and the cotangent bundle as follows,

(q^{i}, {\dot{q}}^{i}) \mapsto (q^{i}, p_{i}) = (q^{i}, \frac{\partial L}{\partial {\dot{q}}^{i}}),

where

p_{i}

is the conjugate momentum to

q^{i}

:

p = \frac{\partial L}{\partial \dot{q}} .

(3)

The term fiberwise is used to emphasize the fact that

F L

establishes a pointwise correspondence between

T_{q} Q

and

T_{q}^{*} Q

for any point q on Q. The cotangent bundle forms the phase space, on which one can define a Hamiltonian

H : T^{*} Q \to R

,

H (q, p) = 〈 p, \dot{q} (q, p) 〉 - L (q, \dot{q} (q, p)),

where

\dot{q}

is viewed as a function of

(q, p)

by inverting the Legendre transform (3), and

{〈 p, v 〉}_{q} = \sum_{i} p_{i} v^{i}

denotes the duality or natural pairing between a vector v and covector p at the point

q \in Q

. A straightforward calculation shows that

\frac{\partial H}{\partial p_{i}} = {\dot{q}}^{i},

(4)

and

\frac{\partial H}{\partial q^{i}} = - \frac{\partial L}{\partial q^{i}} .

From these, we transform the Euler–Lagrange equations into Hamilton’s equations,

\frac{d q^{i}}{d t} = \frac{\partial H}{\partial p_{i}}, \frac{d p_{i}}{d t} = - \frac{\partial H}{\partial q^{i}} .

(5)

The canonical symplectic form

ω_{c a n}

on

T^{*} M

can be identified with a quadratic form induced by the skew-symmetric matrix J, i.e.,

ω_{c a n} (v, w) = v^{T} J w

. With that identification, Hamilton’s equations can be expressed as,

[\begin{matrix} \dot{q} \\ \dot{p} \end{matrix}] = [\begin{matrix} 0 & I \\ - I & 0 \end{matrix}] [\begin{matrix} \frac{\partial H}{\partial q} \\ \frac{\partial H}{\partial p} \end{matrix}] = J [\begin{matrix} \frac{\partial H}{\partial q} \\ \frac{\partial H}{\partial p} \end{matrix}] .

Alternatively, Hamilton’s equations (5) can be derived using Hamilton’s phase space variational principle, which states that,

δ \int_{t_{1}}^{t_{2}} [〈 p, \dot{q} 〉 - H (q, p)] d t = 0,

for infinitesimal variations

δ q

that vanish at the endpoints. The infinitesimal variation of the integral is computed by differentiating under the integral, integrating by parts, and using the fact that the infinitesimal variations

δ q

vanish at the endpoints, which yields:

\int_{t_{1}}^{t_{2}} (〈 δ p, \dot{q} 〉 + 〈 p, δ \dot{q} 〉 - \frac{\partial H}{\partial q} δ q - \frac{\partial H}{\partial p} δ p) d t = \int_{t_{1}}^{t_{2}} [(- \dot{p} - \frac{\partial H}{\partial q}) δ q + (\dot{q} - \frac{\partial H}{\partial p}) δ p] d t,

and by the fundamental theorem of the calculus of variations, which states that the integral is stationary only when the terms in the parentheses multiplying into the independent variations

δ q

and

δ p

vanish, we recover Hamilton’s equations (5).

Lagrangian and Hamiltonian mechanics are typically viewed as different representations of the same dynamical system, with the Legendre transform relating the two formulations. Here, the Legendre transform

F L

(with

F H

as its inverse) refers to both the map relating two sets of variables,

(q, \dot{q})

with

(q, p)

, as well as the relationship between two functions, the Lagrangian

L (q, \dot{q})

and the Hamiltonian

H (q, p)

. The Legendre transform links pairs of convex conjugate functions; in classical mechanics, the Lagrangian L and Hamiltonian H are always related in this sense of forming a convex pair. The requirement that

L (q, \dot{q})

be strictly convex in the variable

\dot{q}

is referred to as hyperregularity. When the Lagrangian is positive homogeneous (or singular), the Legendre transform yields a Hamiltonian function that is identically zero, which means that in such cases, the Hamiltonian analogue of the Lagrangian system does not exist, which is problematic in the context of analytic mechanics. In order to address such degeneracy, it is necessary to consider Dirac mechanics on Dirac manifolds, which is a simultaneous generalization of Lagrangian and Hamiltonian mechanics.

In geometric mechanics, including the contemporaneous Dirac formulation, the Lagrangian L and Hamiltonian H are always coupled via the fiberwise Legendre transform

F L

. In information geometry, it is a well-known fact that one can construct the divergence function (to be defined later), which captures the departure from such perfect coupling. In other words, we can view Lagrangian and Hamiltonian systems as two separate systems, which are endowed with their own dynamics and are in some sense dual to each other, and we then use the divergence function to measure their duality gap. For this reason, we will review the Lagrangian and Hamiltonian formulation of mechanics in terms of

L (q, \dot{q})

and

H (q, p)

, respectively, without necessarily assuming that the Lagrangian and Hamiltonian are related by the Legendre transform.

2.1. Lagrangian Mechanics as an Extremization System on $T Q$

As noted previously, the Euler–Lagrange equations (2) arise from the stationarity conditions that describe the extremal curves of the action integral, over the class of varied curves that fix the endpoints. Carrying out the differentiation in (2) explicitly yields,

\sum_{j} \frac{\partial^{2} L}{\partial v^{i} \partial v^{j}} \frac{d v^{j}}{d t} + \sum_{j} v^{j} \frac{\partial^{2} L}{\partial q^{j} \partial v^{i}} - \frac{\partial L}{\partial q^{i}} = 0 .

(6)

The fundamental tensor

g_{i j}

associated with the Lagrangian

L (q, v)

is given by,

g_{i j} (q, v) = \frac{\partial^{2} L (q, v)}{\partial v^{i} \partial v^{j}},

which is assumed to be positive-definite, i.e., the Lagrangian L is hyperregular. Let

g^{i l}

denote the matrix inverse of

g_{i j}

, then (6) can be written as

\frac{d^{2} q^{i}}{d t^{2}} + 2 G^{i} (q, \frac{d q}{d t}) = 0,

(7)

where

G^{i} (q, v) = \sum_{l} \frac{g^{i l}}{2} (\sum_{k} \frac{\partial^{2} L}{\partial q^{k} \partial v^{l}} v^{k} - \frac{\partial L}{\partial q^{l}}) .

So, Equation (7) with the above

G^{i}

are Euler–Lagrange equations in disguise, and its solution is an extremal curve of the action integral.

Recall that a smooth curve on Q can be lifted to a curve on

T Q

in a natural way: a curve

t \mapsto q (t) \in Q

becomes

t \mapsto (q (t), \dot{q} (t)) \in T Q

. Given an arbitrary

G^{i} (q, v)

, the system of equations (7) specify a family of curves, called a semispray. As seen above, semisprays arise naturally in variational calculus as extremal curves of the action integral associated with a Lagrangian.

Semisprays can be more generally described by a vector field. Recall that a vector field on Q is a section of

T Q

. Now, consider a vector field on the tangent bundle

T Q

; it is a section of the double tangent bundle

T (T Q)

. The integral surfaces of the semispray induces a decomposition of the total space

T (T Q)

into the horizontal subspace

H (T Q)

and the vertical subspace

V (T Q)

, which defines an Ehresmann connection. A vector on

T Q

encodes the second-order derivative of curves on Q, and a semispray defines the following vector field V on

T Q

:

V (q, v) = \sum_{i} (v^{i} {\frac{\partial}{\partial q^{i}}|}_{(q, v)} - 2 G^{i} (q, v) {\frac{\partial}{\partial v^{i}}|}_{(q, v)}),

where the factor

- 2

is there by convention. The integral curve of the semispray satisfies the second-order ordinary differential equation (7), and we say that a semispray is a vector field on the tangent bundle

T Q

which encodes a second-order system of differential equations on the base manifold Q.

A semispray is called a full spray if the spray coefficients

G^{i}

satisfy

G^{i} (q, λ v) = λ^{2} G^{i} (q, v),

for

λ > 0

. In this case, the integral curve remains invariant under reparameterization by a positive number, i.e., it satisfies homogeneity. When the semispray becomes a (full) spray, the Lagrange geometry becomes Finsler geometry, and the fundamental tensor

g_{i j} (q, v)

becomes the Finsler–Riemann metric tensor (which includes the Riemann metric as a special case).

As noted above, a semispray induces an Ehresmann connection on Q and this connection is torsion-free and typically nonlinear. Conversely, given a torsion-free connection, one can construct a semispray. The connection is homogenous if and only if the semispray is a full spray. Moreover, if the spray is affine, then the connection is affine as well—an affine spray

G^{i}

takes the form

G^{i} = \frac{1}{2} \sum_{j k} Γ_{j k}^{i} (q) v^{j} v^{k},

where

Γ_{j k}^{i}

is referred to as the affine connection.

To summarize, Lagrangian dynamics is related to action minimization by the Euler operator, and leads to a semispray on the configuration manifold Q. Under suitable conditions, the Lagrangian function

L (q, v)

defined on

T Q

will lead to a torsion-free but generally nonlinear connection, and an affine connection only for a very special form of Lagrangian.

2.2. Hamiltonian Mechanics as a Conservative System on $T^{*} Q$

Given a Hamiltonian

H : T^{*} Q \to R

, we consider the Hamiltonian vector field

X_{H} \in X (T^{*} Q)

(where

X

denotes a section) defined by

X_{H} = (\frac{\partial H}{\partial p_{i}}, - \frac{\partial H}{\partial q^{i}}) .

(8)

It is straightforward to verify that

H = c o n s t

along the dynamical flow of

X_{H}

:

\frac{d H}{d t} = \sum_{i} (\frac{\partial H}{\partial q^{i}} \frac{d q^{i}}{d t} + \frac{\partial H}{\partial p_{i}} \frac{d p_{i}}{d t}) = 0 .

So, a Hamiltonian vector field

X_{H}

advects the Hamiltonian H along its flow, so that H is constant along solution curves, which implies that the Lie derivative

L

of H along the flow of

X_{H}

vanishes,

L_{X_{H}} H = 0 .

Formally, starting from the tautological 1-form

θ = \sum_{i} p_{i} d q^{i}

on Q, one obtains a 2-form

ω

, called the Poincaré 2-form,

ω = - d θ = \sum_{i} d q^{i} \land d p_{i},

which is the canonical symplectic form on

T^{*} Q

:

ω (X, Y) = \sum_{i} (d q^{i} \land d p_{i}) (X, Y) = \sum_{i} (\frac{\partial X}{\partial q^{i}} \frac{\partial Y}{\partial p_{i}} - \frac{\partial X}{\partial p_{i}} \frac{\partial Y}{\partial q^{i}}),

where

X, Y

are vector fields on

T^{*} Q

.

More generally, given a Hamiltonian H along with a symplectic form

ω

, which is, by definition, a closed, nondegenerate 2-form, one obtains the Hamiltonian vector field

X_{H}

on

T^{*} Q

, defined in abstract notation by

ι_{X_{H}} ω = d H,

(9)

or equivalently in a more familiar notation,

ω (X_{H}, \cdot) = d H (\cdot) .

One can define the Poisson bracket

{\cdot, \cdot}

of two functions F and G by using their respective Hamiltonian vector fields and the symplectic form,

{F, G} \equiv ω (X_{F}, X_{G}) .

For the canonical symplectic form, it has the following coordinate expression,

{F, G} = \sum_{i} (\frac{\partial F}{\partial q^{i}} \frac{\partial G}{\partial p_{i}} - \frac{\partial F}{\partial p_{i}} \frac{\partial G}{\partial q^{i}}) .

In this way, Hamilton’s equations can be expressed in terms of the Poisson bracket as follows,

\frac{d q^{i}}{d t} = {q^{i}, H}, \frac{d p^{i}}{d t} = {p^{i}, H} .

(10)

By Darboux’s theorem, it is always possible to choose local coordinates

(q, p)

on

T^{*} Q

, referred to as canonical coordinates, such that the symplectic form has the expression

ω = \sum_{i} d q^{i} \land d p_{i}

. In these coordinates, Hamilton’s equations defined in terms of the symplectic structure (9) and Poisson structure (10) recover the canonical Hamiltonian vector field (8).

Note that any smooth function H on

T^{*} Q

induces a Hamiltonian vector field. An arbitrary vector field X on

T^{*} Q

is locally Hamiltonian, i.e., induced by a smooth function H, if

ι_{X} ω

is closed, i.e.,

L_{X} ω = 0

. In addition, a Hamiltonian vector field preserves the volume form

Ω

, i.e.,

L_{X} (Ω) = 0,

where

Ω

is the n-fold exterior product of

ω

,

Ω = \frac{{(- 1)}^{\frac{n (n - 1)}{2}}}{n!} ω \land ω \dots \land ω .

2.3. Symplectic Maps and Symplectic Flows

A symplectic map is a diffeomorphism of

T^{*} Q

that preserves its symplectic structure

ω

. We first consider a one-parameter family of symplectic maps generated by the flow map

F_{X} : T^{*} Q \to T^{*} Q

of a vector field

X \in X (T^{*} Q)

. Since the entire family of symplectic maps leave

ω

invariant, it follows that

£_{X} ω = 0

. It can be shown (using Cartan’s magic formula, and the fact that

ω

is closed) that a vector field

X \in X (T^{*} Q)

is symplectic if

i_{X} ω

is closed, i.e.,

d (i_{X} ω) = 0

. By the Poincaré lemma, this implies that

i_{X} ω

is locally exact, that is, in the neighborhood of any point, there exists some function

H : T^{*} Q \to R

such that

i_{X} ω = d H

. So there is always locally exists a Hamiltonian

H : T^{*} Q \to R

that generates a vector field X whose flow is symplectic with respect to

ω

.

More generally, a diffeomorphism

ϕ : M_{1} \to M_{2}

is a symplectic map from a symplectic space

(M_{1}, ω_{1})

to another space

(M_{2}, ω_{2})

if:

ϕ^{*} ω_{2} = ω_{1},

(11)

where

ω_{1}, ω_{2}

are the symplectic forms on

M_{1}, M_{2}

, respectively. The above condition (11) holds if and only if for any functions f, g:

(i): ${f, g} \circ ϕ = {f \circ ϕ, g \circ ϕ}$ ,
(ii): $ϕ^{*} X_{H} = X_{H \circ ϕ}$ .

With respect to Darboux coordinates about a point

z = (q, p) \in M_{1}

, the condition (11) that a map

ϕ : M_{1} \to M_{2}

is symplectic can be expressed locally by

{(D_{z} ϕ)}^{T} J (D_{z} ϕ) = J

, where

D_{z} F

denotes the Jacobian of

ϕ

at z.

A canonical transformation of

T^{*} M

is an automorphism

ϕ : T^{*} Q \to T^{*} Q

,

ϕ : \{\begin{matrix} (q, p) \to {(q^{'})}^{i} (q^{1}, \dots, q^{n}, p_{1}, \dots, p_{n}), \\ (q, p) \to {(p^{'})}^{i} (q^{1}, \dots, q^{n}, p_{1}, \dots, p_{n}), \end{matrix}

such that

ω = \sum_{i} d q^{i} \land d p_{i} = \sum_{i} d q^{' i} \land d p_{i}^{'} .

The significance of canonical transformations is that they preserve the form of Hamilton’s equations, and one can check that an automorphism

ϕ

is canonical by verifying that

{(D_{z} ϕ)}^{T} J (D_{z} ϕ) = J

in a Darboux coordinate chart.

2.4. Symplectic Structure on $T Q$ Pulled Back from $T^{*} Q$

If we endow

T^{*} Q

with the canonical symplectic form, we can construct a symplectic form on

T Q

in such a way that these two spaces are symplectomorphic.

The mapping between

T Q

and

T^{*} Q

can be constructed in two different ways, Case I involves the Legendre transform:

\begin{matrix} T Q & ⟷ & T^{*} Q \\ (q, v) & \mapsto & (q, \frac{\partial L (q, v)}{\partial v}) \\ (q, \frac{\partial H (q, p)}{\partial p}) & ↤ & (q, p) \end{matrix}

(12)

and Case II involves the Riemannian metric tensor g (on Q):

\begin{matrix} T Q & ⟷ & T^{*} Q \\ (q, v) & \mapsto & (q, \sum_{j} g_{i j} v^{j}) \\ (q, \sum_{j} g^{i j} p_{j}) & ↤ & (q, p) \end{matrix}

(13)

Note that we say that g is a pseudo-Riemannian metric on Q when g acts on a pair of tangent vectors at the tangent space

T_{q} Q

at a point q of Q; it can be viewed as a symmetric

(0, 2)

-tensor that maps

T Q \times T Q \to R

. On the other hand, the symplectic form

ω

is a skew-symmetric

(0, 2)

-tensor that acts on a pair of tangent vectors on

T^{*} Q

, so it maps

T (T^{*} Q) \times T (T^{*} Q) \to R

.

Case I. Given the Lagrangian

L : T Q \to R

, this induces the fiberwise Legendre transform

F L : T Q \to T^{*} Q

, which is given by

(q, v) \mapsto (q, p) = (q, \frac{\partial L}{\partial v})

. If L is hyperregular, then this map is a diffeomorphism. If we endow

T Q

with the pullback symplectic form

{(F L)}^{*} ω

, which is given by

\sum_{i j} \frac{\partial^{2} L}{\partial v^{j} \partial v^{i}} d q^{i} \land d v^{j} + \sum_{i j} \frac{\partial^{2} L}{\partial q^{i} \partial v^{j}} d q^{i} \land d q^{j},

then the Legendre transform is a symplectomorphism (by construction).

Case II. The Riemannian metric g induces the musical isomorphisms

♭ : T Q \to T^{*} Q

and

♯ : T^{*} Q \to T Q

between

T Q

and

T^{*} Q

, which are the operations that lower and raise the index, respectively. If we endow

T Q

with the pullback symplectic form

♭^{*} ω

, which is given by

ω = d q \land d p (q, v) = \sum_{i, j} g_{i j} d q^{i} \land d v^{j} + \sum_{i j k} \frac{\partial g_{i j}}{\partial q^{k}} v^{i} d q^{j} \land d q^{k},

then the musical isomorphism is a symplectomorphism (by construction).

Link between Case I and Case II. It is possible that the two ways of identifying

T Q \leftrightarrow T^{*} Q

may be the same; this happens when g on

T Q

coincides with the second derivatives of

L (q, v)

with respect to the v-variable:

g_{i j} (q, v) = \frac{\partial^{2} L (q, v)}{\partial v_{i} \partial v_{j}},

assuming L is hyperregular. The inverse of g, denoted

\tilde{g}

, can be obtained from

{\tilde{g}}^{i j} (q, p) = \frac{\partial^{2} H (q, p)}{\partial p_{i} \partial p_{j}},

using the Hamiltonian

H (q, p)

defined on

T^{*} Q

. Note that when the Lagrangian has the form

L (q, v) = \frac{1}{2} v^{T} M (q) v - V (q)

, this corresponds to the Riemannian metric g being given by the kinetic energy metric

M (q)

.

2.5. Hamilton-Jacobi Theory and Dirichlet-to-Neumann Map

In classical mechanics, the Hamilton–Jacobi equation is first introduced as a partial differential equation that the action integral satisfies. Recall that the action integral S along the solution of the Euler–Lagrange equation (2) over the time interval

[t_{0}, t]

is

S_{q_{0}} (q (t)) : = \int_{t_{0}}^{t} L (q (s), \dot{q} (s)) d s, where q (\cdot) is the solution of the Euler–Lagrange equation .

(14)

This is referred to as Jacobi’s solution of the Hamilton–Jacobi equation. Here, we assume that the initial position

q (0)

is fixed and the final position

q (t)

depends on the initial velocity

v_{0} = \dot{q} (0)

. By taking a variation

δ q (t)

of the endpoint

q (t)

, one obtains a partial differential equation satisfied by

S (q, t)

:

H (q, \frac{\partial S}{\partial q}) = 0 .

(15)

This is the Hamilton–Jacobi equation, when H does not explicit depend on t.

Conversely, it is shown that if

S_{q_{0}} (q)

is a solution of the Hamilton–Jacobi equation then

S_{q_{0}} (q)

is a generating function for the family of canonical transformations (or symplectic flows) that describe the dynamics defined by Hamilton’s equations. This result is the theoretical basis for the powerful technique of exact integration called separation of variables.

There are two uses of

S_{q_{0}} (q)

. First, it serves to characterize the Dirichlet-to-Neumann map, which refers to the correspondence between the boundary data

(q_{0}, q) \in Q \times Q

with the initial data

(q_{0}, v_{0}) \in T Q

of a dynamical system. Second, it provides a foliation of the configuration space Q, around the point

q_{0}

and parameterized by t, that is defined by the condition

S_{q_{0}} (q (t)) = c o n s t

.

In the rest of the paper, we will view

S_{q_{0}} (q)

as a scalar-valued function of

(q_{0}, q (t))

, which we refer to as the exact discrete Lagrangian

L_{d}^{E} : Q \times Q \to R

,

L_{d}^{E} (q_{0}, q_{1}; h) = \underset{\begin{matrix} q \in C^{2} ([0, h], Q) \\ q (0) = q_{0}, q (h) = q_{1} \end{matrix}}{ext} \int_{0}^{h} L (q (t), \dot{q} (t)) d t .

(16)

this is equivalent to the expression for Jacobi’s solution, as the stationarity conditions of this variational characterization are simply the Euler–Lagrange equations. Furthermore, this characterization has the added benefit that it is well-defined even if the Lagrangian is degenerate. The exact discrete Lagrangian provides us with the time-h flow map for the Euler–Lagrange equation. Given a fixed initial point

q_{0}

, this defines a map which takes

q \in Q

to an initial velocity

v_{0}

, such that the Euler–Lagrange trajectory

q (t)

with initial condition

(q (0), \dot{q} (0)) = (q_{0}, v_{0})

has boundary values

(q (0), q (h)) = (q_{0}, q)

. This is the Dirichlet-to-Neumann map

Q \times Q \mapsto T Q

,

(q_{0}, q) \mapsto (q_{0}, v_{0})

.

To address the Dirichlet-to-Neumann map more generally, let us first recall the definition of a retraction:

Definition 1

([7], Definition 4.1.1 on p. 55). A retraction on a manifold Q is a smooth mapping

R

:

T Q \to Q

with the following properties: Let

R_{q} : T_{q} Q \to Q

be the restriction of

R

to

T_{q} Q

for an arbitrary

q \in Q

; then,

(i): $R_{q} (0_{q}) = q$ , where $0_{q}$ denotes the zero element of $T_{q} Q$ ;
(ii): with the identification $T_{0_{q}} (T_{q} Q) ≃ T_{q} Q$ , $R_{q}$ satisfies

$T_{0_{q}} R_{q} = {i d}_{T_{q} Q},$

(17)

where $T_{0_{q}} R_{q}$ is the tangent map of $R_{q}$ at $0_{q} \in T_{q} Q$ .

Equation (17) implies that the map

R_{q} : T_{q} Q \to Q

is invertible in some neighborhood of

0_{q}

in

T_{q} Q

. Its inverse is conveniently denoted as

\tilde{R} : T Q \to Q \times Q

, which is defined by

\tilde{R} (v_{q}) : = (q, R_{q} (v_{q})) .

(18)

it is easy to see that

\tilde{R} : T Q \to Q \times Q

is also invertible in some neighborhood of

0_{q} \in T Q

for any

q \in Q

.

Let us introduce a special class of coordinate charts that are compatible with a given retraction map

R : T Q \to Q

. A coordinate chart

(U, φ)

with U an open subset in Q and

φ : U \to R^{n}

is said to be retraction compatible at

q \in U

if

(i): $φ$ is centered at q, i.e., $φ (q) = 0$ ;
(ii): the compatibility condition

R (v_{q}) = φ^{- 1} \circ T_{q} φ (v_{q})

(19)

holds, where we identify

T_{0} R^{n}

with

R^{n}

as follows: Let

φ = (q^{1}, \dots, q^{n})

with

q^{i} : U \to R

for

i = 1, \dots, n

. Then

v^{i} \frac{\partial}{\partial q^{i}} \mapsto (v^{1}, \dots, v^{n}),

(20)

where

\partial / \partial q^{i}

is the unit vector in the

q^{i}

-direction in

T_{0} R^{n}

.

An atlas for the manifold Q is retraction compatible if it consists of retraction compatible coordinate charts.

In Equation (19), we assumed that

T_{q} φ (v_{q}) \in φ (U) \subset R^{n}

and so strictly speaking

R_{q}

is defined on

{(T_{q} φ)}^{- 1} (φ (U)) \subset T_{q} Q

. However, it is always possible to define a coordinate chart such that

φ (U) = R^{n}

by stretching out the open set

φ (U)

to

R^{n}

so that (19) is defined for any

v_{q} \in T_{q} Q

.

Retraction maps provide general means to relate

Q \times Q

to

T Q

: in essence it provides a correspondence between

{q} \times Q

and

T_{q} Q

for all

q \in Q

(we may take

q \in Q

to mean the projection of

Q \times Q

onto either the first or the second slot).

2.6. Variational Mechanics and the Pontryagin Bundle

Lagrangian and Hamiltonian mechanics can be combined into Dirac mechanics [8,9], which is described on the Pontryagin bundle

T Q \oplus T^{*} Q

, which has position, velocity, and momentum as local coordinates.

Just as the Euler–Lagrange equations of motion arises out of Hamilton’s principle, Hamilton’s equations can also arise from Hamilton’s phase space principle:

δ \int_{t_{1}}^{t_{2}} [〈 p, \dot{q} 〉 - H (q, p)] d t = 0 .

On the Pontryagin bundle

T Q \oplus T^{*} Q

, which has local coordinates

(q, v, p)

, a relaxation of Hamilton’s principle (1) is the Hamilton–Pontryagin variational principle, which uses a Lagrange multiplier p to impose the second-order condition

v = \dot{q}

,

δ \int_{t_{1}}^{t_{2}} (L (q, v) - {〈 p, \dot{q} - v 〉}_{q}) d t = 0 .

(21)

This encapsulates both Hamilton’s and Hamilton’s phase space variational principles, as well as the Legendre transform, and gives the implicit Euler–Lagrange equations,

\dot{q} = v, \dot{p} = \frac{\partial L}{\partial q}, p = \frac{\partial L}{\partial v} .

The last equation explicitly imposes the primary constraint condition, which is important when describing degenerate Lagrangian systems, such as electrical circuits. Note that the p are interpreted as Lagrange multipliers [10] in addition to its usual interpretation as conjugate momenta. The three equations can be combined by eliminating v and p to recover the Euler–Lagrange equations.

An important application of Hamilton–Jacobi theory is in optimal control theory. Consider a typical optimal control problem,

min_{u (\cdot)} \int_{0}^{T} L (q, u) d t,

subject to the constraints,

\dot{q} = f (q, u),

and the boundary conditions

q (0) = q_{0}

and

q (T) = q_{T}

. We convert constrained optimization to unconstrained optimization by using Lagrange multipliers p (sometimes called the costate or auxiliary variables), and we can define the augmented cost functional:

\hat{S} [u] : = \int_{0}^{T} \{L (q, u) + 〈 p, \dot{q} - f (q, u) 〉\} d t = \int_{0}^{T} \{〈 p, \dot{q} 〉 - \hat{H} (q, p, u)\} d t,

where we introduced the costate variables p, and also defined the control Hamiltonian,

\hat{H} (q, p, u) : = 〈 p, f (q, u) 〉 - L (q, u) .

The variables

(q, p)

forms a Hamiltonian system, so we impose the optimality condition,

\frac{\partial \hat{H}}{\partial u} (q, p, u) = 0,

to obtain the equation for the optimal control

u = u^{*} (q, p)

, and we obtain the Hamiltonian,

H (q, p) : = max_{u} \hat{H} (q, p, u) = \hat{H} (q, p, u^{*} (q, p)) .

We also define the optimal cost-to-go function,

\begin{matrix} C (q, t) & : = \int_{t}^{T} \{L (\hat{q}, u^{*}) + 〈 p, \dot{\hat{q}} - f (\hat{q}, u^{*}) 〉\} d s \\ = \int_{t}^{T} \{〈 \hat{p}, \dot{\hat{q}} 〉 - H (\hat{q}, \hat{p})\} d s = S^{*} - S (q, t), \end{matrix}

where

(\hat{q} (s), \hat{p} (s))

for

s \in [0, T]

is the solution of Hamilton’s equations with the above H such that

\hat{q} (t) = q

; and

S^{*}

is the optimal cost

S^{*} : = \int_{0}^{T} \{〈 \hat{p}, \dot{\hat{q}} 〉 - H (\hat{q}, \hat{p})\} d s = \int_{0}^{T} \{〈 \hat{p}, \dot{\hat{q}} 〉 - \hat{H} (\hat{q}, \hat{p}, u^{*} (\hat{q}, \hat{p}))\} d s = \hat{S} [u^{*}],

and the function

S (q, t)

is defined by

S (q, t) : = \int_{0}^{t} \{〈 \hat{p}, \dot{\hat{q}} 〉 - H (\hat{q}, \hat{p})\} d s .

Since this definition coincides with (14), the function

S (q, t) = S^{*} - C (q, t)

satisfies the Hamilton–Jacobi equation (15); this reduces to the Hamilton–Jacobi–Bellman (HJB) equation for the optimal cost-to-go function

C (q, t)

:

\frac{\partial C}{\partial t} + min_{u} \{〈\frac{\partial C}{\partial q}, f (q, u)〉 + L (q, u)\} = 0 .

(22)

It can also be shown that the costate p of the optimal solution is related to the solution of the Hamilton–Jacobi–Bellman equation.

3. Discrete Formulation of Geometric Mechanics

In this section, we review various schemes for discretizing mechanics (see, e.g., [11]). Geometric mechanics focuses on the differential geometric structure of the configuration manifolds, the associated symplectic and Poisson structures on the phase space, and the conservation laws generated by symmetries, and geometric structure-preserving numerical integration aims to preserve as many of these geometric properties as possible under discretization. The main idea is to start from the canonical symplectic form

ω

on

T^{*} Q

, and look at the symplectomorphisms that preserve

ω

or its pullback via the Legendre transforms to

Q \times Q

or

T Q

.

3.1. Symplectomorphisms from $T^{*} Q$ to $T Q$ and to $Q \times Q$

Given a cotangent bundle

T^{*} Q

with a symplectic form

ω

, we wish to endow the bundles

T Q

and

Q \times Q

with a symplectic structure. Given a function

L : T Q \to R

, the Legendre transform is viewed as the fiber derivative

F L : T Q \to T^{*} Q

,

(q, \dot{q}) \mapsto (q, \partial L / \partial \dot{q})

. The pullback of

ω

with respect to

F L

yields a symplectic structure

ω_{L} = F L^{*} ω

on

T Q

.

Similarly, given a function

L_{d} : Q \times Q \to R

, we define two discrete fiber derivatives,

F L_{d}^{\pm}

:

Q \times Q \to T^{*} Q

, which serve as discrete Legendre transforms:

\begin{matrix} F L_{d}^{+} (q_{k}, q_{k + 1}) & = (q_{k + 1}, D_{2} L_{d} (q_{k}, q_{k + 1})), \end{matrix}

(23)

\begin{matrix} F L_{d}^{-} (q_{k}, q_{k + 1}) & = (q_{k}, - D_{1} L_{d} (q_{k}, q_{k + 1})) . \end{matrix}

(24)

Here

D_{1}, D_{2}

refers to taking a derivative with respect to the first or second slot, respectively:

D_{1} L_{d} (q_{k}, q_{k + 1}) = \frac{\partial L_{d} (q_{k}, q_{k + 1})}{\partial q_{k}}, D_{2} L_{d} (q_{k}, q_{k + 1}) = \frac{\partial L_{d} (q_{k}, q_{k + 1})}{\partial q_{k + 1}} .

The two choices of discrete fiber derivatives correspond to whether one views

Q \times Q

as a bundle over Q with respect to

π^{-} : (q_{k}, q_{k + 1}) \mapsto q_{k}

or

π^{+} : (q_{k}, q_{k + 1}) \mapsto q_{k + 1}

, i.e., projection onto the first or the second slot. These induce symplectic structures

ω_{L_{d}}^{\pm} = {(F L_{d}^{\pm})}^{*} ω

on

Q \times Q

by pullback.

Let

F : T^{*} Q \to T^{*} Q

be a symplectic map and let the maps denoted by the dotted arrows in the figure above be defined by requiring that the diagram commutes. Then, these maps are also symplectic maps, and the fiber derivative

F L

is a symplectomorphism between

(T Q, ω_{L})

and

(T^{*} Q, ω)

, and the discrete fiber derivatives

F L_{d}^{\pm}

are symplectomorphisms between

(Q \times Q, ω_{L_{d}}^{\pm})

and

(T^{*} Q, ω)

.

3.2. Discrete Lagrangian Mechanics

The aim of geometric structure-preserving numerical integration is to preserve as many geometric conservation laws as possible under discretization. Discrete variational mechanics [11] is based on the discrete Hamilton’s principle,

δ \sum_{i = 0}^{n - 1} L_{d} (q_{i}, q_{i + 1}) = 0,

(25)

where the endpoints

q_{0}

and

q_{n}

are fixed, and the discrete Lagrangian,

L_{d} : Q \times Q \to R

, is a Type I generating function of the symplectic map. Recall that there exists an exact discrete Lagrangian

L_{d}^{E}

(16), that generates the exact time-h flow of a Lagrangian system, but it cannot be computed in general. One possible method of constructing computable discrete Lagrangians is the Galerkin approach, which involves replacing the infinite-dimensional function space

C^{2} ([0, h], Q)

and the integral in (16) with a finite-dimensional function space and a quadrature formula, respectively. Below are two examples of discrete Lagrangians:

(i): Symplectic midpoint integrator

$L_{d} (q_{0}, q_{1}, h) = h L (\frac{q_{0} + q_{1}}{2}, \frac{q_{1} - q_{0}}{2}) .$

this can be obtained from the Galerkin approach by considering the family of linear polynomials as the finite-dimensional function space, and the midpoint rule as the quadrature formula.
(ii): Störmer–Verlet integrator

$L_{d} (q_{0}, q_{1}, h) = \frac{h}{2} [L (q_{0}, \frac{q_{1} - q_{0}}{2}) + L (q_{1}, \frac{q_{1} - q_{0}}{2})] .$

this can be obtained from the Galerkin approach by considering the family of linear polynomials as the finite-dimensional function space, and the trapezoidal rule as the quadrature formula.

Performing variational calculus on the discrete Hamilton’s principle (25) yields the discrete Euler–Lagrange (DEL) equations,

D_{2} L_{d} (q_{k - 1}, q_{k}) + D_{1} L_{d} (q_{k}, q_{k + 1}) = 0 .

(26)

The above equation implicitly defines the discrete Lagrangian map

F_{L_{d}} : (q_{k - 1}, q_{k}) \mapsto (q_{k}, q_{k + 1})

at points sufficiently close to the diagonal of

Q \times Q

. This is equivalent to the implicit discrete Euler–Lagrange (IDEL) equations,

p_{k} = - D_{1} L_{d} (q_{k}, q_{k + 1}), p_{k + 1} = D_{2} L_{d} (q_{k}, q_{k + 1}),

(27)

which is precisely the characterization of a symplectic map in terms of Type I generating function. It implicitly defines the discrete Hamiltonian map

{\tilde{F}}_{L_{d}} : (q_{k}, p_{k}) \mapsto (q_{k + 1}, p_{k + 1})

, and it is symplectic with respect to the canonical symplectic form

ω_{c a n}

on

T^{*} Q

, i.e.,

{\tilde{F}}_{L_{d}}^{*} ω_{c a n} = ω_{c a n}

.

The two discrete fiber derivatives

F L_{d}^{\pm}

induce a single unique discrete symplectic form

ω_{L_{d}} = ω_{L_{d}}^{\pm} = {(F L_{d}^{\pm})}^{*} ω_{c a n}

on

Q \times Q

,

ω_{L_{d}} (q_{k}, q_{k + 1}) = \frac{\partial^{2} L_{d}}{\partial q_{k} \partial q_{k + 1}} (q_{k}, q_{k + 1}) d q_{k} \land d q_{k + 1},

(28)

and the discrete Lagrangian map is symplectic with respect to

ω_{L_{d}}

on

Q \times Q

, i.e.,

F_{L_{d}}^{*} ω_{L_{d}} = ω_{L_{d}}

.

The discrete Lagrangian and Hamiltonian maps can be expressed in terms of the discrete fiber derivatives,

F_{L_{d}} = F L_{d}^{+} \circ {(F L_{d}^{-})}^{- 1}

, and

{\tilde{F}}_{L_{d}} = {(F L_{d}^{-})}^{- 1} \circ F L_{d}^{+}

, respectively. This characterization of the discrete flow maps underlies the proof of the variational error analysis theorem.

These properties may be summarized in the following commutative diagram,

If the exact discrete Lagrangian

L_{d}^{E}

is used, then the discrete Hamiltonian map

{\tilde{F}}_{L_{d}}

is equal to the time-h flow map of Hamilton’s equations, and the dotted arrow is the time-h flow map of the Euler–Lagrange equations.

The variational integrator approach to constructing symplectic integrators simplifies the numerical analysis of these methods. In particular, the task of establishing the geometric conservation properties and order of accuracy of the discrete Lagrangian map

F_{L_{d}}

reduces to the simpler task of verifying certain properties of the discrete Lagrangian instead.

3.3. Discrete Hamilton–Jacobi Formulation

In the context of discrete variational mechanics, discrete Hamilton–Jacobi theory can be viewed as a composition theorem which relates the composition of symplectic maps generated by a Type II generating function

H_{d}^{+} (q_{k}, p_{k + 1})

with a symplectic map generated by a Type I generating function

S_{d}^{k} (q_{0}, q_{k})

. By convention, the first argument

q_{0}

in the composition generating function is typically omitted, and we simply consider it to be a function

S_{d}^{k} (q_{k})

of the final position

q_{k}

.

The right discrete Hamiltonian,

H_{d}^{+} (q_{k}, p_{k + 1})

[12], is related to the discrete Lagrangian by the Legendre transform,

H_{d}^{+} (q_{k}, p_{k + 1}) = 〈 p_{k + 1}, q_{k + 1} 〉 - L_{d} (q_{k}, q_{k + 1}),

where we impose the condition that

p_{k + 1} = D_{2} L_{d} (q_{k}, q_{k + 1})

. Equivalently, this can be characterized variationally by

H_{d}^{+} (q_{k}, p_{k + 1}) = {ext}_{p_{k}} [〈 p_{k + 1}, q_{k + 1} 〉 - L_{d} (q_{k}, q_{k + 1})]

. This leads to a discrete Hamilton’s principle in phase space,

δ \sum_{i = 0}^{n - 1} \{〈 p_{i + 1}, q_{i + 1} 〉 - H_{d}^{+} (q_{i}, p_{i + 1})\} = 0,

which yields the right discrete Hamilton’s equations,

q_{k + 1} = D_{2} H_{d}^{+} (q_{k}, p_{k + 1}), p_{k} = D_{1} H_{d}^{+} (q_{k}, p_{k + 1}),

(29)

which is precisely the characterization of a symplectic map in terms of Type II generating function.

The continuous Hamilton–Jacobi equation can be derived by considering the evolution properties of Jacobi’s solution, which is the action integral evaluated along the solution of the Euler–Lagrange equations. One can derive a discrete Hamilton–Jacobi theory by considering a discrete analogue of Jacobi’s solution, expressed in terms of the right discrete Hamiltonian,

S_{d}^{k} (q_{k}) \equiv \sum_{i = 0}^{k - 1} \{p_{i + 1} \cdot q_{i + 1} - H_{d}^{+} (q_{i}, p_{i + 1})\},

which we evaluate along a solution of the right discrete Hamilton’s equations (29). From this, we have,

S_{d}^{k + 1} (q_{k + 1}) - S_{d}^{k} (q_{k}) = p_{k + 1} \cdot q_{k + 1} - H_{d}^{+} (q_{k}, p_{k + 1}),

(30)

where

p_{k + 1}

is considered to be a function of

q_{k}

and

q_{k + 1}

. Taking derivatives with respect to

q_{k + 1}

, we obtain,

D S_{d}^{k + 1} (q_{k + 1}) = p_{k + 1} + \frac{\partial p_{k + 1}}{\partial q_{k + 1}} \cdot [q_{k + 1} - D_{2} H_{d}^{+} (q_{k}, p_{k + 1})],

but the term inside the parenthesis vanishes as we are restricting this to a solution of the right discrete Hamilton’s equations. Therefore, we have that

D S_{d}^{k + 1} (q_{k + 1}) = p_{k + 1},

which when substituted into (30) yields the discrete Hamilton–Jacobi equation,

S_{d}^{k + 1} (q_{k + 1}) - S_{d}^{k} (q_{k}) = D S_{d}^{k + 1} (q_{k + 1}) \cdot q_{k + 1} - H_{d}^{+} (q_{k}, D S_{d}^{k + 1} (q_{k + 1})) .

3.4. Discrete Hamilton–Pontryagin Principle

Leok and Ohsawa [13] considered the discrete Hamilton’s principle and relaxed the discrete second-order condition,

q_{k}^{1} = q_{k + 1}^{0},

and reimposed it using Lagrange multipliers

p_{k + 1}

, in order to derive the discrete Hamilton–Pontryagin principle on

(Q \times Q) \times_{Q} T^{*} Q

,

δ [\sum_{i = 0}^{n - 1} L_{d} (q_{i}^{0}, q_{i}^{1}) + \sum_{i = 0}^{n - 2} p_{i + 1} (q_{i + 1}^{0} - q_{i}^{1})] = 0 .

(31)

Here, the superscripts 0, or 1 on

q_{i}

refers to the first or second slot, respectively, in

Q \times Q

. This in turn yields the implicit discrete Euler–Lagrange equations,

q_{k}^{1} = q_{k + 1}^{0}, p_{k + 1} = D_{2} L_{d} (q_{k}^{0}, q_{k}^{1}), p_{k} = - D_{1} L_{d} (q_{k}^{0}, q_{k}^{1}),

(32)

where

D_{1}, D_{2}

denote as before the partial derivative with respect to the first or second argument in

L_{d}

. Making the identification

q_{k} = q_{k}^{0} = q_{k - 1}^{1}

, the last two equations define the discrete fiber derivatives,

F L_{d}^{\pm} : Q \times Q \to T^{*} Q

as given by (23) and (24). Discrete fiber derivatives induce a discrete symplectic form,

ω_{L_{d}} \equiv {(F L_{d}^{\pm})}^{*} ω_{c a n}

, and the discrete Lagrangian map

F_{L_{d}} \equiv {(F L_{d}^{-})}^{- 1} \circ F L_{d}^{+} : (q_{k - 1}, q_{k}) \mapsto (q_{k}, q_{k + 1})

and the discrete Hamiltonian map

{\tilde{F}}_{L_{d}} \equiv F L_{d}^{+} \circ {(F L_{d}^{-})}^{- 1} : (q_{k}, p_{k}) \mapsto (q_{k + 1}, p_{k + 1})

preserve

ω_{L_{d}}

and

ω_{c a n}

, respectively.

4. Information Geometry

4.1. Statistical Structure on $M$

On a differentiable manifold

M

endowed with a metric g and a torsion-free affine connection ∇, the compatibility of a metric g and a connection ∇ is encoded by the cubic form 3-tensor field

C = \nabla g

, i.e., the covariant derivative of g. In a local coordinate system with basis

\partial_{i} \equiv \partial / \partial x^{i}

, the metric tensor g is locally represented by

g_{i j} (x) = g (\partial_{i}, \partial_{j}),

(33)

and the components of ∇ takes the contravariant form

Γ_{i j}^{l}

, where

\nabla_{\partial_{i}} \partial_{j} = \sum_{l} Γ_{i j}^{l} \partial_{l},

(34)

or the covariant form

Γ_{i j, k}

, where

Γ_{i j, k} = g (\nabla_{\partial_{i}} \partial_{j}, \partial_{k}) = \sum_{l} g_{l k} Γ_{i j}^{l} .

(35)

Torsion-freeness of ∇ implies the symmetry of its (first) two lower indices, i.e.,

Γ_{i j}^{l} (x) = Γ_{j i}^{l} (x), Γ_{i j, l} (x) = Γ_{j i, l} (x) .

We can now compute the cubic form,

C (\partial_{i}, \partial_{j}, \partial_{k}) = (\nabla_{\partial_{k}} g) (\partial_{i}, \partial_{j}) = \partial_{k} g (\partial_{i}, \partial_{j}) - g (\nabla_{\partial_{k}} \partial_{i}, \partial_{j}) - g (\partial_{i}, \nabla_{\partial_{k}} \partial_{j}),

or in components,

C_{i j k} = \frac{\partial g_{i j}}{\partial x_{k}} - Γ_{k i, j} - Γ_{k j, i} .

(36)

When the cubic form is identically zero, ∇ is said to be parallel with respect to g. A torsion-free connection parallel to g is called the Levi-Civita connection

\hat{\nabla}

associated to the given metric g:

\partial_{k} g (\partial_{i}, \partial_{j}) = g ({\hat{\nabla}}_{\partial_{k}} \partial_{i}, \partial_{j}) + g (\partial_{i}, {\hat{\nabla}}_{\partial_{k}} \partial_{j}) .

(37)

The fundamental theorem of Riemannian geometry establishes the existence and uniqueness of the Levi-Civita connection, which is a solution of (37), and is given by,

{\hat{Γ}}_{i j}^{k} = \sum_{l} \frac{g^{k l}}{2} (\frac{\partial g_{i l}}{\partial x^{j}} + \frac{\partial g_{j l}}{\partial x^{i}} - \frac{\partial g_{i j}}{\partial x^{l}}) .

Generalizing the notion of parallelism of a connection is the notion of conjugacy (denoted by ∗) between two connections. A connection

\nabla^{*}

is said to be conjugate (or dual) to ∇ with respect to g if

\partial_{k} g (\partial_{i}, \partial_{j}) = g (\nabla_{\partial_{k}} \partial_{i}, \partial_{j}) + g (\partial_{i}, \nabla_{\partial_{k}}^{*} \partial j) .

(38)

Clearly,

{(\nabla^{*})}^{*} = \nabla

. Moreover,

\hat{\nabla}

, which satisfies (37), is special in the sense that it is self-conjugate

{(\hat{\nabla})}^{*} = \hat{\nabla}

. Writing out (38):

\frac{\partial g_{i j}}{\partial x^{k}} = Γ_{k i, j} + Γ_{k j, i}^{*},

(39)

where

Γ_{k j, i}^{*}

is defined analogously to (34) and (35),

\nabla_{\partial_{i}}^{*} \partial_{j} = \sum_{l} Γ_{i j}^{* l} \partial_{l},

so that

Γ_{k j, i}^{*} = g (\nabla_{\partial_{j}}^{*} \partial_{k}, \partial_{i}) = \sum_{l} g_{i l} Γ_{k j}^{* l} .

From (36) and (38), the cubic form can now be written as

C_{i j k} = Γ_{k j, i}^{*} - Γ_{k j, i} .

Clearly,

C_{i j k} = C_{j i k}

, i.e., it is symmetric with respective to its first two indices. When both ∇ and

\nabla^{*}

are torsion-free, this implies that

Γ_{k j, i} = Γ_{j k, i}, Γ_{k j, i}^{*} = Γ_{j k, i}^{*},

then

C_{i j k} = C_{i k j}

, which leads to C being totally symmetric in all the indices,

C_{i j k} = C_{i k j} = C_{k i j} = C_{k j i} = C_{j k i} = C_{j i k} .

Requiring that

C_{i j k}

is totally symmetric imposes a compatibility condition between g and ∇, so that they form a Codazzi pair (see [14]), which generalizes the Levi-Civita coupling (whose corresponding cubic form

C_{i j k} \equiv 0

). Lauritzen [15] defined a statistical manifold

(M, g, \nabla)

to be a manifold

M

equipped with metric g and connection ∇ such that (i) ∇ is torsion-free; (ii)

\nabla g \equiv C

is totally symmetric. Equivalently, a manifold has a statistical structure when the conjugate (with respect to g)

\nabla^{*}

of a torsion-free connection ∇ is also torsion-free. In this case,

\nabla^{*} g = - C

, and the Levi-Civita connection

\hat{\nabla} = (\nabla + \nabla^{*}) / 2

.

On a statistical manifold, one can define a one-parameter family of affine connections

\nabla^{(α)}

, called

α

-connections (

α \in R

) [2]:

\nabla^{(α)} = \frac{1 + α}{2} \nabla + \frac{1 - α}{2} \nabla^{*} .

(40)

Obviously,

\nabla^{(0)} = \hat{\nabla}

is the Levi-Civita connection, and the cubic form is given by

\nabla^{(α)} g = α C

.

The curvature/flatness of a connection ∇ is described by the Riemann curvature tensor R, defined as

R (\partial_{i}, \partial_{j}) \partial_{k} = (\nabla_{\partial_{i}} \nabla_{\partial_{j}} - \nabla_{\partial_{j}} \nabla_{\partial_{i}}) \partial_{k} .

Writing

R (\partial_{i}, \partial_{j}) \partial_{k} = \sum_{l} R_{k i j}^{l} \partial_{l}

and substituting (34), the components of the Riemann curvature tensor are

R_{k i j}^{l} (x) = \frac{\partial Γ_{j k}^{l} (x)}{\partial x^{i}} - \frac{\partial Γ_{i k}^{l} (x)}{\partial x^{j}} + \sum_{m} Γ_{i m}^{l} (x) Γ_{j k}^{m} (x) - \sum_{m} Γ_{j m}^{l} (x) Γ_{i k}^{m} (x) .

By definition,

R_{k i j}^{l}

is antisymmetric when

i \leftrightarrow j

. The covariant form of the Riemann curvature is

R_{l k i j} = \sum_{m} g_{l m} R_{k i j}^{m} .

When the connection is torsion-free,

R_{l k i j}

is antisymmetric when

i \leftrightarrow j

or when

k \leftrightarrow l

, and symmetric when

(i, j) \leftrightarrow (l, k)

. It is related to the Ricci tensor Ric by

{R i c}_{k j} = \sum_{i, l} R_{l k i j} g^{i l}

.

In addition, it can be shown that the curvatures

R_{l k i j}, R_{l k i j}^{*}

for the pair of conjugate connections

\nabla, \nabla^{*}

satisfy

R_{l k i j} = R_{l k i j}^{*} .

A connection is said to be flat when

R_{k i j}^{l} (x) \equiv 0

. So, ∇ is flat if and only if

\nabla^{*}

is flat. In this case, the manifold is said to be dually-flat, and the metric g takes on a particular form (to be discussed later).

4.2. Divergence Function and Induced Geometry

A divergence function

D : M \times M \to R_{\geq 0}

on a manifold

M

with respect to a local chart

V \subseteq R^{n}

is a

C^{3}

function satisfying

(i): $D (x, y) \geq 0, \forall x, y \in V$ , with equality holding if and only if $x = y$ ;
(ii): $D_{i} (x, x) = D_{, j} (x, x) = 0, \forall i, j \in {1, 2, \dots, n}$ ;
(iii): $- D_{i, j} (x, x)$ is positive-definite.

Here

D_{i} (x, y) = \partial_{x^{i}} D (x, y)

,

D_{, i} (x, y) = \partial_{y^{i}} D (x, y)

denote partial derivatives with respect to the i-th component of point x and of point y, respectively, and

D_{i, j} (x, y) = \partial_{x^{i}} \partial_{y^{j}} D (x, y)

the second-order mixed derivative, and so on.

On a manifold, divergence functions act as pseudo-distance functions that are nonnegative but need not be symmetric. Every divergence function induces a dualistic Riemannian structure, i.e., statistical structure, which was first demonstrated by Eguchi (see [16]).

Lemma 1.

A divergence function induces a Riemannian metric g and a pair of torsion-free conjugate connections

\nabla, \nabla^{*}

given as

\begin{matrix} g_{i j} (x) & = - {D_{i, j} (x, y)|}_{x = y}, \\ Γ_{i j, k} (x) & = - {D_{i j, k} (x, y)|}_{x = y}, \\ Γ_{i j, k}^{*} (x) & = - {D_{k, i j} (x, y)|}_{x = y} . \end{matrix}

The

Γ_{i j, k}, Γ_{i j, k}^{*}

are torsion-free and are conjugate with respect to the induced metric

g_{i j}

. Hence, the divergence function

D

induces

(M, g, \nabla, \nabla^{*})

, which is a statistical manifold (Lauritzen [15]).

A popular divergence function is the Bregman divergence

B_{Φ} (x, y)

[17], which is associated to a strictly convex function

Φ

:

B_{Φ} (x, y) = Φ (y) - Φ (x) - 〈 y - x, \partial Φ (x) 〉,

(41)

where

\partial Φ = [\partial_{1} Φ, \dots, \partial_{n} Φ]

denotes the exterior derivative, and

{〈 \cdot, \cdot 〉}_{n}

denotes the canonical pairing of a vector

x = [x^{1}, \dots, x^{n}] \in R^{n}

and a covector

u = [u_{1}, \dots, u_{n}] \in {\tilde{R}}_{n}

(dual to

R^{n}

), i.e.,

〈 x, u 〉 = \sum_{i = 1}^{n} x^{i} u_{i} .

(42)

Where there is no danger of confusion, the subscript n in

{〈 \cdot, \cdot 〉}_{n}

is often omitted. A basic fact in convex analysis is that the necessary and sufficient condition for a smooth function

Φ

to be strictly convex is

B_{Φ} (x, y) > 0,

(43)

for all

x \neq y

.

Recall that when

Φ

is convex, its convex conjugate,

\tilde{Φ} : \tilde{V} \subseteq {\tilde{R}}_{n} \to R

, is defined through the Legendre transform:

\tilde{Φ} (u) = 〈 {(\partial Φ)}^{- 1} (u), u 〉 - Φ ({(\partial Φ)}^{- 1} (u)),

(44)

with

\tilde{\tilde{Φ}} = Φ

and

(\partial Φ) = {(\partial \tilde{Φ})}^{- 1}

. Since

\tilde{Φ}

is also convex, by (43), we obtain the Fenchel inequality,

Φ (x) + \tilde{Φ} (u) - 〈 x, u 〉 \geq 0,

for any

x \in V

,

u \in \tilde{V}

, with equality holding if and only if

u = (\partial Φ) (x) = {(\partial \tilde{Φ})}^{- 1} (x) ⟷ x = (\partial \tilde{Φ}) (u) = {(\partial Φ)}^{- 1} (u),

(45)

or, in component form,

u_{i} = \frac{\partial Φ}{\partial x^{i}} ⟷ x^{i} = \frac{\partial \tilde{Φ}}{\partial u_{i}} .

(46)

Using conjugate variables, we can introduce the canonical divergence

A_{Φ} : V \times \tilde{V} \to R_{+}

(and

A_{\tilde{Φ}} : \tilde{V} \times V \to R_{+}

),

A_{Φ} (x, v) = Φ (x) + \tilde{Φ} (v) - 〈 x, v 〉 = A_{\tilde{Φ}} (v, x) .

They are related to the Bregman divergence (41) via the relation

B_{Φ} (x, {(\partial Φ)}^{- 1} (v)) = A_{Φ} (x, v) = B_{\tilde{Φ}} ((\partial \tilde{Φ}) (x), v) .

Though the Bregman divergence is not a metric, it satisfies a quadrilateral relation [18]: For any four points

x, x^{'}, x^{″}, x^{‴} \in V

,

B_{Φ} (x, x^{'}) + B_{Φ} (x^{‴}, x^{″}) - B_{Φ} (x, x^{″}) - B_{Φ} (x^{‴}, x^{'}) = 〈 x^{″} - x^{'}, \partial Φ (x) - \partial Φ (x^{‴}) 〉 .

As a special case, when

x^{‴} = x^{'}

,

B_{Φ} (x^{‴}, x^{'}) = 0

, the above equality reduces to the Pythagorean (generalized cosine) relation among three points

x, x^{'}, x^{″} \in V

:

B_{Φ} (x, x^{'}) + B_{Φ} (x^{'}, x^{″}) - B_{Φ} (x, x^{″}) = 〈 x^{″} - x^{'}, \partial Φ (x) - \partial Φ (x^{'}) 〉 .

This is the Pythagorean relation [3] for a dually-flat space. Using this relation, one can state minimization problems for divergence functions.

The quadrilateral relation can be expressed in terms of the canonical divergence

A

as follows,

A_{Φ} (x, u) + A_{Φ} (y, v) - A_{Φ} (x, v) - A_{Φ} (y, u) = 〈 x - y, v - u 〉,

for any four points

x, y \in V

,

v, u \in \tilde{V}

.

Zhang [5] introduced the

α

-indexed family of

Φ

-divergence functions

D_{Φ}^{(α)}

on

V \times V

,

D_{Φ}^{(α)} (x, y) = \frac{4}{1 - α^{2}} (\frac{1 - α}{2} Φ (x) + \frac{1 + α}{2} Φ (y) - Φ (\frac{1 - α}{2} x + \frac{1 + α}{2} y)) .

(47)

Furthermore,

D_{Φ}^{(\pm 1)} (x, y)

is defined by taking

{lim}_{α \to \pm 1}

:

\begin{matrix} D_{Φ}^{(- 1)} (x, y) & = D_{Φ}^{(1)} (y, x) = B_{Φ} (x, y), \\ D_{Φ}^{(1)} (x, y) & = D_{Φ}^{(- 1)} (y, x) = B_{Φ} (y, x) . \end{matrix}

Note that

D_{Φ}^{(α)} (x, y)

satisfies the relation (called referential duality in [5,19]),

D_{Φ}^{(α)} (x, y) = D_{Φ}^{(- α)} (y, x),

that is, exchanging the two points in the directed distance amounts to

α \leftrightarrow - α

.

4.3. Hessian Manifolds and Biorthogonal Coordinates

Applying Lemma 1 to the Bregman divergence

B_{Φ}

induces the following metric,

g_{i j} (x) = \frac{\partial^{2} Φ (x)}{\partial x^{i} \partial x^{j}},

and the pair of torsion-free conjugate connections,

Γ_{i j, k} (x) = 0, Γ_{i j, k}^{*} (x) = \frac{\partial^{3} Φ (x)}{\partial x^{i} \partial x^{j} \partial x^{k}} .

In this case,

M

is dually-flat. This yields a Hessian manifold, where g takes the form of the Hessian of a strictly convex function

Φ

. More generally, as shown in [5], the

Φ

-divergence

D_{Φ}^{(α)}

of (47), which degenerates to the Bregman divergence

B_{Φ}

when

α = \pm 1

, induces an

α

-independent Hessian metric

g_{i j}

along with the following

α

-connections

Γ_{i j, k}^{(α)} (x) = \frac{1 - α}{2} \frac{\partial^{3} Φ (x)}{\partial x^{i} \partial x^{j} \partial x^{k}}, Γ_{i j, k}^{(α) *} (x) = \frac{1 + α}{2} \frac{\partial^{3} Φ (x)}{\partial x^{i} \partial x^{j} \partial x^{k}} .

Hessian manifolds enjoy a special status in information geometry, as they exhibit biorthogonal coordinates on

M

that are globally affine coordinates despite the nontrivial Riemannian (Hessian) metric on

M

.

Consider the coordinate transform

x \mapsto u

,

\partial^{i} \equiv \frac{\partial}{\partial u_{i}} = \sum_{l} \frac{\partial x^{l}}{\partial u_{i}} \frac{\partial}{\partial x^{l}} = \sum_{l} F^{l i} \partial_{l},

where the Jacobian matrix F is given by

F_{i j} (x) = \frac{\partial u_{i}}{\partial x^{j}}, F^{i j} (u) = \frac{\partial x^{i}}{\partial u_{j}}, \sum_{l} F_{i l} F^{l j} = δ_{j}^{i},

(48)

where

δ_{j}^{i}

is the Kronecker delta, which takes the value 1 when

i = j

and 0 otherwise. If the new coordinate system

u = [u_{1}, \dots, u_{n}]

(with components denoted by subscripts) is such that

F_{i j} (x) = g_{i j} (x),

(49)

then the x-coordinate system and the u-coordinate system are said to be biorthogonal to each other, since, from the definition of the metric tensor (33),

g (\partial_{i}, \partial^{j}) = g (\partial_{i}, \sum_{l} F^{l j} \partial_{l}) = \sum_{l} F^{l j} g (\partial_{i}, \partial_{l}) = \sum_{l} F^{l j} g_{i l} = δ_{i}^{j} .

In this case, we define

g^{i j} (u) = g (\partial^{i}, \partial^{j}),

(50)

which is equal to

F^{i j}

, the Jacobian of the inverse coordinate transform

u \mapsto x

. We also introduce the contravariant representation of the affine connection ∇ with respect to the u-coordinate system, and denote it by an unconventional notation

Γ_{t}^{r s}

, which is defined by

\nabla_{\partial^{r}} \partial^{s} = \sum_{t} Γ_{t}^{r s} \partial^{t};

similarly,

Γ_{t}^{* r s}

is defined by

\nabla_{\partial^{r}}^{*} \partial^{s} = \sum_{t} Γ_{t}^{* r s} \partial^{t} .

The covariant representation of the affine connections will be denoted by superscripted

Γ

and

Γ^{*}

,

Γ^{i j, k} (u) = g (\nabla_{\partial^{i}} \partial^{j}, \partial^{k}), Γ^{* i j, k} (u) = g (\nabla_{\partial^{i}}^{*} \partial^{j}, \partial^{k}) .

(51)

The representation of the affine connections in the u-coordinate system (denoted by superscripts) and the x-coordinate system (denoted by subscripts) are related by

Γ_{t}^{r s} (u) = \sum_{k} (\sum_{i, j} \frac{\partial x^{r}}{\partial u_{i}} \frac{\partial x^{s}}{\partial u_{j}} Γ_{i j}^{k} (x) + \frac{\partial^{2} x^{k}}{\partial u_{r} \partial u_{s}}) \frac{\partial u_{k}}{\partial x^{t}},

(52)

and

Γ^{r s, t} (u) = \sum_{i, j, k} \frac{\partial x^{r}}{\partial u_{i}} \frac{\partial x^{s}}{\partial u_{j}} \frac{\partial x^{t}}{\partial u_{k}} Γ_{i j, k} (x) + \frac{\partial^{2} x^{t}}{\partial u_{r} \partial u_{s}} .

(53)

Similarly relations hold between

Γ_{t}^{* r s} (u)

and

Γ_{i j}^{* k} (x)

, and between

Γ^{* r s, t} (u)

and

Γ_{i j, k}^{*} (x)

.

Analogous to (39), we have the following identity,

\frac{\partial^{2} x^{t}}{\partial u_{s} \partial u_{r}} = \frac{\partial g^{r t} (u)}{\partial u_{s}} = Γ^{r s, t} (u) + Γ^{* t s, r} (u) .

Therefore, with respect to biorthogonal coordinates, a pair of conjugate connections

\nabla, \nabla^{*}

satisfy,

Γ^{* t s, r} (u) = - \sum_{i, j, k} g^{i r} (u) g^{j s} (u) g^{k t} (u) Γ_{i j, k} (x),

(54)

and

Γ_{r}^{* t s} (u) = - \sum_{j} g^{j s} (u) Γ_{j r}^{t} (x) .

(55)

We now investigate conditions for the existence of biorthogonal coordinates on a Riemannian manifold

(M, g)

. From its definition (49), it can easily be shown that

Proposition 1

([20]). A Riemannian manifold

M

with metric

g_{i j}

admits biorthogonal coordinates if and only if

\frac{\partial g_{i j}}{\partial x^{k}}

is totally symmetric, i.e.,

\frac{\partial g_{i j} (x)}{\partial x^{k}} = \frac{\partial g_{i k} (x)}{\partial x^{j}} .

(56)

In this case,

M

is Hessian.

That (56) is satisfied for biorthogonal coordinates is evident by virtue of (48) and (49). Conversely, given (56), there must be n functions

u_{i} (x), i = 1, 2, \dots, n

, such that,

\frac{\partial u_{i} (x)}{\partial x^{j}} = g_{i j} (x) = g_{j i} (x) = \frac{\partial u_{j} (x)}{\partial x^{i}} .

The above identity implies that there exist a function

Φ

such that

u_{i} = \partial_{i} Φ

and, by positive definiteness of

g_{i j}

,

Φ

would have to be a strictly convex function! In this case, the x- and u-variables satisfy (45), and the pair of convex functions,

Φ

and its conjugate

\tilde{Φ}

, are related to

g_{i j}

and

g^{i j}

by

g_{i j} (x) = \frac{\partial^{2} Φ (x)}{\partial x^{i} \partial x^{j}} ⟷ g^{i j} (u) = \frac{\partial^{2} \tilde{Φ} (u)}{\partial u_{i} \partial u_{j}} .

It follows from Proposition 1 that a necessary and sufficient condition for a Riemannian manifold to admit biorthogonal coordinates it that its Levi-Civita connection is given by

{\hat{Γ}}_{i j, k} (x) \equiv \frac{1}{2} (\frac{\partial g_{i k}}{\partial x^{j}} + \frac{\partial g_{j k}}{\partial x^{i}} - \frac{\partial g_{i j}}{\partial x^{k}}) = \frac{1}{2} \frac{\partial g_{i j}}{\partial x^{k}} .

From this, the following can be shown:

Corollary 1.

A Riemannian manifold

(M, g)

admits a pair of biorthogonal coordinates x and u if and only if there exists a pair of conjugate connections ∇ and

\nabla^{*}

such that

Γ_{i j, k} (x) = 0, Γ^{* r s, t} (u) = 0

.

In other words, biorthogonal coordinates are affine coordinates for the dually-flat pair of connections. In fact, we can now define a pair of torsion-free connections by

Γ_{i j, k} (x) = 0, Γ_{i j, k}^{*} (x) = \frac{\partial g_{i j}}{\partial x^{k}},

and show that they are conjugate with respect to g, that is, they satisfy (38). This means that we select an affine connection ∇ such that x is its affine coordinate system. From (53), when

\nabla^{*}

is expressed in u-coordinates,

\begin{matrix} Γ^{* r s, t} (u) & = \sum_{i, j, k} g^{i r} (u) g^{j s} (u) \frac{\partial x^{k}}{\partial u_{t}} \frac{\partial g_{i j} (x)}{\partial x^{k}} + \frac{\partial g^{t s} (u)}{\partial u_{r}} \\ = \sum_{i, j} g^{i r} (u) (- \frac{\partial g^{j s} (u)}{\partial u_{t}} g_{i j} (x)) + \frac{\partial g^{t s} (u)}{\partial u_{r}} \\ = - \sum_{j} δ_{j}^{r} \frac{\partial g^{j s} (u)}{\partial u_{t}} + \frac{\partial g^{t s} (u)}{\partial u_{r}} = 0 . \end{matrix}

This implies that u is an affine coordinate system with respect to

\nabla^{*}

. Furthermore,

g^{i j} (u) = \frac{\partial^{2} \tilde{Φ} (u)}{\partial u_{i} \partial u_{j}}, Γ^{i j, k} (u) = \frac{\partial^{3} \tilde{Φ} (u)}{\partial u_{i} \partial u_{j} \partial u_{k}},

where

\tilde{Φ}

is the convex conjugate of

Φ

. Therefore, biorthogonal coordinates are affine coordinates for a pair of dually-flat connections. On the manifold of parameterized probability density functions, if the x-coordinates are the natural parameters, then the u-coordinates are the expectations.

5. Linking Information Geometry with Geometric Mechanics

5.1. Symplectic Structure on $Q \times Q$ Induced from the Divergence Function $D$

We will now establish the connection between information geometry and discrete geometric mechanics. The divergence function from information geometry can be viewed as a Type I generating function of a symplectic map, and in particular, it can be viewed as a discrete Lagrangian in the sense of discrete Lagrangian mechanics. More specifically, let the configuration manifold be the information manifold, i.e.,

Q = M

, and the discrete Lagrangian be the divergence function, i.e.,

L_{d} = D

. With this identification, we observe that the information geometric construction of symplectic structure on

M \times M

described below is nothing but the discrete symplectic structure on

Q \times Q

given in (28) where the discrete Lagrangian

L_{d}

is replaced with the divergence function

D

.

From information geometry, a divergence function

D

is given as a scalar-valued binary function on Q (of dimension n). We now view it as a unary function on

Q \times Q

(of dimension

2 n

) that vanishes along the diagonal

Δ_{Q} \subset Q \times Q

. In this subsection, we investigate the conditions under which a divergence function can serve as a generating function of a symplectic structure on

Q \times Q

. A compatible metric on

Q \times Q

will also be derived. When restricted to the diagonal submanifold

Δ_{Q}

, the skew-symmetric symplectic form will vanish, so

Δ_{Q}

, which carries a statistical structure, is actually a Lagrangian submanifold (see [21,22]).

First, we fix a point x in the first slot and a point y in the second slot of

(x, y) \in Q \times Q

– this results in two n-dimensional submanifolds of

Q \times Q

that will be denoted,

Q_{x} = Q \times {y} ≃ Q

(with the y point fixed) and

Q_{y} = {x} \times Q ≃ Q

(with the x point fixed), respectively. The canonical symplectic form

ω_{x}

on the cotangent bundle

T^{*} Q_{x}

is given by

ω_{x} = \sum_{i} d x^{i} \land d ξ_{i} .

Given

D

, we define a map

L_{D}

from

Q_{y} \subset Q \times Q

to

T^{*} Q_{x}, (x, y) \mapsto (x, ξ)

, which is given by,

L_{D} : (x, y) \mapsto (x, \sum_{i} D_{i} (x, y) d x^{i}) .

Recall that the comma in the subscript of a divergence function

D

indicates whether it is being differentiated with respect to a variable in the first or second slot. It is easily checked that there exists a neighborhood of the diagonal

Δ_{Q} \subset Q \times Q

, such that the map

L_{D}

is a diffeomorphism. In particular, the Jacobian of the map is given by

(\begin{matrix} δ_{i j} & D_{i j} \\ 0 & D_{i, j} \end{matrix}),

which is nondegenerate in a neighborhood of the diagonal

Δ_{Q}

.

We calculate the pullback by

L_{D}

of the canonical symplectic form

ω_{x}

on

T^{*} Q_{x}

to

Q \times Q

:

\begin{matrix} L_{D}^{*} ω_{x} & = L_{D}^{*} (\sum_{i} d x^{i} \land d ξ_{i}) = \sum_{i} d x^{i} \land d D_{i} (x, y) \\ = \sum_{i} d x^{i} \land \sum_{j} (D_{i j} (x, y) d x^{j} + D_{i, j} d y^{j}) = \sum_{i j} D_{i, j} (x, y) d x^{i} \land d y^{j} . \end{matrix}

Here,

\sum_{i j} D_{i j} d x^{i} \land d x^{j} = 0

, since by the equality of mixed partials,

D_{i j} (x, y) = D_{j i} (x, y)

always holds.

Similarly, we consider the canonical symplectic form

ω_{y} = \sum_{i} d y^{i} \land d η_{i}

on

T^{*} Q_{y}

and define a map

R_{D}

from

Q \times Q \to T^{*} Q_{y}, (x, y) \mapsto (y, η)

, which is given by

R_{D} : (x, y) = (y, \sum_{i} D_{, i} (x, y) d y^{i}) .

Using

R_{D}

to pullback

ω_{y}

to

Q \times Q

yields an analogous formula:

R_{D}^{*} ω_{y} = - \sum_{i j} D_{i, j} (x, y) d x^{i} \land d y^{j} .

Therefore, based on canonical symplectic forms on

T^{*} Q_{x}

and

T^{*} Q_{y}

, we obtained the same symplectic form on

Q \times Q

ω_{D} (x, y) = - \sum_{i j} D_{i, j} (x, y) d x^{i} \land d y^{j} .

(57)

Theorem 1

([22]). A divergence function

D

induces a symplectic form

ω_{D}

(57) on

Q \times Q

which is the pullback of the canonical symplectic forms

ω_{x}

and

ω_{y}

by the maps

L_{D}

and

R_{D}

,

L_{D}^{*} ω_{y} = \sum_{i j} D_{i, j} (x, y) d x^{i} \land d y^{j} = - R_{D}^{*} ω_{x} .

(58)

With the symplectic form

ω_{D}

given as above, it is easy to check that

ω_{D}

is closed:

d ω_{D} = \sum_{i j k} \{\frac{\partial^{3} D}{\partial x^{k} \partial x^{i} \partial y^{j}} d x^{k} \land d x^{i} \land d y^{j} + \frac{\partial^{3} D}{\partial y^{k} \partial x^{i} \partial y^{j}} d y^{k} \land d x^{i} \land d y^{j}\} = 0 .

It was Barndorff-Nielsen and Jupp [21] who first proposed (57) as an induced symplectic form on

Q \times Q

, apart from a minus sign; they called the divergence function

D

a york.

The fact that this symplectic structure coincides with the one introduced in discrete mechanics should come as no surprise. The

Q_{x}

and

Q_{y}

submanifolds are related to the two ways of viewing

Q \times Q

as a bundle over Q, depending on whether one chooses

π_{1} : Q \times Q \to Q

,

(x, y) \mapsto x

or

π_{2} : Q \times Q \to Q

,

(x, y) \mapsto y

as the bundle projection. Then, the maps

L_{D}

,

R_{D}

are, up to a sign, simply the discrete fiber derivatives

F L_{d}^{\pm}

, where the discrete Lagrangian

L_{d}

is replaced by the divergence function

D

.

5.2. Divergence as a Type I Generating Function

As we have seen previously, symplectic maps are a natural way of describing the flow of Hamiltonian mechanics on the cotangent bundle

T^{*} Q

. We will now consider the characterization of symplectic maps in terms of generating functions, and in particular, we review three different parameterizations based on the classification given in Goldstein [23].

Lemma 2.

Given

(q_{0}, q_{1}) \in Q \times Q

, then

(q_{0}, p_{0}) \mapsto (q_{1}, p_{1})

on

T^{*} Q

is symplectic if and only if there exists

S_{1} : Q \times Q \to R

such that

p_{0} = - D_{1} S_{1}, p_{1} = D_{2} S_{1} .

(59)

To prove this, observe that

d S_{1} (q_{0}, q_{1}) = (D_{1} S_{1}) d q_{0} + (D_{2} S_{1}) d q_{1},

from which, we immediately obtain

- d p_{0} \land d q_{0} + d p_{1} \land d q_{1} = 0 = d^{2} S_{1} (q_{0}, q_{1}) = d (D_{1} S_{1}) \land d q_{0} + d (D_{2} S_{1}) \land d q_{1} .

Identifying the corresponding terms yield (59).

Type I generating functions

S_{1}

are linked with other types of generating functions via partial Legendre transforms. Fixing the first or second variable slot leads to, respectively, Type II or III generating functions, denoted

S_{2}, S_{3}

respectively.

Let

H_{+}

be a submanifold, with local coordinates

(q_{0}, p_{1})

, of

Q \times (T^{*} Q)

, with local coordinates

(q_{0}, (q_{1}, p_{1}))

, where

q_{1}

is dependent on

q_{0}

and

p_{1}

. Then

(q_{0}, p_{0}) \mapsto (q_{1}, p_{1})

on

T^{*} Q

is symplectic if and only if there exists

S_{2} : H_{+} \to R

such that

p_{0} = D_{1} S_{2}, q_{1} = D_{2} S_{2} .

(60)

Likewise, let

H_{-}

be a submanifold, whose local coordinates are

(p_{1}, q_{0})

, of

(T^{*} Q) \times Q

with local coordinates

((q_{0}, p_{0}), q_{1})

where

q_{0}

is dependent on

p_{0}

and

q_{1}

. Then

(q_{0}, p_{0}) \mapsto (q_{1}, p_{1})

on

T^{*} Q

is symplectic if and only if there exists

S_{3} : H_{-} \to R

such that

q_{0} = - D_{1} S_{3}, p_{1} = - D_{2} S_{3} .

(61)

In the case of discrete mechanics, the Type II generating function is denoted by

H^{+} (q_{0}, p_{1})

and the Type III generating function is denoted by

H^{-} (q_{1}, p_{0})

. We compute their exterior derivatives:

\begin{matrix} d H^{+} (q_{0}, p_{1}) & = D_{1} H^{+} (q_{0}, p_{1}) d q_{0} + D_{2} H^{+} (q_{0}, p_{1}) d p_{1}, \end{matrix}

(62)

\begin{matrix} d H^{-} (q_{1}, p_{0}) & = D_{1} H^{-} (q_{1}, p_{0}) d q_{1} + D_{2} H^{-} (q_{1}, p_{0}) d p_{0} . \end{matrix}

(63)

From this, we obtain,

\begin{matrix} (64) & \begin{matrix} 0 & = d^{2} H^{+} (q_{0}, p_{1}) = d (D_{1} H^{+} (q_{0}, p_{1}) d q_{0} + D_{2} H^{+} (q_{0}, p_{1}) d p_{1}) \end{matrix} \\ (65) & \begin{matrix} = d (p_{0} d q_{0} + q_{1} d p_{1}) = d p_{0} \land d q_{0} + d q_{1} \land d p_{1}, \end{matrix} \\ (66) & \begin{matrix} 0 & = d^{2} H^{-} (q_{1}, p_{0}) = d (D_{1} H^{-} (q_{1}, p_{0}) d q_{1} + D_{2} H^{-} (q_{1}, p_{0}) d p_{0}) \end{matrix} \\ (67) & \begin{matrix} = d (p_{1} d q_{1} + q_{0} d p_{0}) = d p_{1} \land d q_{1} + d q_{0} \land d p_{0} . \end{matrix} \end{matrix}

Therefore, symplectic maps can be defined implicitly in terms of a Type II generating function

H^{+} (q_{0}, p_{1})

,

q_{1} = D_{2} H^{+} (q_{0}, p_{1}), p_{0} = D_{1} H^{+} (q_{0}, p_{1}),

and a Type III generating function

H^{-} (q_{1}, p_{0})

,

q_{0} = D_{2} H^{-} (q_{1}, p_{0}), p_{1} = D_{1} H^{-} (q_{1}, p_{0}) .

More explicitly, these are related to the discrete Lagrangian

L_{d} (q_{0}, q_{1})

, which is a Type I generating function, by the following partial Legendre transforms:

\begin{matrix} H^{+} (q_{0}, p_{1}) & = \underset{q_{1}}{ext} \{〈 p_{1}, q_{1} 〉 - L_{d} (q_{0}, q_{1})\}, \end{matrix}

(68)

\begin{matrix} H^{-} (q_{1}, p_{0}) & = \underset{q_{0}}{ext} \{〈 p_{0}, q_{0} 〉 - L_{d} (q_{0}, q_{1})\}, \end{matrix}

(69)

or equivalently,

\begin{matrix} H^{+} (q_{0}, p_{1}) - 〈 p_{1}, q_{1}^{*} 〉 + L_{d} (q_{0}, q_{1}^{*}) & = 0, \end{matrix}

(70)

\begin{matrix} H^{-} (q_{1}, p_{0}) - 〈 p_{0}, q_{0}^{*} 〉 + L_{d} (q_{0}^{*}, q_{1}) & = 0 . \end{matrix}

(71)

The upshot of the above discussion is that

p_{0}

,

p_{1}

are Legendre dual variables with respect to

q_{0}

,

q_{1}

, whereas in the fiberwise Legendre transform

F L

, it is

(q_{0}, v_{0})

,

(q_{1}, v_{1})

which are dual to

(q_{0}, p_{0})

,

(q_{1}, p_{0})

—the dual correspondence is

q \leftrightarrow p

, instead of

v \leftrightarrow p

. As before, the two discrete Legendre dualities are due to the two ways of viewing

Q \times Q

as a bundle over Q.

In the context of information geometry,

H^{\pm}

is nothing but the partial Legendre transform of the divergence function

D (x, y)

with respect to the first or second argument. Consider the Bregman divergence

B_{Φ} (q_{0}, q_{1})

,

B_{Φ} (q_{0}, q_{1}) \equiv Φ (q_{1}) - Φ (q_{0}) - 〈 \partial Φ (q_{0}), q_{1} - q_{0} 〉,

and view it as a discrete Lagrangian

L_{d} (q_{0}, q_{1})

. Then, its partial Legendre transform with respect to

q_{1}

, the Type II generating function

H^{+}

, is

H^{+} (q_{0}, p_{1}) = 〈q_{1}, \frac{\partial B_{Φ} (q_{0}, q_{1})}{\partial q_{1}}〉 - B_{Φ} (q_{0}, q_{1}),

which evaluates to

H^{+} (q_{0}, p_{1}) = B_{Φ} (q_{1} (q_{0}, p_{1}), q_{0}),

where

q_{1} = {(\partial Φ)}^{- 1} (p_{1} + \partial Φ (q_{0})),

is obtained by solving

p_{1} = \frac{\partial B_{Φ} (q_{0}, q_{1})}{\partial q_{1}} = \partial Φ (q_{1}) - \partial Φ (q_{0}) .

By substitution, we obtain,

H^{+} (q_{0}, p_{1}) = B_{Φ} {(\partial Φ)}^{- 1} (\partial Φ (q_{0}) + p_{1}), q_{0}) = B_{\tilde{Φ}} (\partial Φ (q_{0}), \partial Φ (q_{0}) + p_{1}) .

Note that in this case, the Legendre dual of

q_{1}

is no longer

p_{1} = \partial Φ (q_{1})

as given by the fiberwise Legendre map, but is rather shifted by an amount

\partial Φ (q_{0})

. It is interesting that

H^{+}

still takes the form of

B

, as does

L_{d}

. This is a special property of taking the Bregman divergence as the generating function.

5.3. $D$ -Divergence for Decoupling L and H

In geometric mechanics, Hamiltonian and Lagrangian dynamics represent one and the same dynamics–they are coupled; this is because

H (q, p)

and

L (q, v)

are related by the fiberwise Legendre transform

F L = {(F H)}^{- 1}

–in fact they are a Legendre pair. The conservation properties of the Hamiltonian approach with respect to the underlying symplectic geometry and the variational principles that arise in the Lagrangian and Hamilton–Jacobi theories reflect two sides of the same coin.

(72)

To appreciate this, we look at the interaction of three manifolds

Q \times Q

,

T^{*} Q

and

T Q

. We take

(q_{k}, q_{k + 1})

to be the configuration variable q at successive time-step—it is the dynamical equation that governs the evolution from

q_{k}

to

q_{k + 1}

. The Hamiltonian dynamics, which is encoded in the preservation of

ω_{c a n}

of

T^{*} Q

, governs discrete Hamiltonian flow

H (q_{k}, p_{k})

, through a Type I generating function

L_{d} (q_{k}, q_{k + 1})

. On the other hand, the Lagrangian flow is governed by the retraction map

R_{q}

, such as the Dirichlet-to-Neumann map induced by Jacobi’s solution

S_{q_{0}} (q)

to the Hamilton–Jacobi equation. Those two dynamic updates

q_{k} \to q_{k + 1}

need not be identical. In mechanics, the Hamiltonian energy conservation system and the Lagrangian extremization system lead to one and the same dynamics, precisely because

L (q_{k}, v_{k})

and

H (q_{k}, p_{k})

are linked through the fiberwise Legendre transform

F L

at

q_{k}

:

L (q_{k}, v_{k}) + H (q_{k}, p_{k}) - {〈 v_{k}, p_{k} 〉}_{q_{k}} \equiv 0 .

In other words, L and H are perfectly coupled–with no duality gap.

Information geometry, on the other hand, starts with a divergence (or contrast) function

D (q, v, p)

on

T Q \oplus T^{*} Q

, which measures the discrepancy between the two systems. Given

H (q, p)

on

T^{*} Q

and

L (q, v)

on

T Q

, we write

D (q, v, p) = L (q, v) + H (q, p) - {〈 v, p 〉}_{q} .

Theorem 2.

Let

L (q, v)

and

H (q, p)

be strictly convex functions, defined on

T Q

and

T^{*} Q

in terms of the variables

(q, v)

and

(q, p)

, respectively. Then, for the following statements, any two imply the rest:

(i): $D = 0$ ;
(ii): $H (q, \cdot)$ and $L (q, \cdot)$ are (fiberwise) convex conjugate (Legendre dual) to each other;
(iii): $p = \frac{\partial L}{\partial v} = F L (v)$ ;
(iv): $v = \frac{\partial H}{\partial p} = F H (p)$ .

When

D = 0

,

ω_{L} (ξ_{L}, \cdot) = ι_{ξ_{L}} ω_{L} = d E_{L}

with

ξ_{L} = (\dot{q}, \dot{v})

, and

E_{L} (q, v) = \sum_{i} v^{i} \frac{\partial L (q, v)}{\partial v^{i}} - L (q, v) .

Then,

d E_{L} = \sum_{i} (\sum_{j} \frac{\partial^{2} L}{\partial q^{i} \partial v^{j}} v^{j} - \frac{\partial L}{\partial q^{i}}) d q^{i} + \sum_{i j} v^{j} \frac{\partial^{2} L}{\partial v^{i} \partial v^{j}} d v^{i} .

The Euler–Lagrange equations are equivalent to

ξ_{L} = (\frac{\partial E_{L}}{\partial q}, \frac{\partial E_{L}}{\partial v}) .

Our insight here is that

D

does not have to vanish identically. The consequence is that we do not require the Lagrangian dynamics (extremization dynamics) and Hamiltonian dynamics (conservation dynamics) to be coupled; they will be allowed to evolve independently. The function

D

allows us to study fiberwise symplectomorphisms of Dirac manifolds.

Let us consider the case that (ii) holds, i.e.,

H (q, \cdot)

and

L (q, \cdot)

are Legendre duals to each other. Then, the canonical divergence

D

can be written as the Bregman divergence

B_{H}

and

B_{L}

, after applying the fiberwise Legendre map

F L

or

F H

,

p = \frac{\partial L (q, v)}{\partial v} ⟺ v = \frac{\partial H (q, p)}{\partial p} .

This implies that,

\begin{matrix} B_{L} (q, v_{0}, v_{1}) & \equiv D (q, v_{0}, \frac{\partial L (q, v)}{\partial v_{1}}) = L (q, v_{1}) - L (q, v_{0}) - 〈\frac{\partial L (q, v_{0})}{\partial v_{0}}, v_{1} - v_{0}〉, \\ B_{H} (q, p_{0}, p_{1}) & \equiv D (q, \frac{\partial H (q, p_{0})}{\partial p_{0}}, p_{1}) = H (q, p_{1}) - H (q, p_{0}) - 〈\frac{\partial H (q, p_{0})}{\partial p_{0}}, p_{1} - p_{0}〉, \end{matrix}

and they satisfy,

B_{L} (q, v_{0}, v_{1}) = B_{H} (q, p_{1}, p_{0}) .

This is the reference-representation biduality [18,19], which is satisfied whenever L and H are Legendre duals of each other.

5.4. Variational Error Analysis

Recall that we previously defined the exact discrete Lagrangian

L_{d}^{E}

(16), which is related to Jacobi’s solution of the Hamilton–Jacobi equation. The significance of the exact discrete Lagrangian is that it generates the exact discrete time flow of a Lagrangian system, but in general it cannot be computed explicitly. Instead, a computable discrete Lagrangian

L_{d}

is used instead to construct a discretization of Lagrangian mechanics, and it induces the discrete Lagrangian map

F_{L_{d}}

.

Since discrete variational mechanics is expressed in terms of discrete Lagrangians, and the exact discrete Lagrangian generates the exact flow map of a continuous Lagrangian system, it is natural to ask whether we can characterize the order of accuracy of the Lagrangian map

F_{L_{d}}

as an approximation of the exact flow map, in terms of the extent to which the discrete Lagrangian

L_{d}

approximates the exact discrete Lagrangian

L_{d}^{E}

. This is indeed possible, and is referred to as variational error analysis. Theorem 2.3.1 of [11] shows that if a discrete Lagrangian

L_{d}

approximates the exact discrete Lagrangian

L_{d}^{E}

to order p, i.e.,

L_{d} (q_{0}, q_{1}; h) = L_{d}^{E} (q_{0}, q_{1}; h) + O (h^{p + 1})

, then the discrete Hamiltonian map,

{\tilde{F}}_{L_{d}} : (q_{k}, p_{k}) \mapsto (q_{k + 1}, p_{k + 1})

, viewed as a one-step method, is order p accurate.

As mentioned above, the divergence function

D

from information geometry can serve as a Type I generating function of a symplectic map, and hence it can be viewed as a discrete Lagrangian in the sense of discrete Lagrangian mechanics. A divergence function also generates the Riemannian metric and affine connection structures on the diagonal manifold (Lemma 1), in addition to generating the symplectic structure on

Q \times Q

. Viewed in this way, a natural question is to what extent can we view the divergence function as corresponding to the exact Lagrangian flow of an associated continuous Lagrangian. We can show that

Theorem 3.

The exact discrete Lagrangian

L_{d}^{E} (q (0), q (h), h)

associated with the geodesic flow, with respect to the induced metric g, can be approximated by a divergence function

D (q (0), q (h))

up to third order

O (h^{3})

accuracy,

L_{d}^{E} (q (0), q (h), h) = h L (q (0), v (0)) + D (q (0), q (h)) + O (h^{3}),

if and only if Q is a Hessian manifold, i.e.,

D

is the Bregman divergence

B_{Φ}

, for some strictly convex function Φ.

Proof.

Let us expand the exact discrete Lagrangian to obtain,

\begin{matrix} L_{d}^{E} (q (0), q (h), h) & = h L (q, \dot{q}) + \frac{h^{2}}{2} (\frac{\partial L}{\partial q} (q, \dot{q}) + \frac{\partial L}{\partial \dot{q}} (q, \dot{q}) \cdot \ddot{q}) + O (h^{3}) \\ = h L (q, v) + \frac{h^{2}}{2} (\frac{\partial L}{\partial q^{i}} v^{i} + \frac{\partial L}{\partial v^{i}} a^{i}) + O (h^{3}), \end{matrix}

(73)

where

q (0) = q, v = \dot{q} (0), a = \ddot{q} (0)

.

From the definition of a divergence function:

D_{, j} (q, q) = 0 .

Differentiating with respect to q,

0 = \frac{\partial}{\partial q^{i}} D_{, j} (q, q) = D_{i, j} (q, q) + D_{, i j} (q, q),

so

D_{i, j} (q, q) = - D_{, i j} (q, q) .

Differentiating with respect to q again,

\frac{\partial}{\partial q^{k}} D_{, i j} (q, q) = D_{k, i j} (q, q) + D_{, i j k} (q, q) .

Observe that the left-hand side is the metric induced by the divergence function,

\frac{\partial}{\partial q^{k}} D_{, i j} (q, q) = - D_{i, j} (q, q) = g_{i j} (q) .

Expanding

D (q, q^{'})

around

q = q (0)

for

q^{'} = q (h)

:

q^{'} = q + v h + \frac{1}{2} a h^{2} + O (h^{3}),

we obtain

D (q, q^{'}) = \frac{h^{2}}{2} g_{i j} (q) + \frac{h^{3}}{4} g_{i j} (q) (v^{i} a^{j} + v^{j} a^{i}) + \frac{h^{3}}{6} Γ_{i j k} (q) v^{i} v^{j} v^{k} + O (h^{4}),

where

D (q, q) = 0, D_{, i} (q, q) = 0, D_{, i j} (q, q) \equiv g_{i j} (q) = - D_{i, j} (q, q), D_{, i j k} (q, q) \equiv Γ_{i j k} (q) .

Clearly,

g_{i j} = g_{j i}

, and

Γ_{i j k} = Γ_{i k j} = Γ_{k i j} = Γ_{k j i} = Γ_{j k i} = Γ_{j i k} .

Comparing the corresponding terms in powers of h, we obtain,

\begin{matrix} L (q, v) & = \frac{1}{2} g_{i j} v^{i} v^{j}, \end{matrix}

(74)

\begin{matrix} v^{i} \frac{\partial L}{\partial q^{i}} (q, v) + a^{i} \frac{\partial L}{\partial v^{i}} (q, v) & = \frac{1}{2} g_{i j} (a^{i} v^{j} + a^{j} v^{i}) + \frac{1}{3} Γ_{i j k} v^{i} v^{j} v^{k} . \end{matrix}

(75)

Substituting (74) into (75) yields

\frac{\partial g_{i j}}{\partial q^{k}} = Γ_{i j k},

with

\frac{\partial g_{i j}}{\partial q^{k}} = \frac{\partial g_{i k}}{\partial q^{j}} .

This, according to Proposition 1, demonstrates that the manifold

M

is Hessian, and hence dually-flat. So, for the expansions to agree to

O (h^{3})

, the inducing divergence function

D

must be the Bregman divergence

B_{Φ}

. ☐

6. Summary

In this paper, we show the differences and connections between geometric mechanics and information geometry in canonically prescribing differential geometric structures on a smooth manifold Q. The Legendre transform plays crucial roles in both; however, they serve very different purposes. In geometric mechanics, the fiberwise Legendre map serves to link the cotangent bundle

T^{*} Q

with tangent bundle

T Q

, whereas in information geometry, the Legendre transform relates the pair of biorthogonal coordinates, which are special coordinates on a dually-flat manifold Q. More specifically,

F L

(or its inverse

F H

) is invoked to establish the isomorphism between

T^{*} Q \leftrightarrow T Q

in geometric mechanics, whereas in information geometry, a Hessian metric g built upon a convex function on Q is used for the correspondence between two coordinate systems on Q, and also for potentially (but not necessarily) establishing a correspondence between

T Q

and

T^{*} Q

.

The link between information geometry and discrete mechanics is much stronger when one considers the discrete version (as opposed to the traditional, continuous version) of geometric mechanics. Both endow a symplectic structure

ω_{\times}

on

Q \times Q

, through the use of a discrete Lagrangian

L_{d}

in the case of geometric mechanics and a divergence function

D

in the case of information geometry—in fact they are both Type I generating functions for inducing

ω_{\times}

on

Q \times Q

via pullback from the canonical symplectic structure

ω_{c a n}

on

T^{*} Q

. Using the Legendre transform, Type II generating functions can be constructed, which lead to the (right) discrete Hamiltonian

H^{+}

in geometric mechanics and to the dual divergence function in information geometry.

Our analyses draw a distinction between the fiberwise Legendre map (which is used in continuous mechanics setting), the Legendre transform between biorthogonal coordinates (which is used in information geometry), and the Legendre transform between Type I and Type II generating functions (which is used in the setting of both discrete geometric mechanics and information geometry). The distinctions are more prominent when one considers the Pontryagin bundle

T Q \oplus T^{*} Q

. There, we can construct a divergence function that actually measures the duality gap between the Lagrangian function and the Hamiltonian function that generate a pair of (forward and backward) Legendre maps. In so doing, we demonstrate that information geometry can be viewed as an extension of geometric mechanics based on Dirac mechanics and geometry, with a full-blown duality between the Lagrangian and Hamiltonian components.

7. Discussion and Future Directions

Noda [24] showed that, with respect to the symplectic structure

ω_{\times}

on

Q \times Q

, the Hamiltonian flow of the canonical divergence

A

induces geodesic flows for ∇ and

\nabla^{*}

. He interpreted biorthogonal coordinates as a single coordinate system on

Q \times Q

, in a way that is consistent with treating

A

as the Type I generating function on

Q \times Q

. It remains unclear how the resulting Hamiltonian flow is related to dynamical flow on the Dirac manifold.

In another related work, Ay and Amari [25] sought to characterize the canonical form of divergence functions for general (non dually-flat) manifolds. They investigated the retraction map

R_{q} : {q} \times Q \to T_{q} Q

which we discussed in Section 2.5, and used the exponential map associated to any torsion-free affine connection ∇ on

T Q

. This approach, based on parallel transport, in essence generates a semispray on

T Q

, and is quite different from characterizing the dynamics using the Hamilton flow on

T^{*} Q

. Note that even though one may define a symplectic structure (through pullback) on

T Q

as well, Ay and Amari [25] treats the semispray on

T Q

as the primary geometric object. Future research will clarify its relation to our approach, which is based on defining a symplectic structure on

Q \times Q

directly.

Finally, comparing information geometry with geometric mechanics may shed light on universal machine learning algorithms. In machine learning or state estimation applications, we wish to have the estimated distribution be influenced by the observations, so that the estimated distribution eventually becomes consistent with the observed data. Let

{x_{i}}

denote the sequence of predictions by (possibly a series of) model distributions

X

, and let

{y_{i}}

denote the actual data generated by an unknown distribution

Y

that we are trying to estimate. In practice, the divergence functions are constructed so that the pseudo-distance between two distributions

X

and

Y

can be computed using only complete information about

X

and samples from

Y

. As such, we can measure the mismatch between the current prediction

x_{i}

and the actual data

y_{i}

using

D (x_{i}, y_{i})

, since the asymmetry in the definition of

D

is such that we only require samples

y_{i}

from the true but unknown distribution. So, adding a momentum term to ensure gentle change in model predictions, a possible choice of a discrete Lagrangian for generating the discrete dynamics for the machine learning application might be given by

L (x_{i}, x_{i + 1}) = D (x_{i}, x_{i + 1}) + D (x_{i + 1}, y_{i + 1}),

where the first term can be interpreted as the action associated with the kinetic energy, and the second term is the action associated with the potential energy. By construction, the term

D (x_{i}, y_{i})

vanishes when the prediction

x_{i}

is consistent with the actual observation

y_{i}

, and it is positive otherwise, so the term

D (x_{i}, y_{i})

can be viewed as a potential energy term that penalizes mismatch between the estimated distribution and the observational data. Our variational error analysis may thus shed light on an asymptotic theory of inference where sample size

N \to \infty

is akin to discretization step

h \to 0

.

The link between geometric mechanics and information geometry, as revealed through our present investigation, is still rather preliminary. The possibility of a unified mathematical framework for information and mechanics is intriguing and remains a challenge for future research.

Acknowledgments

We thank the anonymous reviewers for helping to improve this paper. The first author is supported by NSF grants CMMI-1334759 and DMS-1411792. The second author is supported by DARPA/ARO Grant W911NF-16-1-0383.

Author Contributions

Both authors contributed equally to the research and writing of the manuscript. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Marsden, J.E.; Ratiu, T.S. Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems, 2nd ed.; Springer: New York, NY, USA, 1999. [Google Scholar]
Amari, S. Differential-Geometrical Methods in Statistics; Lecture Notes in Statistics; Springer: New York, NY, USA, 1985. [Google Scholar]
Amari, S.; Nagaoka, H. Methods of Information Geometry (Translations of Mathematical Monographs); Translated from the 1993 Japanese original by Daishi Harada; American Mathematical Society: Providence, RI, USA; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
Zhang, J. Divergence Functions and Geometric Structures They Induce on a Manifold. In Geometric Theory of Information; Nielsen, F., Ed.; Springer: Berlin, Germany, 2014; pp. 1–30. [Google Scholar]
Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Matsuzoe, H. Dualistic Riemannian Manifold Structure Induced from Convex Functions. In Advances in Applied Mathematics and Global Optimization: In Honor of Gilbert Strang; Gao, D., Sherali, H., Eds.; Springer: Boston, MA, UK, 2009; pp. 437–464. [Google Scholar]
Abraham, R.; Marsden, J. Foundations of Mechanics, 2nd ed.; Benjamin/Cummings Publishing: Reading, MA, USA, 1978. [Google Scholar]
Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part I: Implicit Lagrangian systems. J. Geom. Phys. 2006, 57, 133–156. [Google Scholar] [CrossRef]
Yoshimura, H.; Marsden, J. Dirac structures in Lagrangian mechanics Part II: Variational structures. J. Geom. Phys. 2006, 57, 209–250. [Google Scholar] [CrossRef]
Tulczyjew, W.M.; Urbanski, P. A slow and careful Legendre transformation for singular Lagrangians. Acta Phys. Pol. B 1999, 30, 2909–2978. [Google Scholar]
Marsden, J.; West, M. Discrete mechanics and variational integrators. Acta Numer. 2001, 10, 317–514. [Google Scholar] [CrossRef]
Lall, S.; West, M. Discrete variational Hamiltonian mechanics. J. Phys. A 2006, 39, 5509–5519. [Google Scholar] [CrossRef]
Leok, M.; Ohsawa, T. Variational and Geometric Structures of Discrete Dirac Mechanics. Found. Comput. Math. 2011, 11, 529–562. [Google Scholar] [CrossRef]
Simon, U. Affine differential geometry. In Handbook of Differential Geometry; Elsevier Science: Amsterdam, The Netherlands, 2000; Volume I, pp. 905–961. [Google Scholar]
Lauritzen, S. Statistical manifolds. In Differential Geometry in Statistical Inference; IMS Lecture Notes; Amari, S., Barndorff-Nielsen, O., Kass, R., Lauritzen, S., Rao, C., Eds.; IMS: Hayward, CA, USA, 1987; Volume 10, pp. 163–216. [Google Scholar]
Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647. [Google Scholar]
Bregman, L.M. The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Comput. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
Zhang, J. Dual scaling of comparison and reference stimuli in multi-dimensional psychological space. J. Math. Psychol. 2004, 48, 409–424. [Google Scholar] [CrossRef]
Zhang, J. Reference duality and representation duality in information geometry. AIP Conf. Proc. 2015, 1641, 130–146. [Google Scholar]
Shima, H. The Geometry of Hessian Structures; World Scientific Publishing: Hackensack, NJ, USA, 2007. [Google Scholar]
Barndorff-Nielsen, O.E.; Jupp, P.E. Yorks and symplectic structures. J. Stat. Plan. Inference 1997, 63, 133–146. [Google Scholar] [CrossRef]
Zhang, J.; Li, F. Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions. In Proceedings of the Geometric Science of Information, Paris, France, 28–30 August 2013. [Google Scholar]
Goldstein, H. Classical Mechanics, 2nd ed.; Addison-Wesley Series in Physics; Addison-Wesley Publishing: Reading, MA, USA, 1980. [Google Scholar]
Noda, T. Symplectic structures on statistical manifolds. J. Aust. Math. Soc. 2011, 90, 371–384. [Google Scholar] [CrossRef]
Ay, N.; Amari, S. A novel approach to canonical divergences within information geometry. Entropy 2015, 17, 8111–8129. [Google Scholar] [CrossRef]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Leok, M.; Zhang, J. Connecting Information Geometry and Geometric Mechanics. Entropy 2017, 19, 518. https://doi.org/10.3390/e19100518

AMA Style

Leok M, Zhang J. Connecting Information Geometry and Geometric Mechanics. Entropy. 2017; 19(10):518. https://doi.org/10.3390/e19100518

Chicago/Turabian Style

Leok, Melvin, and Jun Zhang. 2017. "Connecting Information Geometry and Geometric Mechanics" Entropy 19, no. 10: 518. https://doi.org/10.3390/e19100518

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Connecting Information Geometry and Geometric Mechanics

Abstract

1. Introduction

2. A Review of Geometric Mechanics

2.1. Lagrangian Mechanics as an Extremization System on $T Q$

2.2. Hamiltonian Mechanics as a Conservative System on $T^{*} Q$

2.3. Symplectic Maps and Symplectic Flows

2.4. Symplectic Structure on $T Q$ Pulled Back from $T^{*} Q$

2.5. Hamilton-Jacobi Theory and Dirichlet-to-Neumann Map

2.6. Variational Mechanics and the Pontryagin Bundle

3. Discrete Formulation of Geometric Mechanics

3.1. Symplectomorphisms from $T^{*} Q$ to $T Q$ and to $Q \times Q$

3.2. Discrete Lagrangian Mechanics

3.3. Discrete Hamilton–Jacobi Formulation

3.4. Discrete Hamilton–Pontryagin Principle

4. Information Geometry

4.1. Statistical Structure on $M$

4.2. Divergence Function and Induced Geometry

4.3. Hessian Manifolds and Biorthogonal Coordinates

5. Linking Information Geometry with Geometric Mechanics

5.1. Symplectic Structure on $Q \times Q$ Induced from the Divergence Function $D$

5.2. Divergence as a Type I Generating Function

5.3. $D$ -Divergence for Decoupling L and H

5.4. Variational Error Analysis

6. Summary

7. Discussion and Future Directions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Connecting Information Geometry and Geometric Mechanics

Abstract

1. Introduction

2. A Review of Geometric Mechanics

2.1. Lagrangian Mechanics as an Extremization System on T Q

2.2. Hamiltonian Mechanics as a Conservative System on T ∗ Q

2.3. Symplectic Maps and Symplectic Flows

2.4. Symplectic Structure on T Q Pulled Back from T ∗ Q

2.5. Hamilton-Jacobi Theory and Dirichlet-to-Neumann Map

2.6. Variational Mechanics and the Pontryagin Bundle

3. Discrete Formulation of Geometric Mechanics

3.1. Symplectomorphisms from T ∗ Q to T Q and to Q × Q

3.2. Discrete Lagrangian Mechanics

3.3. Discrete Hamilton–Jacobi Formulation

3.4. Discrete Hamilton–Pontryagin Principle

4. Information Geometry

4.1. Statistical Structure on M

4.2. Divergence Function and Induced Geometry

4.3. Hessian Manifolds and Biorthogonal Coordinates

5. Linking Information Geometry with Geometric Mechanics

5.1. Symplectic Structure on Q × Q Induced from the Divergence Function D

5.2. Divergence as a Type I Generating Function

5.3. D -Divergence for Decoupling L and H

5.4. Variational Error Analysis

6. Summary

7. Discussion and Future Directions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.1. Lagrangian Mechanics as an Extremization System on $T Q$

2.2. Hamiltonian Mechanics as a Conservative System on $T^{*} Q$

2.4. Symplectic Structure on $T Q$ Pulled Back from $T^{*} Q$

3.1. Symplectomorphisms from $T^{*} Q$ to $T Q$ and to $Q \times Q$

4.1. Statistical Structure on $M$

5.1. Symplectic Structure on $Q \times Q$ Induced from the Divergence Function $D$

5.3. $D$ -Divergence for Decoupling L and H