1. Introduction
The set-up of classical Lagrangian Mechanics is a finite-dimensional Riemannian manifold. For example, see the monographs by V.I. Arnold ([
1], Chapters III–IV), R. Abraham and J.E. Mardsen ([
2], Chapter 3), J.E. Marsden and T.S. Ratiu ([
3], Chapter 7). Classical Information geometry, as it was first defined in the monograph by S.-I. Amari and H. Nagaoka [
4], views parametric statistical models as a manifold endowed with a dually-flat connection. In a recent paper, M. Leok and J. Zhang [
5] have pointed out the natural relation between these two topics and have given a wide overview of the mathematical structures involved.
In the present paper, we take up the same research program with two further qualifications. First, we assume a non-parametric approach by considering the full set of positive probability functions on a finite set, as it was done, for example, in our review paper [
6]. The discussion is restricted here to a finite state space to avoid difficult technical problems. Second, we consider a specific expression of the tangent space of the statistical manifold, which is a Hilbert bundle that we call a statistical bundle. Our aim is to emphasize the basic statistical intuition of the geometric quantities involved. Because of that, we chose to systematically use the language of non-parametric differential geometry as it is developed in the monography of S. Lang [
7].
Herein, we use our version of Information Geometry; see the review paper [
6]. Preliminary versions of this paper have been presented at the SigmaPhy2017 Conference held in Corfu, Greece, 10–14 July 2017, and at a seminar held at Collegio Carlo Alberto, Moncalieri, on 5 September 2017. In these early versions, we did not refer to Leok and Zhang’s work, which we were unaware of at that time.
In
Section 2, we review the definition and properties of the statistical bundle, and of the affine atlas that endows it with both a manifold structure and a natural family of transports between the fibers. In
Section 3, we develop the formalism of the tangent space of the statistical bundle and derive the expression of the velocity and the acceleration of a one-dimensional statistical model in the given affine atlas. The derivation of the Euler–Lagrange equations, together with a relevant example, is discussed in
Section 4.
2. Statistical Bundle
We consider a finite sample space , with . The probability simplex is , and is its interior. The uniform probability on is denoted as , , . The maximal exponential family is the set of all strictly positive probability densities of . The expected value of with respect to the density is denoted .
In [
6,
8,
9], we made the case for the statistical bundle being the key structure of Information Geometry. The
statistical bundle with base
is
The statistical bundle is a semi-algebraic subset of
; i.e., it is defined by algebraic equations and strict inequalities. It is trivially a real manifold. At each
, the fiber
is endowed with the scalar product
To this structure we add a special affine atlas of charts in order to show a structure of affine manifold, which is of interest in the statistical applications. The
exponential atlas of the statistical manifold
is the collection of charts given for each
by
where (with a slight abuse of notation)
As
, we say that
is the chart
centered at P. If
, it is easy to derive the exponential form of
Q as a density with respect to
P; namely,
. As
, then
, so that the
cumulant function is defined on
by
that is,
is the expression in the chart at
P of Kullback–Leibler divergence of
, and we can write
The
patch centered at P is
In statistical terms, the random variable
is the relative point-wise information about
Q relative to the reference
P, while
is the deviation from its mean value at
P. The expression of the other divergence in the chart centered at
P is
The equation above shows that the two divergences are convex conjugate functions in the proper charts; see [
10].
The transition maps of the exponential atlas in Equations (
1) and (
2) are
so that the exponential atlas is indeed affine. Notice that the linear part is
.
3. The Tangent Space of the Statistical Bundle
Let us compute the expression of the velocity at time
t of a smooth curve
in the chart centered at
P. The expression of the curve is
and hence we have, by denoting the derivative in
by the dot,
and
If we define the
velocity of
to be
then
is a curve in the statistical bundle whose expression in the chart centered at
P is
. The velocity as defined above is nothing else as the
score function of the one-dimensional statistical model; see e.g., the textbook by B. Efron and T. Haste (Section 4.2, [
11]). The variance of the score function (i.e., the squared norm of
in
) is classically known as
Fisher information at
t.
We define the
second statistical bundle to be
with charts
we can identify the second bundle with the tangent space of the first bundle as follows.
For each curve
in the statistical bundle, define its
velocity at t to be
because
is a curve in the second statistical bundle, and its expression in the chart at
P has the last two components equal to the values given in Equations (
3) and (
4).
In particular, consider the a curve
. The velocity is
where the
acceleration is
It should be noted that the acceleration has been defined without explicitly mentioning the relevant connection. In fact, the connection here is implicitly defined by the transports
, which is unusual in Differential Geometry, but is quite natural from the probabilistic point of view; see P. Gibilisco and G. Pistone [
12]. We shall see below that the non-parametric approach to Information Geometry allows the definition of a dual transport, hence a dual connection as it was in [
4]. Because of that, we could have defined other types of acceleration together with the one we have defined. Namely, we could consider an
exponential acceleration , a
mixture acceleration , and a
Riemannian acceleration
each acceleration being associated with a specific connection; see the review paper [
6]. We do not further discuss the different second-order geometries associated with the statistical bundle in this paper.
Example 1 (Boltzmann–Gibbs).
Let us compare the formalism we have introduced above with standard computations in Statistical Physics. The
Boltzmann–Gibbs distribution gives to point
the probability
, with
and
, see Landau and Lifshitz ([
13], Chapter 3). As a curve in
, it is
because of the reference to the uniform probability. The velocity defined above becomes in this case
, while the acceleration of Equation (
5) is
. Notice that we have the equation
.
Following the original construction of Amari’s Information Geometry [
4], we have defined on the statistical bundle a manifold structure which is both an affine and a Riemannian manifold. The base manifold
is actually a Hessian manifold with respect to any of the convex functions
,
(see [
14]). Many computations are actually performed using the Hessian structure. The following equations are easily checked and frequently used:
We have defined a centering operation that can be thought of as a
transport among fibers,
whose adjoint is
. In fact, is the adjoint of
,
Moreover, iff
, then
Example 2 (Entropy flow).
This example is taken from [
8]. In the scalar field
, there is no dependence on the fiber. If
is a smooth curve in
expressed in the chart centered at
P, then we can write
where the argument of the last expectation belongs to the fiber
and we have expressed the expected value as a derivative by using Equation (
7).
Again using Equations (
7) and (9), we compute the derivative of the entropy along the given curve as
We use now the equations
and
to obtain
We have identified the gradient of the entropy in the statistical bundle,
Notice that the previous computation could have been done using the exponential family
. See the computation of the gradient flow in [
8].
In the next section, we extend the computation illustrated in the example above to scalar fields on the statistical bundle.
4. Lagrangian Function
A
Lagrangian function is a smooth scalar field on the statistical bundle
At each fixed density
, the partial mapping
is defined on the vector space
; hence, we can use the ordinary derivative, which in this case is called the
fiber derivative,
Example 3 (Running Example 1).
If
then
. The example is suggested by the form of the classical Lagrangian function in mechanics, where the first term is the kinetic energy and
is the potential energy.
As the statistical bundle
is non-trivial, the computation of the partial derivative of the Lagrangian with respect to the first variable requires some care. We want to compute the expression of the total derivative in a chart of the affine atlas defined in Equations (
1) and (
2).
Let
be a smooth curve in the statistical bundle. In the chart centered at
P, we have
with
being a smooth curve in
. Let us compute the velocity of variation of the Lagrangian
L along the curve
.
with
. It follows that
If we write
and
, then we have
where
is the fiber derivative of
L. As
and
, it follows from Equations (
16) and (
17) that
In the equation above, the first term on the RHS does not depend on
P because the LHS and the second term of the RHS do not depend on
P. Hence, we define the first partial derivative of the Lagrangian function to be
so that the derivative of
L along
becomes
In particular, if
, then
see Equation (
5).
Example 4 (Running Example 2).
With the Lagrangian of Equation (
15), we have
see Equations (9) and (11). The first partial derivative is
where we have used Equations (9) and (10) together with
.
We have found that
and also
Using the fiber derivative computed in the first part of the running example, we find
Notice that Equation (
12) shows that one of the terms in the equations above is
.
5. Action Integral
If
is a smooth curve in the exponential manifold, then the
action integral
is well defined. We consider the expression of
Q in the chart centered at
P,
.
Given
with
, for each
and
, we define the perturbed curve
We have
,
, and
whose expression in the chart centered at
P is
.
Let us consider the variation in
of the action integral. We apply Equation (
19) applied to the smooth curve in
given by
where
t is fixed. As
and
we obtain
If
is a critical curve of the action integral, then
; hence, for all
and
H, we have
This in turn implies that for each
and
, the Euler–Lagrange equation holds:
Example 5 (Running Example 3).
For the Lagrangian of Equation (
15), we can use Equation (20) in the form
with
. For the other term, we have
whose derivative is
Dropping the generic
H, the Euler–Lagrange equation becomes
that is,
The equation above has been derived using the exponential affine geometry of the statistical bundle and involves
. However, by using Equations (
5), (
6), and (
12), we find the equivalent form