Anisotropically Weighted and Nonholonomically Constrained Evolutions on Manifolds

Stefan Sommer

doi:10.3390/e18120425

Abstract

We present evolution equations for a family of paths that results from anisotropically weighting curve energies in non-linear statistics of manifold valued data. This situation arises when performing inference on data that have non-trivial covariance and are anisotropic distributed. The family can be interpreted as most probable paths for a driving semi-martingale that through stochastic development is mapped to the manifold. We discuss how the paths are projections of geodesics for a sub-Riemannian metric on the frame bundle of the manifold, and how the curvature of the underlying connection appears in the sub-Riemannian Hamilton–Jacobi equations. Evolution equations for both metric and cometric formulations of the sub-Riemannian metric are derived. We furthermore show how rank-deficient metrics can be mixed with an underlying Riemannian metric, and we relate the paths to geodesics and polynomials in Riemannian geometry. Examples from the family of paths are visualized on embedded surfaces, and we explore computational representations on finite dimensional landmark manifolds with geometry induced from right-invariant metrics on diffeomorphism groups.

Keywords:

sub-Riemannian geometry; geodesics; most probable paths; stochastic development; non-linear data analysis; statistics

1. Introduction

When manifold valued data have non-trivial covariance (i.e., when anisotropy asserts higher variance in some directions than others), non-zero curvature necessitates special care when generalizing Euclidean space normal distributions to manifold valued distributions: in the Euclidean situation, normal distributions can be seen as transition distributions of diffusion processes, but on the manifold, holonomy makes transport of covariance path-dependent in the presence of curvature, preventing a global notion of a spatially constant covariance matrix. To handle this, in the diffusion principal component analysis (PCA) framework [1], and with the class of anisotropic normal distributions on manifolds defined in [2,3], data on non-linear manifolds are modelled as being distributed according to transition distributions of anisotropic diffusion processes that are mapped from Euclidean space to the manifold by stochastic development (see [4]). The construction is connected to a non-bracket-generating sub-Riemannian metric on the bundle of linear frames of the manifold, the frame bundle, and the requirement that covariance stays covariantly constant gives a nonholonomically constrained system.

Velocity vectors and geodesic distances are conventionally used for estimation and statistics in Riemannian manifolds; for example, for estimation of the Frechét mean [5], for Principal Geodesic Analysis [6], and for tangent space statistics [7]. In contrast to this, anisotropy as modelled with anisotropic normal distributions makes a distance for a sub-Riemannian metric the natural vehicle for estimation and statistics. This metric naturally accounts for anisotropy in a similar way as the precision matrix weights the inner product in the negative log-likelihood of a Euclidean normal distribution. The connection between the weighted distance and statistics of manifold valued data was presented in [2], and the underlying sub-Riemannian and fiber-bundle geometry, together with properties of the generated densities, was further explored in [3]. The fundamental idea is to perform statistics on manifolds by maximum likelihood (ML) instead of parametric constructions that use, for example, approximating geodesic subspaces; by defining natural families of probability distributions (in this case using diffusion processes), ML parameter estimates give a coherent way to statistically model non-linear data. The anisotropically weighted distance and the resulting family of extremal paths arises in this situation when the diffusion processes have non-isotropic covariance (i.e., when the distribution is not generated from a standard Brownian motion).

In this paper, we focus on the family of most probable paths for the semi-martingales that drives the stochastic development, and in turn the manifold valued anisotropic stochastic processes. Such paths, as exemplified in Figure 1, extremize the anisotropically weighted action functional. We present derivations of evolution equations for the paths from different viewpoints, and we discuss the role of frames as representing either metrics or cometrics. In the derivation, we explicitly see the influence of the connection and its curvature. We then turn to the relation between the sub-Riemannian metric and the Sasaki–Mok metric on the frame bundle, and we develop a construction that allows the sub-Riemannian metric to be defined as a sum of a rank-deficient generator and an underlying Riemannian metric. Finally, we relate the paths to geodesics and polynomials in Riemannian geometry, and we explore computational representations on different manifolds including a specific case: the finite dimensional manifolds arising in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) [8] landmark matching problem. The paper ends with a discussion concerning statistical implications, open questions, and concluding remarks.

Figure 1. (a) A most probable path (MPP) for a driving Euclidean Brownian motion on an ellipsoid. The gray ellipsis over the starting point (red dot) indicates the covariance of the anisotropic diffusion. A frame

u_{t}

(black/gray vectors) representing the square root covariance is parallel transported along the curve, enabling the anisotropic weighting with the precision matrix in the action functional. With isotropic covariance, normal MPPs are Riemannian geodesics. In general situations, such as the displayed anisotropic case, the family of MPPs is much larger; (b) The corresponding anti-development in

R^{2}

(red line) of the MPP. Compare with the anti-development of a Riemannian geodesic with same initial velocity (blue dotted line). The frames

u_{t} \in GL (R^{2}, T_{x_{t}} M)

provide local frame coordinates for each time t.

Background

Generalizing common statistical tools for performing inference on Euclidean space data to manifold valued data has been the subject of extensive work (e.g., [9]). Perhaps most fundamental is the notion of Frechét or Karcher means [5,10], defined as minimizers of the square Riemannian distance. Generalizations of the Euclidean principal component analysis procedure to manifolds are particularly relevant for data exhibiting anisotropy. Approaches include principal geodesic analysis (PGA, [6]), geodesic PCA (GPCA, [11]), principal nested spheres (PNS, [12]), barycentric subspace analysis (BSA, [13]), and horizontal component analysis (HCA, [14]). Common to these constructions are explicit representations of approximating low-dimensional subspaces. The fundamental challenge here is that the notion of Euclidean linear subspace on which PCA relies has no direct analogue in non-linear spaces.

A different approach taken by diffusion PCA (DPCA, [1,2]) and probabilistic PGA [15] is to base the PCA problem on a maximum likelihood fit of normal distributions to data. In Euclidean space, this approach was first introduced with probabilistic PCA [16]. In DPCA, the process of stochastic development [4] is used to define a class of anisotropic distributions that generalizes the family of Euclidean space normal distributions to the manifold context. DPCA is then a simple maximum likelihood fit in this family of distributions mimicking the Euclidean probabilistic PCA. The approach transfers the geometric complexities of defining subspaces common in the approaches listed above to the problem of defining a geometrically natural notion of normal distributions.

In Euclidean space, squared distances

∥ x - x_{0} ∥^{2}

between observations x and the mean

x_{0}

are affinely related to the negative log-likelihood of a normal distribution

N (x_{0}, Id)

. This makes an ML fit of the mean such as performed in probabilistic PCA equivalent to minimizing squared distances. On a manifold, distances

d_{g} {(x, x_{0})}^{2}

coming from a Riemannian metric g are equivalent to tangent space distances

∥ {Log}_{x_{0}} {x ∥}^{2}

when mapping data from M to

T_{x_{0}} M

using the inverse exponential map

{Log}_{x_{0}}

. Assuming

{Log}_{x_{0}} x

are distributed according to a normal distribution in the linear space

T_{x_{0}} M

, this restores the equivalence with a maximum likelihood fit. Let

{e_{1}, \dots, e_{d}}

be the standard basis for

R^{d}

. If

u : R^{d} \to T_{x_{0}} M

is a linear invertible map with

u e_{1}, \dots, u e_{d}

orthonormal with respect to g, the normal distribution in

T_{x_{0}} M

can be defined as

u N (0, Id)

(see Figure 2).

Figure 2. (a) Normal distributions

u N (0, Id)

in the tangent space

T_{x_{0}} M

with covariance

u u^{T}

(blue ellipsis) can be mapped to the manifold by applying the exponential map

{Exp}_{x_{0}}

to sampled vectors

v \in T_{x_{0}} M

(red vectors). This effectively linearises the geometry around

x_{0}

; (b) The stochastic development map

φ_{u}

maps

R^{d}

valued paths

w_{t}

to M by transporting the covariance in each step (blue ellipses) giving a covariance

u_{t}

along the entire sample path. The approach does not linearise around a single point. Holonomy of the connection implies that the covariance “rotates” around closed loops—an effect which can be illustrated by continuing the transport along the loop created by the dashed path. The anisotropic metric

g_{F M}

weights step lengths by the transported covariance at each time t.

The map u can be represented as a point in the frame bundle

F M

of M. When the orthonormal requirement on u is relaxed so that

u N (0, Id)

is a normal distribution in

T_{x_{0}} M

with anisotropic covariance, the negative log-likelihood in

T_{x_{0}} M

is related to

{(u^{- 1} {Log}_{x_{0}} x)}^{T} (u^{- 1} {Log}_{x_{0}} x)

in the same way as the precision matrix

Σ^{- 1}

is related to the negative log-likelihood

{(x - x_{0})}^{T} Σ^{- 1} (x - x_{0})

in Euclidean space. The distance is thus weighted by the anisotropy of u, and u can be interpreted as a square root covariance matrix

Σ^{1 / 2}

.

However, the above approach does not specify how u changes when moving away from the base point

x_{0}

. The use of

{Log}_{x_{0}} x

effectively linearises the geometry around

x_{0}

, but a geometrically natural way to relate u at points nearby to

x_{0}

will be to parallel transport it, equivalently specifying that u when transported does not change as measured from the curved geometry. This constraint is nonholonomic, and it implies that any path from

x_{0}

to x carries with it a parallel transport of u lifting paths from M to paths in the frame bundle

F M

. It therefore becomes natural to equip

F M

with a form of metric that encodes the anisotropy represented by u. The result is the sub-Riemannian metric on

F M

defined below that weights infinitesimal movements on M using the parallel transport of the frame u. Optimal paths for this metric are sub-Riemannian geodesics giving the family of most probable paths for the driving process that this paper concerns. Figure 1 shows one such path for an anisotropic normal distribution with M an ellipsoid embedded in

R^{3}

.

2. Frame Bundles, Stochastic Development, and Anisotropic Diffusions

Let M be a finite dimensional manifold of dimension d with connection

C

, and let

x_{0}

be a fixed point in M. When a Riemannian metric is present, and

C

is its Levi–Civita connection, we denote the metric

g_{R}

. For a given interval

[0, T]

, we let

W (M)

denote the Wiener space of continuous paths in M starting at

x_{0}

. Similarly,

W (R^{d})

is the Wiener space of paths in

R^{d}

. We let

H (R^{d})

denote the subspace of

W (R^{d})

of finite energy paths.

Let now

u = (u_{1}, \dots, u_{d})

be a frame for

T_{x} M

,

x \in M

; i.e.,

u_{1}, \dots, u_{d}

is an ordered set of linearly independent vectors in

T_{x} M

with

span {u_{1}, \dots, u_{d}} = T_{x} M

. We can regard the frame as an isomorphism

u : R^{d} \to T_{x} M

with

u (e_{i}) = u_{i}

, where

e_{1}, \dots, e_{d}

denotes the standard basis in

R^{d}

. Stochastic development (e.g., [4]) provides an invertible map

φ_{u}

from

W (R^{d})

to

W (M)

. Through

φ_{u}

, Euclidean semi-martingales map to stochastic processes on M. When M is Riemannian and u orthonormal, the result is the Eells–Elworthy–Malliavin construction of Brownian motion [17]. We here outline the geometry behind development, stochastic development, the connection, and curvature, focusing in particular on frame bundle geometry.

2.1. The Frame Bundle

For each point

x \in M

, let

F_{x} M

be the set of frames for

T_{x} M

(i.e., the set of ordered bases for

T_{x} M

). The set

{F_{x} M}_{x \in M}

can be given a natural differential structure as a fiber bundle on M called the frame bundle

F M

. It can equivalently be defined as the principal bundle

GL (R^{d}, T M)

. We let the map

π : F M \to M

denote the canonical projection. The kernel of

π_{*} : T F M \to T M

is the sub-bundle of

T F M

that consists of vectors tangent to the fibers

π^{- 1} (x)

. It is denoted the vertical subspace

V F M

. We will often work in a local trivialization

u = (x, u_{1}, \dots, u_{d}) \in F M

, where

x = π (u) \in M

denotes the base point, and for each

i = 1, \dots, d

,

u_{i} \in T_{x} M

is the ith frame vector. For

v \in T_{x} M

and

u \in F M

with

π (u) = x

, the vector

u^{- 1} v \in R^{d}

expresses v in components in terms of the frame u. We will denote the vector

u^{- 1} v

frame coordinates of v.

For a differentiable curve

x_{t}

in M with

x = x_{0}

, a frame u for

T_{x_{0}} M

can be parallel transported along

x_{t}

by parallel transporting each vector in the frame, thus giving a path

u_{t} \in F M

. Such paths are called horizontal, and have zero acceleration in the sense

C ({\dot{u}}_{i, t}) = 0

. For each

x \in M

, their derivatives form a d-dimensional subspace of the

d + d^{2}

-dimensional tangent space

T_{u} F M

. This horizontal subspace

H F M

and the vertical subspace

V F M

together split the tangent bundle of

F M

(i.e.,

T F M = H F M \oplus V F M

). The split induces a map

π_{*} : H F M \to T M

, see Figure 3. For fixed

u \in F M

, the restriction

π_{*} |_{H_{u} F M} : H_{u} F M \to T_{x} M

is an isomorphism. Its inverse is called the horizontal lift and is denoted

h_{u}

in the following. Using

h_{u}

, horizontal vector fields

H_{e}

on

F M

are defined for vectors

e \in R^{d}

by

H_{e} (u) = h_{u} (u e)

. In particular, the standard basis

(e_{1}, \dots, e_{d})

on

R^{d}

gives d globally defined horizontal vector fields

H_{i} \in H F M

,

i = 1, \dots, d

by

H_{i} = H_{e_{i}}

. Intuitively, the fields

H_{i} (u)

model infinitesimal transformations in M of

x_{0}

in direction

u_{i} = u e_{i}

with corresponding infinitesimal parallel transport of the vectors

u_{1}, \dots, u_{d}

of the frame along the direction

u_{i}

. A horizontal lift of a differentiable curve

x_{t} \in M

is a curve in

F M

tangent to

H F M

that projects to

x_{t}

. Horizontal lifts are unique up to the choice of initial frame

u_{0}

.

Figure 3. Relations between the manifold, frame bundle, the horizontal distribution

H F M

, the vertical bundle

V F M

, a Riemannian metric

g_{R}

, and the sub-Riemannian metric

g_{F M}

, defined below. The connection

C

provides the splitting

T F M = H F M \oplus V F M

. The restrictions

π_{*} |_{H_{u} M}

are invertible maps

H_{u} M \to T_{π (u)} M

with inverse

h_{u}

, the horizontal lift. Correspondingly, the vertical bundle

V F M

is isomorphic to the trivial bundle

F M \times gl (n)

. The metric

g_{F M} : T^{*} F M \to T F M

has an image in the subspace

H F M

.

2.2. Development and Stochastic Development

Let

x_{t}

be a differentiable curve on M and

u_{t}

a horizontal lift. If

s_{t}

is a curve in

R^{d}

with components

s_{t}^{i}

such that

{\dot{x}}_{t} = H_{i} (u) s_{t}^{i}

,

x_{t}

is said to be a development of

s_{t}

. Correspondingly,

s_{t}

is the anti-development of

x_{t}

. For each t, the vector

s_{t}

contains frame coordinates of

{\dot{x}}_{t}

as defined above. Similarly, let

W_{t}

be an

R^{d}

valued Brownian motion so that sample paths

W_{t} (ω) \in W (R^{d})

. A solution to the stochastic differential equation

d U_{t} = \sum_{i = 1}^{d} H_{i} (U_{t}) \circ d W_{t}^{i}

in

F M

is called a stochastic development of

W_{t}

in

F M

. The solution projects to a stochastic development

X_{t} = π (U_{t})

in M. We call the process

W_{t}

in

R^{d}

, that through φ maps to

X_{t}

, the driving process of

X_{t}

. Let

φ_{u} : W (R^{d}) \to W (M)

be the map that for fixed u sends a path in

R^{d}

to its development on M. Its inverse

φ_{u}^{- 1}

is the anti-development in

R^{d}

of paths on M given u.

Equivalent to the fact that normal distributions

N (0, Σ)

in

R^{d}

can be obtained as the transition distributions of diffusion processes

Σ^{1 / 2} W_{t}

stopped at time

t = 1

, a general class of distributions on the manifold M can be defined by stochastic development of processes

W_{t}

, resulting in M-valued random variables

X = X_{1}

. This family of distributions on M introduced in [2] is denoted anisotropic normal distributions. The stochastic development by construction ensures that

U_{t}

is horizontal, and the frames are thus parallel transported along the stochastic displacements. The effect is that the frames stay covariantly constant, thus resembling the Euclidean situation where

Σ^{1 / 2}

is spatially constant and therefore does not change as

W_{t}

evolves. Thus, as further discussed in Section 3.2, the covariance is kept constant at each of the infinitesimal stochastic displacements. The existence of a smooth density for the target process

X_{t}

and small time asymptotics are discussed in [3].

Stochastic development gives a map

\int_{Diff} : F M \to Prob (M)

to the space of probability distributions on M. For each point

u \in F M

, the map sends a Brownian motion in

R^{d}

to a distribution

μ_{u}

by stochastic development of the process

U_{t}

in

F M

, starting at u and letting

μ_{u}

be the distribution of

X = π (U_{1})

. The pair

(x, u)

,

x = π (u)

is analogous to the parameters

(μ, Σ)

for a Euclidean normal distribution: the point

x \in M

represents the starting point of the diffusion, and the frame u represents a square root

Σ^{1 / 2}

of the covariance Σ. In the general situation where

μ_{u}

has smooth density, the construction can be used to fit the parameters u to data by maximum likelihood. As an example, diffusion PCA fits distributions obtained through

\int_{Diff}

by maximum likelihood to observed samples in M; i.e., it optimizes for the most likely parameters

u = (x, u_{1}, \dots, u_{d})

for the anisotropic diffusion process, giving a fit to the data of the manifold generalization of the Euclidean normal distribution.

2.3. Adapted Coordinates

For concrete expressions of the geometric constructions related to frame bundles, and for computational purposes, it is useful to apply coordinates that are adapted to the horizontal bundle

H F M

and the vertical bundle

V F M

together with their duals

H^{*} F M

and

V^{*} F M

. The notation below follows the notation used in, for example, [18]. Let

z = (u, ξ)

be a local trivialization of

T^{*} F M

, and let

(x^{i}, u_{α}^{i})

be coordinates on

F M

with

u_{α}^{i}

satisfying

u_{α} = u_{α}^{i} \partial_{x^{i}}

for each

α = 1, \dots, d

.

To find a basis that is adapted to the horizontal distribution, define the d linearly independent vector fields

D_{j} = \partial_{x^{j}} - Γ_{j}^{h_{γ}} \partial_{u_{γ}^{h}}

where

Γ_{j}^{h_{γ}} = {Γ^{h}}_{i j} u_{γ}^{i}

is the contraction of the Christoffel symbols

{Γ^{h}}_{i j}

for the connection

C

with

u_{α}^{i}

. We denote this adapted frame D. The vertical distribution is correspondingly spanned by

D_{j_{β}} = \partial_{u_{β}^{j}}

. The vectors

D^{h} = d x^{h},

and

D^{h_{γ}} = Γ_{j}^{h_{γ}} d x^{j} + d u_{γ}^{h}

constitutes a dual coframe

D^{*}

. The map

π_{*} : H F M \to T M

is in coordinates of the adapted frame

π_{*} (w^{j} D_{j}) = w^{j} \partial_{x^{j}}

. Correspondingly, the horizontal lift

h_{u}

is

h_{u} (w^{j} \partial_{x^{j}}) = w^{j} D_{j}

. The map

u : R^{d} \to T_{x} M

is given by the matrix

[u_{α}^{i}]

so that

u v = u_{α}^{i} v^{α} \partial_{x^{i}} = u_{α} v^{α}

.

Switching between standard coordinates and the adapted frame and coframes can be expressed in terms of the component matrices A below the frame and coframe induced by the coordinates

(x^{i}, u_{α}^{i})

and the adapted frame D and coframe

D^{*}

. We have

_{(\partial_{x^{i}}, \partial_{u_{α}^{i}})} A_{D} = [\begin{matrix} I & 0 \\ - Γ & I \end{matrix}] w i t h i n v e r s e_{D} A_{(\partial_{x^{i}}, \partial_{u_{α}^{i}})} = [\begin{matrix} I & 0 \\ Γ & I \end{matrix}]

writing Γ for the matrix

[Γ_{j}^{h_{γ}}]

. Similarly, the component matrices of the dual frame

D^{*}

are

_{{(\partial_{x^{i}}, \partial_{u_{α}^{i}})}^{*}} A_{D^{*}} = [\begin{matrix} I & Γ^{T} \\ 0 & I \end{matrix}] a n d_{D^{*}} A_{{(\partial_{x^{i}}, \partial_{u_{α}^{i}})}^{*}} = [\begin{matrix} I & - Γ^{T} \\ 0 & I \end{matrix}] .

2.4. Connection and Curvature

The

T M

valued connection

C : T M \times T M \to T M

lifts to a principal connection

T F M \times T F M \to V F M

on the principal bundle

F M

.

C

can then be identified with the

gl (n)

-valued connection form ω on

T F M

. The identification occurs by the isomorphism ψ between

F M \times gl (n)

and

V F M

given by

ψ (u, v) = \frac{d}{d t} {u exp (t v) |}_{t = 0}

(e.g., [19,20]).

The map ψ is equivariant with respect to the

GL (n)

action

g \mapsto u g^{- 1}

on

F M

. In order to explicitly see the connection between the usual covariant derivative

\nabla : Γ (T M) \times Γ (T M) \to Γ (T M)

on M determined by

C

and

C

regarded as a connection on the principal bundle

F M

, following [19], we let

s : M \to T M

be a local vector field on M; equivalently,

s \in Γ (T M)

is a local section of

T M

. s determines a map

s^{F M} : F M \to R^{d}

by

s^{F M} (u) = u^{- 1} s (π (u))

; i.e., it gives the coordinates of

s (x)

in the frame u at x. The pushforward

{(s^{F M})}_{*} : T F M \to R^{d}

has in its ith component the exterior derivative

d {(s^{F M})}^{i}

. Let now

w (x)

be a local section of

F M

. The composition

w \circ {(s^{F M})}_{*} \circ h_{w} : T M \to T M

is identical to the covariant derivative

\nabla_{\cdot} s : T M \to T M

. The construction is independent of the choice of w because of the

GL (n)

-equivariance of

s^{F M}

. The connection form ω can be expressed as the matrix

(s_{1}^{F M} \circ h_{w}, \dots, s_{d}^{F M} \circ h_{w})

when letting

s_{i}^{F M} (u) = e_{i}

.

The identification becomes particularly simple if the covariant derivative is taken along a curve

x_{t}

on which

w_{t}

is the horizontal lift. In this case, we can let

s_{t} = w_{t, i} s_{t}^{i}

. Then,

s^{F M} (w_{t}) = {(s_{t}^{1}, \dots, s_{t}^{d})}^{T}

, and

w_{t}^{- 1} \nabla_{{\dot{x}}_{t}} s = {(s^{F} M)}_{*} (h_{w_{t}} ({\dot{x}}_{t})) = \frac{d}{d t} {(s_{t}^{1}, \dots, s_{t}^{d})}^{T};

(1)

i.e., the covariant derivative takes the form of the standard derivative applied to the frame coordinates

s_{t}^{i}

.

The curvature tensor

R \in T_{1}^{3} (M)

gives the

gl (n)

-valued curvature form

Ω : T F M \times T F M \to gl (n)

on

T F M

by

Ω (v_{u}, w_{u}) = u^{- 1} R (π_{*} (v_{u}), π_{*} (w_{u})) u, v_{u}, w_{v} \in T F M .

Note that

Ω (v_{u}, w_{u}) = Ω (h_{u} (π_{*} (v_{u})), h_{u} (π_{*} (w_{u})))

, which we can use to write the curvature R as the

gl (n)

-valued map

R_{u} : T^{2} (T_{π (u)} M) \to gl (n)

,

(v, w) \mapsto Ω (h_{u} (π_{*} (v_{u})), h_{u} (π_{*} (w_{u})))

for fixed u. In coordinates, the curvature is

R_{i j k}^{s} = Γ_{i k}^{l} Γ_{j l}^{s} - Γ_{j k}^{l} Γ_{i l}^{s} + Γ_{i k; j}^{s} - Γ_{j k; i}^{s}

where

{Γ^{s}}_{i k; j} = \partial_{x^{j}} {Γ^{s}}_{i k}

.

Let

x_{t, s}

be a family of paths in M, and let

u_{t, s} \in π^{- 1} (x_{t, s})

be horizontal lifts of

x_{t, s}

for each fixed s. Write

{\dot{x}}_{t, s} = \partial_{t} x_{t, s}

and

{\dot{u}}_{t, s} = \partial_{t} u_{t, s}

. The s-derivative of

u_{t, s}

can be regarded a pushforward of the horizontal lift and is the curve in

T F M

\begin{matrix} \partial_{s} u_{t, s} & = ψ (u_{t, s}, ψ_{u_{0, s}}^{- 1} (C (\partial_{s} u_{0, s})) + \int_{0}^{s} Ω ({\dot{u}}_{r, s}, \partial_{s} u_{r, s}) d r) + h_{u_{t, s}} (\partial_{s} x_{t, s}) \\ = ψ (u_{t, s}, ψ_{0, s}^{- 1} (C (\partial_{s} u_{0, s})) + \int_{0}^{s} R_{u_{r, s}} ({\dot{x}}_{r, s}, \partial_{s} x_{r, s}) d r) + h_{u_{t, s}} (\partial_{s} x_{t, s}) . \end{matrix}

(2)

This follows from the structure equation

d ω = - ω \land ω + Ω

(e.g., [21]). Note that the curve depends on the vertical variation

C (\partial_{s} u_{0, s})

at only one point along the curve. The remaining terms depend on the horizontal variation or, equivalently,

\partial_{s} x_{t, s}

. The t-derivative of

\partial_{s} u_{t, s}

is the curve in

T T F M

satisfying

\begin{matrix} \partial_{s} h_{u_{t, s}} ({\dot{x}}_{t, s}) & = ψ (u_{t, s}, R_{u_{t, s}} ({\dot{x}}_{t, s}, \partial_{s} x_{t, s})) + \partial_{t} ψ (u_{t, s}, ψ_{0, s}^{- 1} (C (\partial_{s} u_{0, s}))) + \partial_{t} (h_{u_{t, s}} (\partial_{s} x_{t, s})) \\ = ψ (u_{t, s}, R_{u_{t, s}} ({\dot{x}}_{t, s}, \partial_{s} x_{t, s})) + \partial_{t} ψ (u_{t, s}, ψ_{0, s}^{- 1} (C (\partial_{s} u_{0, s}))) \\ + h_{u_{t, s}} (\partial_{t} \partial_{s} x_{t, s}) + (\partial_{t} h_{u_{t, s}}) (\partial_{s} x_{t, s}) . \end{matrix}

(3)

Here, the first and third term in the last expression are identified with elements of

T_{\partial_{s} u_{t, s}} T F M

by the natural mapping

T_{u_{t, s}} F M \to T_{\partial_{s} u_{t, s}} T F M

. When

C (\partial_{s} u_{0, s})

is zero, the relation reflects the property that the curvature arises when computing brackets between horizontal vector fields. Note that the first term of (3) has values in

V F M

, while the third term has values in

H F M

.

3. The Anisotropically Weighted Metric

For a Euclidean driftless diffusion process with spatially constant stochastic generator Σ, the log-probability of a sample path can formally be written

ln {\tilde{p}}_{Σ} (x_{t}) \propto - \int_{0}^{1} {∥ {\dot{x}}_{t} ∥}_{Σ}^{2} d t + c_{Σ}

(4)

with the norm

{∥ \cdot ∥}_{Σ}

given by the inner product

{⟨v, w⟩}_{Σ} = ⟨Σ^{- 1 / 2} v, Σ^{- 1 / 2} w⟩ = v Σ^{- 1} w

; i.e., the inner product weighted by the precision matrix

Σ^{- 1}

. Though only formal, as the sample paths are almost surely nowhere differentiable, the interpretation can be given a precise meaning by taking limits of piecewise linear curves [21]. Turning to the manifold situation with the processes mapped to M by stochastic development, the probability of observing a differentiable path can either be given a precise meaning in the manifold by taking limits of small tubes around the curve, or in

R^{d}

by considering infinitesimal tubes around the anti-development of the curves. With the former formulation, a scalar curvature correction term must be added to (4), giving the Onsager–Machlup function [22]. The latter formulation corresponds to defining a notion of path density for the driving

R^{d}

-valued process

W_{t}

. When M is Riemannian and Σ unitary, taking the maximum of (4) gives geodesics as most probable paths for the driving process.

Let now

u_{t}

be a path in

F M

, and choose a local trivialization

u_{t} = (x_{t}, u_{1, t}, \dots, u_{d, t})

such that the matrix

[u_{α, t}^{i}]

represents the square root covariance matrix

Σ^{1 / 2}

at

x_{t}

. Since

u_{t}

being a frame defines an invertible map

R^{d} \to T_{x_{t}} M

, the norm

{∥ \cdot ∥}_{Σ}

above has a direct analogue in the norm

{∥ \cdot ∥}_{u_{t}}

defined by the inner product

{⟨v, w⟩}_{u_{t}} = {⟨u_{t}^{- 1} v, u_{t}^{- 1} w⟩}_{R^{d}}

(5)

for vectors

v, w \in T_{x_{t}} M

. The transport of the frame along paths in effect defines a transport of inner product along sample paths: the paths carry with them the inner product weighted by the precision matrix, which in turn is a transport of the square root covariance

u_{0}

at

x_{0}

.

The inner product can equivalently be defined as a metric

g_{u} : T_{x}^{*} M \to T_{x} M

. Again using that u can be considered a map

R^{d} \to T_{x} M

,

g_{u}

is defined by

ξ \mapsto u {(ξ \circ u)}^{♯}

, where ♯ is the standard identification

{(R^{d})}^{*} \to R^{d}

. The sequence of mappings defining

g_{u}

is illustrated below:

\begin{matrix} T_{x}^{*} M & \to & {(R^{d})}^{*} & \to & R^{d} & \to & T_{x} M \\ ξ & \mapsto & ξ \circ u & \mapsto & {(ξ \circ u)}^{♯} & \mapsto & u {(ξ \circ u)}^{♯} . \end{matrix}

(6)

This definition uses the

R^{d}

inner product in the definition of ♯. Its inverse gives the cometric

g_{u}^{- 1} : T_{x} M \to T_{x}^{*} M

; i.e.,

v \mapsto {(u^{- 1} v)}^{♭} \circ u^{- 1}

.

\begin{matrix} T_{x} M & \to & R^{d} & \to & {(R^{d})}^{*} & \to & T_{x}^{*} M \\ v & \mapsto & u^{- 1} v & \mapsto & {(u^{- 1})}^{♭} & \mapsto & {(u^{- 1})}^{♭} \circ u^{- 1} . \end{matrix}

(7)

3.1. Sub-Riemannian Metric on the Horizontal Distribution

We now lift the path-dependent metric defined above to a sub-Riemannian metric on

H F M

. For any

w, \tilde{w} \in H_{u} F M

, the lift of (5) by

π_{*}

is the inner product

⟨w, \tilde{w}⟩ = {⟨u^{- 1} π_{*} w, u^{- 1} π_{*} \tilde{w}⟩}_{R^{d}} .

The inner product induces a sub-Riemannian metric

g_{F M} : T F M^{*} \to H F M \subset T F M

by

⟨w, g_{F M} (ξ)⟩ = (ξ | w), \forall w \in H_{u} F M

(8)

with

(ξ | w)

denoting the evaluation

ξ (w)

for the covector

ξ \in T^{*} F M

. The metric

g_{F M}

gives

F M

a non-bracket-generating sub-Riemannian structure [23] on

F M

(see also Figure 3). It is equivalent to the lift

ξ \mapsto h_{u} (g_{u} (ξ \circ h_{u})), ξ \in T_{u} F M

(9)

of the metric

g_{u}

above. In frame coordinates, the metric takes the form

u^{- 1} π_{*} g_{F M} (ξ) = (\begin{matrix} ξ (H_{1} (u)) \\ ⋮ \\ ξ (H_{d} (u)) \end{matrix}) .

(10)

In terms of the adapted coordinates for

T F M

described in Section 2.3, with

w = w^{j} D_{j}

and

\tilde{w} = {\tilde{w}}^{j} D_{j}

, we have

\begin{matrix} ⟨w, \tilde{w}⟩ & = ⟨w^{i} D_{i}, {\tilde{w}}^{j} D_{j}⟩ = ⟨u^{- 1} w^{i} \partial_{x^{i}}, u^{- 1} {\tilde{w}}^{j} \partial_{x^{j}}⟩ \\ = {⟨w^{i} u_{i}^{α}, {\tilde{w}}^{j} u_{j}^{α}⟩}_{R^{d}} = δ_{α β} w^{i} u_{i}^{α} {\tilde{w}}^{j} u_{j}^{β} = W_{i j} w^{i} {\tilde{w}}^{j} \end{matrix}

where

[u_{i}^{α}]

is the inverse of

[u_{α}^{i}]

and

W_{i j} = δ_{α β} u_{i}^{α} u_{j}^{β}

. Define now

W^{k l} = δ^{α β} u_{α}^{k} u_{β}^{l}

, so that

W^{i r} W_{r j} = δ_{j}^{i}

and

W_{i r} W^{r j} = δ_{i}^{j}

. We can then write the metric

g_{F M}

directly as

g_{F M} (ξ_{h} D^{h} + ξ_{h_{γ}} D^{h_{γ}}) = W^{i h} ξ_{h} D_{i},

(11)

because

⟨w, g_{F M} (ξ)⟩ = ⟨w, W^{j h} ξ_{h} D_{j}⟩ = W_{i j} w^{i} W^{j h} ξ_{h} = w^{i} ξ_{i} = ξ_{h} D^{h} (w^{j} D_{j}) = ξ (w)

. One clearly recognizes the dependence on the horizontal

H^{*} F M

part of

T^{*} F M

only, and the fact that

g_{F M}

has image in

H F M

. The sub-Riemannian energy of an almost everywhere horizontal path

u_{t}

is

l_{F M} (u_{t}) = \int g_{F M} ({\dot{u}}_{t}, {\dot{u}}_{t}) d t;

i.e., the line element is

d s^{2} = W_{i j} D^{i} D^{j}

in adapted coordinates. The corresponding distance is given by

d_{F M} (u_{1}, u_{2}) = inf {l_{F M} (γ) ∣ γ (0) = u_{1}, γ (1) = u_{2}} .

If we wish to express

g_{F M}

in canonical coordinates on

T^{*} F M

, we can switch between the adapted frame and the coordinates

(x^{i}, u_{α}^{i}, ξ^{i}, ξ_{α}^{i})

. From (11),

g_{F M}

has

D, D^{*}

components

_{D} g_{F M, D^{*}} = [\begin{matrix} W^{- 1} & 0 \\ 0 & 0 \end{matrix}] .

Therefore,

g_{F M}

has the following components in the coordinates

(x^{i}, u_{α}^{i}, ξ_{h}, ξ_{h_{γ}})

_{(\partial_{x^{i}}, \partial_{u_{α}^{i}})} g_{F M, {(\partial_{x}, \partial_{u_{α}^{i}})}^{*}} =_{(\partial_{x^{i}}, \partial_{u_{α}^{i}})} A_{D}_{D} g_{F M, D^{*}}_{D^{*}} A_{{(\partial_{x^{i}}, \partial_{u_{α}^{i}})}^{*}} = [\begin{matrix} W^{- 1} & - W^{- 1} Γ^{T} \\ - Γ W^{- 1} & Γ W^{- 1} Γ^{T} \end{matrix}]

or

g_{F M}^{i j} = W^{i j}

,

g_{F M}^{i j_{β}} = - W^{i h} Γ_{h}^{j_{β}}

,

g_{F M}^{i_{α} j} = - Γ_{h}^{i_{α}} W^{h j}

, and

g_{F M}^{i_{α} j_{β}} = Γ_{k}^{i_{α}} W^{k h} Γ_{h}^{j_{β}}

.

3.2. Covariance and Nonholonomicity

The metric

g_{F M}

encodes the anisotropic weighting given the frame u, thus up to an affine transformation measuring the energy of horizontal paths equivalently to the negative log-probability of sample paths of Euclidean anisotropic diffusions as formally given in (4). In addition, the requirement that paths must stay horizontal almost everywhere enforces that

C ({\dot{u}}_{t}) = 0

a.e., i.e., that no change of the covariance is measured by the connection. The intuitive effect is that covariance is covariantly constant as seen by the connection. Globally, curvature of

C

will imply that the covariance changes when transported along closed loops, and torsion will imply that the base point “slips” when travelling along covariantly closed loops on M. However, the zero acceleration requirement implies that the covariance is as close to spatially constant as possible with the given connection. This is enabled by the parallel transport of the frame, and it ensures that the model closely resembles the Euclidean case with spatially constant stochastic generator.

With non-zero curvature of

C

, the horizontal distribution is non-integrable (i.e., the brackets

[H_{i}, H_{j}]

are non-zero for some

i, j

). This prevents integrability of the horizontal distribution

H F M

in the sense of the Frobenius theorem. In this case, the horizontal constraint is nonholonomic similarly to nonholonomic constraints appearing in geometric mechanics (e.g., [24]). The requirement of covariantly constant covariance thus results in a nonholonomic system.

3.3. Riemannian Metrics on $F M$

If the horizontality constraint is relaxed, a related Riemannian metric on

F M

can be defined by pulling back a metric on

gl (n)

to each fiber using the isomorphism

ψ {(u, \cdot)}^{- 1} : V_{u} F M \to gl (n)

. Therefore, the metric on

H F M

can be extended to a Riemannian metric on

F M

. Such metrics incorporate the anisotropically weighted metric on

H F M

, however, allowing vertical variations and thus that covariances can change unrestricted.

When M is Riemannian, the metric

g_{F M}

is in addition related to the Sasaki–Mok metric on

F M

[18] that extends the Sasaki metric on

T M

. As for the above Riemannian metric on

F M

, the Sasaki–Mok metric allows paths in

F M

to have derivatives in the vertical space

V F M

. On

H F M

, the Riemannian metric

g_{R}

is here lifted to the metric

g_{S M} = (v_{u}, w_{u}) = g_{R} (π_{*} (v_{u}), π_{*} (w_{u}))

(i.e., the metric is not anisotropically weighted). The line element is in this case

d s^{2} = g_{i j} d x^{i} d x^{j} + X_{β α} g_{i j} D^{α_{i}} D^{β_{j}}

.

Geodesics for

g_{S M}

are lifts of Riemannian geodesics for

g_{R}

on M, in contrast to the sub-Riemannian normal geodesics for

g_{F M}

which we will characterize below. The family of curves arising as projections to M of normal geodesics for

g_{F M}

includes Riemannian geodesics for

g_{R}

(and thus projections of geodesics for

g_{S M}

), but the family is in general larger than geodesics for

g_{R}

.

4. Constrained Evolutions

Extremal paths for (5) can be interpreted as most probable paths for the driving process

W_{t}

when

u_{0}

defines an anisotropic diffusion. This is captured in the following definition [3]:

Definition 1.

A most probable path for the driving process (MPP) from

x = π (u_{0}) \in M

to

y \in M

is a smooth path

x_{t} : [0, 1] \to M

with

x_{0} = x

and

x_{1} = y

such that its anti-development

φ_{u_{0}}^{- 1} (x_{t})

is a most probable path for

W_{t}

; i.e.,

x_{t} \in {argmin}_{σ, σ_{0} = x, σ_{1} = y} \int_{0}^{1} - L_{R^{d}} (φ_{u_{0}}^{- 1} (σ_{t}), \frac{d}{d t} φ_{u_{0}}^{- 1} (σ_{t})) d t

with

L_{R^{d}}

being the Onsager–Machlup function for the process

W_{t}

on

R^{d}

[22].

The definition uses the one-to-one relation between

W (R^{d})

and

W (M)

provided by

φ_{u_{0}}

to characterize the paths using the

R^{d}

Onsager–Machlup function

L_{R^{d}}

. When M is Riemannian with metric

g_{R}

, the Onsager–Machlup function for a g-Brownian motion on M is

L (x_{t}, {\dot{x}}_{t}) = - \frac{1}{2} {∥ {\dot{x}}_{t} ∥}_{g_{R}}^{2} + \frac{1}{12} S_{g_{R}} (x_{t})

with

S_{g_{R}}

denoting the scalar curvature. This curvature term vanishes on

R^{d}

, and therefore

L_{R^{d}} (γ_{t}, {\dot{γ}}_{t}) = - \frac{1}{2} {∥ {\dot{γ}}_{t} ∥}^{2}

for a curve

γ_{t} \in R^{d}

.

By pulling

x_{t} \in M

back to

R^{d}

using

φ_{u_{0}}^{- 1}

, the construction removes the

\frac{1}{12} S_{g_{R}} (x_{t})

scalar curvature correction term present in the non-Euclidean Onsager–Machlup function. It thereby provides a relation between geodesic energy and most probable paths for the driving process. This is contained in the following characterization of most probable paths for the driving process as extremal paths of the sub-Riemannian distance [3] that follows from the Euclidean space Onsager–Machlup theorem [22].

Theorem 1

([3]) Let

Q (u_{0})

denote the principal sub-bundle of

F M

of points

z \in F M

reachable from

u_{0} \in F M

by horizontal paths. Suppose the Hörmander condition is satisfied on

Q (u_{0})

, and that

Q (u_{0})

has compact fibers. Then, most probable paths from

x_{0}

to

y \in M

for the driving process of

X_{t}

exist, and they are projections of sub-Riemannian geodesics in

F M

minimizing the sub-Riemannian distance from

u_{0}

to

π^{- 1} (y)

.

Below, we will derive evolution equations for the set of such extremal paths that correspond to normal sub-Riemannian geodesics.

4.1. Normal Geodesics for $g_{F M}$

Connected to the metric

g_{F M}

is the Hamiltonian

H (z) = \frac{1}{2} (z | g_{F M} (z))

(12)

on the symplectic space

T^{*} F M

. Letting

\hat{π}

denote the projection on the bundle

T^{*} F M \to F M

, (8) gives

H (z) = \frac{1}{2} ⟨g_{F M} (z) | g_{F M} (z)⟩ = \frac{1}{2} {∥ z \circ h_{\hat{π} (z)} \circ \hat{π} (z) ∥}_{{(R^{d})}^{*}}^{2} = \frac{1}{2} \sum_{i = 1}^{d} ξ {(H_{i} (u))}^{2} .

Normal geodesics in sub-Riemannian manifolds satisfy the Hamilton–Jacobi equations [23] with Hamiltonian flow

{\dot{z}}_{t} = X_{H} = Ω^{#} d H (z)

(13)

where Ω here is the canonical symplectic form on

T^{*} F M

(e.g., [25]). We denote (13) the MPP equations, and we let projections

x_{t} = π_{T^{*} F M} (z_{t})

of minimizing curves satisfying (13) be denoted normal MPPs. The system (13) has

2 (d + d^{2})

degrees of freedom, in contrast to the usual

2 d

degrees of freedom for the classical geodesic equation. Of these,

d^{2}

describes the current frame at time t, while the remaining

d^{2}

allows the curve to “twist” while still being horizontal. We will see this effect visualized in Section 6.

In a local canonical trivialization

z = (u, ξ)

, (13) gives the Hamilton–Jacobi equations

\begin{matrix} \dot{u} & = \partial_{ξ} H (u, ξ) = g_{F M} (u, ξ) = h_{u} (u {(ξ (H_{1} (u)), \dots, ξ (H_{d} (u)))}^{T}) \\ \dot{ξ} & = - \partial_{u} H (u, ξ) = - \partial_{u} \frac{1}{2} {∥ ξ \circ h_{u} \circ u ∥}_{{(R^{d})}^{*}}^{2} = - \partial_{u} \frac{1}{2} \sum_{i = 1}^{d} ξ {(H_{i} (u))}^{2} . \end{matrix}

(14)

Using (3), we have for the second equation

\begin{matrix} \dot{ξ} & = - \sum_{i = 1}^{d} ξ (H_{i} (u)) ξ (\partial_{u} h_{u} (u e_{i})) \\ = - \sum_{i = 1}^{d} ξ (H_{i} (u)) ξ (ψ (u, R_{u} (u e_{i}, π_{*} (\partial_{u}))) + \partial_{h_{u} (u e_{i})} ψ (u, ψ^{- 1} (C (\partial_{u}))) + \partial_{h_{u} (u e_{i})} h_{u} (π_{*} (\partial_{u}))) \\ = - ξ (ψ (u, R_{u} (π_{*} (\dot{u}), π_{*} (\partial_{u}))) + \partial_{\dot{u}} ψ (u, ψ^{- 1} (C (\partial_{u}))) + \partial_{\dot{u}} h_{u} (π_{*} (\partial_{u}))) . \end{matrix}

(15)

Here

\partial_{\dot{u}}

denotes u-derivative in the direction

\dot{u}

, equivalently

\partial_{\dot{u}} h_{u} (v) = \partial_{t} (h_{u}) (v)

. While the first equation of (14) involves only the horizontal part of ξ, the second equation couples the vertical part of ξ through the evaluation of ξ on the term

ψ (u, R_{u} (π_{*} (\dot{u}), π_{*} (\partial_{u}))

. If the connection is curvature-free, which in non-flat cases implies that it carries torsion, this vertical term vanishes. Conversely, when M is Riemannian,

C

the

g_{R}

Levi–Civita connection, and

u_{0}

is

g_{R}

orthonormal,

g_{F M} (h_{u} (v), h_{u} (w)) = g_{R} (v, w)

for all

v, w \in T_{π (u_{t})} M

. In this case, a normal MPP

π (u_{t})

will be a Riemannian

g_{R}

geodesic.

4.2. Evolution in Coordinates

In coordinates

u = (x^{i}, u_{α}^{i}, ξ_{i}, ξ_{i_{α}})

for

T^{*} F M

, we can equivalently write

\begin{matrix} {\dot{x}}^{i} = g^{i j} ξ_{j} + g^{i j_{β}} ξ_{j_{β}} = W^{i j} ξ_{j} - W^{i h} Γ_{h}^{j_{β}} ξ_{j_{β}} \\ {\dot{X}}_{α}^{i} = g^{i_{α} j} ξ_{j} + g^{i_{α} j_{β}} ξ_{j_{β}} = - Γ_{h}^{i_{α}} W^{h j} ξ_{j} + Γ_{k}^{i_{α}} W^{k h} Γ_{h}^{j_{β}} ξ_{j_{β}} \\ {\dot{ξ}}_{i} = - \frac{1}{2} (\partial_{y^{i}} g_{y}^{h k} ξ_{h} ξ_{k} + \partial_{y^{i}} g_{y}^{h k_{δ}} ξ_{h} ξ_{k_{δ}} + \partial_{y^{i}} g_{y}^{h_{γ} k} ξ_{h_{γ}} ξ_{k} + \partial_{y^{i}} g_{y}^{h_{γ} k_{δ}} ξ_{h_{γ}} ξ_{k_{δ}}) \\ {\dot{ξ}}_{i_{α}} = - \frac{1}{2} (\partial_{y^{i_{α}}} g_{y}^{h k} ξ_{h} ξ_{k} + \partial_{y^{i_{α}}} g_{y}^{h k_{δ}} ξ_{h} ξ_{k_{δ}} + \partial_{y^{i_{α}}} g_{y}^{h_{γ} k} ξ_{h_{γ}} ξ_{k} + \partial_{y^{i_{α}}} g_{y}^{h_{γ} k_{δ}} ξ_{h_{γ}} ξ_{k_{δ}}) \end{matrix}

with

Γ_{k, i}^{h_{γ}}

for

\partial_{y^{i}} Γ_{k}^{h_{γ}}

, and where

\begin{matrix} \partial_{y^{l}} g^{i j} = 0, \partial_{y^{l}} g^{i j_{β}} = - W^{i h} Γ_{h, l}^{j_{β}}, \partial_{y^{l}} g^{i_{α} j} = - Γ_{h, l}^{i_{α}} W^{h j}, \partial_{y^{l}} g^{i_{α} j_{β}} = Γ_{k, l}^{i_{α}} W^{k h} Γ_{h}^{j_{β}} + Γ_{k}^{i_{α}} W^{k h} Γ_{h, l}^{j_{β}}, \end{matrix}

\begin{matrix} \partial_{y^{l_{ζ}}} g^{i j} = {W^{i j}}_{, l ζ}, \partial_{y^{l_{ζ}}} g^{i j_{β}} = - {W^{i h}}_{, l ζ} Γ_{h}^{j_{β}} - W^{i h} Γ_{h, l_{ζ}}^{j_{β}}, \partial_{y^{l_{ζ}}} g^{i_{α} j} = - Γ_{h, l_{ζ}}^{i_{α}} W^{h j} - Γ_{h}^{i_{α}} {W^{h j}}_{, l ζ}, \\ \partial_{y^{l_{ζ}}} g^{i_{α} j_{β}} = Γ_{k, l_{ζ}}^{i_{α}} W^{k h} Γ_{h}^{j_{β}} + Γ_{k}^{i_{α}} {W^{k h}}_{, l ζ} Γ_{h}^{j_{β}} + Γ_{k}^{i_{α}} W^{k h} Γ_{h, l_{ζ}}^{j_{β}}, \\ Γ_{h, l_{ζ}}^{i_{α}} = \partial_{y^{l_{ζ}}} (Γ_{h k}^{i} u_{α}^{k}) = δ^{ζ α} Γ_{h l}^{i}, {W^{i j}}_{, l ζ} = δ^{i l} u_{ζ}^{j} + δ^{j l} u_{ζ}^{i} . \end{matrix}

Combining these expressions, we obtain

\begin{matrix} {\dot{x}}^{i} = W^{i j} ξ_{j} - W^{i h} Γ_{h}^{j_{β}} ξ_{j_{β}}, {\dot{X}}_{α}^{i} = - Γ_{h}^{i_{α}} W^{h j} ξ_{j} + Γ_{k}^{i_{α}} W^{k h} Γ_{h}^{j_{β}} ξ_{j_{β}} \\ {\dot{ξ}}_{i} = W^{h l} Γ_{l, i}^{k_{δ}} ξ_{h} ξ_{k_{δ}} - \frac{1}{2} (Γ_{k, i}^{h_{γ}} W^{k h} Γ_{h}^{k_{δ}} + Γ_{k}^{h_{γ}} W^{k h} Γ_{h, i}^{k_{δ}}) ξ_{h_{γ}} ξ_{k_{δ}} \\ {\dot{ξ}}_{i_{α}} = Γ_{k, i_{α}}^{h_{δ}} W^{k h} Γ_{h}^{k_{δ}} ξ_{h_{γ}} ξ_{k_{δ}} - ({W^{h l}}_{, i α} Γ_{l}^{k_{δ}} + W^{h l} Γ_{l, i_{α}}^{k_{δ}}) ξ_{h} ξ_{k_{δ}} - \frac{1}{2} ({W^{h k}}_{, i α} ξ_{h} ξ_{k} + Γ_{k}^{h_{δ}} {W^{k h}}_{, i α} Γ_{h}^{k_{δ}} ξ_{h_{γ}} ξ_{k_{δ}}) . \end{matrix}

4.3. Acceleration and Polynomials for $C$

We can identify the covariant acceleration

\nabla_{{\dot{x}}_{t}} {\dot{x}}_{t}

of curves satisfying the MPP equations, and hence normal MPPs through their frame coordinates. Let

(u_{t}, ξ_{t})

satisfy (13). Then,

u_{t}

is a horizontal lift of

x_{t} = π (u_{t})

and hence by (1), (3), (10), and (15),

\begin{matrix} u_{t}^{- 1} \nabla_{{\dot{x}}_{t}} {\dot{x}}_{t} & = \frac{d}{d t} (\begin{matrix} ξ (h_{u_{t}} (u_{t} e_{1})) \\ ⋮ \\ ξ (h_{u_{t}} (u_{t} e_{d})) \end{matrix}) = (\begin{matrix} \dot{ξ} (h_{u_{t}} (u_{t} e_{1})) \\ ⋮ \\ \dot{ξ} (h_{u_{t}} (u_{t} e_{d})) \end{matrix}) + (\begin{matrix} ξ (\partial_{t} h_{u_{t}} (u_{t} e_{1})) \\ ⋮ \\ ξ (\partial_{t} h_{u_{t}} (u_{t} e_{d})) \end{matrix}) \\ = - (\begin{matrix} ξ (\partial_{h_{u_{t}} (u_{t} e_{1})} h_{u_{t}} (π_{*} ({\dot{u}}_{t})) \\ ⋮ \\ ξ (\partial_{h_{u_{t}} (u_{t} e_{d})} h_{u_{t}} (π_{*} ({\dot{u}}_{t})) \end{matrix}) + (\begin{matrix} ξ (\partial_{h_{u_{t}} (π_{*} ({\dot{u}}_{t}))} h_{u_{t}} (u_{t} e_{1})) \\ ⋮ \\ ξ (\partial_{h_{u_{t}} (π_{*} ({\dot{u}}_{t}))} h_{u_{t}} (u_{t} e_{d})) \end{matrix}) \\ = (\begin{matrix} ξ (ψ (u_{t}, R_{u_{t}} (u_{t} e_{1}, π_{*} ({\dot{u}}_{t})))) \\ ⋮ \\ ξ (ψ (u_{t}, R_{u_{t}} (u_{t} e_{d}, π_{*} ({\dot{u}}_{t})))) \end{matrix}) . \end{matrix}

(16)

The fact that the covariant derivative vanishes for classical geodesic leads to a definition of higher-order polynomials through the covariant derivative by requiring

{(\nabla_{{\dot{x}}_{t}})}^{k} {\dot{x}}_{t} = 0

for a kth order polynomial (e.g., [26,27]). As discussed above, compared to classical geodesics, curves satisfying the MPP equations have extra

d^{2}

degrees of freedom, allowing the curves to twist and deviate from being geodesic with respect to

C

while still satisfying the horizontality constraint on

F M

. This makes it natural to ask if normal MPPs relate to polynomials defined using

C

. For curves satisfying the MPP equations, using (16) and (15), we have

\begin{matrix} u_{t}^{- 1} {(\nabla_{{\dot{x}}_{t}})}^{2} {\dot{x}}_{t} & = \frac{d}{d t} (\begin{matrix} ξ (ψ (u_{t}, R_{u_{t}} (u_{t} e_{1}, π_{*} ({\dot{u}}_{t})))) \\ ⋮ \\ ξ (ψ (u_{t}, R_{u_{t}} (u_{t} e_{d}, π_{*} ({\dot{u}}_{t})))) \end{matrix}) = (\begin{matrix} ξ (ψ (u_{t}, \frac{d}{d t} R_{u_{t}} (u_{t} e_{1}, π_{*} ({\dot{u}}_{t})))) \\ ⋮ \\ ξ (ψ (u_{t}, \frac{d}{d t} R_{u_{t}} (u_{t} e_{d}, π_{*} ({\dot{u}}_{t})))) \end{matrix}) . \end{matrix}

Thus, in general, normal MPPs are not second order polynomials in the sense

{(\nabla_{{\dot{x}}_{t}})}^{2} {\dot{x}}_{t} = 0

unless the curvature

R_{u_{t}} (u_{t} e_{i}, π_{*} ({\dot{u}}_{t}))

is constant in t.

For comparison, in the Riemannian case, a variational formulation placing a cost on covariant acceleration [28,29] leads to cubic splines

{(\nabla_{{\dot{x}}_{t}})}^{2} {\dot{x}}_{t} = - R (\nabla_{{\dot{x}}_{t}} {\dot{x}}_{t}, x_{t},) {\dot{x}}_{t} .

In (16), the curvature terms appear in the covariant acceleration for normal MPPs, while cubic splines leads to the curvature term appearing in the third order derivative.

5. Cometric Formulation and Low-Rank Generator

We now investigate a cometric

g_{F^{k} M} + λ g_{R}

, where

g_{R}

is Riemannian,

g_{F^{k} M}

is a rank k positive semi-definite inner product arising from k linearly independent tangent vectors, and

λ > 0

a weight. We assume that

g_{F^{k} M}

is chosen so that

g_{F^{k} M} + λ g_{R}

is invertible, even though

g_{F^{k} M}

is rank-deficient. The situation corresponds to extracting the first k eigenvectors in Euclidean space PCA. If the eigenvectors are estimated statistically from observed data, this allows the estimation to be restricted to only the first k eigenvectors. In addition, an important practical implication of the construction is that a numerical implementation need not transport a full

d \times d

matrix for the frame, but a potentially much lower dimensional

d \times k

matrix. This point is essential when dealing with high-dimensional data, examples of which are landmark manifolds as discussed in Section 6.

When using the frame bundle to model covariances, the sum formulation is natural to express as a cometric compared to a metric because, with the cometric formulation,

g_{F^{k} M} + λ g_{R}

represents a sum of covariance matrices instead of a sum of precision matrices. Thus,

g_{F^{k} M} + λ g_{R}

can be intuitively thought of as adding isotropic noise of variance λ to the covariance represented by

g_{F^{k} M}

.

To pursue this, let

F^{k} M

denote the bundle of rank k linear maps

R^{k} \to T_{x} M

. We define a cometric by

⟨ξ, \tilde{ξ}⟩ = δ^{α β} (ξ | h_{u} (u_{α})) (\tilde{ξ} | h_{u} (u_{β})) + λ {⟨ξ, \tilde{ξ}⟩}_{g_{R}}

for

ξ, \tilde{ξ} \in T_{u}^{*} F^{k} M

. The sum over

α, β

is for

α, β = 1, \dots, k

. The first term is equivalent to the lift (9) of the cometric

⟨ξ, \tilde{ξ}⟩ = (ξ | g_{u} (\hat{ξ}))

given

u : R^{k} \to T_{x} M

. Note that in the definition (6) of

g_{u}

, the map u is not inverted; thus, the definition of the metric immediately carries over to the rank-deficient case.

Let

(x^{i}, u_{α}^{i})

,

α = 1, \dots, k

be a coordinate system on

F^{k} M

. The vertical distribution is in this case spanned by the

d k

vector fields

D_{j_{β}} = \partial_{u_{β}^{j}}

. Except for index sums being over k instead of d terms, the situation is thus similar to the full-rank case. Note that

(ξ | π_{*}^{- 1} w) = (ξ | w^{j} D_{j}) = w^{i} ξ_{i}

. The cometric in coordinates is

⟨ξ, \tilde{ξ}⟩ = δ^{α β} u_{α}^{i} ξ_{i} u_{β}^{j} {\tilde{ξ}}_{j} + λ g_{R}^{i j} ξ_{i} {\tilde{ξ}}_{j} = ξ_{i} (δ^{α β} u_{α}^{i} u_{β}^{j} + λ g_{R}^{i j}) {\tilde{ξ}}_{j} = ξ_{i} W^{i j} {\tilde{ξ}}_{j}

with

W^{i j} = δ^{α β} u_{α}^{i} u_{β}^{j} + λ g_{R}^{i j}

. We can then write the corresponding sub-Riemannian metric

g_{F^{k} M}

in terms of the adapted frame D

g_{F^{k} M} (ξ_{h} D^{h} + ξ_{h_{γ}} D^{h_{γ}}) = W^{i h} ξ_{h} D_{i}

(17)

because

(ξ | g_{F^{k} M} (\tilde{ξ})) = ⟨ξ, \tilde{ξ}⟩ = ξ_{i} W^{i j} {\tilde{ξ}}_{j}

. That is, the situation is analogous to (11), except the term

λ g_{R}^{i j}

is added to

W^{i j}

.

The geodesic system is again given by the Hamilton–Jacobi equations. As in the full-rank case, the system is specified by the derivatives of

g_{F^{k} M}

:

\begin{matrix} \partial_{y^{l}} g_{F^{k} M}^{i j} = {W^{i j}}_{, l}, \partial_{y^{l}} g_{F^{k} M}^{i j_{β}} = - {W^{i h}}_{, l} Γ_{h}^{j_{β}} - W^{i h} Γ_{h, l}^{j_{β}}, \partial_{y^{l}} g_{F^{k} M}^{i_{α} j} = - Γ_{h, l}^{i_{α}} W^{h j} - Γ_{h}^{i_{α}} {W^{h j}}_{, l}, \\ \partial_{y^{l}} g_{F^{k} M}^{i_{α} j_{β}} = Γ_{k, l}^{i_{α}} W^{k h} Γ_{h}^{j_{β}} + Γ_{k}^{i_{α}} {W^{k h}}_{, l} Γ_{h}^{j_{β}} + Γ_{k}^{i_{α}} W^{k h} Γ_{h, l}^{j_{β}}, \\ \partial_{y^{l_{ζ}}} g_{F^{k} M}^{i j} = {W^{i j}}_{, l ζ}, \partial_{y^{l_{ζ}}} g_{F^{k} M}^{i j_{β}} = - {W^{i h}}_{, l ζ} Γ_{h}^{j_{β}} - W^{i h} Γ_{h, l_{ζ}}^{j_{β}}, \partial_{y^{l_{ζ}}} g_{F^{k} M}^{i_{α} j} = - Γ_{h}^{i_{α}} {W^{h j}}_{, l ζ} - Γ_{h, l_{ζ}}^{i_{α}} W^{h j}, \\ \partial_{y^{l_{ζ}}} g_{F^{k} M}^{i_{α} j_{β}} = Γ_{k, l_{ζ}}^{i_{α}} W^{k h} Γ_{h}^{j_{β}} + Γ_{k}^{i_{α}} W_{, l_{ζ}}^{k h} Γ_{h}^{j_{β}} + Γ_{k}^{i_{α}} W^{k h} Γ_{h, l_{ζ}}^{j_{β}}, \\ Γ_{h, l_{ζ}}^{i_{α}} = \partial_{y^{l_{ζ}}} (Γ_{h k}^{i} u_{α}^{k}) = δ^{ζ α} Γ_{h l}^{i}, {W^{i j}}_{, l} = λ g_{R}^{i j}, l, {W^{i j}}_{, l ζ} = δ^{i l} u_{ζ}^{j} + δ^{j l} u_{ζ}^{i} . \end{matrix}

Note that the introduction of the Riemannian metric

g_{R}

implies that

W^{i j}

are now dependent on the manifold coordinates

x^{i}

.

6. Numerical Experiments

We aim at visualizing most probable paths for the driving process and projections of curves satisfying the MPP Equation (13) in two cases: On 2D surfaces embedded in

R^{3}

and on finite dimensional landmark manifolds that arise from equipping a subset of the diffeomorphism group with a right-invariant metric and letting the action descend to the landmarks by a left action. The surface examples are implemented in Python using the Theano [30] framework for symbolic operations, automatic differentiation, and numerical evaluation. The landmark equations are detailed below and implemented in Numpy using Numpy’s standard ODE integrators. The code for running the experiments is available at http://bitbucket.com/stefansommer/mpps/.

6.1. Embedded Surfaces

We visualize normal MPPs and projections of curves satisfying the MPP Equation (13) on surfaces embedded in

R^{3}

in three cases: The sphere

S^{2}

, on an ellipsoid, and on a hyperbolic surface. The surfaces are chosen in order to have both positive and negative curvature, and to have varying degree of symmetry. In all cases, an open subset of the surfaces are represented in a single chart by a map

F : R^{2} \to R^{3}

. For the sphere and ellipsoid, this gives a representation of the surface, except for the south pole. The metric and Christoffel symbols are calculated using the symbolic differentiation features of Theano. The integration are performed by a simple Euler integrator.

Figure 4, Figure 5, Figure 6 show families of curves satisfying the MPP equations in three cases: (1) With fixed starting point

x_{0} \in M

and initial velocity

{\dot{x}}_{0} \in T M

but varying anisotropy represented by changing frame u in the fiber above

x_{0}

; (2) minimizing normal MPPs with fixed starting point and endpoint

x_{0}, x_{1} \in M

but changing frame u above

x_{0}

; (3) fixed starting point

x_{0} \in M

and frame u but varying

V^{*} F M

vertical part of the initial momentum

ξ_{0} \in T^{*} F M

. The first and second cases thus show the effect of varying anisotropy, while the third case illustrates the effect of the “twist” that the

d^{2}

degrees in the vertical momentum allows. Note the displayed anti-developed curves in

R^{2}

that for classical

C

geodesics would always be straight lines.

Figure 4. Curves satisfying the MPP equations (top row) and corresponding anti-development (bottom row) on three surfaces embedded in

R^{3}

: (a) An ellipsoid; (b) a sphere; (c) a hyperbolic surface. The family of curves is generated by rotating by

π / 2

radians the anisotropic covariance represented in the initial frame

u_{0}

and displayed in the gray ellipse.

Figure 5. Minimizing normal MPPs between two fixed points (red/cyan). From isotropic covariance (top row, (a)) to anisotropic (top row, (c)) on

S^{2}

. Compare with minimizing Riemannian geodesic (black curve). The MPP travels longer in the directions of high variance. Families of curves (middle row, (d–f)) and corresponding anti-development (bottom row, (g–i)) on the three surfaces in Figure 4. The family of curves is generated by rotating the covariance matrix as in Figure 4. Notice how the varying anisotropy affects the resulting minimizing curves, and how the anti-developed curves end at different points in

R^{2}

.

Figure 6. (a–l) With the setup of Figure 4 and Figure 5, generated families of curves by varying the vertical

V^{*} F M

part of the initial momentum

ξ_{0} \in T^{*} F M

but keeping the base point and frame

u_{0}

fixed. The vertical part allows varying degree of “twisting” of the curve.

6.2. LDDMM Landmark Equations

We here give a example of the MPP equations using the finite dimensional landmark manifolds that arise from right invariant metrics on subsets of the diffeomorphism group in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework [8]. The LDDMM metric can be conveniently expressed as a cometric, and, using a rank-deficient inner product

g_{F^{k} M}

as discussed in Section 5, we can obtain a reduction of the system of equations to

2 (2 N + 2 N k)

compared to

2 (2 N + {(2 N)}^{2})

with N landmarks in

R^{2}

.

Let

{p_{1}, \dots, p_{N}}

be landmarks in a subset

Ω \subset R^{d}

. The diffeomorphism group

Diff (Ω)

acts on the left on landmarks with the action

φ . {p_{1}, \dots, p_{N}} = {φ (p_{1}), \dots, φ (p_{N})}

. In LDDMM, a Hilbert space structure is imposed on a linear subspace V of

L^{2} (Ω, R^{d})

using a self-adjoint operator

L : V \to V^{*} \subset L^{2} (Ω, R^{d})

and defining the inner product

{⟨\cdot, \cdot⟩}_{V}

by

{⟨v, w⟩}_{V} = {⟨L v, w⟩}_{L^{2}} .

Under sufficient conditions on L, V is reproducing and admits a kernel K inverse to L. K is a Green’s kernel when L is a differential operator, or K can be a Gaussian kernel. The Hilbert structure on V gives a Riemannian metric on a subset

G_{V} \subset Diff (Ω)

by setting

{∥ v ∥}_{φ}^{2} = {∥ v \circ φ^{- 1} ∥}_{V}^{2}

; i.e., regarding

{⟨\cdot, \cdot⟩}_{V}

an inner product on

T_{Id} G_{V}

and extending the metric to

G_{V}

by right-invariance. This Riemannian metric descends to a Riemannian metric on the landmark space.

Let M be the manifold

M = {(p_{1}^{1}, \dots, p_{1}^{d}, \dots, p_{N}^{1}, \dots, p_{N}^{d}) | (p_{i}^{1}, \dots, p_{i}^{d}) \in R^{d}}

. The LDDMM metric on the landmark manifold M is directly related to the kernel K when written as a cometric

g_{p} (ξ, η) = \sum_{i, j = 1}^{N} ξ^{i} K (p_{i}, p_{j}) η^{j}

. Letting

i^{k}

denote the index of the kth component of the ith landmark, the cometric is in coordinates

g_{p}^{i^{k} j^{l}} = K {(p_{i}, p_{j})}_{k}^{l}

. The Christoffel symbols can be written in terms of derivatives of the cometric

g^{i j}

[31] (recall that

δ_{j}^{i} = g^{i k} g_{k j} = g_{j k} g^{k i}

)

{Γ^{k}}_{i j} = \frac{1}{2} g_{i r} (g^{k l} {g^{r s}}_{, l} - g^{s l} {g^{r k}}_{, l} - g^{r l} {g^{k s}}_{, l}) g_{s j} .

(18)

This relation comes from the fact that

g_{j m, k} = - g_{j r} {g^{r s}}_{, k} g_{s m}

gives the derivative of the metric. The derivatives of the cometric is simply

{g^{i^{k} j^{l}}}_{, r^{q}} = (δ_{r}^{i} + δ_{r}^{j}) \partial_{p_{r}^{q}} K {(p_{i}, p_{j})}_{k}^{l}

. Using (18), derivatives of the Christoffel symbols can be computed

\begin{matrix} {Γ^{k}}_{i j, ξ} & = \frac{1}{2} g_{i r, ξ} (g^{k l} {g^{r s}}_{, l} - g^{s l} {g^{r k}}_{, l} - g^{r l} {g^{k s}}_{, l}) g_{s j} + \frac{1}{2} g_{i r} (g^{k l} {g^{r s}}_{, l} - g^{s l} {g^{r k}}_{, l} - g^{r l} {g^{k s}}_{, l}) g_{s j, ξ} \\ + \frac{1}{2} g_{i r} ({g^{k l}}_{ξ} {g^{r s}}_{, l} + g^{k l} {g^{r s}}_{, l ξ} - {g^{s l}}_{, ξ} {g^{r k}}_{, l} - g^{s l} {g^{r k}}_{, l ξ} - {g^{r l}}_{, ξ} {g^{k s}}_{, l} - g^{r l} {g^{k s}}_{, l ξ}) g_{s j} . \end{matrix}

This provides the full data for numerical integration of the evolution equations on

F^{k} M

.

In Figure 7 (top row), we plot minimizing normal MPPs on the landmark manifold with two landmarks and varying covariance in the

R^{2}

horizontal and vertical direction. The plot shows the landmark equivalent of the experiment in Figure 5. Note how adding covariance in the horizontal and vertical direction, respectively, allows the minimizing normal MPP to vary more in these directions because the anisotropically-weighted metric penalizes high-covariance directions less.

Figure 7. (Top row) Matching of two landmarks (green) to two landmarks (red) by (a) computing a minimizing Riemannian geodesic on the landmark manifold, and (b–e) minimizing MPPs with added covariance (arrows) in

R^{2}

horizontal direction (b,c) and vertical (d,e). The action of the corresponding diffeomorphisms on a regular grid is visualized by the deformed grid which is colored by the warp strain. The added covariance allows the paths to have more movement in the horizontal and vertical direction, respectively, because the anisotropically weighted metric penalizes high-covariance directions less. (bottom row, (f–j)) Five landmark trajectories with fixed initial velocity and anisotropic covariance but varying

V^{*} F M

vertical initial momentum

ξ_{0}

. Changing the vertical momentum “twists” the paths.

Figure 7 (bottom row) shows five curves satisfying the MPP equations with varying vertical

V^{*} F M

initial momentum similarly to the plots in Figure 6. Again, we see how the extra degrees of freedom allows the paths to twist, generating a higher-dimensional family than classical geodesics with respect to

C

.

7. Discussion and Concluding Remarks

Incorporating anisotropy in models for data in non-linear spaces via the frame bundle as pursued in this paper leads to a sub-Riemannian structure and metric. A direct implication is that most probable paths to observed data in the sense of sequences of stochastic steps of a driving semi-martingale are not related to geodesics in the classical sense. Instead, a best estimate of the sequence of steps

w_{t} \in R^{d}

that leads to an observation

x = φ_{u} (w_{t}) |_{t = 1}

is an MPP in the sense of Definition 1. As shown in the paper, these paths are generally not geodesics or polynomials with respect to the connection on the manifold. In particular, if M has a Riemannian structure, the MPPs are generally neither Riemannian geodesics nor Riemannian polynomials. Below, we discuss the statistical implications of this result.

7.1. Statistical Estimators

Metric distances and Riemannian geodesics have been the traditional vehicle for representing observed data in non-linear spaces. Most fundamentally, the sample Frechét mean

\hat{x} = {argmin}_{x \in M} \sum_{i = 1}^{N} d_{g_{R}} {(x, x_{i})}^{2}

(19)

of observed data

x_{1}, \dots, x_{N} \in M

relies crucially on the Riemannian distance

d_{g_{R}}

connected to the metric

g_{R}

. Many PCA constructs (e.g., Principal Geodesics Analysis [6]) use the Riemannian

Exp .

and

Log

maps to map between linear tangent spaces and the manifold. These maps are defined from the Riemannian metric and Riemannian geodesics. Distributions modelled as in the random orbit model [32] or Bayesian models [15,33] again rely on geodesics with random initial conditions.

Using the frame bundle sub-Riemannian metric

g_{F M}

, we can define an estimator analogous to the Riemannian Frechét mean estimator. Assuming the covariance is a priori known, the estimator

\hat{x} = {argmin}_{u \in s (M)} \sum_{i = 1}^{N} d_{F M} {(u, π^{- 1} (x_{i}))}^{2}

(20)

acts correspondingly to the Frechét mean estimator (19). Here

s \in Γ (F M)

is a (local) section of

F M

that to

x \in M

connects the known covariance represented by

s (x) \in F M

. The distances

d_{F M} (u, π^{- 1} (x_{i}))

,

u = s (x)

are realized by MPPs from the mean candidate x to the fibers

π^{- 1} (x_{i})

. The Frechét mean problem is thus lifted to the frame bundle with the anisotropic weighting incorporated in the metric

g_{F M}

. This metric is not related to

g_{R}

, except for its dependence on the connection

C

that can be defined as the Levi–Civita connection of

g_{R}

. The fundamental role of the distance

d_{g_{R}}

and

g_{R}

geodesics in (19) is thus removed.

Because covariance is an integral part of the model, sample covariance can also be estimated directly along with the sample mean. In [3], the estimator

\hat{u} = {argmin}_{u \in F M} \sum_{i = 1}^{N} d_{F M} {(u, π^{- 1} (x_{i}))}^{2} - N log (\det_{g_{R}} u)

(21)

is suggested. The normalizing term

- N log (\det_{g_{R}} u)

is derived such that the estimator exactly corresponds to the maximum likelihood estimator of mean and covariance for Euclidean Gaussian distributions. The determinant is defined via

g_{R}

, and the term acts to prevent the covariance from approaching infinity. Maximum likelihood estimators of mean and covariance for normally distributed Euclidean data have unique solutions in the sample mean and sample covariance matrix, respectively. Uniqueness of the Frechét mean (19) is only ensured for sufficiently concentrated data. For the estimator (21), existence and uniqueness properties are not immediate, and more work is needed in order to find necessary and sufficient conditions.

7.2. Priors and Low-Rank Estimation

The low-rank cometric formulation pursued in Section 5 gives a natural restriction of (21) to

u \in F^{k} M

,

1 \leq k \leq d

. As for Euclidean PCA, most variance is often captured in the span of the first k eigenvectors with

k ≪ d

. Estimates of the remaining eigenvectors are generally ignored, as the variance of the eigenvector estimates increases as the noise captured in the span of the last eigenvectors becomes increasingly uniform. The low-rank cometric restricts the estimation to only the first k eigenvectors, and thus builds the construction directly into the model. In addition, it makes numerical implementation feasible, because a numerical representation need only store and evolve

d \times k

matrices. As a different approach for regularizing the estimator (21), the normalizing term

- N log (\det_{g_{R}} u)

can be extended with other priors (e.g., an

L^{1}

-type penalizing term). Such priors can potentially partly remove existence and uniqueness issues, and result in additional sparsity properties that can benefit numerical implementations. The effects of such priors have yet to be investigated.

In the

k = d

case, the number of degrees of freedom for the MPPs grows quadratically in the dimension d. This naturally increases the variance of any MPP estimate given only one sample from its trajectory. The low-rank cometric formulation reduces the growth to linear in d. The number of degrees of freedom is however still k times larger than for Riemannian geodesics. With longitudinal data, more samples per trajectory can be obtained, reducing the variance and allowing a better estimate of the MPP. However, for the estimators (20) and (21) above, estimates of the actual optimal MPPs are not needed—only their squared length. It can be hypothesized that the variance of the length estimates is lower than the variance of the estimates of the corresponding MPPs. Further investigation regarding this will be the subject of future work.

7.3. Conclusions

The underlying model of anisotropy used in this paper originates from the anisotropic normal distributions formulated in [2] and the diffusion PCA framework [1]. Because many statistical models are defined using normal distributions, this approach to incorporating anisotropy extends to models such as linear regression. We expect that finding most probable paths in other statistical models such as regressions models can be carried out with a program similar to the program presented in this paper.

The difference between MPPs and geodesics shows that the geometric and metric properties of geodesics, zero acceleration, and local distance minimization are not directly related to statistical properties such as maximizing path probability. Whereas the concrete application and model determines if metric or statistical properties are fundamental, most statistical models are formulated without referring to metric properties of the underlying space. It can therefore be argued that the direct incorporation of anisotropy and the resulting MPPs are natural in the context of many models of data variation in non-liner spaces.

Acknowledgments

The author wishes to thank Peter W. Michor and Sarang Joshi for suggestions for the geometric interpretation of the sub-Riemannian metric on

F M

and discussions on diffusion processes on manifolds. The work was supported by the Danish Council for Independent Research, the CSGB Centre for Stochastic Geometry and Advanced Bioimaging funded by a grant from the Villum foundation, and the Erwin Schrödinger Institute in Vienna.

Conflicts of Interest

The author declares no conflict of interest.

References

Sommer, S. Diffusion Processes and PCA on Manifolds. Available online: https://www.mfo.de/document/1440a/OWR_2014_44.pdf (accessed on 24 November 2016).
Sommer, S. Anisotropic distributions on manifolds: Template estimation and most probable paths. In Information Processing in Medical Imaging; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2015; Volume 9123, pp. 193–204. [Google Scholar]
Sommer, S.; Svane, A.M. Modelling anisotropic covariance using stochastic development and sub-riemannian frame bundle geometry. J. Geom. Mech. 2016, in press. [Google Scholar]
Hsu, E.P. Stochastic Analysis on Manifolds; American Mathematical Society: Providence, RI, USA, 2002. [Google Scholar]
Fréchet, M. Les éléments aléatoires de nature quelconque dans un espace distancie. Annales de l’Institut Henri Poincaré 1948, 10, 215–310. [Google Scholar]
Fletcher, P.; Lu, C.; Pizer, S.; Joshi, S. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging 2004, 23, 995–1005. [Google Scholar] [CrossRef] [PubMed]
Vaillant, M.; Miller, M.; Younes, L.; Trouvé, A. Statistics on diffeomorphisms via tangent space representations. NeuroImage 2004, 23, S161–S169. [Google Scholar] [CrossRef] [PubMed]
Younes, L. Shapes and Diffeomorphisms; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Pennec, X. Intrinsic statistics on riemannian manifolds: Basic tools for geometric measurements. J. Math. Imaging Vis. 2006, 25, 127–154. [Google Scholar] [CrossRef]
Karcher, H. Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 1977, 30, 509–541. [Google Scholar] [CrossRef]
Huckemann, S.; Hotz, T.; Munk, A. Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric lie group actions. Stat. Sin. 2010, 20, 1–100. [Google Scholar]
Jung, S.; Dryden, I.L.; Marron, J.S. Analysis of principal nested spheres. Biometrika 2012, 99, 551–568. [Google Scholar] [CrossRef] [PubMed]
Pennec, X. Barycentric subspaces and affine spans in manifolds. In Proceedings of the Second International Conference on Geometric Science of Information, Paris, France, 28–30 October 2015; Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science. pp. 12–21.
Sommer, S. Horizontal dimensionality reduction and iterated frame bundle development. In Geometric Science of Information; Springer: Berlin/Heidelberg, Germany, 2013; pp. 76–83. [Google Scholar]
Zhang, M.; Fletcher, P. Probabilistic principal geodesic analysis. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, 5–10 December 2013; pp. 1178–1186.
Tipping, M.E.; Bishop, C.M. Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B 1999, 61, 611–622. [Google Scholar] [CrossRef]
Elworthy, D. Geometric aspects of diffusions on manifolds. In École d’Été de Probabilités de Saint-Flour XV–XVII, 1985–1987; Hennequin, P.L., Ed.; Number 1362 in Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 1988; pp. 277–425. [Google Scholar]
Mok, K.P. On the differential geometry of frame bundles of Riemannian manifolds. J. Reine Angew. Math. 1978, 1978, 16–31. [Google Scholar]
Taubes, C.H. Differential Geometry: Bundles, Connections, Metrics and Curvature, 1st ed.; Oxford University Press: Oxford, UK; New York, NY, USA, 2011. [Google Scholar]
Kolář, I.; Slovák, J.; Michor, P.W. Natural Operations in Differential Geometry; Springer: Berlin/Heidelberg, Germany, 1993. [Google Scholar]
Andersson, L.; Driver, B.K. Finite dimensional approximations to wiener measure and path integral formulas on manifolds. J. Funct. Anal. 1999, 165, 430–498. [Google Scholar] [CrossRef]
Fujita, T.; Kotani, S.i. The Onsager-Machlup function for diffusion processes. J. Math. Kyoto Univ. 1982, 22, 115–130. [Google Scholar]
Strichartz, R.S. Sub-Riemannian geometry. J. Differ. Geom. 1986, 24, 221–263. [Google Scholar]
Bloch, A.M. Nonholonomic mechanics and control. In Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2003; Volume 24. [Google Scholar]
Marsden, J.E.; Ratiu, T.S. Introduction to mechanics and symmetry. In Texts in Applied Mathematics; Springer: New York, NY, USA, 1999; Volume 17. [Google Scholar]
Leite, F.S.; Krakowski, K.A. Covariant Differentiation under Rolling Maps; Centro de Matemática da Universidade de Coimbra: Coimbra, Portugal, 2008. [Google Scholar]
Hinkle, J.; Fletcher, P.T.; Joshi, S. Intrinsic polynomials for regression on riemannian manifolds. J. Math. Imaging Vis. 2014, 50, 32–52. [Google Scholar] [CrossRef]
Noakes, L.; Heinzinger, G.; Paden, B. Cubic splines on curved spaces. IMA J. Math. Control Inf. 1989, 6, 465–473. [Google Scholar] [CrossRef]
Camarinha, M.; Silva Leite, F.; Crouch, P. On the geometry of Riemannian cubic polynomials. Differ. Geom. Appl. 2001, 15, 107–135. [Google Scholar] [CrossRef]
Team, T.T.D.; Al-Rfou, R.; Alain, G.; Almahairi, A.; Angermueller, C.; Bahdanau, D.; Ballas, N.; Bastien, F.; Bayer, J.; Belikov, A.; et al. Theano: A Python framework for fast computation of mathematical expressions. arXiv, 2016; arXiv:1605.02688. [Google Scholar]
Micheli, M. The Differential Geometry of Landmark Shape Manifolds: Metrics, Geodesics, and Curvature. Ph.D. Thesis, Brown University, Providence, RI, USA, 2008. [Google Scholar]
Miller, M.; Banerjee, A.; Christensen, G.; Joshi, S.; Khaneja, N.; Grenander, U.; Matejic, L. Statistical methods in computational anatomy. Stat. Methods Med. Res. 1997, 6, 267–299. [Google Scholar] [CrossRef] [PubMed]
Zhang, M.; Singh, N.; Fletcher, P.T. Bayesian estimation of regularization and atlas building in diffeomorphic image registration. In Information Processing for Medical Imaging (IPMI); Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; pp. 37–48. [Google Scholar]

Figure 1. (a) A most probable path (MPP) for a driving Euclidean Brownian motion on an ellipsoid. The gray ellipsis over the starting point (red dot) indicates the covariance of the anisotropic diffusion. A frame

u_{t}

(black/gray vectors) representing the square root covariance is parallel transported along the curve, enabling the anisotropic weighting with the precision matrix in the action functional. With isotropic covariance, normal MPPs are Riemannian geodesics. In general situations, such as the displayed anisotropic case, the family of MPPs is much larger; (b) The corresponding anti-development in

R^{2}

(red line) of the MPP. Compare with the anti-development of a Riemannian geodesic with same initial velocity (blue dotted line). The frames

u_{t} \in GL (R^{2}, T_{x_{t}} M)

provide local frame coordinates for each time t.

Figure 1. (a) A most probable path (MPP) for a driving Euclidean Brownian motion on an ellipsoid. The gray ellipsis over the starting point (red dot) indicates the covariance of the anisotropic diffusion. A frame

u_{t}

(black/gray vectors) representing the square root covariance is parallel transported along the curve, enabling the anisotropic weighting with the precision matrix in the action functional. With isotropic covariance, normal MPPs are Riemannian geodesics. In general situations, such as the displayed anisotropic case, the family of MPPs is much larger; (b) The corresponding anti-development in

R^{2}

(red line) of the MPP. Compare with the anti-development of a Riemannian geodesic with same initial velocity (blue dotted line). The frames

u_{t} \in GL (R^{2}, T_{x_{t}} M)

provide local frame coordinates for each time t.

Figure 2. (a) Normal distributions

u N (0, Id)

in the tangent space

T_{x_{0}} M

with covariance

u u^{T}

(blue ellipsis) can be mapped to the manifold by applying the exponential map

{Exp}_{x_{0}}

to sampled vectors

v \in T_{x_{0}} M

(red vectors). This effectively linearises the geometry around

x_{0}

; (b) The stochastic development map

φ_{u}

maps

R^{d}

valued paths

w_{t}

to M by transporting the covariance in each step (blue ellipses) giving a covariance

u_{t}

along the entire sample path. The approach does not linearise around a single point. Holonomy of the connection implies that the covariance “rotates” around closed loops—an effect which can be illustrated by continuing the transport along the loop created by the dashed path. The anisotropic metric

g_{F M}

weights step lengths by the transported covariance at each time t.

Figure 2. (a) Normal distributions

u N (0, Id)

in the tangent space

T_{x_{0}} M

with covariance

u u^{T}

(blue ellipsis) can be mapped to the manifold by applying the exponential map

{Exp}_{x_{0}}

to sampled vectors

v \in T_{x_{0}} M

(red vectors). This effectively linearises the geometry around

x_{0}

; (b) The stochastic development map

φ_{u}

maps

R^{d}

valued paths

w_{t}

to M by transporting the covariance in each step (blue ellipses) giving a covariance

u_{t}

along the entire sample path. The approach does not linearise around a single point. Holonomy of the connection implies that the covariance “rotates” around closed loops—an effect which can be illustrated by continuing the transport along the loop created by the dashed path. The anisotropic metric

g_{F M}

weights step lengths by the transported covariance at each time t.

Figure 3. Relations between the manifold, frame bundle, the horizontal distribution

H F M

, the vertical bundle

V F M

, a Riemannian metric

g_{R}

, and the sub-Riemannian metric

g_{F M}

, defined below. The connection

C

provides the splitting

T F M = H F M \oplus V F M

. The restrictions

π_{*} |_{H_{u} M}

are invertible maps

H_{u} M \to T_{π (u)} M

with inverse

h_{u}

, the horizontal lift. Correspondingly, the vertical bundle

V F M

is isomorphic to the trivial bundle

F M \times gl (n)

. The metric

g_{F M} : T^{*} F M \to T F M

has an image in the subspace

H F M

.

Figure 3. Relations between the manifold, frame bundle, the horizontal distribution

H F M

, the vertical bundle

V F M

, a Riemannian metric

g_{R}

, and the sub-Riemannian metric

g_{F M}

, defined below. The connection

C

provides the splitting

T F M = H F M \oplus V F M

. The restrictions

π_{*} |_{H_{u} M}

are invertible maps

H_{u} M \to T_{π (u)} M

with inverse

h_{u}

, the horizontal lift. Correspondingly, the vertical bundle

V F M

is isomorphic to the trivial bundle

F M \times gl (n)

. The metric

g_{F M} : T^{*} F M \to T F M

has an image in the subspace

H F M

.

Figure 4. Curves satisfying the MPP equations (top row) and corresponding anti-development (bottom row) on three surfaces embedded in

R^{3}

: (a) An ellipsoid; (b) a sphere; (c) a hyperbolic surface. The family of curves is generated by rotating by

π / 2

radians the anisotropic covariance represented in the initial frame

u_{0}

and displayed in the gray ellipse.

Figure 4. Curves satisfying the MPP equations (top row) and corresponding anti-development (bottom row) on three surfaces embedded in

R^{3}

: (a) An ellipsoid; (b) a sphere; (c) a hyperbolic surface. The family of curves is generated by rotating by

π / 2

radians the anisotropic covariance represented in the initial frame

u_{0}

and displayed in the gray ellipse.

Figure 5. Minimizing normal MPPs between two fixed points (red/cyan). From isotropic covariance (top row, (a)) to anisotropic (top row, (c)) on

S^{2}

. Compare with minimizing Riemannian geodesic (black curve). The MPP travels longer in the directions of high variance. Families of curves (middle row, (d–f)) and corresponding anti-development (bottom row, (g–i)) on the three surfaces in Figure 4. The family of curves is generated by rotating the covariance matrix as in Figure 4. Notice how the varying anisotropy affects the resulting minimizing curves, and how the anti-developed curves end at different points in

R^{2}

.

Figure 5. Minimizing normal MPPs between two fixed points (red/cyan). From isotropic covariance (top row, (a)) to anisotropic (top row, (c)) on

S^{2}

. Compare with minimizing Riemannian geodesic (black curve). The MPP travels longer in the directions of high variance. Families of curves (middle row, (d–f)) and corresponding anti-development (bottom row, (g–i)) on the three surfaces in Figure 4. The family of curves is generated by rotating the covariance matrix as in Figure 4. Notice how the varying anisotropy affects the resulting minimizing curves, and how the anti-developed curves end at different points in

R^{2}

.

Figure 6. (a–l) With the setup of Figure 4 and Figure 5, generated families of curves by varying the vertical

V^{*} F M

part of the initial momentum

ξ_{0} \in T^{*} F M

but keeping the base point and frame

u_{0}

fixed. The vertical part allows varying degree of “twisting” of the curve.

Figure 6. (a–l) With the setup of Figure 4 and Figure 5, generated families of curves by varying the vertical

V^{*} F M

part of the initial momentum

ξ_{0} \in T^{*} F M

but keeping the base point and frame

u_{0}

fixed. The vertical part allows varying degree of “twisting” of the curve.

Figure 7. (Top row) Matching of two landmarks (green) to two landmarks (red) by (a) computing a minimizing Riemannian geodesic on the landmark manifold, and (b–e) minimizing MPPs with added covariance (arrows) in

R^{2}

horizontal direction (b,c) and vertical (d,e). The action of the corresponding diffeomorphisms on a regular grid is visualized by the deformed grid which is colored by the warp strain. The added covariance allows the paths to have more movement in the horizontal and vertical direction, respectively, because the anisotropically weighted metric penalizes high-covariance directions less. (bottom row, (f–j)) Five landmark trajectories with fixed initial velocity and anisotropic covariance but varying

V^{*} F M

vertical initial momentum

ξ_{0}

. Changing the vertical momentum “twists” the paths.

Figure 7. (Top row) Matching of two landmarks (green) to two landmarks (red) by (a) computing a minimizing Riemannian geodesic on the landmark manifold, and (b–e) minimizing MPPs with added covariance (arrows) in

R^{2}

horizontal direction (b,c) and vertical (d,e). The action of the corresponding diffeomorphisms on a regular grid is visualized by the deformed grid which is colored by the warp strain. The added covariance allows the paths to have more movement in the horizontal and vertical direction, respectively, because the anisotropically weighted metric penalizes high-covariance directions less. (bottom row, (f–j)) Five landmark trajectories with fixed initial velocity and anisotropic covariance but varying

V^{*} F M

vertical initial momentum

ξ_{0}

. Changing the vertical momentum “twists” the paths.

© 2016 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

Anisotropically Weighted and Nonholonomically Constrained Evolutions on Manifolds †

Abstract

1. Introduction

Background

2. Frame Bundles, Stochastic Development, and Anisotropic Diffusions

2.1. The Frame Bundle

2.2. Development and Stochastic Development

2.3. Adapted Coordinates

2.4. Connection and Curvature

3. The Anisotropically Weighted Metric

3.1. Sub-Riemannian Metric on the Horizontal Distribution

3.2. Covariance and Nonholonomicity

3.3. Riemannian Metrics on F M

4. Constrained Evolutions

4.1. Normal Geodesics for g F M

4.2. Evolution in Coordinates

4.3. Acceleration and Polynomials for C

5. Cometric Formulation and Low-Rank Generator

6. Numerical Experiments

6.1. Embedded Surfaces

6.2. LDDMM Landmark Equations

7. Discussion and Concluding Remarks

7.1. Statistical Estimators

7.2. Priors and Low-Rank Estimation

7.3. Conclusions

Acknowledgments

Conflicts of Interest

References

Article Metrics

Article Access Statistics

Anisotropically Weighted and Nonholonomically Constrained Evolutions on Manifolds^†

3.3. Riemannian Metrics on $F M$

4.1. Normal Geodesics for $g_{F M}$

4.3. Acceleration and Polynomials for $C$