Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems

Fiori, Simone

doi:10.3390/sym13112092

Open AccessTutorial

Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems

by

Simone Fiori

Dipartimento di Ingegneria dell’Informazione, Università Politecnica delle Marche, Via Brecce Bianche, I-60131 Ancona, Italy

Symmetry 2021, 13(11), 2092; https://doi.org/10.3390/sym13112092

Submission received: 23 August 2021 / Revised: 12 October 2021 / Accepted: 26 October 2021 / Published: 4 November 2021

Download Versions Notes

Abstract

:

The aim of the present tutorial paper is to recall notions from manifold calculus and to illustrate how these tools prove useful in describing system-theoretic properties. Special emphasis is put on embedded manifold calculus (which is coordinate-free and relies on the embedding of a manifold into a larger ambient space). In addition, we also consider the control of non-linear systems whose states belong to curved manifolds. As a case study, synchronization of non-linear systems by feedback control on smooth manifolds (including Lie groups) is surveyed. Special emphasis is also put on numerical methods to simulate non-linear control systems on curved manifolds. The present tutorial is meant to cover a portion of the mentioned topics, such as first-order systems, but it does not cover topics such as covariant derivation and second-order dynamical systems, which will be covered in a subsequent tutorial paper.

Keywords:

first-order and second-order abstract systems; feedback control system; smooth manifolds

1. Introduction

The theory of dynamical systems whose state spaces possess the structures of curved manifolds has been applied primarily in physics (especially to mathematically describe the theory of general relativity at the beginning of the 20th century). More recently, dynamical systems on manifolds have proven their relevance in a number of subjects in engineering and applied sciences such as in robotics and in biomedical engineering [1,2,3,4,5] to name a few. The observation at the core of such applications is that those dynamical systems whose descriptive variables are bound to one another by non-linear holonomic constraints may be studied by means of the rich variety of mathematical tools provided by manifold calculus (a shortened terminology to denote ’calculus on manifold’) and may be framed in the class of dynamical systems on manifold.

The structures of the state manifolds of such dynamical systems depend on the application at hand. As a concrete example, whenever the dynamics of a rigid body is concerned, the state manifold of choice is the special orthogonal group

SO (3)

, because it encodes the attitude of a flying drone or a submarine robot. Most applications of interest (such as the computation of the dynamics of flying bodies) concern well-studied and well-understood curved manifolds, such as the special orthogonal group, the unit hypersphere and the group of symmetric, positive-definite matrices.

From the perspective of performing numerical simulations of dynamical systems on a computing platform, it is necessary to design adequate numerical methods to compute approximate solutions which still meet the structures of the state manifolds. Classical numerical methods, such as those in the Euler class or in the Runge–Kutta class, will fail when applied directly to such dynamical systems because they were designed to work on flat spaces and cannot cope with non-flat manifolds [6].

As a specific applied field, the time synchronization of first-order dynamical systems on curved state manifolds by non-linear control will be surveyed. The time synchronization of non-autonomous dynamical systems has been applied in physiology [7,8], ecology [9], atmosphere physics [10], neurology [11] and many more applied fields [12,13,14]. Synchronization theory is a topic that appears to be very interesting from an abstract point of view and, at the same time, very useful in a number of applications. Indeed, synchronization theory appears to be an exciting and interdisciplinary research topic that combines key ideas from system theory, control theory and manifold calculus. In particular, non-linear control theory on manifolds may supply tools to design control fields to force a pair of dynamical systems into synchronizing their dynamics over time [15,16].

The present tutorial paper is devoted to recalling fundamental notions from manifold calculus and to explaining how these concepts apply to system theory and to non-linear control theory, taking time synchronization as a representative case of study in non-linear control. The present contribution is oriented to readers who either possess a good command of calculus but not of manifold calculus, or to readers who claim an understanding of theoretical manifold calculus, but do not have enough insight in the application/computation aspects of this field. In addition, a basic understanding of system theory and control theory will be assumed. The content of the present tutorial paper may be summarized as follows:

It provides a clear and well-motivated introduction to manifold calculus, the basis of system and control theories on manifolds, with special emphasis on computational and applicational aspects. The present contribution provides practical formulas to deal with those real-valued manifolds that, in the author’s experience, are the most accessed in engineering and applied science problems. As a matter of fact, complex-valued manifolds are not treated at all.
It clearly states and illustrates the idea that, when one wishes to perform a simulation, by a computing platform, of dynamical systems on manifolds described in terms of differential equations, it is necessary to time-discretize such differential equations in a suitable way. In order to achieve such a discretization, it is not safe to invoke standard discretization methods (such as the ones based on Euler forward–backward discretization), which do not work as they stand on curved manifolds. One should therefore resort to more sophisticated numerical integration techniques.
By the author’s choice, the present tutorial paper does not carry any graphical illustrations nor any numerical simulation results. Readers who are interested in deepening their understanding of this topic are invited to to sketch graphs autonomously and to code examples in their favorite programming language.

The present paper is organized as follows. Section 2 lays out some introductory material, such as the motivation behind manifold calculus and a list of manifolds mostly accessed in applications. Section 3 introduces the notion of curves and tangent bundles and allied topics, such as normal spaces. Section 4 provides a first introduction, with examples, to first-order dynamical systems on manifolds. Section 5 presents a special kind of derivative of a function having a manifold as a domain and a manifold as a co-domain, termed a pushforward map. Section 6 introduces a class of manifolds that have peculiar features: the Lie groups. Section 7, in detail, treats the fundamental concept of metrization of a curved space through the more familiar notion of metrization of vector spaces. Section 8 introduces notions such as geodesic lines, Riemannian distance and exponential maps. Section 9 surveys the notion of Riemannian gradient and illustrates such a concept via examples. Section 10 introduces and exemplifies the notion of parallel transport along a curve, which is of paramount importance in manifold calculus and its computer-based implementations. Section 11 outlines the concept of manifold retraction and vector transport, which are computationally convenient approximations of exponential maps and parallel transport, respectively. Section 12 illustrates a feedback control theory suitable for application to first-order systems insisting on state manifolds. Section 13 introduces the notion of Riemannian Hessian, which stems from a second-order approximation of a manifold-to-scalar function, and recalls optimization algorithms that extend the Newton method to look for a zero of a vector field. Section 14 concludes this tutorial paper.

As a distinctive aspect of the present tutorial paper, the main flow of discussion is based on coordinate-free (or component-free) expressions, which facilitates the implementation of the main equations by a matrix-friendly computational language (such as MATLAB

^{©}

). Coordinate-free manifold calculus is introduced by the technical tool of embedded calculus, which stems from embedding a manifold into a larger ambient space where the calculation rules are simpler and more familiar to readers. The starred subsections, namely subsections marked by an asterisk (*), address some specific arguments, related to coordinate-prone manifold calculus, which may be skipped by the uninterested readers without detriment to the comprehension of the main flow of this presentation.

The present tutorial paper does not cover a number of subjects, such as the covariant derivation of vector fields, continuous-time second-order dynamical systems arising from a Lagrangian framework nor higher-order discrete-time dynamical systems, nor the key topics related to manifold curvature. These topics will be the subject of a forthcoming tutorial paper.

2. Coordinate-Free Embedded Manifold Calculus

In this section, we recall the relevant notation and fundamental properties of matrix calculus, as well as several examples of manifolds of interest in engineering and applied sciences.

In manifold calculus, both embedded (or extrinsic) and intrinsic coordinates may be accessed. With the aim of elucidating the difference between extrinsic and intrinsic coordinates for manifold elements, let us mention the case of the space

SO (2)

of planar rotations. The space

SO (2)

is a mono-dimensional manifold; therefore, any element

X \in SO (2)

may be pointed to by one intrinsic coordinate: a matrix

X \in SO (2)

may be represented as

X (θ) = [\begin{matrix} cos θ & sin θ \\ - sin θ & cos θ \end{matrix}],

(1)

with

θ \in [0, 2 π)

. On the other hand, by embedding the space

SO (2)

into the space

R^{2 \times 2}

of

2 \times 2

real-valued matrices, it turns out that any element of

SO (2)

may be regarded as a two-by-two real-valued matrix

X = [\begin{matrix} X_{11} & X_{12} \\ X_{21} & X_{22} \end{matrix}],

(2)

whose entries must satisfy the four non-linear constrains:

\{\begin{matrix} X_{11}^{2} + X_{21}^{2} = 1, \\ X_{22}^{2} + X_{12}^{2} = 1, \\ X_{11} X_{12} + X_{21} X_{22} = 0, \\ X_{11} X_{22} - X_{12} X_{21} = 1 . \end{matrix}

(3)

The four parameters

X_{11}

,

X_{12}

,

X_{21}

,

X_{22}

denote embedded or extrinsic coordinates. The orthogonality and normality constraints of the columns of the matrix X may be represented through two coordinate-free constraints in a compact way as

X^{⊤} X = I_{2} and det (X) = 1,

(4)

where a superscript

^{⊤}

denotes matrix transposition and

I_{2}

denotes a

2 \times 2

identity matrix.

Another illustrative example is the ordinary sphere

S^{2}

that may be parametrized as follows:

\{\begin{matrix} x = cos θ sin φ \\ y = sin θ sin φ \\ z = cos φ \end{matrix}

(5)

with parameters

0 \leq θ < 2 π

and

0 \leq φ \leq π

(in cartography, the coordinate

θ

is termed ‘latitude’, while the coordinate

φ

is termed ‘longitude’.) In general, the dimension of a manifold denotes the minimal number of parameters that are required to individuate a point on a manifold. While in common speech the ordinary sphere is termed ‘three-dimensional’, it is indeed a bi-dimensional manifold embedded in a three-dimensional ambient space. The dimension of the manifold is 2; the dimension of the embedding space, instead, is 3.

In general, in engineering and applied mathematics, using intrinsic coordinates is deprecated. As we have just seen, representing an ordinary sphere through intrinsic coordinates is fairly easy, but what about, for example, a hypersphere

S^{9}

? Any intrinsic representation would require nine angles and a series of complicated trigonometric expressions. Assuming that a manifold of interest may be embedded into a larger ambient space

A

, it is way easier to represent one such manifold as a subset of points of an ambient space that meet certain constraints. For example, the sphere

S^{9}

may be embedded in

R^{10}

and every point of such a sphere may be represented by an array

x \in R^{10}

such that

\sum_{i = 1}^{10} x_{i}^{2} = x^{⊤} x = 1 .

(6)

Clearly, the ten coordinates

x_{i}

, termed embedded coordinates, turn out to be unnecessary (nine coordinates would be enough). The existence of an embedding may be evaluated through classical results of manifold calculus, which also state the best dimension of the ambient space

A

on the basis of the structural properties of the embedding. Two such results are the strong Whitney embedding theorem and the Nash isometric embedding theorem for Riemannian manifolds. In the present tutorial, we are concerned exclusively with manifolds that may be embedded into a linear ambient space.

The number of extrinsic coordinates is, in general, far larger than the number of intrinsic coordinates; hence, a coordinate-free embedded representation is, in general, redundant. However, as long as computer implementation is concerned, a coordinate-free representation is advantageous because computing languages, such as MATLAB

^{©}

and Python, generally deal seamlessly with bi-dimensional arrays. With reference to the sphere, let us examine a simple example that tells us why manifolds need their own calculus.

Example 1.

Take a point

x \in S^{2}

and a direction

v \in R^{3} - {0}

, such that

x^{⊤} v = 0

, and let us write the parametric equation of a line departing from x in the direction v as

ℓ (t) : = x + t v

, with

0 \leq t \leq 1

. It is important to point out that none of the points over such a line belong to the sphere, except for the point x. In fact, we have

ℓ^{⊤} (t) ℓ (t) = {(x + t v)}^{⊤} (x + t v) = 1 + t^{2} {∥ v ∥}^{2},

(7)

since v is orthogonal to x. It turns out that

ℓ^{⊤} (t) ℓ (t) \neq 1

if

t \neq 0

, hence the assertion. ■

This simple example tells us that standard constructs, such as straight lines, do not work on curved manifolds, hence the need of a specific calculus to be consistently developed.

2.1. General Notation and Properties

In general, column arrays are denoted by a lower-case letter, while matrices are denoted by an upper-case letter. Manifold elements, for a generic manifold, are denoted as lower-case letters.

A number of matrix functions and factorizations will be invoked in this paper, namely:

Matrix trace: The trace of a square matrix $M \in R^{p \times p}$ (namely the sum of its principal-diagonal entries) is denoted by $tr (M)$ . Matrix trace has a cyclic permutation invariance property. For example, given three conformable (i.e., mutually multipliable) matrices $A, B, C$ , it holds that:

$tr (A B C) = tr (B C A) = tr (C A B) .$

(8)
Matrix square root: Given a matrix $M \in R^{p \times p}$ , its square root R is the unique matrix such that $R^{2} = P$ . Not every matrix admits a square root. Special square roots (such as a symmetric square root) will be defined later.
Spectral factorization: Given a matrix $M \in R^{p \times p}$ , let us assume that there exists an orthogonal matrix X (i.e., one such that $X^{⊤} X = I_{p}$ ) and a real diagonal matrix D such that $M = X D X^{⊤}$ . The expression on the right-hand side denotes the spectral factorization of the matrix M. Such a factorization turns out to be very useful in evaluating matrix polynomials. For example, it holds that

$\begin{matrix} M^{4} = {(X D X^{⊤})}^{4} = (X D X^{⊤}) (X D X^{⊤}) (X D X^{⊤}) (X D X^{⊤}) = \\ X D (X^{⊤} X) D (X^{⊤} X) D (X^{⊤} X) D X^{⊤} = X D D D D X^{⊤} = X D^{4} X^{⊤} . \end{matrix}$

(9)

Exponentiating full matrices is cumbersome, while exponentiating diagonal matrices is amiable.
Thin QR factorization: Any matrix $M \in R^{p \times q}$ , with $p \geq q$ , may be factored as the product of a $p \times p$ orthogonal matrix Q and a $p \times q$ upper triangular matrix R, namely $M = Q R$ . In general, a QR factorization is not unique [17]. To remove such kind of indeterminacy, the R-factor may be chosen with strictly positive entries on its main diagonal, so that the factorization is unique.
Compact singular value factorization (SVD): Compact SVD of a matrix $M \in R^{p \times q}$ is a matrix factorization of the type $M = A D B^{⊤}$ in which D is square diagonal of size $r \times r$ , where $r \leq min {p, q}$ is the rank of M, and has only nonzero singular values. In this variant, A denotes a $p \times r$ matrix and B denotes a $q \times r$ matrix, such that $A^{⊤} A = B^{⊤} B = I_{r}$ [18].
Polar factorization: Given a real-valued $p \times n$ matrix M, its polar factorization is written as $M = X S$ , where X denotes a $p \times n$ matrix such that $X^{⊤} X = I_{n}$ , termed polar factor, and S denotes a symmetric positive semidefinite $n \times n$ matrix [19]. The polar factorization of a matrix always exists and, if the matrix is full rank, its polar factor is unique.
Matrix exponential: Given a matrix $M \in R^{p \times p}$ , its matrix exponential is denoted as $Exp (M)$ . Matrix exponential is defined via a series as

$Exp (M) : = \sum_{k = 0}^{\infty} \frac{M^{k}}{k!} .$

(10)

There exist special formulas to compute the matrix exponential via a finite number of operations for special matrices (see, for example, [20] and references therein). For example, for a symmetric matrix $M \in R^{p \times p}$ , that admits a spectral factorization $M = X D X^{⊤}$ , it holds that

$Exp (M) = X Exp (D) X^{⊤},$

(11)

where $Exp (diag (λ_{1}, λ_{2}, λ_{3}, \dots, λ_{p})) = diag (e^{λ_{1}}, e^{λ_{2}}, e^{λ_{3}}, \dots, e^{λ_{p}})$ .
Principal matrix logarithm: Given a matrix $M \in R^{p \times p}$ , its principal matrix logarithm is denoted as $Log (M)$ . Matrix logarithm is defined via a series as

$Log (M) : = - \sum_{k = 1}^{\infty} \frac{{(I_{p} - M)}^{k}}{k}, deﬁned for ∥ I_{p} - M ∥ < 1 .$

(12)

In the specialized literaure it is possible to find special formulas to compute the principal matrix logarithm via a finite number of operations for special matrices. For example, for a positive-definite matrix $M \in R^{p \times p}$ , which admits a spectral factorization $M = X D X^{⊤}$ , it holds that

$Log (M) = X Log (D) X^{⊤},$

(13)

where $Log (diag (λ_{1}, λ_{2}, λ_{3}, \dots, λ_{p})) = diag (log λ_{1}, log λ_{2}, log λ_{3}, \dots, log λ_{p})$ . Recall that any symmetric, positive-definite matrix has only positive eigenvalues; hence, $Log (D)$ is a real-valued matrix.

The other notation used within this paper is defined at its earliest occurrence. We just recall that the symbol

: =

denotes a definition and that the derivative of a function

F (t)

with respect to its scalar parameter t is denoted as

\dot{F}

.

2.2. Manifolds and Embedded Manifolds (or Submanifolds)

The formal definition of a smooth manifold is quite convoluted, as it requires notions from mathematical topology [21]. More practically, a manifold may be essentially regarded as a generalization of surfaces in higher dimensions that is endowed with the noticeable property of being locally similar to a flat space. In addition, a manifold may generally be regarded as an abstract mathematical object, not necessarily ready for computation, whereas, in practice, manifolds of interest in applied science and engineering are essentially matrix manifolds.

Let us consider a smooth manifold

M

and a point x on it. From an abstract point of view, x is an element of a set

M

and does not necessarily carry any intrinsic numerical features. In order to be able to develop calculus on manifolds, to, for instance, compute the directional derivative of a function

f : M \to R

, it is convenient to ‘coordinatize’ a manifold. To this aim, let us take a neighborhood (open set)

U \subset M

that contains the point x and a coordinate map

ψ : U \to R^{p}

. The size p denotes the minimal number of coordinates that is necessary to specify the location of the point x unequivocally and is taken as the dimension of the manifold. The map

ψ

needs to be a one-to-one map. In this way, we attach a set of coordinates

ψ (x)

to the point x. Such a construction establishes a smooth one-to-one correspondence between a point on a manifold and its coordinate point; therefore, the two concepts may be confused and we may safely speak of a point

x \in M

when actually speaking of its coordinates.

The above theoretical construction carries a number of practical drawbacks. For instance, notice that in general a manifold cannot be covered by a single coordinate map. Indeed, in general a manifold needs to be covered by a number of neighborhoods

U_{k}

, each of which is equipped with a coordinate map

ψ_{k} : U_{k} \to R^{p}

. The set

{U_{k}}

thus forms a basis for the manifold. Such a basis need not be finite, although it needs to be countable and is termed ‘atlas’. In general, the basis neighborhoods

U_{k}

may happen to be overlapping one another; hence, the coordinate maps

ψ_{k}

need to satisfy the compatibility conditions. Such conditions formalize the natural requirement that there need to be a one-to-one smooth correspondence between any two different coordinate systems insisting on regions of the manifolds belonging to more than one neighborhood. In formal terms, if

U_{k} \cap U_{h} \neq \emptyset

, then the ‘transition functions’

ψ_{k}^{- 1} \circ ψ_{h}

and

ψ_{h}^{- 1} \circ ψ_{k}

should possess the structure of diffeomorphisms, namely

C^{\infty}

functions endowed with

C^{\infty}

inverses.

As mentioned in the introduction, in the present tutorial we are neglecting coordinates in favor of embeddings, except in starred sections that cover coordinate-prone calculations and that may be skipped by uninterested readers. A smooth manifold is by nature a continuous object. Manifolds of interest in applications, described in embedded terms, may be summarized as follows:

Hypercube: The simplest manifold of interest is perhaps the hypercube $R^{p}$ , which is essentially the set spanned by p real-valued variables (or p-tuples).
Hypersphere, oblique manifold, hyperellipsoid: A hypersphere is represented as $S^{p - 1} : = {x \in R^{p} ∣ x^{⊤} x = 1}$ and is the subset of points of the hypercube with unit Euclidean distance from point 0. This is a smooth manifold of dimension $p - 1$ embedded in the hypercube $R^{p}$ ; in fact, with only $p - 1$ coordinates, we can identify unequivocally any point on a sphere. In [21], it is shown how to ‘coordinatize’ such a manifold through, e.g., the stereographic projection, which requires two coordinate maps applied to two convenient neighborhoods (each including only one ‘pole’) on the sphere. The special cases are $S^{1}$ , the unit circle, and $S^{2}$ , the ordinary sphere. There exist a number of applications insisting on the hyperspheres $S^{p - 1}$ such as, for instance, blind deconvolution [22,23], data classification [24], adaptive pattern recognition [25] and motion planning, optimization, and verification in robotics and in computational biology [26]. A smooth manifold closely related to the unit hypersphere is the oblique manifold [27], defined as:

$OB (p) : = {X \in R^{p \times p} ∣ diag (X^{⊤} X) = I_{p}},$

(14)

where the operator $diag (\cdot)$ returns the zero matrix except for the main diagonal that is copied from the main diagonal of its argument. The structure of the oblique manifold may be easily studied on the basis of the unit hypersphere; in fact, the following identification holds true:

$OB (p) ≅ \underset{p times}{\underset{︸}{S^{p - 1} \times S^{p - 1} \times \dots \times S^{p - 1}}},$

(15)

hence, each column of a $OB (p)$ -matrix may be treated as a $S^{p - 1}$ -column array. The hyperellipsoid $L^{p - 1}$ is defined as:

$L^{p - 1} : = {x \in R^{p} ∣ {∥ D R x ∥}^{2} = 1},$

(16)

with $D \in R^{p \times p}$ being diagonal and positive-definite and $R \in R^{p \times p}$ being a hyper-rotation such that $R^{⊤} R = I_{p}$ and $det (R) = 1$ . The mathematical structure of the hyperellipsoid $L^{p - 1}$ may be studied on the basis of the manifold structure of the hypersphere. The hyperellipsoid $L^{p - 1}$ is used, e.g., in the calibration of magnetometers [28].
General linear group and special linear group: The general linear group is defined as $GL (p) : = {X \in R^{p \times p} ∣ det (X) \neq 0$ }. This is the subset of the space of $p \times p$ matrices $R^{p \times p}$ which are invertible. The special linear group is defined as $Sl (p) : = {X \in R^{p \times p} ∣ det (X) = 1$ }. This is the subset of the general linear group made by all matrices with a unitary determinant.
Orthogonal group, special orthogonal group, special Euclidean group: An orthogonal group of size p is defined by $O (p) : = {X \in R^{p \times p} ∣ X^{⊤} X = I_{p}}$ . The manifold $O (p)$ has dimension $2^{- 1} p (p - 1)$ . In fact, every matrix in $O (p)$ possesses $p^{2}$ entries which are constrained by $2^{- 1} p (p + 1)$ orthogonality/normality restrictions. The manifold of special orthogonal matrices is defined as $SO (p) : = \{X \in R^{p \times p} ∣ X^{⊤} X = I_{p}, det (X) = 1\}$ . A smooth manifold closely related to the special orthogonal group is the special Euclidean group, denoted as $SE (p)$ , that finds applications in robotics (see, e.g., [29]). The special Euclidean group is a set of $(p + 1) \times (p + 1)$ matrices defined as:

$SE (p) : = \{[\begin{matrix} X & δ \\ 0 & 1 \end{matrix}]| X \in SO (p), δ \in R^{p}\} .$

(17)
Stiefel manifold: The (compact) Stiefel manifold is defined as:

$St (n, p) : = {X \in R^{n \times p} ∣ X^{⊤} X = I_{p}},$

(18)

where $p \leq n$ . Avery Stiefel matrix has $n p$ entries, but its elements are constrained by $2^{- 1} p (p - 1)$ non-linear constraints, and hence the dimension of a Stiefel manifold $St (n, p)$ is $n p - 2^{- 1} p (p - 1)$ . Exemplary applications are blind source separation [30], non-negative matrix factorization [31], best basis search/selection [32], electronic structures computation [33] and factor analysis in psychometrics [34]. A generalization of the Stiefel manifold, to be applied to principal subspace tracking, was studied in the contribution [35]. Such a generalized Stiefel manifold is defined as:

${St}_{B} (n, p) = {X \in R^{n \times p} ∣ X^{⊤} B X = I_{p}},$

(19)

where B denotes any symmetric, positive-definite $n \times n$ matrix. The contribution [35] studied the structure of the tangent bundle of the generalized Stiefel manifold and suggested a computationally convenient calculus for this manifold.
Real symplectic group: The real symplectic group is defined as

$Sp (2 n) : = {Q \in R^{2 n \times 2 n} | Q^{⊤} J Q = J}, J : = [\begin{matrix} 0_{n} & I_{n} \\ - I_{n} & 0_{n} \end{matrix}],$

(20)

where the symbol $I_{n}$ denotes again a $n \times n$ identity matrix and the symbol $0_{n}$ denotes a whole-zero $n \times n$ matrix. The skew-symmetric matrix J enjoys the following properties: $J^{2} = - I_{2 n}$ , $J^{- 1} = J^{⊤} = - J$ .
Manifold of symmetric, positive-definite (SPD) matrices: The main features of the space

$S^{+} (p) : = {P \in R^{p \times p} | P = P^{⊤}, P > 0}$

(21)

of symmetric, positive-definite matrices was surveyed, e.g., in [36,37]. We recall that a matrix $P \in R^{p \times p}$ is termed positive definite if for every column array $z \in R^{p} - {0}$ it holds that $z^{⊤} P z > 0$ . A useful property is that the (real) eigenvalues of any SPD matrix are all strictly positive. A related manifold is the space $S^{+} (n, p)$ of symmetric fixed-rank positive-semidefinite matrices, that may be defined as:

$S^{+} (n, p) : = {X X^{⊤} ∣ X \in R^{n \times p}, rank (X) = p} .$

(22)

The growing use of low-rank matrix approximations to retain tractability in large-scale applications boosted extensions of the calculus of positive-definite matrices to their low-rank counterparts [38].
Grassmann manifold: A Grassmann manifold $Gr (n, p)$ is a set of subspaces of $R^{n}$ spanned by p-independent vectors, namely

$Gr (n, p) = {span (w_{1}, w_{2}, w_{3}, \dots, w_{p})},$

(23)

with $w_{1}, w_{2}, w_{3}, \dots, w_{p} \in R^{n}$ being a p-tuple of arbitrary and linearly independent n-dimensional arrays. Grassmann manifolds are compact, smooth manifolds and are special cases of more general objects termed flag manifolds [39]. A representation of any of such subspaces may be assumed as the equivalence class $[X] = {X R ∣ X \in St (n, p), R \in O (p)}$ [33]. In practice, an element $[X]$ of the Grassmann manifold $Gr (n, p)$ is represented by a matrix in $St (n, p)$ , whose columns span the subspace $[X]$ .

It is easy to see that manifolds of interest in applications may exhibit large dimensions. Coordinatizing these manifolds may be inconvenient for practical purposes when their sizes are larger than some units. For this reason, matrix manifolds are treated as submanifolds of

R^{p}

or

R^{p \times n}

and their elements are represented by solid arrays.

In general, we shall denote as

A

an ambient space that a manifold is embedded into. We shall assume that any ambient space

A

of interest in the present tutorial paper is a Euclidean vector space, namely, a finite-dimensional vector space over the real numbers endowed with an inner product, that we shall denote as

{〈 \cdot, \cdot 〉}^{A}

.

Example 2.

An elementary example of manifold calculus involving the circle

S^{1}

is the Borsuk–Ulam theorem.

Theorem 1.

Let

f : S^{1} \to R

denote a continuous function. There exist two antipodal points

x, - x \in S^{1}

such that

f (x) = f (- x)

.

Proof.

Let

g (y) : = f (y) - f (- y)

with

y \in S^{1}

. Choose an arbitrary point

x_{0} \in S^{1}

:

If $g (x_{0}) = 0$ , then $x = x_{0}$ ;
If $g (x_{0}) \neq 0$ , notice that $g (- x_{0}) = f (- x_{0}) - f (- (- x_{0}))$ = $- (f (x_{0}) - f (- x_{0})) = - g (x_{0})$ . Since g is continuous and takes a different sign in two different points of its domain, there must exist at least a point $x \in S^{1}$ such that $g (x) = 0$ .

The above cases prove the assertion by exhaustion. □

The Borsuk–Ulam theorem has an interesting consequence for the temperature of the Earth. In fact, let us identify the equator with

S^{1}

and the temperature over the equator with the function f, then the Borsuk–Ulam theorem implies that there exist two antipodal points over the equator where the temperature is the same. Indeed, the Borsuk–Ulam theorem holds in every dimension, which implies, for example, that if

S^{2}

represents the Earth’s surface and

f : S^{2} \to R^{2}

returns the atmospheric temperature and the Earth’s pressure in a point of the Earth surface, then there exist two antipodal points on Earth that exhibit the same temperature-pressure value pair! ■

Matrix factorizations are characterized by a paramount importance in applied calculations, as illustrated in the following.

Example 3.

Let us recall that every symmetric, positive-definite matrix may be factorized as:

P = R D R^{⊤},

(24)

where

P \in S^{+} (n)

,

R \in SO (n)

and D denotes a diagonal

n \times n

matrix. In fact, D is the matrix whose in-diagonal entries coincide with the eigenvalues and R is the matrix whose columns coincide with the eigenvectors of P. The factorization (24) simplifies some calculations, for example, it holds that:

\begin{matrix} P^{3} = P P P = (R D R^{⊤}) (R D R^{⊤}) (R D R^{⊤}) = R D (R^{⊤} R) D (R^{⊤} R) D R^{⊤} \\ = R D D D R^{⊤} = R D^{3} R^{⊤}, \\ \sqrt{P} = R \sqrt{D} R^{⊤}, \\ Exp (P) = \sum_{i = 0}^{\infty} \frac{P^{i}}{i!} = \sum_{i = 0}^{\infty} \frac{{(R D R^{⊤})}^{i}}{i!} = \sum_{i = 0}^{\infty} \frac{R D^{i} R^{⊤}}{i!} = R Exp (D) R^{⊤}, \\ P^{- 1} = R D^{- 1} R^{⊤} . \end{matrix}

(25)

The net result is that convoluted matrix operations may be turned into fairly simple computations that require a finite number of elementary operations. ■

Manifolds may be classified as compact or non-compact:

The manifolds $S^{n - 1} \subset A$ , with $A : = R^{n}$ , and $SO (n) \subset A$ , with $A : = R^{n \times n}$ , are compact since there exists a ball $B (0, r) \subset A$ , with $r < \infty$ , that contains them.
The manifold $S^{+} (n)$ is non-compact since no t finite-radius ball exists that contains it.

In practical terms, the elements of a compact manifold are necessarily limited in value, while the elements of a non-compact manifold may take arbitrarily large values.

From a computational point of view, dynamical systems on compact manifolds, even if they turn out to be unstable, do not pose serious implementation problems, while dynamical systems on non-compact manifolds may result in implementational difficulties (and easily cause runtime errors).

3. Smooth Curves, Tangent Vector Fields, Tangent Spaces and Bundle, Normal Spaces

An interesting object we may think of on a smooth manifold

M

is a smooth curve

γ : [- ϵ, ϵ] \to M

, with

ϵ > 0

. It is worth remarking that a curve may cross different coordinate charts

(U_{k}, ψ_{k})

; therefore, it is generally necessary to split a curve in as many branches (or segments) as coordinate charts it crosses. The function

γ

describes a curve on the manifold

M

delimited by the endpoints

γ (- ϵ)

and

γ (ϵ)

.

3.1. Curves and Bundles for Embedded Manifolds

Let us assume that a manifold

M

is embedded into an ambient space

A

of suitable dimensions (for instance, the sphere

S^{p - 1}

is embedded in the ambient space

A : = R^{p}

). In the following, a number of examples, and a counterexample, are discussed to clarify the important notion of a smooth curve on a manifold.

Example 4.

On a hypersphere

S^{n - 1}

, consider the following function:

γ (t) : = \frac{x + t v}{\sqrt{{(x + t v)}^{⊤} (x + t v)}}

(26)

with

t \in [- ϵ, ϵ]

and

x, v \in R^{n}

being arbitrary (but let us avoid the singularity

x + t v = 0

). Notice that

γ (0) = x / ∥ x ∥

. To verify that such a function γ traces indeed a curve on the hyersphere it suffices to show that, for every

t \in [- ϵ, ϵ]

and

x, v \in R^{n}

, it holds that

γ (t) \in S^{n - 1}

. Indeed,

γ^{⊤} (t) γ (t) = \frac{{(x + t v)}^{⊤} (x + t v)}{{[\sqrt{{(x + t v)}^{⊤} (x + t v)}]}^{2}} = 1 .

(27)

In addition, consider the following function

γ (t) : = x cos (\sqrt{v^{⊤} v} t) + \frac{v}{\sqrt{v^{⊤} v}} sin (\sqrt{v^{⊤} v} t),

(28)

with

t \in [- ϵ, ϵ]

and with

x \in S^{n - 1}

and

v \in R^{n} - {0}

such that

v^{⊤} x = 0

. (In this case, it turns out that

γ (0) = x

.) Similarly to the previous example, in order to prove that γ is a valid curve on the hypersphere, it suffices to show that

γ (t) \in S^{n - 1}

for every

t \in [- ϵ, ϵ]

:

\begin{matrix} γ^{⊤} (t) γ (t) = & [x cos (∥ v ∥ t) + \frac{v}{∥ v ∥} {sin (∥ v ∥ t)]}^{⊤} [x cos (∥ v ∥ t) + \frac{v}{∥ v ∥} sin (∥ v ∥ t)] \\ = & (x^{⊤} x) {cos}^{2} (∥ v ∥ t) + 2 \frac{x^{⊤} v}{∥ v ∥} sin (∥ v ∥ t) cos (∥ v ∥ t) + \frac{v^{⊤} v}{{∥ v ∥}^{2}} {sin}^{2} (∥ v ∥ t) \\ = & (1) {cos}^{2} (∥ v ∥ t) + 2 \frac{(0)}{∥ v ∥} sin (∥ v ∥ t) cos (∥ v ∥ t) + (1) {sin}^{2} (∥ v ∥ t) = 1 . \end{matrix}

(29)

It is not difficult to evaluate

γ^{'} (t)

and, in particular, to show that

γ^{'} (0) = v

.

Let us now consider an example of a curve over the manifold

SO (n)

of the hyper-rotations. In particular, let us examine the following curve on the manifold

SO (2)

:

γ (t) : = [\begin{matrix} cos (b t) & sin (b t) \\ - sin (b t) & cos (b t) \end{matrix}]

(30)

with

t \in [- ϵ, ϵ]

and

b \in R

being arbitrarily chosen. (Notice that

γ (0) = I_{2}

.) In order to show that function γ represents a curve over the space of planar rotations, it suffices to compute the product

γ^{⊤} (t) γ (t)

and to show that it keeps equal to the identity

I_{2}

and that

det (γ (t)) = 1

for every value of t in its range:

\begin{matrix} γ^{⊤} (t) γ (t) = & {[\begin{matrix} cos (b t) & sin (b t) \\ - sin (b t) & cos (b t) \end{matrix}]}^{⊤} [\begin{matrix} cos (b t) & sin (b t) \\ - sin (b t) & cos (b t) \end{matrix}] \\ = & [\begin{matrix} cos (b t) & - sin (b t) \\ sin (b t) & cos (b t) \end{matrix}] [\begin{matrix} cos (b t) & sin (b t) \\ - sin (b t) & cos (b t) \end{matrix}] \\ = & [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}], \\ det (γ (t)) = & {cos}^{2} (b t) - (- {sin}^{2} (b t)) = 1 . \end{matrix}

(31)

Further to this, let us prove that the following function represents a curve over the manifold

SO (n)

:

γ (t) : = X (I_{n} + t H) {(I_{n} - t H)}^{- 1}

(32)

with

t \in [- ϵ, ϵ]

,

X \in SO (n)

and

H^{⊤} = - H \in R^{n \times n}

. (Even in this case, it holds that

γ (0) = X

.) In order to prove such a statement, it is necessary to compute

γ^{⊤} (t) γ (t)

and to show that it equals

I_{n}

and that

det (γ (t)) = 1

at any time:

\begin{matrix} γ^{⊤} (t) γ (t) = & {[X (I_{n} + t H) {(I_{n} - t H)}^{- 1}]}^{⊤} X (I_{n} + t H) {(I_{n} - t H)}^{- 1} \\ = & {(I_{n} - t H)}^{- ⊤} {(I_{n} + t H)}^{⊤} (X^{⊤} X) (I_{n} + t H) {(I_{n} - t H)}^{- 1} \\ = & {(I_{n} + t H)}^{- 1} {(I_{n} + t H)}^{⊤} (I_{n} + t H) {(I_{n} - t H)}^{- 1} \\ = & {(I_{n} + t H)}^{⊤} [{(I_{n} + t H)}^{- 1} (I_{n} + t H)] {(I_{n} - t H)}^{- 1} \\ = & {(I_{n} + t H)}^{⊤} {(I_{n} - t H)}^{- 1} \\ = & (I_{n} - t H) {(I_{n} - t H)}^{- 1} = I_{n}, \\ det (γ (t)) = & det (X) det (I_{n} + t H) {det}^{- 1} (I_{n} - t H) \\ = & det (X) det (I_{n} + t H) {det}^{- 1} ({(I_{n} + t H)}^{⊤}) = 1, \end{matrix}

(33)

where the superscript

^{- ⊤}

denotes the inverse of the transposed matrix.

Notice that, in the proof, we have made use of some matrix identities, including the commutativity property

(I_{n} + t H) {(I_{n} - t H)}^{- 1} = {(I_{n} - t H)}^{- 1} (I_{n} + t H)

and

det (A) = det (A^{⊤})

.

As a further example, let us consider the following function, which we shall prove to represent a curve over the manifold of the

2 \times 2

symmetric, positive-definite matrices:

γ (t) : = [\begin{matrix} a {(t + 1)}^{2} & b {(t + 1)}^{2} \\ b {(t + 1)}^{2} & c {(t + 1)}^{2} \end{matrix}]

(34)

with

t \in [- ϵ, ϵ]

,

0 < ϵ < 1

,

a > 0

and

a c - b^{2} > 0

. (Notice that

γ (0) = [\begin{matrix} a & b \\ b & c \end{matrix}]

.) To prove the assertion, it suffices to show that

γ^{⊤} (t) = γ (t)

and that the eigenvalues of

γ (t)

are strictly positive for any t in its range. Symmetry is immediately verified. The eigenvalues are

a {(t + 1)}^{2} > 0

and

(a c - b^{2}) {(t + 1)}^{4} > 0

.

Let us now consider a counterexample. Define the following function:

γ (t) : = P + t Q,

(35)

with

t \in [- ϵ, ϵ]

and

P, Q \in S^{+} (n)

arbitrary. In general, the above function does not represent a curve in

S^{+} (n)

. To prove such an assertion, it suffices to recall that, in general, a linear combination of two positive-definite matrices does not result in a positive-definite matrix. As a numerical example, consider the following:

P : = [\begin{matrix} 4 & 1 \\ 1 & 1 \end{matrix}]

and

Q : = [\begin{matrix} 2 & - 1 \\ - 1 & 1 \end{matrix}]

. Then,

γ (t) = [\begin{matrix} 4 + 2 t & 1 - t \\ 1 - t & 1 + t \end{matrix}]

. It is readily verified that if

t \in [- 4, 4]

, then

γ (t)

is not necessarily positive-definite for every value of t in the range (for example,

γ (- 3)

is not positive-definite). From this counterexample, we can verify that the manifold

S^{+} (n)

, as all manifolds just exemplified, is not a linear space with respect to standard matrix operations. ■

From a system-theoretic perspective, a curve may be thought of as a trajectory generated by a dynamical system whose state space is a smooth manifold. Take, for instance, the case of a little magnetized ball rolling over a metallic spherical surface: no matter how the little ball moves over the surface of the sphere, during its motion the contact point will describe a curve

γ (t)

over the larger sphere from the initial time

t = - ϵ

to

t = + ϵ

. The parameter t may be thought of as a time index, whose progression corresponds to the flowing of time. Thinking of physical objects moving over curved surfaces helps in understanding the main concepts in manifold calculus, even though a certain degree of abstraction is still necessary, not only because the dimensions of manifolds of interest may be far larger than 3, but also because most manifolds are difficult to visualize in practice.

A noteworthy property of curves is that they are equivalent up to a regular reparameterization.

Comparing the position over a smooth curve in two close-by instants provides the notion of speed, namely, how quickly an object is moving along a trajectory and in which direction. In particular, denoting by

γ_{x} : [- ϵ, ϵ] \to M

a curve such that

γ_{x} (0) = x

, the quantity

v_{x} : = lim_{h \to 0} \frac{γ_{x} (h) - γ_{x} (0)}{h} = {\frac{d γ_{x} (t)}{d t}|}_{t = 0} = {\dot{γ}}_{x} (0) \in A,

(36)

denotes such a speed, which is represented as a tangent vector

v_{x}

at the point x on the manifold. Clearly, the vector

v_{x}

does not belong to the curved manifold

M

, whereas it is tangent to it in the point x. (Notice that in this tutorial the word ‘vector’ is reserved to tangent vectors, while a column-type array—which may represent a location—is termed ‘array’.)

Consider every possible smooth curve on a manifold of dimension p passing through the point x and compute the tangent vectors to all these curves in the point x at once. The collection of these tangent vectors span a linear space of dimension p, which is referred to as tangent space (or ‘tangent plane’) to the manifold

M

at the point x, and is denoted with

T_{x} M : = {{\dot{γ}}_{x} (0) ∣ \forall γ_{x}} \subset A .

(37)

Since there exist infinitely many curves passing through a given point with the same velocity, one should define a tangent vector as an equivalence class of curves passing through such a given point, while being tangent to each other at that point. A tangent plane is hence a collection of such vectors. In addition, we should underline that each tangent space is a subset of the ambient space due to the special nature of

A

, being both a point set and a vector space.

Taking a point

x \in M

and a tangent vector

v_{x} \in M

, any pair

(x, v_{x})

is thought of as belonging to an abstract set termed tangent bundle, defined as

T M : = {(x, v) \in A^{2} ∣ x \in M, v \in T_{x} M} .

(38)

In order to facilitate computation, it is useful to introduce the concept of the normal space of an embedded manifold in a given point under a chosen metric:

N_{x} M : = {z \in A ∣ {〈 z, v 〉}^{A} = 0, \forall v \in T_{x} M} .

(39)

The normal space

N_{x} M

represents the orthogonal complement of the tangent space with respect to a Euclidean ambient space

A

that the manifold

M

is embedded into (in fact, some authors denote normal spaces as

T_{x}^{⊥} M

).

Let us examine the structure of the tangent/normal spaces of a number of smooth manifolds:

Hypercube: Since the space $R^{p}$ is linear, when it is embedded into itself, each tangent space coincides to the whole space, namely, for every, $x \in R^{p}$ it holds that $T_{x} R^{p} \equiv R^{p}$ . Since a normal space is the orthogonal complement to a tangent space, we must conclude that $N_{x} R^{p} = {0}$ .
Hypersphere: At every point $x \in S^{p - 1}$ , the tangent space has the structure

$T_{x} S^{p - 1} : = {v \in R^{p} ∣ v^{⊤} x = 0} .$

(40)

The normal space $N_{x} S^{p - 1}$ at every point of the hypersphere, which is the orthogonal complement of the tangent space with respect to the ambient space $A : = R^{p}$ that the manifold $S^{p - 1}$ is embedded in, has the structure

$N_{x} S^{p - 1} : = {λ x ∣ λ \in R}$

(41)

in the case that the ambient space is equipped with the Euclidean inner product ${〈 v, w 〉}^{A} : = v^{⊤} w$ .
Special orthogonal group: The tangent space of the manifold $SO (p)$ has the structure

$T_{X} SO (p) = {V \in R^{p \times p} ∣ V^{⊤} X + X^{⊤} V = 0_{p}} .$

(42)

This may be proven by differentiating a generic curve $γ (t) \in SO (p)$ passing through X at $t = 0$ . Every such curve satisfies the orthogonal-group characteristic equation $γ^{⊤} (t) γ (t) = I_{p}$ ; therefore, after differentiation, one obtains

${\dot{γ}}^{⊤} (0) γ (0) + γ^{⊤} (0) \dot{γ} (0) = 0_{p} .$

(43)

By recalling that the tangent space is formed by velocity vectors $\dot{γ} (0)$ , the above-mentioned result is readily achieved. Provided the ambient space $A : = R^{p \times p}$ is endowed with the canonical Euclidean metric ${〈 V, W 〉}^{A} : = tr (V^{⊤} W)$ , the normal space at a point X may be defined as

$N_{X} SO (p) = {N \in R^{p \times p} ∣ tr (N^{⊤} V) = 0, \forall V \in T_{X} SO (p)} .$

(44)

It is easy to convince oneself that every tangent vector $V \in T_{X} SO (p)$ may be written as $V = X H$ , with H skew-symmetric (i.e., such that $H^{⊤} = - H$ ); then, any element $N \in N_{X} SO (p)$ may be written as $N = X S$ , with $S = S^{⊤}$ . In fact, the normality condition implies $0 = tr (V^{⊤} (X S)) = tr (S V^{⊤} X) = tr (X^{⊤} V S^{⊤})$ , which is equivalent to $- tr ((V^{⊤} X) S^{⊤})$ ; therefore, the normality condition may be recast as $tr ((V^{⊤} X) (S - S^{⊤})) = 0$ . It is hence necessary and sufficient that $S = S^{⊤}$ ; therefore,

$N_{X} SO (p) = {X S ∣ S^{⊤} = S \in R^{p \times p}} .$

(45)
Stiefel manifold: Given a trajectory $[- ϵ, ϵ] ∋ t \mapsto X (t) \in St (n, p)$ , derivation with respect to the parameter t yields ${\dot{X}}^{⊤} X + X^{⊤} \dot{X} = 0$ , which means that the tangent space to the manifold $St (n, p)$ in a point $X \in St (n, p)$ has the structure:

$T_{X} St (n, p) = {V \in R^{n \times p} ∣ V^{⊤} X + X^{⊤} V = 0} .$

(46)

The normal space has the structure:

$N_{X} St (n, p) = {X S ∣ S \in R^{p \times p}, S^{⊤} - S = 0} .$

(47)
Real symplectic group: The tangent space associated with the real symplectic group has the structure:

$T_{Q} Sp (2 n) = {V \in R^{2 n \times 2 n} ∣ V^{⊤} J Q + Q^{⊤} J V = 0_{2 n}} .$

(48)

The tangent spaces and the normal space associated with the real symplectic group may be characterized as follows:

$\{\begin{matrix} T_{Q} Sp (2 n) = {V = Q J S ∣ S \in R^{2 n \times 2 n}, S^{⊤} = S}, \\ N_{Q} Sp (2 n) = {N = J Q H ∣ H \in R^{2 n \times 2 n}, H^{⊤} = - H} . \end{matrix}$

(49)
Space of symmetric, positive-definite matrices: Given a point $P \in S^{+} (p)$ , its tangent bundle may be characterized simply by observing that every curve $γ : [- ϵ, ϵ] \to S^{+} (p)$ satisfies $γ (t) = γ {(t)}^{⊤}$ and $γ (t) > 0$ . Only the equality constraint influences the structure of the tangent space; therefore,

$T_{P} S^{+} (p) = {S \in R^{p \times p} ∣ S^{⊤} = S} .$

(50)

Notice that every tangent space is identical to each other as it does not depend on the base point P.
Grassmann manifold: For every element $[X] \in Gr (n, p)$ , the tangent space may be represented as:

$T_{[X]} Gr (n, p) = {V \in R^{n \times p} | X^{⊤} V = 0} .$

(51)

A tangent space $T_{[X]} Gr (n, p)$ may be decomposed as the direct sum of a horizontal space and of a vertical space at $[X] \in Gr (n, p)$ [40]. Starting from a point $[X] \in Gr (n, p)$ , moving along a horizontal direction causes a change in subspace, while moving along a vertical direction does not change the subspace $[X]$ .

3.2. Vector Fields

Let us now consider a map

v_{x} : M \to T M

that associates a tangent vector

v_{x}

with every point x of a manifold. One such map is termed tangent vector field (or simply vector field, for short). The set of all tangent vector fields of a manifold

M

is denoted as

Γ (M)

. Vector fields are objects of prime importance in manifold calculus. In physical system dynamics, vector fields are associated with, e.g., speed and acceleration.

Example 5.

A vector field on a manifold

M

is a function

f : M \to T M

that assigns to every point

x \in M

a tangent vector

f (x) \in T_{x} M

. A vector field may even depend on the time parameter, in which case it will be denoted as

f (t, x)

with

f : R \times M \to T M

.

Let us now consider the hypersphere

S^{7}

embedded in

A : = R^{8}

and the function

f : S^{7} \to R^{8}

defined as:

f (x) : = (I_{8} - x x^{⊤}) A x,

(52)

where x indicates a column array of 8 elements and

A \in R^{8 \times 8}

is an arbitrary constant matrix. Such a function describes a vector field in

Γ (S^{7})

because it holds that

f (x) \in T_{x} S^{7}

for every x. In fact,

x^{⊤} f (x) = x^{⊤} (A x - x x^{⊤} A x) = x^{⊤} A x - (x^{⊤} x) (x^{⊤} A x) = 0 .

(53)

Notice that the tangent vector

f (x)

is inextricably tied to the point x on the manifold. In fact, in general, taking two distinct points

x, y \in M

, it turns out that

f (x) \notin T_{y} M

. It is then sensible to describe a vector field as a set of pairs:

(x, f (x)) \in T M .

(54)

As a further example, let us consider the manifold

SO (8)

embedded in

A : = R^{8 \times 8}

and a function

f : R \times SO (8) \to R^{8 \times 8}

:

f (t, R) : = R A t + R^{2} B R^{⊤},

(55)

where R is a

8 \times 8

orthogonal matrix variable, while A and B are

8 \times 8

skew-symmetric constant matrices (namely,

A^{⊤} = - A

and

B^{⊤} = - B

). Such a function represents a (time-varying) vector field on the manifold of hyper-rotations in that, for every

R \in SO (8)

and

t \in R

, it holds that

f (t, R) \in T_{R} SO (8)

. Such an assertion may be proven as follows

\begin{matrix} R^{⊤} f (t, R) + f^{⊤} (t, R) R = & R^{⊤} (R A t + R^{2} B R^{⊤}) + {(R A t + R^{2} B R^{⊤})}^{⊤} R \\ = & (R^{⊤} R) A t + (R^{⊤} R) R B R^{⊤} + A^{⊤} (R^{⊤} R) t + R B^{⊤} R^{⊤} (R^{⊤} R) \\ = & A t + R B R^{⊤} - A t - R B R^{⊤} \\ = & 0_{n} . \end{matrix}

(56)

As a last example, let us take the manifold

S^{+} (5)

embedded in

A : = R^{5 \times 5}

and a function

f : S^{+} (5) \to R^{5 \times 5}

defined as

f (P) : = P + P^{2} + 3 P^{- 1},

(57)

where P is a

5 \times 5

symmetric, positive-definite matrix variable (hence, even

P^{2} = P P

and

P^{- 1}

are symmetric, positive-definite). It turns out that the function f represents a vector field in

Γ (S^{+} (5))

; in fact, for every

P \in S^{+} (5)

, it holds that

f (P) \in T_{P} S^{+} (5)

:

\begin{matrix} f^{⊤} (P) - f (P) = & {(P + P^{2} + 3 P^{- 1})}^{⊤} - (P + P^{2} + 3 P^{- 1}) \\ = & P^{⊤} + P^{2 ⊤} + 3 P^{- ⊤} - P - P^{2} - 3 P^{- 1} \\ = & P + P^{2} + 3 P^{- 1} - P - P^{2} - 3 P^{- 1} \\ = & 0_{n} . \end{matrix}

(58)

(Notice that, in the proof, only the symmetry of the matrix P and of its powers

P^{2}

and

P^{- 1}

has been made use of.) ■

As a matter of fact, manifold calculus primarily deals with two kinds of objects:

Points on a manifold, denoted as $x \in M \subset A$ ;
Tangent vectors, denoted as $v \in T_{x} M \subset A$ .

and occasionally on normal arrays, which are instrumental in calculations. In the case of matrix manifolds, which are of prime importance in applications, the ambient space

A : = R^{p \times q}

; hence, points x and tangent vectors v are essentially arrays or matrices (either rectangular or square).

3.3. Canonical Curves, Canonical Basis of a Tangent Space*

In intrinsic manifold calculus, the way to regard, e.g., tangent spaces and vector fields is based on differential operators [21]. Formally, if

F

denotes a smooth function space, a tangent vector

v \in T_{x} M

is defined in such a way that

v : F \to R

; namely,

v (f)

denotes the directional derivative of the function

f \in F

along the direction v. Recall that a differential operator in

T_{x} M

may be written as a linear combination of elementary differential operators, namely the derivatives along the basis axes, through some coefficients.

Let

M

denote a smooth manifold of dimension p and let

γ_{x} : [- ϵ, ϵ] \to M

denote any smooth curve such that

γ_{x} (0) = x

. Let us assume, for simplicity, that the image of

γ_{x}

is entirely contained in a chart

(U, ψ)

. The map

ψ \circ γ_{x}

traces out a smooth curve in the parameter space

R^{p}

, which is differentiable in the usual multivariable calculus sense. Let us denote the intrinsic coordinates of the points over the curve as

(θ^{1} (t), θ^{2} (t), θ^{3} (t), \dots, θ^{p} (t)) = ψ (γ_{x} (t)) .

(59)

The superscript notation

θ^{i}

to denote the

i

th coordinate is standard in manifold calculus. This notation helps when checking expressions written in intrinsic coordinates. Since the chart

ψ

is invertible by definition, we may represent the curve

γ

in local coordinates by

γ_{x} (t) = ψ^{- 1} (θ^{1} (t), θ^{2} (t), θ^{3} (t), \dots, θ^{p} (t)) .

(60)

In particular, it holds that

x = ψ^{- 1} (θ^{1} (0), θ^{2} (0), θ^{3} (0), \dots, θ^{p} (0))

. On the basis of the representation (60), we may define as many as pcanonical curves as

γ_{x}^{i} (t) : = ψ^{- 1} (θ^{1} (0), θ^{2} (0), \dots, θ^{i} (0) + t, \dots, θ^{p} (0))

for

t \in [- ϵ, ϵ]

and

i = 1, 2, 3, \dots, p

. Namely, a canonical curve around a point is traced on a manifold by letting only one of the p coordinates vary at a time. Let us now consider the tangent vectors

\partial_{i}^{x} : = {\dot{γ}}_{x}^{i} (0) = {\frac{d γ_{x}^{i} (t)}{d t}|}_{t = 0} \in T_{x} M .

(61)

The set

{\partial_{1}^{x}, \partial_{2}^{x}, \dots, \partial_{p}^{x}}

is termed canonical basis of the vector space

T_{x} M

. Its property of being a basis is implied by the fact that such canonical vectors are linearly independent from one another. Therefore, any tangent vector may be written as a linear combination of canonical vectors, that is

v_{x} = v_{x}^{1} \partial_{1}^{x} + v_{x}^{2} \partial_{2}^{x} + \dots + v_{x}^{p} \partial_{p}^{x},

(62)

where the quantities

v_{x}^{1}

represent the coefficients of the linear combination. To shorten the equations, the above expression may be written simply as

v = v^{i} \partial_{i}

, where the information about what happens at the point x has been suppressed and the summation is implied by the presence of a repeated index (i), one in upper position (

^{i}

) and one in lower position (

_{i}

), which is commonly referred to as Einstein summation convention.

Example 6.

Let us consider the case of a sphere

S^{2}

embedded into

R^{3}

. A parametrization of

S^{2}

(excluding the South pole) is

x = ψ^{- 1} (θ^{1}, θ^{2}) : = [\begin{matrix} sin θ^{1} cos θ^{2} \\ sin θ^{1} sin θ^{2} \\ cos θ^{1} \end{matrix}]

(63)

for

0 \leq θ^{1} \leq π

(where

θ^{1}

is commonly referred to as ‘latitude’) and

0 \leq θ^{2} < 2 π

(where

θ^{2}

is commonly referred to as ‘longitude’). For example, the point

ψ^{- 1} (0, 0)

coincides with the North pole, while the point

ψ^{- 1} (\frac{π}{2}, 0)

lays on the equator of the sphere. Given a point

x = ψ^{- 1} (θ_{0}^{1}, θ_{0}^{2})

, the curve

γ_{x}^{1} (t) = [\begin{matrix} sin (θ_{0}^{1} + t) cos θ_{0}^{2} \\ sin (θ_{0}^{1} + t) sin θ_{0}^{2} \\ cos (θ_{0}^{1} + t) \end{matrix}]

(64)

traces a ‘meridian’ through x, while the curve

γ_{x}^{2} (t) = [\begin{matrix} sin θ_{0}^{1} cos (θ_{0}^{2} + t) \\ sin θ_{0}^{1} sin (θ_{0}^{2} + t) \\ cos θ_{0}^{1} \end{matrix}]

(65)

traces a ‘parallel’ through x. Hence, by definition, the tangent space

T_{x} S^{2}

is spanned by the canonical basis vectors

\partial_{1}^{x} = {\dot{γ}}_{x}^{1} (0) = {\frac{d γ_{x}^{1} (t)}{d t}|}_{t = 0} = [\begin{matrix} cos θ_{0}^{1} cos θ_{0}^{2} \\ cos θ_{0}^{1} sin θ_{0}^{2} \\ - sin θ_{0}^{1} \end{matrix}]

(66)

and

\partial_{2}^{x} = {\dot{γ}}_{x}^{2} (0) = {\frac{d γ_{x}^{2} (t)}{d t}|}_{t = 0} = [\begin{matrix} - sin θ_{0}^{1} sin θ_{0}^{2} \\ sin θ_{0}^{1} cos θ_{0}^{2} \\ 0 \end{matrix}] .

(67)

For example, it is readily seen that

\partial_{1}^{(\frac{π}{2}, 0)} = [\begin{matrix} 0 \\ 0 \\ - 1 \end{matrix}], \partial_{2}^{(\frac{π}{2}, 0)} = [\begin{matrix} 0 \\ 1 \\ 0 \end{matrix}],

(68)

represents tangent vectors at

{[1 0 0]}^{⊤}

(i.e., the North pole). ■

The dual space to a tangent space

T_{x} M

, denoted as

T_{x}^{★} M

, is termed cotangent space. Elements of a cotangent space are termed covectors

ω \in T_{x}^{*} M

. Covectors combine with vectors by ‘annihilation’ to a scalar. Given a canonical basis

{\partial_{1}^{x}, \partial_{2}^{x}, \dots, \partial_{p}^{x}}

of

T_{x} M

, it is possible to define a canonical basis

{d x^{1}, d x^{2}, \dots, d x^{p}}

of

T_{x}^{*} M

by the conditions

d x^{i} (\partial_{j}) = δ_{j}^{i}

. The union of all cotangent spaces form a cotangent bundle

T^{*} M : = {(x, ω) ∣ x \in M, ω \in T_{x}^{*} M}

.

4. First-Order Dynamical Systems on Manifolds

A number of physical phenomena are described in system theory as systems of coupled differential equations in several variables. A fairly general representation of such systems is

\dot{x} (t) = f (t, x (t)), x (0) = x_{0} \in R^{p}, t \in [0, t_{f}],

(69)

where

x \in R^{p}

denotes a state variable array and represents a set of descriptive variables, and

f : R \times R^{p} \to R^{p}

represents a state-transition function. The multi-variable array-type function f denotes a (possibly time-varying) vector field that represents the set of velocities of change of the state of the system. The solutions of the differential Equation (69) represent integral curves of the vector field f. In this case, the space

R^{p}

represents the flat state space and the velocity space of the model (69) at once.

In order to take into account further constraints on the state variables, termed invariants, it is convenient to introduce the notion of curved state space in the form of a smooth manifold. This logical process leads to a type of first-order dynamical system described by

\dot{x} (t) = f (t, x (t)), x (0) = x_{0} \in M, t \in [0, t_{f}],

(70)

where

f : R \times M \to T M

denotes the state-transition function of the mathematical model and

x \in M

denotes the state of the system at any given time t. Even in this case, the function f denotes the velocity field; however, in this case, the state space is the manifold

M

, while the velocity space is the tangent bundle

T M

.

Let us underline two aspects of the above equation:

Meaning of $\dot{x} (t)$ in the expression (70): Since we assumed the manifold $M$ to be embedded in an ambient space $A$ of type $R^{p \times q}$ , the quantity $\dot{x} (t)$ is an array or a matrix made of the derivatives of the entries of $x (t)$ with respect to the time parameter t. Let us recall, however, that even if such a specification helps one understand the subject, we shall never write any relation involving single components of such matrix-type objects as we shall treat any state variable x as a whole, (except in a low-dimensional example). The solution $x (t)$ of the differential Equation (70) is the trajectory of the system and is represented by a curve on the manifold $M$ ; hence, $\dot{x} (t)$ denotes the speed along the trajectory since, for every $t \in [0, t_{f}]$ , it holds that $(x (t), \dot{x} (t)) \in T M$ , namely $\dot{x} \in Γ (M)$ .
Separation (or non-equivalency) of first-order and second-order systems: When dealing with dynamical systems in $R^{p \times q}$ , the system (69) is actually fairly general since, upon introducing additional variables, it is possible to turn an n-th order system into a first-order system. We consider now how such a property stems from the legitimate ‘confusion’ between the state space and velocity space. In contrast to this, when dealing with dynamical systems on manifolds, such confusion is not legitimate, since $x \in M$ while $\dot{x} \in T_{x} M$ ; namely, the system state and the system velocity belong to very different spaces. It suffices to recall that $M$ is generally a curved space (i.e., non linear) while each $T_{x} M$ is a vector space (flat, linear). For this reason, second-order systems are not assimilable to first-order systems.

Example 7.

Let us reconsider the function (52). On the basis of such a vector field, we may define the following first-order system on the hypersphere

S^{n - 1}

:

\dot{x} (t) = (I_{n} - x (t) x^{⊤} (t)) A x (t), x (0) = x_{0} \in S^{n - 1} .

(71)

Such a system may be rewritten equivalently as:

\dot{x} = A x - (x^{⊤} A x) x .

(72)

Let us examine the equilibrium points of such dynamical system. Any equilibrium point

x_{★}

must satisfy the equation

A x_{★} = (x_{★}^{⊤} A x_{★}) x_{★} .

(73)

Whenever A is symmetric and positive-definite, Equation (73) establishes that

x_{★}

is an eigenvector of matrix A. In effect, the state of the system (72) evolves towards an eigenvector of the matrix A. For this reason, the dynamical system (73) represents a prototypical example of a continuous-time calculation system.

The differential Equation (72) is commonly referred to as theOja equationand was studied by Prof. Erkki Oja from the Helsinki University of Technology (currently Aalto University). It was studied in several contexts, including automation and control [41], and constitutes an exemplification of the fact that even a fairly simple computing element might be able to perform a complex calculations, namely, extracting an eigenvector

x_{★}

(and the corresponding eigenvalue

λ_{★} : = x_{★}^{⊤} A x_{★}

) from an arbitrarily-sized matrix. ■

Let us now consider an example, drawn from circuit theory, that introduces the notion of ‘invariants’ for a dynamical system, which indeed motivates the study of dynamical systems on manifolds.

Example 8.

In order to emphasize the notion ofinvariantsin connection to dynamical systems, let us consider the simple model of an ideal DC-to-DC converter studied in [42]. The mathematical model of such converter reads:

\{\begin{matrix} C_{1} \frac{d v_{1}}{d t} = (1 - u) i_{3}, \\ C_{2} \frac{d v_{2}}{d t} = u i_{3}, \\ L_{3} \frac{d i_{3}}{d t} = - (1 - u) v_{1} - u v_{2}, \end{matrix}

(74)

where

v_{1}, v_{2}

are state voltages (across capacitors),

i_{3}

is a state current (across an inductor) and

u \in {0, 1}

denotes the control input (a switch). The following function is an invariant for such an electrical circuit

I : = \frac{1}{2} C_{1} v_{1}^{2} + \frac{1}{2} C_{2} v_{2}^{2} + \frac{1}{2} L_{3} i_{3}^{2} .

(75)

Indeed, such a quantity represents the total energy across the electrical network. By invariant it is meant that

\dot{I} = 0

, namely that the time-function

I (t)

keeps the same value for every t. Notice, in fact, that

\begin{matrix} \frac{d I}{d t} = & v_{1} C_{1} \frac{d v_{1}}{d t} + v_{2} C_{2} \frac{d v_{2}}{d t} + i_{3} L_{3} \frac{d i_{3}}{d t} \\ = & v_{1} (1 - u) i_{3} + v_{2} u i_{3} + i_{3} (- (1 - u) v_{1} - u v_{2}) \\ = & 0 . \end{matrix}

(76)

In order to simplify the analysis of such a dynamical system, let us define the following abstract state variables:

x_{1} : = \sqrt{\frac{C_{1}}{2}} v_{1}, x_{2} : = \sqrt{\frac{C_{2}}{2}} v_{2}, x_{3} : = \sqrt{\frac{L_{3}}{2}} i_{3} .

(77)

Let us arrange the above state variables into the state array

x : = [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}]

. The system (74) may thus be rewritten as

\dot{x} = f (t, x)

, with

f (t, [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}]) : = \underset{= : H (t)}{\underset{︸}{[\begin{matrix} 0 & 0 & \sqrt{\frac{1}{C_{1} L_{3}}} (1 - u (t)) \\ 0 & 0 & \sqrt{\frac{1}{C_{2} L_{3}}} u (t) \\ - \sqrt{\frac{1}{C_{1} L_{3}}} (1 - u (t)) & - \sqrt{\frac{1}{C_{2} L_{3}}} u (t) & 0 \end{matrix}]}} [\begin{matrix} x_{1} \\ x_{2} \\ x_{3} \end{matrix}] .

(78)

In short,

f (t, x) = H (t) x

.

In terms of the new state variables, the invariant may be rewritten as:

I (x) : = x_{1}^{2} + x_{2}^{2} + x_{3}^{2} .

(79)

Assuming, for instance, that

I (x_{0}) = 1

, Equation (79) establishes that the state x belongs to the sphere

S^{2}

. Hence, we may observe that the state space of the DC-to-DC converter is the unit sphere

S^{2}

embedded in

R^{3}

.

As a last verification step, let us prove that

f (t, x) = H (t) x \in T_{x} S^{2}

for every t. It suffices to prove that

{(H x)}^{⊤} x = 0

, namely that

x^{⊤} H^{⊤} x = 0

. By definition of H, it is readily found to be a skew-symmetric matrix, namely that for every t, it holds that

H^{⊤} (t) = - H (t)

. By transposing the product

x^{⊤} H^{⊤} x

, we find the sought result; in fact,

{(x^{⊤} H^{⊤} x)}^{⊤} = x^{⊤} H x = - (x^{⊤} H^{⊤} x),

(80)

and since

x^{⊤} H^{⊤} x

is a scalar, it must hold that

{(x^{⊤} H^{⊤} x)}^{⊤} = x^{⊤} H^{⊤} x

, therefore it must be equal to zero. ■

First-order dynamical systems are characterized by a (possibly varying) vector field that determines its dynamics, as illustrated by the following example.

Example 9.

Let us consider the following first-order dynamical system on the manifold

SO (3)

of three-dimensional rotations in space:

\dot{R} (t) = A - R (t) A^{⊤} R (t), R (0) = I_{3} .

(81)

In this system, the matrix

R \in SO (3)

represents the orientation of a moving orthogonal frame attached to a rigid body with respect to an inertial reference frame, while the arbitrary constant matrix

A \in R^{3 \times 3}

determines its rotational speed (namely, the orientation of the axis of rotation and the rotational velocity).

The orientation matrix is often termedattitudefor rigid bodies such as drones and satellites:

The instance $R = I_{3}$ indicates that the object is horizontal with respect to the reference frame;
The instance $R \neq I_{3}$ indicates that it is necessary to rotate the body-fixed axes to align them to the inertial axes.

The dynamical system (81) is of the type

\dot{R} = f (R)

. Let us verify that

f (R) : = A - R A^{⊤} R

is a vector field of

T SO (3)

. To show such a property, it suffices to prove that

f^{⊤} (R) R + R^{⊤} f (R) = 0_{3}

. This is in fact true:

\begin{matrix} {(A - R A^{⊤} R)}^{⊤} R + R^{⊤} (A - R A^{⊤} R) = & A^{⊤} R - R^{⊤} A (R^{⊤} R) + R^{⊤} A - (R^{⊤} R) A^{⊤} R \\ = & A^{⊤} R - R^{⊤} A + R^{⊤} A - A^{⊤} R \\ = & 0_{3} . \end{matrix}

(82)

It is worth noticing that the property

f \in Γ (SO (3))

holds true even in the case that A is a time-varying matrix field, namely

A = A (t)

. ■

5. Tangent Maps: Pushforward and Pullback

Let us consider two Riemannian manifolds

M

and

N

. Any smooth function

f : M \to N

transforms a curve in

M

into a curve in

N

. Since both curves are associated with their velocity vector fields, which define their tangent spaces, one might wonder how the function f maps tangent spaces in

M

into tangent spaces in

N

. The answer is materialized as a pushforward map.

Let

x \in M

, then a pushforward map

d_{x} f : T_{x} M \to T_{f (x)} N

is defined such that for every smooth curve

γ_{x} : [- ϵ, ϵ] \to M

,

γ_{x} (0) = x

it holds that:

{\frac{d}{d t} f (γ_{x} (t))|}_{t = 0} = : d_{x} f ({\dot{γ}}_{x} (0)) .

(83)

In general, therefore, given a function

f : M \to N

that maps a point x from the manifold

M

to the point

f (x)

on the manifold

N

, the map

d_{x} f

associates to any tangent vector v belonging to the tangent space

T_{x} M

a tangent vector

d_{x} f (v)

belonging to the tangent space

T_{f (x)} N

by ‘pushing’ such vector to such space.

A pushforward map is indeed a linear approximation of a smooth map on tangent spaces. Any tangent map

d_{x} f (v)

is linear in the argument v. The map

d_{x} f

at a point x represents, in practical terms, the best linear approximation of the function f near x. A pushforward map may be regarded as a generalization of the total derivative of ordinary calculus. (Some authors would denote a pushforward map by an asterisk as in

f_{★}

; however, this notation takes up the space of the reference point x hence hindering it.)

Let us consider two special cases of interest.

Pushforward of a manifold-to-scalar function: The special case that $N : = R$ , namely that f is a manifold-to-scalar function, is particularly important in applications. Such a special case will be covered later since it involves the notion of Riemannian gradient.
Pushforward of a matrix-to-matrix function: This is the case that the smooth manifolds $M$ and $N$ are real matrix manifolds embedded in $R^{p \times p}$ . Any smooth function between any such pair of manifolds is of matrix-to-matrix type. Let us assume that the function f is analytic about a point $X_{0} \in M$ , namely, that it may be expressed as a polynomial series:

$f (X) = \sum_{k = 0}^{\infty} a_{k} {(X - X_{0})}^{k}, a_{k} \in R .$

(84)

Then, the pushforward map $d_{X} f (V)$ in a point $X \in M$ applied to the tangent direction $V \in T_{X} M$ may be expressed as:

$d_{X} f (V) = \sum_{k = 1}^{\infty} a_{k} \sum_{r = 0}^{k - 1} {(X - X_{0})}^{r} V {(X - X_{0})}^{k - r - 1} .$

(85)

It is easily recognized that the tangent map $d f_{X} (V)$ is linear in the argument V. As a reference for the readers, we recall the analytic expansion of three matrix-to-matrix functions, that may be used to compute the corresponding pushforward maps:
- Matrix exponential: For the map $f (X) = Exp (X)$ , it holds that $X_{0} = 0$ , $a_{k} = {(k!)}^{- 1}$ for $k \geq 0$ ;
- Principal matrix logarithm: For the map $f (X) = Log (X)$ , it holds that $X_{0} = I_{p}$ , $a_{0} = 0$ , $a_{k} = {(- 1)}^{k + 1} k^{- 1}$ for $k \geq 1$ ;
- Matrix inversion: For the map $f (X) = X^{- 1}$ , it holds that $X_{0} = I_{p}$ , $a_{k} = {(- 1)}^{k}$ for $k \geq 0$ .

The inverse of a pushforward map

d_{x} f

is termed pullback map and is denoted as

{(d_{x} f)}^{- 1}

. A map

{(d_{x} f)}^{- 1} (w)

‘pulls’ any tangent vector w from the tangent space

T_{f (x)} N

back to the tangent space

T_{x} M

.

6. Lie Groups, Lie Algebras, Lie Brackets

Lie groups are hybrid mathematical constructions sharing properties that characterize smooth manifolds and algebraic groups. Let us recall that an algebraic group is an algebraic structure

(G, m, i, e)

made of a set

G

, either discrete or continuous, endowed with an internal operation denoted as

m : G \times G \to G

, usually referred to as group multiplication (not necessarily a multiplication is the standard sense, though), an inversion operation denoted as

i : G \to G

, and an identity element with respect to group multiplication denoted as e. The multiplication, inversion and identity are related in such a way that, for every triple

x, y, z \in G

, it holds that:

\begin{matrix} m (x, e) = m (e, x) = x \\ m (x, i (x)) = m (i (x), x) = e, \\ m (x, m (y, z)) = m (m (x, y), z) . \end{matrix}

(86)

Note that, in general, group multiplication is not commutative, i.e.

m (x, y) \neq m (y, x)

.

Two instances of algebraic groups are

(Z, +, -, 0)

, a discrete group, and

(GL (p), \cdot,^{- 1}, I_{p}

), a continuous group. The structure

(Z, +, -, 0)

represents the set of integer numbers (either positive or negative, also including 0) with the standard addition as a group multiplication, which implies that the inverse is the standard subtraction while the identity is the 0 element. The second example, namely the structure

(GL (p), \cdot,^{- 1}, I_{p}

), represents the subset of non-singular matrices of size

p \times p

, endowed with the standard matrix-to-matrix multiplication ‘·’ as group multiplication. In such a case, the inverse operation coincides with the standard matrix inversion while the group identity coincides with the identity matrix

I_{p}

. It is straightforward to show that such group operations/identities satisfy the recalled group axioms. A counterexample of a structure that is not an algebraic group is given by the set of non-negative integer numbers

Z_{0}^{+}

, which does not form a group under standard addition/subtraction (in fact, the subtraction of two positive integers does not necessarily return a positive integer).

With the notions of algebraic groups and smooth manifolds, we may now define a well-known object of manifold calculus, namely a Lie group. A Lie group conjugates the properties of an algebraic group and of a smooth manifold, as it is a set endowed with group properties and a manifold structure. Paraphrasing Wirth:

Lie group := Manifold + Algebraic group.

Let us denote by

T_{x} G

the tangent space of a Lie group

G

at a point

x \in G

. The tangent space at the identity, namely

T_{e} G

, represents a special instance of tangent spaces. In fact, such a tangent space, upon being endowed with a binary operator termed Lie brackets, possesses the structure of a so-called Lie algebra and will be denoted as

g

.

Let us examine the structure of the Lie algebra of the following Lie groups:

Hypercube: The hypercube, also known as a translation group, is a Lie group under standard matrix sum (matrix subtraction and zero matrix complete the group structure). The Lie algebra of $R^{p}$ coincides with itself.
General linear group and the special linear group: Both the general linear group $GL (p)$ and the special linear group $Sl (p)$ are Lie groups under standard matrix multiplication and inversion. The Lie algebra of the general linear group, namely $gl (p)$ , coincides with $R^{p \times p}$ . The linear algebra associated with the special linear group is more interesting and its determination involves some clever matrix computations. Let us consider $Sl (p)$ endowed with standard matrix multiplication, inversion and $I_{p}$ as the group identity and a curve $γ : [- ϵ, ϵ] \to Sl (p)$ defined by $γ (t) : = Exp (t A)$ , where $Exp$ denotes the matrix exponential and $A \in R^{p \times p}$ . Clearly, $γ (0) = I_{p}$ ; hence, $\dot{γ} (0)$ represents any element of the Lie algebra $sl (p)$ . It is not hard to prove that

$\dot{γ} (t) = A Exp (t A),$

(87)

hence, $\dot{γ} (0) = A \in sl (p)$ . Now, let us recall a result from matrix calculus [43]:

${\frac{d}{d t} det (γ (t))|}_{t = 0} = tr (\dot{γ} (0)) .$

(88)

In the present case, since $det (γ (t)) \equiv 1$ , it follows from the above considerations that $tr (A) = 0$ . In conclusion, we found that

$sl (p) = {A \in R^{p \times p} ∣ tr (A) = 0},$

(89)

the space of traceless matrices. The algebra $sl (p)$ has dimension $p^{2} - 1$ .
Special orthogonal group: The $SO (n)$ manifold is a Lie group under standard matrix multiplication. The Lie algebra associated with the special orthogonal group is the set of skew-symmetric matrices $so (p) : = {H \in R^{p \times p} ∣ H + H^{⊤} = 0_{p}}$ . In fact, at the identity it holds that $T_{I_{p}} SO (p) = so (p)$ . The Lie algebra $so (p)$ is a vector space of dimension $2^{- 1} p (p - 1)$ .
Real symplectic group: The Lie algebra associated with the real symplectic group may be characterized as follows:

$sp (2 n) = {H = J S ∣ S \in R^{2 n \times 2 n}, S^{⊤} = S},$

(90)

where the quantity J denotes again the fundamental skew-symmetric matrix.
Manifold of symmetric, positive-definite matrices: The Lie algebra associated with the Lie group $S^{+} (p)$ is the set of $p \times p$ symmetric matrices, namely $s^{+} (p) : = {S \in R^{p \times p} ∣ S^{⊤} = S}$ . The space of symmetric, positive-definite matrices is not a group under standard matrix multiplication. We recall from [44] the following group structure $(S^{+} (p), m, i, e)$ :
-
Multiplication: $m (P, Q) : = Exp (Log P + Log Q)$ , (logarithmic multiplication), with $P, Q \in S^{+} (p)$ , where ‘ $Log$ ’ denotes the principal matrix logarithm;
-
Identity element: $e = I_{p}$ (notice that $Log I_{p} = 0_{p}$ );
-
Inverse: $i (P) = Exp (- Log P)$ (matrix inversion), with $P \in S^{+} (p)$ (any symmetric, positive-definite matrix is non-singular).
It is easy to verify that the proposed instances of $m, i, e$ satisfy the algebraic-group axioms in $S^{+} (p)$ . Additionally, the logarithmic multiplication on $S^{+} (p)$ is compatible with its smooth manifold structure, as the map $(P, Q) \mapsto m (P, i (Q)) = Exp (Log P - Log Q)$ is smooth.

An essential peculiarity of any Lie groups

(G, m, i, e)

is that the whole group may always be brought back to a convenient neighborhood of the identity e and the same holds true for every tangent space

T_{x} G

,

\forall x \in G

, which may be brought back to the algebra

g

. Let us consider, for instance, a curve

γ_{x} : [- ϵ, ϵ] \to G

such that

γ_{x} (0) = x

. We may define a new curve

{\tilde{γ}}_{x} (t) : = m (γ_{x} (t), i (x))

(91)

that has the property

{\tilde{γ}}_{x} (0) = e

. Conversely,

γ_{x} (t) = m ({\tilde{γ}}_{x} (t), x)

. This operation closely resembles a translation of a curve into a neighborhood of the group identity.

The above observation leads to defining two Lie-group functions:

Right translation: Defined as a function $R_{x} : G \to G$ by $R_{x} (y) : = m (y, x)$ for every pair $(x, y) \in G$ ;
Left-translation: Defined as a function $L_{x} : G \to G$ by $L_{x} (y) : = m (i (x), y)$ for every pair $(x, y) \in G$ ;

the distinction between the left translation and right translation is somewhat arbitrary: we chose a rather non-standard one.

Notice that

R_{x}

and

L_{x}

commute; in fact,

R_{x} (L_{x} (y)) = m (m (i (x), y), x) = m (i (x), m (y, x)) = L_{x} (R_{x} (y)) .

(92)

Let us consider a simple example to get acquainted with the notion of left/right translation.

Example 10.

Let us particularize the notion of left/right translation to the familiar hypercube. In

R^{p}

, considered as a group under standard array addition, these functions may be defined as

R_{x} (y) : = y + x

and

L_{x} (y) = - x + y

. The composition

R_{x} (L_{x} (y))

returns

R_{x} (L_{x} (y)) = L_{x} (y) + x = - x + y + x

, namely

R_{x} (L_{x} (y)) = y

; hence,

R_{x} \circ L_{x} = {id}_{x}

. Since

R_{x}

and

L_{x}

commute, we may say that, in this specific case, the functions

R_{x}

and

L_{x}

are the inverse of one another. In general, this is not true, though, and the failure of the composition

R_{x} \circ L_{x}

to equal an identity map is caused by the lack of commutativity of a group. ■

Now, let us consider the curve

L_{x} (γ_{x})

: it crosses the identity e at

t = 0

. Consequently, the pushforward

d_{x} L_{x}

maps every vector in

T_{x} G

to a vector in

T_{e} G

. Namely,

d_{x} L_{x} : T_{x} G \to g .

(93)

Likewise, taking an arbitrary smooth curve

γ_{e}

passing by the identity

e \in G

at

t = 0

and considering the curve

R_{x} (γ_{e})

, it is readily seen that the latter crosses the point x at

t = 0

. Consequently, the pushforward

d_{e} R_{x}

maps every vector in

T_{e} G

to a vector in

T_{x} G

. Namely,

d_{e} R_{x} : g \to T_{x} G .

(94)

Recall that

T_{x} G

and

g

are of equal dimension and that the pushforward map is linear; hence, the pushforward map

d_{x} L_{x}

allows us to translate a vector belonging to a tangent space of a group to a vector belonging to its algebra (and vice versa through its pullback). This is the reason for which the Lie algebra of a Lie group is sometimes termed the ‘generator’ of a group.

It is easy to see that, if the structure of

g

is known for a group

G

, it might be convenient to coordinatize a neighborhood of the identity of

G

through elements of the associated algebra with the help of a conveniently selected homeomorphism (namely, a continuous function between topological spaces that has a continuous inverse function). Such a homeomorphism is known in the literature as an exponential map and is denoted as

exp : g \to G

. We shall discuss the notion of exponential map for general manifolds in a later section.

As mentioned, a Lie algebra is endowed with a binary operator termed Lie bracket, denoted as

[\cdot, \cdot]

. Let us survey its derivation. Let us define the function

Ψ_{x} : G \to G

termed inner isomorphism as

Ψ_{x} (y) : = L_{x} (R_{x} (y)) .

(95)

The inner isomorphism provides a measure of non-commutativity of a Lie group. It is interesting to notice that the function

Ψ_{x}

preserves the inner product of

G

; in fact, taken two elements

y, z \in G

, it holds that

\begin{matrix} Ψ_{x} (m (y, z)) = & m (i (x), m (m (y, z), x)) \\ = & m (i (x), m (m (m (y, x), m (i (x), z)), x)) \\ = & m (m (i (x), m (y, x)), m (i (x), m (z, x))) \\ = & m (Ψ_{x} (y), Ψ_{x} (z)); \end{matrix}

(96)

hence, it represents a Lie-group automorphism.

Now, let us take the differential of

Ψ_{x} (y)

with respect to the variable y at

y = e

. To this aim, it suffices to consider a curve

y : [- ϵ, ϵ] \to G

such that

y (0) = e

and an element

x \in G

, and to compute the derivative of

Ψ_{x} (y (t))

with respect to t at

t = 0

. The calculation of the derivative gives

\frac{d}{d t} Ψ_{x} (y (t)) = (d_{y (t)} Ψ_{x}) (\dot{y} (t)) = (d_{R_{x} (y (t))} L_{x}) \circ (d_{y (t)} R_{x}) (\dot{y} (t)) .

(97)

Setting

y = e

and letting

v : = \dot{y} (0) \in g

and

{Ad}_{x} : = d_{e} Ψ_{x}

yields

{Ad}_{x} (v) : = (d_{x} L_{x}) \circ (d_{e} R_{x}) (v) .

(98)

By virtue of the properties (93) and (94),

{Ad}_{x}

is a linear map from

g

to itself, namely an endomorphism in

End (g)

. The map

{Ad}_{x}

is termed adjoint representation of the Lie algebra

g

.

In addition, let us now consider a smooth curve

x : [- ϵ, ϵ] \to G

such that

x (0) = e

and

\dot{x} (0) = u \in g

and let us evaluate the map

{ad}_{u} : = {\frac{d}{d t} (d_{x} L_{x}) \circ (d_{e} R_{x})|}_{t = 0} .

(99)

By the chain rule, one obtains

{ad}_{u} : = {\frac{d}{d t} (d_{x (t)} L_{x (t)})|}_{t = 0} \circ (d_{e} R_{e}) + (d_{e} L_{e}) \circ {\frac{d}{d t} (d_{e} R_{x (t)})|}_{t = 0} .

(100)

This is termed adjoint operator and coincides with the Lie bracket of two tangent vectors up to sign, namely

[u, v] : = - {ad}_{u} (v) .

(101)

The Lie bracket provides a measure of non-commutativity of the algebra

g

.

Example 11.

Let us consider the Lie group

GL (n)

endowed with standard matrix multiplication and let us write down explicitly the inner isomorphism, the adjoint representation and the Lie bracket.

Since

L_{X} (Y) = X^{- 1} Y

and

R_{X} (Y) = Y X

, the inner isomorphism reads

Ψ_{X} (Y) = X^{- 1} Y X

.

Taking a curve

Y (t) \in GL (n)

such that

Y (0) = I_{n}

and

\dot{Y} (0) = V \in R^{n \times n}

yields

{Ad}_{X} (V) = {\frac{d}{d t} (X^{- 1} Y (t) X)|}_{t = 0} = X^{- 1} V X .

(102)

To end with, taking a curve

X (t) \in GL (n)

such that

X (0) = I_{n}

and

\dot{X} (0) = U \in R^{n \times n}

yields

\begin{matrix} {ad}_{U} (V) = {\frac{d}{d t} (X^{- 1} (t) V X (t))|}_{t = 0} = (- U V) + V U = - [U, V] . \end{matrix}

(103)

In the above calculation, we have used a known matrix identity, namely

\frac{d}{d t} X^{- 1} (t) = - X^{- 1} (t) \dot{X} (t) X^{- 1} (t) .

(104)

In this case,

[A, B] : = A B - B A

denotes the matrix commutator. ■

7. Metrization, Riemannian Manifolds

A Riemannian manifold is a smooth manifold

M

whose tangent bundle is equipped with a smooth family of positive-definite inner products

M ∋ x \mapsto {〈 \cdot, \cdot 〉}_{x} \in R

. An inner product is locally defined at every point

x \in M

of a manifold as a bilinear function from

T_{x} M \times T_{x} M

to

R

. It is important to remark that the inner product acts on two elements of a tangent space to the manifold at a given point x; it therefore depends (smoothly) on the point x. Hence, such an inner product gives rise to a local metric. Whenever an inner product

{〈 \cdot, \cdot 〉}_{x}

does not depend explicitly on the point x, it is termed uniform. Given two tangent vectors

v, w \in T_{x} M

, their inner products are denoted as

{〈 v, w 〉}_{x}

.

7.1. Coordinate-Free Metrization by Inner Products and Metric Kernels

Let us recall some of the general properties of any inner product. In general, any vector space

V

may be endowed with an inner product (also termed scalar product) denoted by

〈 \cdot, \cdot 〉 : V \times V \to R

. Any inner product has a set of properties:

For every $u, v \in V$ , it holds that $〈 u, v 〉 = 〈 v, u 〉$ ,
For every $u, v, w \in V$ , it holds that $〈 u + w, v 〉 = 〈 u, v 〉 + 〈 w, v 〉$ ,
For every $u, v \in V$ and $c \in R$ , it holds that $〈 c u, v 〉 = c 〈 u, v 〉$ ,
The norm of a vector $v \in V$ is defined as $∥ v ∥ : = \sqrt{〈 v, v 〉}$ ,
An inner product is non-degenerate if and only if $〈 u, v 〉 = 0$ for every $v \in V$ implies $u = 0$ .

Let us consider a simple example about the above notions.

Example 12.

Let us consider the special case

V : = R^{p}

and the inner product:

〈 u, v 〉 = u^{⊤} S v,

(105)

with S being a symmetric,

p \times p

matrix. If S, in addition to being symmetric, is also positive-definite, then such an inner product is non-degenerate. ■

It pays to recall the notion of adjoint operator with respect to a metric on a vector space. Let us consider again the vector space

V

endowed with an inner product

〈 \cdot, \cdot 〉 : V \times V \to R

and a linear operator

ω : V \to V

. Ad adjoint operator

ω^{†} : V \to V

is one such that, for every

v, w \in V

, it holds that

〈 ω^{†} (v), w 〉 = 〈 v, ω (w) 〉 .

(106)

A self-adjoint operator is one such that

ω^{†} = ω

, while an antiadjoint operator is one such that

ω^{†} = - ω

. The following observation on self-adjoint operators in quadratic forms is in order.

Observation 1.

Every operator may be decomposed as the sum of a self-adjoint and of an antiadjoint operator; in fact, it holds that

ω = \frac{1}{2} (ω + ω^{†}) + \frac{1}{2} (ω - ω^{†}) .

(107)

As far as quadratic forms

〈 ω (v), v 〉

are concerned, only the self-adjoint component plays a role. In fact, it is not hard to show that

〈 ω (v), v 〉 = 〈 \frac{1}{2} (ω + ω^{†}) (v), v 〉 .

(108)

This is indeed true for any arbitrary inner product of the type

〈 ω (v), w 〉

, in fact, some straightforward algebraic work reveals that

〈 ω (v), w 〉 = \frac{1}{2} \{〈 ω (w + v), w + v 〉 - 〈 ω (w - v), w - v 〉\} .

(109)

Since the right-hand side is a combination of quadratic forms, the left-hand side depends only on the self-adjoint part of ω.

In the case of a manifold, at every point,

x \in M

corresponds a tangent space

T_{x} M

that may be endowed with an inner product. Therefore, to denote the inner product assigned to the vector space

T_{x} M

, we use the notation

{〈 \cdot, \cdot 〉}_{x}

.

Example 13.

Given a Riemannian manifold

(M, 〈 \cdot, \cdot 〉)

and two smooth vector fields

f, g \in Γ (M)

, we may define the function

h : M \to R

as

h (x) : = {〈 f (x), g (x) 〉}_{x} .

(110)

Notice that the inner product variessmoothlyfrom one tangent space to another; hence, the function h is regular in

M

. The function

h : M \to R

represents a scalar field on

M

.

As a further example, let us consider an arbitrary curve

γ : [- ϵ, ϵ] \to M

over a manifold

M

. On the basis of such curve, we can define a function

φ : [- ϵ, ϵ] \to R

as

φ (t) : = {〈 f (γ (t)), g (γ (t)) 〉}_{γ (t)} .

(111)

Notice that the inner product ‘accompanies’ the point

x = γ (t)

that travels along the curve and involves different tangent spaces. Even the function

φ : [- ϵ, ϵ] \to R

represents a scalar fieldrestrictedto the curve γ. ■

On a Riemannian manifold

M

, the norm of a vector

v \in T_{x} M

is defined as

{∥ v ∥}_{x} : = \sqrt{{〈 v, v 〉}_{x}}

. A specific property of Riemannian manifolds is that the norm is positive-definite; namely

{∥ v ∥}_{x}^{2} \geq 0

, where

{∥ v ∥}_{x} = 0

if an only if

v = 0

.

Choosing the ‘right’ metric for a given manifold in a given application is one of the most challenging aspects of metrization, especially because it is seldom unclear what ‘right’ practically means. When in doubt, it is possible to resort to canonical metrics. A number of canonical metrics for different manifolds of interest are summarized in the following:

Hypercube: For the space $R^{p}$ of (column) arrays, the canonical metric is the Euclidean metric ${〈 u, v 〉}_{x} : = u^{⊤} v$ , for every $u, v \in R^{p}$ , while for the space $R^{p \times q}$ of rectangular matrices, the canonical metric is the Euclidean metric ${〈 U, V 〉}_{X} : = tr (U^{⊤} V)$ , for every $U, V \in R^{p \times q}$ . Notice that both these metrics do not depend explicitly on the point that they are calculated at, and hence they are uniform.
Hypersphere: The hypersphere $S^{p - 1}$ embedded into the ambient space $R^{p}$ inherits its canonical metric; hence, we shall choose ${〈 u, v 〉}_{x} : = u^{⊤} v$ for every $u, v \in T_{x} S^{p - 1}$ . Even this metric is uniform.
General linear group and the special linear group: A metric for the general linear group $GL (p) ∋ A$ is:

${〈 U, V 〉}_{A} : = tr ({(A^{- 1} U)}^{⊤} (A^{- 1} V)), \forall U, V \in T_{A} GL (p) .$

(112)

Such a metric was popularized, for instance, in [45], in the context of machine learning.
Special orthogonal group: The canonical metric in $SO (p)$ is defined as ${〈 U, V 〉}_{X} : = tr (U^{⊤} V)$ , for any $X \in SO (p)$ and $U, V \in T_{X} SO (p)$ . Notice that the norm of a tangent vector $V \in T_{X} SO (p)$ is ${∥ V ∥}_{X} = \sqrt{〈 V, V 〉} = \sqrt{tr (V^{⊤} V)} = : {∥ V ∥}_{F}$ , known as the Frobenius norm [18].
Stiefel manifold: There are two well-known metrics for the Stiefel manifold, namely, the Euclidean metric and the canonical metric.
- Euclidean metric: A possible metric that the Stiefel manifold may be endowed with is the Euclidean metric, inherited from the embedding of $St (n, p)$ in $R^{n \times p}$ :
  
  ${〈 U, V 〉}_{X} : = tr (U^{⊤} V), U, V \in T_{X} St (n, p) .$
  
  (113)
- Canonical metric: The Stiefel manifold may be endowed with a second kind of metric, termed ‘canonical metric’. The associated inner product reads:
  
  ${〈 U, V 〉}_{X} : = tr (U^{⊤} (I_{n} - \frac{1}{2} X X^{⊤}) V), U, V \in T_{X} St (n, p),$
  
  (114)
  
  which, unlike the Euclidean metric (113), is not uniform over the Stiefel manifold.
Real symplectic group: There exist two known metrics in the scientific literature that were applied to the real symplectic group.
- Khvedelidze–Mladenov metric: A metric for the real symplectic group $Sp (2 n)$ is:
  
  ${〈 U, V 〉}_{Q} : = tr (Q^{- 1} U Q^{- 1} V), \forall U, V \in T_{Q} Sp (2 n) .$
  
  (115)
  
  It is referred to as Khvedelidze–Mladenov metric (or KM metric, for short, [46]). This is an indefinite metric; hence, a manifold endowed with this metric is not Riemannian (it is in fact referred to as a pseudo-Riemannian manifold).
- Euclidean metric: A further metric for the real symplectic group $Sp (2 n)$ is:
  
  ${〈 U, V 〉}_{Q} : = tr ({(Q^{- 1} U)}^{⊤} (Q^{- 1} V)), \forall U, V \in T_{Q} Sp (2 n) .$
  
  (116)
  
  Such a metric is inherited from the embedding of a real symplectic group $Sp (2 n)$ into the space of real invertible matrices $GL (2 n)$ that, in turn, is embedded into the real hypercube $R^{2 n \times 2 n}$ and hence inherits its canonical metric.
Space of symmetric, positive-definite matrices: The canonical metric in $S^{+} (p)$ is defined as ${〈 U, V 〉}_{P} : = tr (U P^{- 1} V P^{- 1})$ , for any $P \in S^{+} (p)$ and $U, V \in T_{P} S^{+} (p)$ . Clearly, this is not a uniform metric.
Grassmann manifold: The canonical metric on a Grassmann manifold is

${〈 U, V 〉}_{[X]} = tr (U^{⊤} V), \forall U, V \in T_{[X]} Gr (n, p),$

(117)

which corresponds to the Euclidean metric in the Stiefel manifold that is used to represent elements of the Grassmann manifold.

Example 14.

The inner product (115) gives rise to a metric which is not positive-definite on the space

Sp (2 n) \subset GL (2 n)

. To verify this property, it suffices to evaluate the structure of the squared norm

{∥ V ∥}_{Q}^{2} = tr ({(Q^{- 1} V)}^{2})

with

Q \in Sp (2 n)

and

V \in T_{Q} Sp (2 n)

. By the structure of the tangent space

T_{Q} Sp (2 n)

, it is known that

Q^{- 1} V = J S

with

S \in R^{2 n \times 2 n}

symmetric. It holds that:

J S = [\begin{matrix} 0_{n} & I_{n} \\ - I_{n} & 0_{n} \end{matrix}] [\begin{matrix} S_{1} & S_{2} \\ S_{2}^{T} & S_{3} \end{matrix}] = [\begin{matrix} S_{2}^{T} & S_{3} \\ - S_{1} & - S_{2} \end{matrix}],

with

S_{1}, S_{3} \in R^{n \times n}

being symmetric and

S_{2} \in R^{n \times n}

being arbitrary. Hence,

tr ({(J S)}^{2}) = 2 tr (S_{2}^{2}) - 2 tr (S_{1} S_{3})

, which has an indefinite sign. ■

In general, there exists a canonical metric inherited from the ambient space

A

that a manifold

M

is embedded in. We know that whenever a manifold is embedded in an ambient space

A

, the inner product between two tangent vectors may be written as

{〈 u, w 〉}_{x} = {〈 u, G_{x} (w) 〉}^{A},

(118)

where

{〈 \cdot, \cdot 〉}^{A}

denotes an inner product in

A

. The expression (118) is based on a metric kernel

G : T M \to T M

. Metric kernels play a prominent role in coordinate-free embedded manifold calculus since a kernel and its derivative determine most of the main functions and maps that we shall encounter in the next sections. We assume that the metric kernel has the following properties:

Linearity: $G_{x} (v)$ is linear in v, namely $G_{x} (v + α w) = G_{x} (v) + α G_{x} (w)$ , for every $α \in R$ , $x \in M$ and $v, w \in T_{x} M$ ;
Symmetry: $G_{x}$ is a self-adjoint operator, namely ${〈 u, G_{x} (w) 〉}^{A} = {〈 G_{x} (u), w 〉}^{A}$ ;
Closure with respect to $T M$ : $G_{x}$ is an endomorphism of $T_{x} M$ , namely $G_{x} (v) \in T_{x} M$ , for every $x \in M$ and $v \in T_{x} M$ ;
Invertibility: $G_{x}$ is invertible, namely, its inverse $G_{x}^{- 1}$ is well-defined for every $x \in M$ .

In general, a metric kernel is not well-defined in the whole ambient space

A

. We shall invoke the fact that it is well-defined in

T M

and that it might be extended to

M \times A

, in fact in the expression

G_{x} (a)

one must take

x \in M

but it is allowable to take

a \in A

, since the metric kernel is linear in the argument a. Occasionally, we might need to extend the metric kernel G by an operator

\bar{G} : A^{2} \to A

that is defined at least in a neighborhood of

M \times A

and such that

\bar{G} \equiv G

in

T M

. (In principle, such an ‘extendability’ requirement is not necessary, although it facilitates some computations and certainly clarifies some computational developments.)

Example 15.

Let us consider a hypersphere

S^{n - 1} \subset A

where

A : = R^{n}

endowed with a metric

{〈 u, w 〉}_{x} = {〈 u, w 〉}^{A} : = u^{⊤} w

. It is readily seen that, in this example,

G_{x} = {id}_{x}

; therefore, the metric kernel is linear, self-adjoint, invertible and an endomorphism of

T_{x} S^{n - 1}

.

Let us further consider the manifold

S^{+} (n) \subset A

endowed with the metric

{〈 U, W 〉}_{P} : = tr (U P^{- 1} W P^{- 1})

, where

A : = R^{n \times n}

is endowed with the metric

{〈 U, V 〉}^{A} : = tr (U^{⊤} V)

. In this case, the metric kernel does not coincide with the identity, rather:

G_{P} (W) : = P^{- 1} W P^{- 1} .

(119)

Let us verify that such metric kernel has the four mentioned properties:

1.: Linearity: $G_{P} (W)$ appears to be linear in W; in fact, it holds that $G_{P} (V + α W) = P^{- 1} (V + α W) P^{- 1} = P^{- 1} V P^{- 1} + α P^{- 1} W P^{- 1}$ , for every $α \in R$ , $V, W \in T_{P} A$ ,
2.: Symmetry: $G_{P}$ is self-adjoint; in fact, it holds that ${〈 U, G_{P} (W) 〉}^{A} = tr (U (P^{- 1} W P^{- 1})) = tr ((P^{- 1} U P^{- 1}) W) = {〈 G_{P} (U), W 〉}^{A}$ , by virtue of the cyclic permutation invariance of the trace operator,
3.: Closure: $G_{P}$ turns out to be an endomorphism of $T_{P} S^{+} (n)$ ; in fact, for every matrix $U \in T_{P} S^{+} (n)$ , it holds that $G_{P} (U) = P^{- 1} U P^{- 1}$ is a symmetric matrix. Hence, it belongs to $T_{P} S^{+} (n)$ (notice that ${(P^{- 1} U P^{- 1})}^{⊤} = P^{- ⊤} U^{⊤} P^{- ⊤} = P^{- 1} U P^{- 1}$ because both U and P are symmetric),
4.: Invertibility: $G_{P}$ is invertible; in fact, if $W = G_{P} (U) = P^{- 1} U P^{- 1}$ , then $U = P W P = : G_{P}^{- 1} (W)$ .

Notice that

G_{P}

is not well-defined in

R^{n \times n}

(in fact, if P is taken in

R^{n \times n}

,

P^{- 1}

does not necessarily exist).

As a last example, let us consider the Stiefel manifold

St (n, p) \subset A

is endowed with the metric

{〈 U, W 〉}_{X} : = tr (U^{⊤} (I_{n} - \frac{1}{2} X X^{⊤}) W)

, where

A : = R^{n \times p}

endowed with

{〈 U, V 〉}^{A} : = tr (U^{⊤} V)

. In this case, the metric kernel

G_{X} : R^{n \times p} \to R^{n \times p}

reads

G_{X} (W) : = (I_{n} - \frac{1}{2} X X^{⊤}) W .

(120)

Linearity and symmetry are readily proven. Closure may be proven as follows. Take

X \in St (n, p)

and

W \in T_{X} St (n, p)

, namely

W^{⊤} X + X^{⊤} W = 0_{p}

. Define

U : = (I_{n} - \frac{1}{2} X X^{⊤}) W

. It now suffices to show that

U \in T_{X} St (n, p)

, which holds true; in fact,

\begin{matrix} U^{⊤} X + X^{⊤} U = & W^{⊤} X - \frac{1}{2} W^{⊤} X X^{⊤} X + X^{⊤} W - \frac{1}{2} X^{⊤} X X^{⊤} W \\ = & W^{⊤} X + X^{⊤} W - \frac{1}{2} (W^{⊤} X + X^{⊤} W) \\ = & 0_{p} . \end{matrix}

(121)

The invertibility of the matrix kernel follows from the invertibility of the matrix

I_{n} - \frac{1}{2} X X^{⊤}

. From the Sherman–Morrison–Woodbury (matrix inversion) formula [47]

{(A + U C V)}^{- 1} = A^{- 1} - A^{- 1} U {(C^{- 1} + V A^{- 1} U)}^{- 1} V A^{- 1},

(122)

it follows, upon setting

A : = I_{n}

,

C : = - \frac{1}{2} I_{p}

,

U : = X

and

V : = X^{⊤}

, that

{(I_{n} - \frac{1}{2} X X^{⊤})}^{- 1} = I_{n} - X {(- 2 I_{p} + X^{⊤} X)}^{- 1} X^{⊤} = I_{n} + X X^{⊤},

(123)

that exists for every

X \in St (n, p)

. ■

Further, let us examine separately and concisely the structure of the metric kernel for the symplectic group endowed with its canonical metric.

Example 16.

Let us consider the symplectic group

Sp (2 n)

endowed with the canonical metric

{〈 V, W 〉}_{Q} : = tr ({(Q^{- 1} V)}^{⊤} (Q^{- 1} W))

. Assume the symplectic group

Sp (2 n)

to be embedded in the ambient space

A : = R^{2 n \times 2 n}

endowed with the Euclidean metric

{〈 U, V 〉}^{A} : = tr (U^{⊤} V)

. A simple expansion gives

{〈 V, W 〉}_{Q} : = tr (V^{⊤} Q^{- ⊤} Q^{- 1} W),

(124)

hence the metric kernel reads

G_{Q} (W) : = (Q^{- ⊤} Q^{- 1}) W

.

7.2. Covariancy, Contravariancy, Tensors*

The convention on index used so far comes from the nature of different objects in relation to their behavior upon coordinate changes.

Let a smooth manifold

M

be a submanifold of dimension p of a Euclidean space and let us fix a curve

γ : [- ε, ε] \to M

, described by:

γ_{x} (t) = (x^{1} (t), x^{2} (t), \dots, x^{p} (t)) .

(125)

The subscript

_{x}

tells that the components of the curve are expressed through coordinates x. The tangent vector

v : = {\dot{γ}}_{x} (0)

has the components:

v_{x} = ({\dot{x}}^{1} (0), {\dot{x}}^{2} (0), \dots, {\dot{x}}^{p} (0)) .

(126)

Now, let us introduce a change of variable

x^{i} : = κ^{i} (ξ^{1}, ξ^{2}, \dots, ξ^{p})

, for

i = 1, 2, \dots, p

, such that

\begin{matrix} γ_{ξ} (t) = (κ^{1} (ξ^{1} (t), ξ^{2} (t), \dots, ξ^{p} (t)), \dots, κ^{p} (ξ^{1} (t), ξ^{2} (t), \dots, ξ^{p} (t))) . \end{matrix}

(127)

In the new coordinates, the tangent vector at

t = 0

is expressed as

v_{ξ} = (\frac{\partial κ^{1}}{\partial ξ^{i}} {\dot{ξ}}^{i} (0), \frac{\partial κ^{2}}{\partial ξ^{i}} {\dot{ξ}}^{i} (0), \dots, \frac{\partial κ^{p}}{\partial ξ^{i}} {\dot{ξ}}^{i} (0)) = (\frac{\partial x^{1}}{\partial ξ^{i}} {\dot{ξ}}^{i} (0), \frac{\partial x^{2}}{\partial ξ^{i}} {\dot{ξ}}^{i} (0), \dots, \frac{\partial x^{p}}{\partial ξ^{i}} {\dot{ξ}}^{i} (0)) .

(128)

At the heart of intrinsic manifold calculus is the requirement that one such tangent vector stays exactly the same, no matter which coordinates are used. Observe that the components

v_{x}^{j} : = {\dot{x}}^{j} (0)

and the components

v_{ξ}^{j} : = {\dot{ξ}}^{j} (0)

are related by

v_{x}^{j} = \frac{\partial x^{j}}{\partial ξ^{i}} v_{ξ}^{i} .

(129)

Such a set of equations prescribes how the components of a tangent vector vary upon a coordinate change. Every object that obeys the above law is termed contravariant. Contravariant objects are marked by an upper index.

The canonical basis vectors

\partial_{i}

do not obey the law (129). In fact, from the requirement that

v_{x} = v_{ξ}

, it follows that

v_{x}^{j} \partial_{j}^{x} = \frac{\partial x^{j}}{\partial ξ^{i}} v_{ξ}^{i} \partial_{j}^{x} = v_{ξ}^{i} \partial_{i}^{ξ},

(130)

which implies that

\frac{\partial x^{j}}{\partial ξ^{i}} \partial_{j}^{x} = \partial_{i}^{ξ}

. Multiplying both sides by

\frac{\partial ξ^{i}}{\partial x^{k}}

gives

\frac{\partial ξ^{i}}{\partial x^{k}} \frac{\partial x^{j}}{\partial ξ^{i}} \partial_{j}^{x} = \frac{\partial ξ^{i}}{\partial x^{k}} \partial_{i}^{ξ} .

(131)

Since

\frac{\partial ξ^{i}}{\partial x^{k}} \frac{\partial x^{j}}{\partial ξ^{i}} = \frac{\partial x^{j}}{\partial x^{k}} = δ_{k}^{j}

, we obtain the transformation law

\partial_{k}^{x} = \frac{\partial ξ^{i}}{\partial x^{k}} \partial_{i}^{ξ} .

(132)

Every object that obeys the above law is termed covariant. Covariant objects are marked by a lower index.

Concerning tangent vectors, we may summarize the above results saying that the components of a tangent vector

v \in T_{x} M

are contravariant, while the basis of a tangent space

T_{x} M

is covariant.

Given two tangent vectors

u, v \in T_{x} M

, their inner products may be written as

{〈 u, v 〉}_{x} = {〈 u^{i} \partial_{i}^{x}, v^{j} \partial_{j}^{x} 〉}_{x} = u^{i} v^{j} {〈 \partial_{i}^{x}, \partial_{j}^{x} 〉}_{x},

(133)

due to the bilinearity of the inner product. Now, define

g_{i j} (x) : = {〈 \partial_{i}^{x}, \partial_{j}^{x} 〉}_{x}

. Then,

{〈 u, v 〉}_{x} = g_{i j} (x) u^{i} v^{j}

. The functions

g_{i j}

are clearly symmetric in their indexes (hence, on a smooth manifold of dimension p, these functions are

p^{2}

, but only

2^{- 1} p (p - 1)

are independent). Moreover, these functions are covariant in both indexes.

Given the canonical basis

{d x^{1}, d x^{2}, \dots, d x^{p}}

of a cotangent space

T_{x}^{*} M

, every covector may be written as

ω = ω_{i} d x^{i}

. It is easy to show that the components of a covector are covariant, while the basis elements of a cotangent space are contravariant.

Vectors and covectors are special cases of more general objects termed tensors. A vector is a

(1, 0)

-tensor, while a covector is a

(0, 1)

-tensor. In general, it is possible to construct

(p, q)

-tensors by an operation termed tensor product denoted by ⊗. Each component of one such tensor has p upper indexes and q lower indexes. For example, one can construct the metric tensor

g = g_{i j} d x^{i} \otimes d x^{j}

, which is a

(0, 2)

-tensor. Its inverse is

g^{- 1} = g^{i j} \partial_{i} \otimes \partial_{j}

, which is a

(2, 0)

-tensor. Additionally,

δ_{j}^{i}

represents the mixed components of the so-termed fundamental tensor

δ = δ_{j}^{i} \partial_{i} \otimes d x^{j}

, which is a

(1, 1)

-tensor.

Example 17.

Let us show in what sense functions

g_{i j} (x)

are covariant components of a tensor. Given a variable change

x = κ (ξ)

, from property (132) it follows that

\begin{matrix} g_{i j} (κ (ξ)) = & {〈 \partial_{i}^{x}, \partial_{j}^{x} 〉}_{κ (ξ)} \\ = & {〈\frac{\partial ξ^{k}}{\partial x^{i}} \partial_{k}^{ξ}, \frac{\partial ξ^{h}}{\partial x^{j}} \partial_{h}^{ξ}〉}_{κ (ξ)} \\ = & \frac{\partial ξ^{k}}{\partial x^{i}} \frac{\partial ξ^{h}}{\partial x^{j}} {〈 \partial_{k}^{ξ}, \partial_{h}^{ξ} 〉}_{κ (ξ)} \\ = & \frac{\partial ξ^{k}}{\partial x^{i}} \frac{\partial ξ^{h}}{\partial x^{j}} g_{k h} (ξ) . \end{matrix}

(134)

This is the transformation law of a

(0, 2)

-tensor. The point is that the components

g_{i j} (x)

transform into components

g_{k h} (ξ)

through a linear expression, whose coefficients depend on the derivatives of one set of coordinates with respect to the other set. ■

8. Geodesic Arc, Riemannian Distance, Exponential and Logarithmic Map

A geodesic on a smooth manifold may be intuitively looked at in different ways:

On a general manifold, the concept of geodesic extends the concept of a straight line from a flat space to a curved space. In fact, let us consider a curved manifold $M$ embedded in an ambient space $R^{p}$ . Such ambient space contains straight lines in the usual meaning, but the manifold, being curved, hardly accommodates any straight lines. Geodesics are curves that resemble straight lines in that they copy some of their distinguishing features.
On a metrizable manifold, a geodesic connecting two points is locally defined as the shortest curve on the manifold connecting these endpoints. Therefore, once a metric is specified, the equation of the geodesic arises from the minimization of a length functional. Such a definition comes from the observation that any straight line in $R^{p}$ is indeed the shortest path connecting any two given points.
A further distinguishing feature of straight lines is that they are self-parallel, namely, sliding a straight lines infinitesimally along itself returns the same exact lines. Such a concept gives rise to a definition of geodesic which requires to specify the mathematical meaning of ‘sliding a piece of line infinitesimally along itself’. The technical argument to access such a definition is covariant derivation, which is not covered in the present tutorial (while it will be covered in a forthcoming review paper).
Another intuitive interpretation is based on the observation that a geodesic emanating from a point on a manifold coincides with the path followed by a particle sliding on the manifold at a constant speed. For a manifold embedded in a larger ambient space and in special circumstances, this is equivalent to the requirement that the naïve (or embedded) acceleration of the particle is either zero or perpendicular to the tangent space to the manifold at every point of its trajectory. (In the present tutorial paper, we use the term naïve acceleration to distinguish it from covariant acceleration, which will only be defined in a subsequent tutorial.)

Starting from the above informal description, we are going to examine in detail the notion of embedded geodesy. In particular, we shall treat the problem according to an energy-minimizing principle (which is in fact equivalent to a length-minimizing principle).

8.1. Coordinate-Free Embedded Geodesy

The length of a smooth curve

γ : [- ϵ, ϵ] \to M

may be evaluated though a rectification formula as:

L_{γ} : = \int_{- ϵ}^{ϵ} \sqrt{{〈 \dot{γ} (t), \dot{γ} (t) 〉}_{γ (t)}} d t .

(135)

The net result of this argument is that, through the definition of an inner product on the tangent bundle to a Riemannian manifold, we are able to measure the lengths of paths in the manifold itself, which turns the manifold into a metric space.

Example 18.

Let us consider the manifold described in intrinsic coordinates:

M : = {(θ^{1}, θ^{2}, sin (θ^{1}) + 1) \in R^{3} ∣ (θ^{1}, θ^{2}) \in R^{2}} .

(136)

Let us consider the curve

(θ^{1}, θ^{2}) = (t, t)

obtained by mapping a segment of the plane

R^{2}

to the manifold

M

, namely:

γ (t) : = (t, t, sin (t) + 1), t \in [a, b] .

(137)

Let us assume the manifold

M

to be endowed with the metric

{〈 v, w 〉}_{x} : = v^{⊤} w

. In order to evaluate the length of such a curve, it is necessary to evaluate the velocity field

\dot{γ}

:

\dot{γ} (t) = (1, 1, cos (t)), t \in [a, b] .

(138)

The length of the curve is now obtained through the rectification formula:

L (γ) : = \int_{a}^{b} \sqrt{1^{2} + 1^{2} + {cos}^{2} (t)} d t = \int_{a}^{b} \sqrt{2 + {cos}^{2} (t)} d t .

(139)

Such an integral may not be expressed in terms of elementary functions. ■

On the basis of the concepts of ‘curve’ and of ‘length of a curve’, it is possible to define the notion of ‘distance between two points on a Riemannian manifold’

d (\cdot, \cdot)

.

To define a notion of distance, let us consider what follows:

Given a manifold $(M, 〈 \cdot, \cdot 〉)$ , take two arbitrary points $p, q \in M$ ;
Choose a curve $γ : [0, 1] \to M$ that has p and q as endpoints, namely, such that $γ (0) = p$ and $γ (1) = q$ ;
Take as as distance $d (p, q)$ the length of such a curve, namely $L (γ)$ ; one such definition seems a good starting point, but needs to be perfected since there exist infinitely many curves joining two given points.

It is important to underline that the notion of distance is not univocally defined since it depends on the inner product that a manifold is endowed with. Since there exist infinitely many curves joining two points on a manifold, a distance between two points p and q is actually defined as the length of the shortest curve connecting such two points, namely

d (p, q) : = inf_{γ} \int_{0}^{1} \sqrt{{〈 \dot{γ} (t), \dot{γ} (t) 〉}_{γ (t)}} d t .

(140)

The problem of selecting the shortest path connecting two points is far from easy to solve. A possible method to look for it is the so-called variational method that we are going to introduce in the following.

The key point is to introduce an energy functional defined as

E (γ) : = \int_{0}^{1} {〈 \dot{γ} (t), \dot{γ} (t) 〉}_{γ (t)} d t,

(141)

whose minimum argument coincides to a geodesic. The variational method consists of looking for a curve that makes the energy functional stationary, namely, for a curve such that

δ E (γ) = 0

. A concept that facilitates the formulation of the variational method is that of normal space to a manifold in a point.

Clearly, the above definition introduces yet another degree of freedom, since the definition of normal space is based on a choice of inner product

{〈 \cdot, \cdot 〉}^{A}

to evaluate orthogonality. It is instructive to notice that, since a normal space is defined as the orthogonal complement to a tangent space with respect to the ambient space, then

T_{x} M \oplus N_{x} M = A .

(142)

In other terms, every element in

A

may be decomposed exactly as the sum of a tangent and a normal vector.

Example 19.

Let us evaluate the structure of the normal spaces to the hypersphere

S^{n - 1}

embedded in

A : = R^{n}

. Let us recall that

S^{n - 1} : = {x \in R^{n} ∣ x^{⊤} x = 1}

and that

T_{x} S^{n - 1} = {v \in R^{n} ∣ v^{⊤} x = 0}

. Let us select

{〈 y, z 〉}^{A} : = y^{⊤} z

for every

y, z \in A

. Since, by definition,

N_{x} S^{n - 1} : = {z \in R^{n} ∣ z^{⊤} v = 0, \forall v \in R^{n} s u c h t h a t v^{⊤} x = 0},

(143)

it is easily found that

N_{x} S^{n - 1} = {λ x ∣ λ \in R} .

(144)

Let us further determine the structure of the normal spaces for the manifold

SO (n)

. Let us recall that

SO (n) : = {R \in R^{n \times n} ∣ R^{⊤} R = I_{n}, det (R) = 1}

and that

T_{R} SO (n) = {V \in R^{n \times n} ∣ V^{⊤} R + R^{⊤} V = 0}

. Let us take

A : = R^{n \times n}

and

{〈 Y, Z 〉}^{A} : = tr (Y^{⊤} Z)

for

Y, Z \in A

. Let us also recall that

V^{⊤} R + R^{⊤} V = {(R^{⊤} V)}^{⊤} + R^{⊤} V = H^{⊤} + H = 0_{n},

(145)

where

H : = R^{⊤} V

. Hence, any tangent vector at R may be written as

V = R H

with H being skew-symmetric. By definition,

N_{R} SO (n) : = {Z \in R^{n \times n} ∣ tr (Z^{⊤} V) = 0, \forall V = R H, H^{⊤} + H = 0_{n}},

(146)

and hence it is readily found that

N_{R} SO (n) = {R S ∣ S \in R^{n \times n}, S^{⊤} = S} .

(147)

As a verification step, let us show that, for every

n \times n

special orthogonal matrix R, skew-symmetric matrix H and symmetric matrix S, it holds that

{〈 R H, R S 〉}^{A} = 0

. To this aim, let us show that:

{〈 R H, R S 〉}^{A} = tr ({(R H)}^{⊤} R S) = tr (H^{⊤} (R^{⊤} R) S) = - tr (H S) .

(148)

Now, it is not hard to prove that every skew-symmetric matrix H is ‘orthogonal’ to every symmetric matrix S; in fact:

\underset{B e c a u s e tr (X) = tr (X^{⊤})}{\underset{︸}{tr (H S) = tr ({(H S)}^{⊤})}} = tr (S^{⊤} H^{⊤}) = \underset{B e c a u s e tr (X Y) = tr (Y X)}{\underset{︸}{tr (S (- H)) = tr ((- H) S)}} = - tr (H S),

(149)

and therefore

tr (H S) = 0

. From this result it follows that

{〈 R H, R S 〉}^{A} = 0

.

As a last example, let us determine the structure of the normal space to the manifold

S^{+} (n)

. Recall that

S^{+} (n) : = {P \in R^{n \times n} ∣ P^{⊤} = P, P > 0}

and that

T_{P} S^{+} (n) = {V \in R^{n \times n} ∣ V^{⊤} - V = 0}

. In this case, choose

A : = R^{n \times n}

and

{〈 Y, Z 〉}^{A} : = tr (Y^{⊤} Z)

for every

Y, Z \in A

. By definition, it holds that

N_{P} S^{+} (n) : = {Z \in R^{n \times n} ∣ tr (Z^{⊤} V) = 0, \forall V s u c h t h a t V^{⊤} - V = 0},

(150)

from which it turns out that

N_{P} S^{+} (n) = {Z \in R^{n \times n} ∣ Z^{⊤} + Z = 0} .

(151)

In fact,

N_{P} S^{+} (n)

represents the set of

n \times n

matrices that are orthogonal to all

n \times n

symmetric matrices, which coincides with the set of all

n \times n

skew-symmetric matrices. Notice that

N_{P} S^{+} (n)

does not depend explicitly on the base point P. ■

The notion of normal space affords formulating the variational problem

δ E (γ) = 0

as a differential inclusion. For every specific manifold, it will then be possible to turn such a differential inclusion into a differential equation to be solved under appropriate initial conditions. Let us consider a curve

γ : [0, 1] \to M

and let us define the fundamental form

F : T M \to R

as:

F (x, v) : = {〈 v, v 〉}_{x} .

(152)

The following result holds.

Theorem 2.

For a manifold

M

, the variational equation

δ E (γ) = 0

is equivalent to the following differential inclusion:

\frac{\partial F}{\partial γ} - \frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} \in N_{γ} M .

(153)

Notice that such a relation appears as a generalized form of the well-known Euler–Lagrange equation of rational mechanics [48].

Proof.

In order to prove such an important result, notice that the energy functional may be rewritten as:

E (γ) = \int_{0}^{1} F (γ, \dot{γ}) d t .

(154)

It is important to notice that, by virtue of the property (118), for every

(x, v) \in T M

, there exists a neighborhood of

(x, v)

in

A^{2}

, where the partial derivatives

\frac{\partial F (x, v)}{\partial x}

and

\frac{\partial F (x, v)}{\partial v}

exist (technically, one should use an extension

\bar{F}

in place of

F

). Such partial derivatives are intended in the Gateaux sense, namely

\begin{matrix} lim_{ϵ \to 0} \frac{F (x + ϵ u, v) - F (x, v)}{ϵ} = : {〈\frac{\partial F (x, v)}{\partial x}, u〉}^{A}, for every u \in A, \\ lim_{ϵ \to 0} \frac{F (x, v + ϵ u) - F (x, v)}{ϵ} = : {〈\frac{\partial F (x, v)}{\partial v}, u〉}^{A}, for every u \in A . \end{matrix}

(155)

Let us define an arbitrary smooth variation

η : [0, 1] \to T M

, namely a deviation from the geodesic

γ

, such that

η (0) = η (1) = 0

.

For a sufficiently small constant

ϵ > 0

, it is possible to evaluate the quantity

F (γ + ϵ η, \dot{γ} + ϵ \dot{η})

. Applying the Taylor series expansion in

A^{2}

gives:

F (γ + ϵ η, \dot{γ} + ϵ \dot{η}) = F (γ, \dot{γ}) + ϵ {〈\frac{\partial F}{\partial γ}, η〉}^{A} + ϵ {〈\frac{\partial F}{\partial \dot{γ}}, \dot{η}〉}^{A} + O (ϵ^{2}),

(156)

where

O

denotes a Landau symbol and it is used to compactly represent the remainder of the series (e.g., in Lagrange form). The variation of the energy of a curve may be concretely defined as:

\begin{matrix} δ E (γ) : = & lim_{ϵ \to 0} \frac{E (γ + ϵ η) - E (γ)}{ϵ} = lim_{ϵ \to 0} \frac{1}{ϵ} \int_{0}^{1} [F (γ + ϵ η, \dot{γ} + ϵ \dot{η}) - F (γ, \dot{γ})] d t \\ = & \int_{0}^{1} {〈\frac{\partial F}{\partial γ}, η〉}^{A} d t + \int_{0}^{1} {〈\frac{\partial F}{\partial \dot{γ}}, \dot{η}〉}^{A} d t + \int_{0}^{1} lim_{ϵ \to 0} \frac{O (ϵ^{2})}{ϵ} d t . \end{matrix}

(157)

The last integral is null. The second last integral may be rewritten, upon integration by parts, as:

\int_{0}^{1} {〈\frac{\partial F}{\partial \dot{γ}}, \dot{η}〉}^{A} d t = {{〈\frac{\partial F}{\partial \dot{γ}}, η〉}^{A}|}_{0}^{1} - \int_{0}^{1} {〈\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}}, η〉}^{A} d t .

(158)

The first addendum on the right-hand side is null because the variation

η

vanishes at the endpoints of the curve; therefore,

\begin{matrix} δ E (γ) = \int_{0}^{1} {〈\frac{\partial F}{\partial γ} - \frac{d}{d t} \frac{\partial F}{\partial \dot{γ}}, η〉}^{A} d t . \end{matrix}

(159)

By imposing that

δ E (γ) = 0

it is readily obtained that, since the variation

η

is arbitrary, it must hold the relation (153). □

Let us apply the above theorem to a number of manifolds.

Example 20.

Let us determine the equation of geodesics on the hypersphere

S^{n - 1}

through Equation (153). Let us recall that

S^{n - 1} : = {x \in R^{n} ∣ x^{⊤} x = 1}

; assume that

{〈 v, w 〉}_{x} : = v^{⊤} w

(independently of x) and let us recall that

N_{x} S^{n - 1} = {λ x ∣ λ \in R}

. Now, the equation of the geodesic reads:

\frac{\partial ({\dot{γ}}^{⊤} \dot{γ})}{\partial γ} - \frac{d}{d t} \frac{\partial ({\dot{γ}}^{⊤} \dot{γ})}{\partial \dot{γ}} = 0 - 2 \frac{d \dot{γ}}{d t} = - 2 \ddot{γ} \in {λ γ ∣ λ \in R}

(160)

In summary, the equation for the geodesic is

\ddot{γ} (t) = λ γ (t) .

(161)

(Since λ is arbitrary, it absorbed the factor

- 2

.) The geodesic equation is then a second-order differential equation in γ that needs two conditions to be solved.

As a further example, let us derive the geodesic equation for the manifold

SO (n)

. Let us recall that

SO (n) : = {R \in R^{n \times n} ∣ R^{⊤} R = I_{n}, det (R) = 1}

; the canonical inner product in this space is

{〈 V, W 〉}_{R} : = tr (V^{⊤} W)

(independently of R) and that

N_{R} SO (n) = {R S ∣ S \in R^{n \times n}, S = S^{⊤}}

. According to the general principle, the geodesic equation reads as

\frac{\partial ({\dot{γ}}^{⊤} \dot{γ})}{\partial γ} - \frac{d}{d t} \frac{\partial ({\dot{γ}}^{⊤} \dot{γ})}{\partial \dot{γ}} = 0 - 2 \frac{d \dot{γ}}{d t} = - 2 \ddot{γ} \in {γ S ∣ S \in R^{n \times n}, S = S^{⊤}}

(162)

In summary, the geodesic equation on the space

SO (n)

reads as

\ddot{γ} = γ S,

(163)

with S denoting an arbitrary symmetric function. ■

Through the notion of geodesic arcs, one may define the notion of Riemannian (or geodesic) distance between two nearby points in a smooth manifold. In fact, let us denote by

x, y \in M

two nearby points and assume that there exists a unique geodesic arc

γ_{x}^{y} : [0, 1] \to M

such that

γ_{x}^{y} (0) = x

and

γ_{x}^{y} (1) = y

. By definition, the length of such geodesic arcs is taken as the distance between its two endpoints, namely:

d (x, y) : = \int_{0}^{1} \sqrt{{〈 {\dot{γ}}_{x}^{y} (t), {\dot{γ}}_{x}^{y} (t) 〉}_{γ_{x}^{y} (t)}} d t .

(164)

It is worth noticing, at this point, that the expression of the geodesic distance may be rewritten in terms of the fundamental form

F

, namely

d (x, y) = \int_{0}^{1} \sqrt{F (γ_{x}^{y}, {\dot{γ}}_{x}^{y})} d t .

(165)

A noteworthy (and extremely useful) result that we are going to prove is that, along a geodesic, the fundamental form keeps constant. We shall see after the proof that such a result noticeably simplifies the computation of the geodesic distance.

Theorem 3.

Along a geodesic arc

γ : [0, 1] \to M

, the function

F (γ (t), \dot{γ} (t))

stays constant with respect to t.

Proof.

Let us mention that, for any given pair

(x, v) \in T M

, it holds that

{〈\frac{\partial F (x, v)}{\partial v}, v〉}^{A} = 2 F (x, v),

(166)

which clearly stems from the fact that

F (x, v)

is quadratic in v. Such a property may be shown by recalling that

F (x, v) = {〈 v, G_{x} (v) 〉}^{A}

, where G denotes again a metric kernel, and by noticing that

\frac{\partial F (x, v)}{\partial v} = 2 G_{x} (v),

(167)

from which it follows that

{〈\frac{\partial F (x, v)}{\partial v}, v〉}^{A} = {〈2 G_{x} (v), v〉}^{A},

(168)

hence the property (166). Let us now show that the differential inclusion (153) and the property (166) imply the constancy of the fundamental form

F

along a geodesic, namely that:

F (γ (t), \dot{γ} (t)) = F (γ (0), \dot{γ} (0)) f o r e v e r y v a l u e t \in [0, 1]

(169)

on every geodesic. By multivariable calculus it is readily proven that:

\frac{d}{d t} F (γ, \dot{γ}) = {〈\frac{\partial F}{\partial γ}, \frac{d γ}{d t}〉}^{A} + {〈\frac{\partial F}{\partial \dot{γ}}, \frac{d \dot{γ}}{d t}〉}^{A} .

(170)

Integrating such an equation over the interval

[0, t]

gives:

\int_{0}^{t} d F = \int_{0}^{t} {〈\frac{\partial F}{\partial γ}, \frac{d γ}{d t}〉}^{A} d t + \int_{0}^{t} {〈\frac{\partial F}{\partial \dot{γ}}, \frac{d \dot{γ}}{d t}〉}^{A} d t .

(171)

The integral on the left-hand side is equal to

{F|}_{0}^{t}

, while the last integral on the right-hand side may be evaluated through integration by parts and equals

\begin{matrix} \int_{0}^{t} {〈\frac{\partial F}{\partial \dot{γ}}, \frac{d \dot{γ}}{d t}〉}^{A} d t = & {{〈\frac{\partial F}{\partial \dot{γ}}, \frac{d γ}{d t}〉}^{A}|}_{0}^{t} - \int_{0}^{t} {〈\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}}, \frac{d γ}{d t}〉}^{A} d t \\ = & {2 F|}_{0}^{t} - \int_{0}^{t} {〈\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}}, \frac{d γ}{d t}〉}^{A} d t, \end{matrix}

(172)

by Equation (166). From Equations (171) and (172), it follows that:

{F|}_{0}^{t} = \int_{0}^{t} {〈\frac{\partial F}{\partial γ} - \frac{d}{d t} \frac{\partial F}{\partial \dot{γ}}, \frac{d γ}{d t}〉}^{A} d t + 2 {F|}_{0}^{t},

(173)

hence

- {F|}_{0}^{t} = \int_{0}^{t} {〈\frac{\partial F}{\partial γ} - \frac{d}{d t} \frac{\partial F}{\partial \dot{γ}}, \frac{d γ}{d t}〉}^{A} d t .

(174)

From the differential inclusion (153), it follows that the integrand is null; therefore,

{F|}_{0}^{t} = 0

for every

t \in [0, 1]

. □

The above result means that any geodesic trajectory is traveled at a constant speed. Since, along a geodesic arc, the speed

\sqrt{{〈 {\dot{γ}}_{x}^{y} (t), {\dot{γ}}_{x}^{y} (t) 〉}_{γ_{x}^{y} (t)}}

is constant for any t, it holds that

d (x, y) = \sqrt{{〈 {\dot{γ}}_{x}^{y} (0), {\dot{γ}}_{x}^{y} (0) 〉}_{x}} .

(175)

It is worth underlining that, provided that one knows the functional expression of the geodesic connecting two points, their distance does not require integration, but just algebraic operations. (The difficulty is hidden by the fact that finding the exact expression of a geodesic connecting two points may be more troublesome than one might expect.)

We make an observation in order to justify an idea presented at the beginning of this subsection about normal naïve acceleration for a geodesic.

Observation 2.

Let us take a closer look at the differential inclusion (153), where

F (x, v) = 〈 v, G_{x} (v) 〉

, namely

\frac{\partial {〈 \dot{γ}, G_{γ} (\dot{γ}) 〉}^{A}}{\partial γ} - \frac{d}{d t} \frac{\partial {〈 \dot{γ}, G_{γ} (\dot{γ}) 〉}^{A}}{\partial \dot{γ}} \in N_{γ} M .

(176)

Under the hypothesis that the metric kernel

G_{x}

does not depend explicitly on the point x, the first term on the left-hand side vanishes. In this case, let us plainly assume that

G_{x} \equiv {id}_{x}

. In addition, we have already established that

\frac{\partial {〈 \dot{γ}, G_{γ} (\dot{γ}) 〉}^{A}}{\partial \dot{γ}} = 2 G_{γ} (\dot{γ})

. The above differential inclusion may hence be written as

(G_{x} \equiv {id}_{x} \Rightarrow) \ddot{γ} \in N_{γ} M,

(177)

namely, the naïve acceleration is perpendicular to the velocity. This is perhaps the simplest form of a geodesic equation on a manifold. This is, in fact, the case for the hypersphere surveyed in an example.

A geodesic arc may be expressed in terms of two pieces of information, in addition to its endpoints, such as the initial values

γ (0) = x \in M

and

\dot{γ} (0) = v \in T_{x} M

, which represent the point where a geodesic departs from and its initial speed, respectively. A geodesic arc determined by these two pieces of information will be denoted as

γ_{x, v} : [0, 1] \to M

.

Example 21.

Let us consider a geodesic

γ_{x, v} : [0, 1] \to M

, with

(x, v) \in T M

, such that

{\dot{γ}}_{x, v} (0) = v

. In this case, it holds that

L (γ_{x, v}) = \sqrt{{〈 v, v 〉}_{x}} = \sqrt{F (x, v)} .

(178)

■

8.2. Exponential and Logarithmic Maps

To a geodesic arc

γ_{x, v} : [0, 1] \to M

is associated a map

{exp}_{x} : T_{x} M \to M

defined as:

{exp}_{x} (v) : = γ_{x, v} (1) .

(179)

The function (179) is termed exponential map with pole

x \in M

and is of paramount importance in manifold calculus. The exponential map

exp : T M \to M

takes two arguments, the point

x \in M

over the manifold and a vector

v \in T_{x} M

tangent to the manifold at the pole x, and returns a point on the manifold itself. A practical reading of the exponential map is that it advances a point x toward a direction v, like vector addition moves a point along a straight line in flat spaces. In other words, the following two expressions are the counterparts of each other in a flat space

R^{n}

and in a manifold

M

:

\{\begin{matrix} γ_{x, v} (t) = x + t v i n R^{n}, t \in [0, 1], \\ γ_{x, v} (t) = {exp}_{x} (t v) i n M, t \in [0, 1] . \end{matrix}

(180)

Much like the first expression denotes the shortest/straightest line on a flat space (endowed with a Euclidean metric), the second expression denotes the shortest/straightest line over a curved manifold.

The exponential map depends on the pair

(x, v) \in T M

and is locally invertible around

v = 0

. This follows from the fact that

{exp}_{x} (0) = x

and by local invertibility results. In the flat manifold

R^{n}

, the exponential map may be inverted easily; in fact

from y = x + v it follows that v = y - x .

(181)

In the flat manifold

R^{n}

, we recover the classical and intuitive meaning of the exponential map and of its inverse:

Exponential map: Given a point x and a vector v in $R^{n}$ , the exponential map ${exp}_{x} (v) : = x + v$ moves the point from x to $x + v$ . (Indeed, the term ‘vector’ comes from the homonymus Latin term that means ‘transporter’.)
Inverse exponential map: Given two points x and y in $R^{n}$ , the inverse of the exponential map ${exp}_{x}^{- 1} (y)$ returns a vector v such that ${exp}_{x} (v) = {exp}_{x} ({exp}_{x}^{- 1} (y)) = x + (y - x) \equiv y$ . In other words, the inverse exponential map applied to two points $x, y$ determines the vector v that ‘transports’ x to y.

Let us observe that the inverse exponential map is non-symmetric in its arguments, namely the notation

{exp}_{x}^{- 1} (y)

makes it immediately clear that it is not allowed to swap x with y. Indeed, even in the flat space

R^{n}

, it holds that

{exp}_{x}^{- 1} (y) : = y - x = - (x - y) = - {exp}_{y}^{- 1} (x)

. (In the case of a curved manifold, such a reciprocity relation is more convoluted.)

Generally, the inverse exponential map is termed logarithmic map. On a manifold

M

where it is defined an exponential map

exp : T M \to M

a logarithmic map is denoted as

log : M^{2} \to T M

. A logarithmic map takes as arguments two points

x, y \in M

and returns a vector

v = {log}_{x} (y) \in T_{x} M

. It is important to underline that a logarithmic map is defined only locally, namely, only if

x, y \in M

lay sufficiently close to one another. The lack of a global logarithmic map may be understood as follows. A logarithmic map is defined by the following equation:

{exp}_{x} ({log}_{x} y) = y,

(182)

and therefore, the existence of

{log}_{x} y

is tied to the chance of determining one and only one geodesic arc

γ_{x}^{y}

that connects the given points

x, y \in M

, and then, by the equality

γ_{x}^{y} (t) = γ_{x, v} (t)

, it follows that

{log}_{x} y \equiv v

. However, not every pair of points may be connected by a unique geodesic, and hence a global logarithmic map, in general, fails to exist. A quick example of such an unavoidable problem is found on the sphere

S^{2}

; taking x as the North pole and y as the South pole, there exist infinitely many geodesic lines that connect the two poles. Hence,

{log}_{x} y

is undetermined.

Let us examine the expressions of geodesic arcs, geodesic distances and exponential maps for manifolds of interest in system theory and non-linear control.

Hypercube: The space $R^{p}$ endowed with the Euclidean metric admits straight lines as geodesics; in fact, since $T_{x} R^{p} = R^{p}$ it follows that $N_{x} R^{p} = {0}$ , and hence the geodesic equation is simply ${\ddot{γ}}_{x, v} = 0$ and its solution is $γ_{x, v} (t) = x + t v$ for every $x \in R^{p}$ , $v \in T_{x} R^{p}$ and $t \in [0, 1]$ . Now, take two points $x, y \in R^{p}$ and look for a geodesic arc connecting them with $t \in [0, 1]$ . It is necessary to find a vector $v \in T_{x} R^{p}$ such that $γ_{x, v} (1) = y$ . Such a vector is clearly $v = y - x$ ; hence, the unique geodesic arc connecting x to y is $γ_{x}^{y} (t) = x + t (y - x)$ . Since $\sqrt{{〈 {\dot{γ}}_{x}^{y} (0), {\dot{γ}}_{x}^{y} (0) 〉}_{x}} = ∥ v ∥$ , the Riemannian distance reads

$d (x, y) = ∥ y - x ∥,$

(183)

which is a well-known result from calculus and geometry.
Hypersphere: On the hypersphere $S^{p - 1}$ embedded in the Euclidean space $R^{p}$ , a geodesic line may be conceived as a curve on which a particle, departing from the point $x \in S^{p - 1}$ with velocity $v \in T_{x} S^{p - 1}$ , slides with constant speed $∥ v ∥$ , where $∥ \cdot ∥$ denotes the standard $L_{2}$ vector norm. On the hypersphere, we denote such a curve as $γ_{x, v} (t)$ , where the variable $t \in [0, 1]$ provides a parametrization of the curve. The differential equation characterizing geodesics on the hypersphere may be determined by observing that, with the given conditions, in this case the naïve acceleration of the particle must be either null or normal to the tangent space at any point of the hypersphere itself, namely ${\ddot{γ}}_{x, v} (t) \in N_{γ_{x, v} (t)} S^{p - 1}$ . Since the normal space to a hypersphere at a point x is radial along x, the geodesic equation reads as ${\ddot{γ}}_{x, v} (t) = λ γ_{x, v} (t)$ . In explicit form, the equation of the geodesic on the unit hypersphere may be written as [49]:

$γ_{x, v} (t) = x cos (∥ v ∥ t) + {v sin (∥ v ∥ t) ∥ v ∥}^{- 1}, t \in [0, 1],$

(184)

as it is easy to verify by substitution. Additionally, it is easy to verify that $γ_{x, v} (0) = x$ , ${\dot{γ}}_{x, v} (0) = v$ and that $γ_{x, v} {(t)}^{⊤} γ_{x, v} (t) = 1$ for every t. The exponential map associated with the above geodesic is

${exp}_{x} (v) : = x cos (∥ v ∥) + {v sin (∥ v ∥) ∥ v ∥}^{- 1} .$

(185)

The relationship (184) for the geodesic represents a ‘great circle’ on the hypersphere. Now let us take two points $x, y \in S^{p - 1}$ (non-antipodal, such that $x^{⊤} y \neq - 1$ ) and let us look for a geodesic arc of the form (184) connecting them. It is clearly necessary to find a vector $v \in T_{x} S^{p - 1}$ such that $γ_{x, v} (1) = y$ . Such an equation in the unknown v may be expressed explicitly as

$x cos (∥ v ∥) + v sinc (∥ v ∥) = y,$

(186)

where ‘ $sinc$ ’ denotes the cardinal sine function. Pre-multiplying the above equation by $x^{⊤}$ gives $cos (∥ v ∥) = x^{⊤} y$ , namely $∥ v ∥ = acos (x^{⊤} y)$ ; hence

$v = \frac{y - x (x^{⊤} y)}{sinc (acos (x^{⊤} y))} .$

(187)

This expression represents the inverse of the exponential map applied to points $x, y \in S^{p - 1}$ , namely

${log}_{x} y : = \frac{(I_{p} - x x^{⊤}) y}{sinc (acos (x^{⊤} y))} .$

(188)

Notice that such a logarithmic map is defined only when $x^{⊤} y \neq - 1$ , namely when the two points are antipodal. The unique geodesic arc connecting x to y is given by

$γ_{x}^{y} (t) = x cos (acos (x^{⊤} y) t) + \frac{y - x (x^{⊤} y)}{sinc (acos (x^{⊤} y))} sinc (acos (x^{⊤} y) t) .$

(189)

A noticeable consequence is that, since $\sqrt{{〈 {\dot{γ}}_{x}^{y} (0), {\dot{γ}}_{x}^{y} (0) 〉}_{x}} = ∥ v ∥$ , the Riemannian distance between the points x and y reads

$d (x, y) = acos (x^{⊤} y),$

(190)

where the inverse cosine function ‘ $acos$ ’ returns a value in $[0, π]$ .
Special orthogonal group: In general, it is not easy to obtain the expression of a geodesic arc on a given manifold in closed form. In the present case, with the assumptions considered, the geodesic on $SO (p)$ departing from the identity with velocity $H \in so (p)$ has expression $\tilde{γ} (t) = exp (t H)$ . (It is important to verify that $\tilde{γ} (0) = I_{p}$ and ${\frac{d \tilde{γ} (t)}{d t}|}_{t = 0} = H$ .) It might be useful to verify such an essential result by the help of the following arguments. A geodesic $\tilde{γ} (t)$ on the Riemannian manifold $SO (p)$ , embedded in the Euclidean ambient space $A : = R^{p \times p}$ and endowed with its canonical metric, departing from the identity $I_{p}$ , should satisfy $\ddot{\tilde{γ}} (t) \in N_{\tilde{γ} (t)} SO (p)$ , and therefore it should hold:

$\ddot{\tilde{γ}} (t) = \tilde{γ} (t) S (t), w i t h S^{⊤} (t) = S (t) .$

(191)

Additionally, we know that any geodesic arc belongs entirely to the base manifold; therefore ${\tilde{γ}}^{⊤} (t) \tilde{γ} (t) = I_{p}$ . By differentiating such an expression two times with respect to the parameter t, one obtains:

${\ddot{\tilde{γ}}}^{⊤} (t) \tilde{γ} (t) + 2 {\dot{\tilde{γ}}}^{⊤} (t) \dot{\tilde{γ}} (t) + {\tilde{γ}}^{⊤} (t) \ddot{\tilde{γ}} (t) = 0_{p} .$

(192)

By plugging Equation (191) into Equation (192), we find that $S (t) = - {\dot{\tilde{γ}}}^{⊤} (t) \dot{\tilde{γ}} (t)$ , which leads to the second-order differential equation on the orthogonal group:

$\ddot{\tilde{γ}} (t) = - \tilde{γ} (t) ({\dot{\tilde{γ}}}^{⊤} (t) \dot{\tilde{γ}} (t)),$

(193)

to be solved with the initial conditions $\tilde{γ} (0) = I_{p}$ and $\dot{\tilde{γ}} (0) = H$ . It is a straightforward task to verify that the solution to this second-order differential equation is given by the one-parameter curve $\tilde{γ} (t) = Exp (t H)$ , where ‘Exp’ denotes a matrix exponential.
The expression of the geodesic arc in the position of interest may be made explicit by taking advantage of the Lie-group structure of the orthogonal group endowed with the canonical metric. In fact, let us consider the pair $X \in SO (p)$ and $V \in T_{X} SO (p)$ as well as the geodesic $γ (t)$ that emanates from X, namely $γ (0) = X$ , with velocity V. We claim that the geodesic departing from $X \in SO (p)$ in the direction $V \in T_{X} SO (p)$ is:

$γ_{X, V} (t) = X Exp (t X^{⊤} V) .$

(194)

In fact, let us consider the left-translated curve $\tilde{γ} (t) = X^{⊤} γ_{X, V} (t)$ . It has the following properties:
The curve $γ_{X, V} (t)$ belongs to the orthogonal group at any time. This may be proven by computing the quantity $γ_{X, V}^{⊤} (t) γ_{X, V} (t)$ and taking into account that the identity ${Exp}^{⊤} (t H) = Exp (- t H)$ holds true. Therefore,

$\begin{matrix} γ_{X, V}^{⊤} (t) γ_{X, V} (t) = & {Exp}^{⊤} (t X^{⊤} V) X^{⊤} X Exp (t X^{⊤} V) \\ = & Exp (- t X^{⊤} V) Exp (t X^{⊤} V) = \\ = & I_{p} . \end{matrix}$

(2)
It satisfies Equation (193); hence, it is a geodesic. In fact, notice that $\tilde{γ} (t) = X^{⊤} X Exp (t X^{⊤} V) = Exp (t X^{⊤} V)$ ; hence, $\dot{\tilde{γ}} (t) = X^{⊤} V Exp (t X^{⊤} V)$ and $\ddot{\tilde{γ}} (t) = {(X^{⊤} V)}^{2} Exp (t X^{⊤} V)$ . Now, the right-hand side of Equation (193) has the expression $- Exp (t X^{⊤} V) {(X^{⊤} V Exp (t X^{⊤} V))}^{⊤} X^{⊤} V Exp (t X^{⊤} V) = - Exp (t X^{⊤} V)$ $Exp (- t X^{⊤} V) {(X^{⊤} V)}^{⊤} X^{⊤} V Exp (t X^{⊤} V) = {(X^{⊤} V)}^{2} Exp (t X^{⊤} V)$ because $X^{⊤} V \in so (p)$ , hence the claim.
(3)
It satisfies $\tilde{γ} (0) = X$ and $\dot{\tilde{γ}} (0) = X (X^{⊤} V) = V$ ; hence, it has the correct base point and direction.
Therefore, the exponential map associated with the above geodesic expression reads

${exp}_{X} (V) : = X Exp (X^{⊤} V) .$

(195)

Now, fixed two points $X, Y \in SO (p)$ , let us compute the geodesic arc $γ_{X}^{Y} : [0, 1] \to SO (p)$ that joins them. It all boils down in finding a tangent vector $V \in T_{X} SO (p)$ such that $Y = X Exp (X^{⊤} V)$ . Pre-multiplying this expression by $X^{⊤}$ gives $X^{⊤} Y = Exp (X^{⊤} V)$ ; hence, $Log (X^{⊤} Y) = X^{⊤} V$ . Pre-multiplying the last equality by X gives the sought tangent vector as $V = X Log (X^{⊤} Y)$ . The logarithmic map associated with the above exponential map therefore reads

${log}_{X} Y : = X Log (X^{⊤} Y) .$

(196)

The Riemannian distance between X and Y then takes the expression

$d (X, Y) = ∥ X Log (X^{⊤} Y) ∥_{F} = {∥ Log (X^{⊤} Y) ∥}_{F},$

(197)

where ‘Log’ denotes, as usual, the matrix logarithm.
Stiefel manifold: Let us consider the expression of geodesics corresponding to two metrics.
- Euclidean metric: The solution of the geodesic equation, with the initial conditions $γ_{X, V} (0) = X \in St (n, p)$ and ${\dot{γ}}_{X, V} (0) = V \in T_{X} St (n, p)$ , reads [33]:
  
  $γ_{X, V} (t) = [X V] Exp (t [\begin{matrix} X^{⊤} V & - V^{⊤} V \\ I_{p} & X^{⊤} V \end{matrix}]) I_{2 p, p} Exp (- t X^{⊤} V),$
  
  (198)
  
  for $t \in [0, 1]$ . The expression of the exponential map corresponding to the Euclidean metric reads, therefore,
  
  ${exp}_{X} (V) : = [X V] Exp ([\begin{matrix} X^{⊤} V & - V^{⊤} V \\ I_{p} & X^{⊤} V \end{matrix}]) I_{2 p, p} Exp (- X^{⊤} V),$
  
  (199)
  
  while the expression of the logarithmic map and of the Riemannian distance between two given points, in closed form, are unknown at present (to the best of the author’s knowledge). The pseudo-identity matrix $I_{2 p, p}$ , with $2 p$ rows and p columns, is just an identity matrix topping a zero matrix.
- Canonical metric: The geodesic arc $γ_{X, V}$ may be computed as follows. Let Q and R denote the factors of the thin QR factorization of the matrix V, then:
  
  $γ_{X, V} (t) = [X Q] Exp (t [\begin{matrix} X^{⊤} V & - R^{⊤} \\ R & 0_{p} \end{matrix}]) [\begin{matrix} I_{p} \\ 0_{p} \end{matrix}],$
  
  (200)
  
  for $t \in [0, 1]$ . The expression of the exponential map corresponding to the canonical metric is, therefore,
  
  ${exp}_{X} (V) : = [X Q] Exp ([\begin{matrix} X^{⊤} V & - R^{⊤} \\ R & 0_{p} \end{matrix}]) [\begin{matrix} I_{p} \\ 0_{p} \end{matrix}],$
  
  (201)
  
  while the expressions of the logarithmic map and of the Riemannian distance between two points are still unknown.
  In fact, neither the logarithmic map nor the geodesic distance are known in closed form for a Stiefel manifold.
Real symplectic group: According to the two considered metrics, we have:
- KM metric: Under the pseudo-Riemannian metric (115), it is indeed possible to solve the geodesic equation in closed form. The geodesic curve $γ_{Q, V} : [0, 1] \to Sp (2 n)$ with $Q \in Sp (2 n)$ and $V \in T_{Q} Sp (2 n)$ corresponding to the indefinite Khvedelidze–Mladenov metric (115) has the expression:
  
  $γ_{Q, V} (t) = Q Exp (t Q^{- 1} V) .$
  
  (202)
  
  In fact, the geodesic equation in variational form is:
  
  $δ \int_{0}^{1} tr (γ^{- 1} \dot{γ} γ^{- 1} \dot{γ}) d t = 0 .$
  
  (203)
  
  The calculation of this variation is facilitated by the following rules of the calculus of variations:
  
  $\begin{matrix} δ (Q^{- 1}) = - Q^{- 1} (δ Q) Q^{- 1}, \end{matrix}$
  
  (204)
  
  $\begin{matrix} δ (\frac{d Q}{d t}) = \frac{d}{d t} (δ Q), \end{matrix}$
  
  (205)
  
  $\begin{matrix} δ (Q Z) = (δ Q) Z + Q (δ Z), \end{matrix}$
  
  (206)
  
  for curves $Q, Z : [0, 1] \to Sp (2 n)$ . By computing the variations, integrating by parts and recalling that the variations vanish at endpoints, it is found that the geodesic equation in variational form reads:
  
  $\int_{0}^{1} tr (δ γ (γ^{- 1} \ddot{γ} γ^{- 1} - γ^{- 1} \dot{γ} γ^{- 1} \dot{γ} γ^{- 1})) d t = 0 .$
  
  (207)
  
  The variation $δ γ \in T_{γ} Sp (2 n)$ is arbitrary. By the structure of the normal space $N_{Q} Sp (2 n)$ , the equation $tr (P^{⊤} δ γ) = 0$ , with $δ γ \in T_{γ} Sp (2 n)$ , implies that $P^{⊤} = H J γ^{- 1}$ with $H \in so (2 n)$ . Therefore, Equation (207) is satisfied if and only if:
  
  $γ^{- 1} \ddot{γ} γ^{- 1} - γ^{- 1} \dot{γ} γ^{- 1} \dot{γ} γ^{- 1} = H J γ^{- 1}, H \in so (2 n),$
  
  (208)
  
  or, equivalently,
  
  $\ddot{γ} - \dot{γ} γ^{- 1} \dot{γ} = γ H J,$
  
  (209)
  
  for some $H \in so (2 n)$ . In order to determine the value of matrix H, note that:
  
  $γ^{⊤} J γ - J = 0 \Rightarrow {\ddot{γ}}^{⊤} J γ + 2 {\dot{γ}}^{⊤} J \dot{γ} + γ^{⊤} \ddot{γ} = 0 .$
  
  Substituting the expression $\ddot{γ} = \dot{γ} γ^{- 1} \dot{γ} + γ H J$ into the above equation yields the condition $J H J = 0$ . Hence, $H = 0$ and the geodesic equation reads:
  
  $\ddot{γ} - \dot{γ} γ^{- 1} \dot{γ} = 0 .$
  
  (210)
  
  Its solution, with the initial conditions $γ (0) = Q \in Sp (2 n)$ and $\dot{γ} (0) = V \in T_{X} Sp (2 n)$ , is found to be of the form (202). By definition of matrix exponential, it follows that ${\dot{γ}}_{Q, V} (t) = V Exp (t Q^{- 1} V)$ .
- Euclidean metric: The expression of the geodesic corresponding to the Euclidean metric was derived in [50]. Let $γ : [0, 1] \to Sp (2 n)$ be a geodesic arc connecting the points $X, Y \in Sp (2 n)$ . Let us define $h (t) : = γ^{- 1} (t) \dot{γ} (t)$ . The geodesic that minimizes the following energy functional
  
  $\int_{0}^{1} tr (h^{⊤} (t) h (t)) d t,$
  
  (211)
  
  is the solution of the differential (Lax) equation:
  
  $\dot{h} (t) = h^{⊤} (t) h (t) - h (t) h^{⊤} (t) .$
  
  (212)
  
  Furthermore, for the initial conditions $γ (0) = X \in Sp (2 n)$ and $\dot{γ} (0) = V \in T_{X} Sp (2 n)$ , the geodesic on the real symplectic group is given by
  
  $γ_{X, V} (t) = X Exp (t {(X^{- 1} V)}^{⊤}) Exp (t [(X^{- 1} V) - {(X^{- 1} V)}^{⊤}]),$
  
  (213)
  
  in the case that the real symplectic group is equipped by a Euclidean metric.
Space of symmetric, positive-definite matrices: The geodesic arc, corresponding to the canonical metric, emanating from a point $P \in S^{+} (p)$ in the direction $V \in T_{P} S^{+} (p)$ has the expression:

$γ_{P, V} (t) = \sqrt[S]{P} Exp (t \sqrt[S]{P^{- 1}} V \sqrt[S]{P^{- 1}}) \sqrt[S]{P},$

(214)

where $\sqrt[S]{\cdot}$ denotes a symmetric matrix square root. The exponential and the logarithmic maps for $P, Q \in S^{+} (p)$ and $V \in T_{P} S^{+} (p)$ thus read:

$\begin{matrix} {exp}_{P} (V) = \sqrt[S]{P} Exp (\sqrt[S]{P^{- 1}} V \sqrt[S]{P^{- 1}}) \sqrt[S]{P}, \end{matrix}$

(215)

$\begin{matrix} {log}_{P} (Q) = \sqrt[S]{P} Log (\sqrt[S]{P^{- 1}} Q \sqrt[S]{P^{- 1}}) \sqrt[S]{P} . \end{matrix}$

(216)

The symmetric matrix square root of a $S^{+} (p)$ matrix P may be computed by means of its eigenvalue factorization. In fact, if the matrix P is factored as $X diag (λ_{1}, λ_{2}, λ_{3}, \dots, λ_{p}) X^{⊤}$ , with $X \in O (p)$ and every $λ_{k} > 0$ , then it holds that $\sqrt[S]{P} = X diag (\sqrt{λ_{1}}, \sqrt{λ_{2}}, \sqrt{λ_{3}}, \dots,$ $\sqrt{λ_{p}}) X^{⊤}$ . The squared Riemannian distance between two points $P, Q \in S^{+} (p)$ is given by

$\begin{matrix} d^{2} (P, Q) & = {〈 {log}_{P} Q, {log}_{P} Q 〉}_{P} = tr (({log}_{P} Q) P^{- 1} ({log}_{P} Q) P^{- 1}) \\ = tr (\sqrt[S]{P} Log (\sqrt[S]{P^{- 1}} Q \sqrt[S]{P^{- 1}}) \sqrt[S]{P} P^{- 1} \sqrt[S]{P} Log (\sqrt[S]{P^{- 1}} Q \sqrt[S]{P^{- 1}}) \sqrt[S]{P} P^{- 1}) \\ = tr ({Log}^{2} (\sqrt[S]{P^{- 1}} Q \sqrt[S]{P^{- 1}}) \sqrt[S]{P} P^{- 1} \sqrt[S]{P}) \\ = tr ({Log}^{2} (\sqrt[S]{P^{- 1}} Q \sqrt[S]{P^{- 1}})) . \end{matrix}$

Notice that the following identity holds true:

$tr {(I_{p} - \sqrt[S]{P^{- 1}} Q \sqrt[S]{P^{- 1}})}^{k} = tr {(I_{p} - Q P^{- 1})}^{k},$

(217)

and hence the Riemannian distance between two SPD matrices may be written equivalently as

$d^{2} (P, Q) = tr ({Log}^{2} (Q P^{- 1})),$

(218)

as obtained by the series expansion of the matrix logarithm function (cf. Section 2.1).
Grassmann manifold: A geodesic arc on a Grassmann manifold emanating from $[X] \in Gr (n, p)$ with the velocity $V \in T_{[X]} Gr (n, p)$ may be written as:

$\begin{matrix} γ_{[X], V} (t) = & [X B A] [\begin{matrix} cos (D t) \\ sin (D t) \end{matrix}] B^{⊤}, \\ = & X B cos (D t) B^{⊤} + A sin (D t) B^{⊤} . \end{matrix}$

(219)

where $X \in St (n, p)$ denotes a Stiefel matrix whose columns form an orthonormal basis of the subspace $[X]$ and $A D B^{⊤}$ denotes the compact singular value factorization of the matrix V. The sin/cos functions applied to a diagonal matrix simply act, in terms of components, on the entries of the main diagonal. As a compact, smooth Riemannian manifold, a Grassmann manifold is geodesically complete, which implies that any two points on $Gr (n, p)$ may be connected by a geodesic arc.
The exponential map associated with the canonical metric reads:

${exp}_{[X]} (V) = X B cos (D) B^{⊤} + A sin (D) B^{⊤} .$

(220)

The logarithmic map ${log}_{[X]} [Y]$ of two subspaces $[X], [Y] \in Gr (n, p)$ is not easy to compute in general. In [40], it is shown that if their Stiefel representatives $X, Y \in St (n, p)$ are such that the product $X^{⊤} Y$ is symmetric, then ${log}_{[X]} [Y] = A D B^{⊤}$ , where $B (cos D) B^{⊤}$ denotes the spectral factorization of the matrix $X^{⊤} Y$ and $A : = (I - X X^{⊤}) Y B {(sin D)}^{- 1}$ .

Example 22.

Let us compute the length of a geodesic curve

γ_{x}^{y} : [0, 1] \to S^{n - 1}

that, by definition, represents the distance between points

x, y \in S^{n - 1}

:

d (x, y) : = L (γ_{x}^{y}) .

(221)

Let us start by showing that:

{\dot{γ}}_{x}^{y} (t) = \frac{1}{sin θ} [θ cos (θ t) y - θ cos (θ (1 - t)) x]

(222)

with

cos θ : = x^{⊤} y

; hence,

\begin{matrix} ∥ {\dot{γ}}_{x}^{y} {(t) ∥}^{2} = & \frac{θ^{2}}{{sin}^{2} θ} [{cos}^{2} (θ t) + {cos}^{2} (θ (1 - t)) - 2 cos (θ t) cos (θ (1 - t)) x^{⊤} y] \\ = & θ^{2} = {acos}^{2} (x^{⊤} y) . \end{matrix}

(223)

It is instructive to verify that

∥ {\dot{γ}}_{x}^{y} {(t) ∥}^{2}

does not actually depend on t, which facilitates the computation of the length. In fact, the length of

γ_{x}^{y} : [0, 1] \to S^{n - 1}

is given by:

L (γ_{x}^{y}) = \int_{0}^{1} ∥ {\dot{γ}}_{x}^{y} (t) ∥ d t = acos (x^{⊤} y) \int_{0}^{1} d t .

(224)

Therefore, the distance between

x, y \in S^{n - 1}

equals:

d (x, y) = acos (x^{⊤} y) .

(225)

As a further example, let us determine the length of the geodesic curve

γ_{R, V} : [0, 1] \to SO (n)

given by the relationship (194). By definition, it holds that

L (γ_{R, V}) = \int_{0}^{1} ∥ {\dot{γ}}_{R, V} (t) ∥ d t .

(226)

Let us show that

{\dot{γ}}_{R, V} (t) = (R R^{⊤}) V Exp (R^{⊤} V t) = V Exp (R^{⊤} V t),

(227)

therefore

\begin{matrix} L (γ_{R, V}) = & \int_{0}^{1} \sqrt{tr [{(V Exp (R^{⊤} V t))}^{⊤} V Exp (R^{⊤} V t)]} d t \\ = & \int_{0}^{1} \sqrt{tr [Exp (- R^{⊤} V t) V^{⊤} V Exp (R^{⊤} V t)]} d t \\ = & \int_{0}^{1} ∥ V ∥ d t = ∥ V ∥, \end{matrix}

(228)

as proven in the general theoretical developments. ■

The following observation clarifies a technical argument.

Observation 3.

The parameter t in the expressions of geodesic lines, which we may identify astime, indicates which point of a geodesic one is referring to. Normally, the value

t = 0

corresponds to the initial point x, namely

γ_{x, v} (0) \equiv x

. It is instructive to observe that, in all examined cases, the velocity v and the time parameter t appear to be multiplied to one another. For example, the equation of the geodesic

γ_{x, v}

on a hypersphere may be rewritten as

γ_{x, v} (t) = cos (∥ v ∥ t) x + sin (∥ v ∥ t) \frac{v t}{∥ v ∥ t}, t > 0 .

(229)

This means that it is always possible to re-scale the parameter t as

α t

as long as the velocity v is re-scaled as

v / α

, where

α > 0

. This is the reason why the time parameter t normally ranges in the interval

[0, 1]

.

8.3. Geodesic Interpolation

The notions of metric, distance, geodesics, are related to the notion of interpolation. In fact, the interpolation between two points

x, y \in M

may be defined through the optimization problem [51]:

ι_{x}^{y} (t) : = arg min_{z \in M} [(1 - t) d^{2} (x, z) + t d^{2} (z, y)],

(230)

with parameter

t \in [0, 1]

providing the degree of interpolation between the two points. The one-parameter curve

ι_{x}^{y} : [0, 1] \to M

describes a trajectory over the manifold

M

, having x and y as endpoints, which may be regarded as an interpolation curve. For example, the value

ι_{x}^{y} (\frac{1}{2})

is regarded as a midpoint between points x and y.

Example 23.

Let us consider two close points

X, Y \in SO (p)

and let us look for an interpolation of such points in

SO (p)

endowed with its canonical metric. We are looking for a minimizer of the criterion:

Z \mapsto (1 - t) ∥ Log (X^{⊤} Z) ∥_{F}^{2} + t {∥ Log (Y^{⊤} Z) ∥}_{F}^{2},

(231)

with

t \in [0, 1]

assigned. We may look for an explicit solution of the above minimization problem by reasoning as follows. The points X and Y may be connected by a geodesic arc

γ_{X, V} (τ) = X Exp (τ X^{⊤} V)

with

τ \in [0, 1]

and

V \in T_{X} SO (p)

such that

γ_{X, V} (1) = Y

, namely, with V satisfying condition

Exp (X^{⊤} V) = X^{⊤} Y

. As the interpolating point

ι_{X}^{Y} (t)

must be as close as possible to both X and Y, it should belong to the geodesic

γ_{X, V} (τ)

; therefore, we are left with the problem of minimizing the criterion function:

\begin{matrix} τ & \mapsto & (1 - t) ∥ Log (X^{⊤} X Exp (τ X^{⊤} V)) ∥_{F}^{2} + t {∥ Log (Y^{⊤} X Exp (τ X^{⊤} V)) ∥}_{F}^{2} \\ = (1 - t) ∥ τ X^{⊤} {V ∥}_{F}^{2} + t {∥ (τ - 1) X^{⊤} V ∥}_{F}^{2} \\ = ((1 - t) τ^{2} + t {(τ - 1)}^{2}) {∥ Log (X^{⊤} Y) ∥}_{F}^{2}, \end{matrix}

with respect to the variable τ. In the second line, we have used the identity

Y^{⊤} X = Exp (- X^{⊤} V)

. The minimum of the above criterion function in

[0, 1]

is readily seen to be achieved in

τ = t

; thus, the interpolating curve reads:

ι_{X}^{Y} (t) = X Exp (t Log (X^{⊤} Y)) = X {(X^{⊤} Y)}^{t}

(232)

for

0 \leq t \leq 1

. Notice that a matrix power

A^{α}

, with

A \in R^{n \times n}

and

α \in R

, is defined (inasmuch as it exists) in terms of matrix exponential/logarithm, hence the last identity. ■

Generalizing the notion of midpoint, one comes up with the notion of mean value (and of dispersion of a sample set around its mean value) in a metrizable manifold

M

. We may consider what follows:

The notion of ‘mean value’ of objects in a metrizable space should reflect the intuitive understanding that the mean value is an element of the space that locates ‘amidst’ the available objects. Therefore, a fundamental tool in the definition of mean value is a measure of ‘how far apart’ elements in the sample space lie to one another.
The notion of ‘metric variance’ of objects in a metrizable space should be defined in a way that accounts for the dispersion of these objects about their mean values and also depends on how the dissimilarity of such objects is measured.

A way of defining the mean value of a set of objects

x_{k} \in M

,

k = 1, \dots, K

, is provided by the notion of Fréchet sample mean. The Fréchet sample mean and associated sample metric variance [52] may be defined as:

\begin{matrix} μ : = arg min_{x \in M} \frac{1}{K} \sum_{k = 1}^{K} d^{2} (x, x_{k}), \end{matrix}

(233)

\begin{matrix} σ^{2} : = min_{x \in M} \frac{1}{K} \sum_{k = 1}^{K} d^{2} (x, x_{k}), \end{matrix}

(234)

where d denotes a distance function in the Riemannian manifold

M

. It is worth noting that there is no guarantee, in general, that the optimization problem (233) admits a unique solution. For example, consider the space

S^{p - 1}

and a sample set consisting of two antipodal points; then, every point over the equator is a mean value. (Notice that a mean is Fréchet only if it is a global minimizer, otherwise it is a Karcher mean.) If the sample points are close enough to each other, it is known that the mean value is unique [53].

The notion of mean value of a set of points belonging to a curved manifold is utterly important in system theory and non-liner control. In fact, the mean value

μ

is, by definition, close to all points in a collection of points. Therefore, the tangent space

T_{μ} M

associated with a cloud of data points may serve as a reference tangent space (see, for example, [54]).

8.4. Coordinate-Prone Geodesy, Christoffel Symbols*

In terms of the variation of an energy functional, a geodesic on a manifold

M

is defined as the unique curve

γ (t) \in M

,

t \in [0, 1]

that meets the following requirement

δ \int_{0}^{1} {〈 \dot{γ}, \dot{γ} 〉}_{γ} d t = 0,

(235)

where the symbol

δ

denotes the variation of the integral functional. The integral functional in (235) represents the total action associated with the parametrized smooth curve

γ : [0, 1] \to M

of local coordinates

x^{k} = γ^{k} (t)

and may be written explicitly as:

\int_{0}^{1} {〈 \dot{γ}, \dot{γ} 〉}_{γ} d t = \int_{0}^{1} g_{i j} (γ (t)) {\dot{γ}}^{i} (t) {\dot{γ}}^{j} (t) d t .

(236)

The variation

δ

in the expression (235) corresponds to a perturbation of the total action (236). Let

η : [0, 1] \to M

denote an arbitrary parametrized smooth curve of local coordinates

η^{k} (t)

such that

η^{k} (0) = η^{k} (1) = 0

. Define the perturbed action as

A_{η}^{*} (h) : = \int_{0}^{1} g_{i j} (γ + h η) ({\dot{γ}}^{i} + h {\dot{η}}^{i}) ({\dot{γ}}^{j} + h {\dot{η}}^{j}) d t,

(237)

with

h \in [- ϵ, ϵ]

,

ϵ > 0

. The condition (235) may be expressed explicitly in terms of the perturbed action as:

lim_{h \to 0} \frac{A_{η}^{*} (h) - A_{η}^{*} (0)}{h} = 0, \forall perturbation η .

(238)

The perturbed total action may be expanded around

h = 0

as follows:

\begin{matrix} A_{η}^{*} (h) = & \int_{0}^{1} {[g_{i j} (x) + h \frac{\partial g_{i j}}{\partial x^{k}} η^{k} + O (h^{2})]|}_{x = γ (t)} [{\dot{γ}}^{i} {\dot{γ}}^{j} + h ({\dot{γ}}^{i} {\dot{η}}^{j} + {\dot{γ}}^{j} {\dot{η}}^{i}) + O (h^{2})] d t \\ = & \int_{0}^{1} g_{i j} (γ) {\dot{γ}}^{i} {\dot{γ}}^{j} d t + h \int_{0}^{1} {[g_{i j} (x) ({\dot{γ}}^{i} {\dot{η}}^{j} + {\dot{γ}}^{j} {\dot{η}}^{i}) + η^{k} \frac{\partial g_{i j}}{\partial x^{k}} {\dot{γ}}^{i} {\dot{γ}}^{j}]|}_{x = γ (t)} d t + O (h^{2}) \\ = & A_{η}^{*} (0) + h \int_{0}^{1} {[g_{i k} (x) {\dot{γ}}^{i} {\dot{η}}^{k} + g_{k j} (x) {\dot{γ}}^{j} {\dot{η}}^{k} + \frac{\partial g_{i j}}{\partial x^{k}} {\dot{γ}}^{i} {\dot{γ}}^{j} η^{k}]|}_{x = γ (t)} d t + O (h^{2}), \end{matrix}

where

O (h^{2})

is such that

{lim}_{h \to 0} h^{- 1} O (h^{2}) = 0

. The first term in the integral on the right-hand side of the last line may be integrated by parts, namely:

\int_{0}^{1} g_{i k} (γ) {\dot{γ}}^{i} {\dot{η}}^{k} d t = {g_{i k} (γ) {\dot{γ}}^{i} η^{k}|}_{0}^{1} - \int_{0}^{1} \frac{d (g_{i k} (γ) {\dot{γ}}^{i})}{d t} η^{k} d t,

(239)

whose first term on the right-hand side vanishes to zero because

η^{k} (0) = η^{k} (1) = 0

, hence:

\int_{0}^{1} g_{i k} (γ) {\dot{γ}}^{i} {\dot{η}}^{k} d t = - \int_{0}^{1} {(\frac{\partial g_{i k}}{\partial x^{j}} {\dot{γ}}^{i} {\dot{γ}}^{j} + g_{i k} (x) {\ddot{γ}}^{i})|}_{x = γ (t)} η^{k} d t .

(240)

A similar result holds for the second term within the integral on the right-hand side. Therefore, it holds that:

\frac{A_{η}^{*} (h) - A_{η}^{*} (0)}{h} = \int_{0}^{1} {[(\frac{\partial g_{i j}}{\partial x^{k}} - \frac{\partial g_{i k}}{\partial x^{j}} - \frac{\partial g_{k j}}{\partial x^{i}}) {\dot{γ}}^{i} {\dot{γ}}^{j} - 2 g_{i k} (x) {\ddot{γ}}^{i}]|}_{x = γ (t)} η^{k} d t + \frac{O (h^{2})}{h} .

The condition (238), therefore, implies that:

g_{i k} {\ddot{γ}}^{i} + \frac{1}{2} (- \frac{\partial g_{i j}}{\partial x^{k}} + \frac{\partial g_{i k}}{\partial x^{j}} + \frac{\partial g_{k j}}{\partial x^{i}}) {\dot{γ}}^{i} {\dot{γ}}^{j} = 0 .

(241)

Let us now recall the notion of Christoffel symbols of the first kind (named after Elwin Bruno Christoffel) associated with a metric tensor of components

g_{i j}

. These coefficients are defined as:

Γ_{i j k} : = \frac{1}{2} (\frac{\partial g_{j k}}{\partial x^{i}} + \frac{\partial g_{i k}}{\partial x^{j}} - \frac{\partial g_{i j}}{\partial x^{k}}) .

(242)

On top of it, the Christoffel symbols of the second kind are defined as

Γ_{i j}^{h} : = g^{h k} Γ_{i j k}

. The Christoffel symbols of the second kind are symmetric in the covariant indices, namely,

Γ_{i j}^{h} = Γ_{j i}^{h}

.

On the basis of the Christoffel symbols, Equation (241) may be rewritten as

g_{i k} {\ddot{γ}}^{i} + Γ_{i j k} {\dot{γ}}^{i} {\dot{γ}}^{j} = 0 .

(243)

Multiplying by

g^{h k}

and introducing the Christoffel symbols of the second kind, Equation (243) may be written as:

{\ddot{γ}}^{i} + Γ_{j k}^{i} {\dot{γ}}^{j} {\dot{γ}}^{k} = 0 .

(244)

Such a system of second-order differential equations needs two initial conditions to be solved. Typical initial conditions are

γ^{i} (0) = x^{i}

and

{\dot{γ}}^{i} (0) = v^{i}

.

Example 24.

One might wonder if the Christoffel symbols defined in (242) are the components of a tensor. The answer is negative, because none of its lower index denote covariancy. ■

9. Riemannian Gradient of a Manifold-to-Scalar Function

The gradient of a scalar function in a point of its manifold-type domain represents a degree of variability of the function in a vicinity of such a point. In the present section, we are going to survey the notion of ‘Riemannian gradient’ starting from more familiar cases to obtain a full definition for manifold-to-scalar functions.

9.1. Riemannian Gradient: Motivation and Definition

In the simplest case of a scalar function of a scalar variable

f : R \to R

, its degree of variability is quantified by its slope

\frac{d f}{d x}

. In fact, let us recall that, given a point

x \in R

, moving slightly away from it of a little amount

h \in R

, the value of the function

f (x + h)

may be related to the value of

f (x)

through a Taylor series expansion. Such an expansion, truncated to a first-order term by a Lagrange-type remainder, reads

f (x + h) = f (x) + \frac{d f (x)}{d x} h + O (h^{2}),

(245)

and hence the first derivative of the function (that is, its gradient) truly quantifies the variability of the function around the point x. In the above expression,

O

again denotes a Landau symbol.

In the more involved case of a scalar function of several variables,

f : R^{p} \to R

, the gradient

\frac{\partial f}{\partial x}

quantifies again the degree of variability of the function around a point, although such information is spread along p axes. In fact, in this case, there exist multiple directions along which the variability of a function may be explored. After choosing a direction, one may move away along a straight line. Starting from a point

x \in R^{p}

and moving away along a direction

v \in R^{p}

of a small fraction t, the value of

f (x + t v)

may be related to the value of

f (x)

through a multivariable Taylor series expansion which, in the first order, reads

f (x + t v) = f (x) + {(\frac{\partial f (x)}{\partial x})}^{⊤} (t v) + O (t^{2}) .

(246)

Therefore, the gradient of the function f (which, in this case, may be termed ‘Euclidean gradient’ and coincides with the ‘Gateaux derivative’ and with the ‘Jacobian’ of the function) encodes the variability of the function and the quantity

{(\frac{\partial f (x)}{\partial x})}^{⊤} v

represents the directional derivative of the function f at x along the direction v. In the following, we shall utilize the shorter notation

\partial_{x} : = \frac{\partial}{\partial x}

.

In the case of a scalar function whose domain is a curved manifold

f : M \to R

, its gradient, which we shall denote as

{grad}_{x} f

, serves again to quantify the degree of variability of a function in the neighborhood of point

x \in M

along a given tangent direction. On the same line of reasoning as before, starting from a point

x \in M

one may think of moving away slightly along a given tangent direction

v \in T_{x} M

. After choosing a direction, the simplest way to move away from a point along a given direction is to travel along a geodesic line for a short time. Thus, taking a geodesic

γ_{x, v} : [- ϵ, ϵ] \to M

, the value of

f (γ_{x, v})

may be related to the value

f (x)

through a Taylor-type expansion, which is written as

f (γ_{x, v} (t)) = f (x) + {〈 {grad}_{x} f, t v 〉}_{x} + O (t^{2}) .

(247)

Therefore, the gradient of the function f quantifies the variability of a function around a point, while

{〈 {grad}_{x} f, v 〉}_{x}

may be interpreted as a directional derivative of the function f at x along a direction v. From the relationship (247), we see that

\frac{f (γ_{x, v} (t)) - f (x)}{t} = {〈 {grad}_{x} f, v 〉}_{x} + \frac{O (t^{2})}{t} .

(248)

Taking the limit as t approaches zero, the second addendum on the right-hand side vanishes to zero; hence, we obtain

{〈 {grad}_{x} f, v 〉}_{x} = lim_{t \to 0} \frac{f (γ_{x, v} (t)) - f (x)}{t} = {\frac{d f (γ_{x, v} (t))}{d t}|}_{t = 0} .

(249)

The right-hand side is resemblant of a Gateaux derivative of the function f in the direction v. The relationship (249) may be indeed taken as a definition of the gradient of the function with the proviso that it should hold for every

v \in T_{x} M

.

By taking a closer look at the expression (249), it is readily seen that it holds even if the geodesic line is replaced by any smooth curve

γ : [- ϵ, ϵ] \to M

as long as

γ (0) = x

and

\dot{γ} (0) = v

, in which case we can write

{〈 {grad}_{x} f, v 〉}_{x} = lim_{t \to 0} \frac{f (γ (t)) - f (x)}{t} .

(250)

Now let us underline an illusory contradiction in the above expression.

Observation 4.

The curve γ takes part only in closed proximity of the point x; therefore, the right-hand side of the above relation only depends on the point x and the tangent v, as well as the function f, and, in particular, it does not depend on the metric. In other words, the result of the expression

{lim}_{t \to 0} \frac{f (γ (t)) - f (x)}{t}

quantifies how much the function f varies upon moving away from the point x toward the direction v. As opposed to this, the left-hand side of the relation (250) seems to depend on the metric

〈 \cdot, \cdot 〉

. Since the metric may be chosen arbitrarily and independently of the function f and of the quantities x and v, the above relation looks like a conceptual clash.

The only possible way to fix such an illusory contradiction is to recognize that the gradient itself must depend on the metric in such a way that the directional derivative

{〈 {grad}_{x} f, v 〉}_{x}

does not. In other words, if we denote by

{〈 \cdot, \cdot 〉}^{(1)}

and

{〈 \cdot, \cdot 〉}^{(2)}

two metrics for the manifold

M

, then it must hold that

{〈 {grad}_{x}^{(1)} f, v 〉}_{x}^{(1)} = {〈 {grad}_{x}^{(2)} f, v 〉}_{x}^{(2)}

(251)

for every

v \in T_{x} M

, where

{grad}_{x}^{(1)} f

denotes the gradient of the function f at x corresponding to the metric

{〈 \cdot, \cdot 〉}^{(1)}

and

{grad}_{x}^{(2)} f

denotes the gradient of the function f at x corresponding to the metric

{〈 \cdot, \cdot 〉}^{(2)}

.

Let us formalize the above discussion with the aim of formalizing the notion of Riemannian gradient on a Riemannian manifold. Let us consider a Riemannian manifold

M

embedded in an ambient space

A

and, for every point x, the tangent space

T_{x} M

. Let us now consider a smooth function

f : M \to R

and an inner product

{〈 \cdot, \cdot 〉}^{A}

in

A

. We shall require that the function f is extendable to a neighborhood of x in

A

so as to be able to compute the Gateaux derivative

\partial_{x}

of the function f with respect to x. (As usual, we are supposed to use an extension

\bar{f}

in the following expressions.) Let us recall that

lim_{h \to 0} \frac{f (x + h v) - f (x)}{h} = : {〈 \partial_{x} f, v 〉}^{A} .

(252)

We denote the inner product that the tangent bundle

T M

is endowed with at

x \in M

as

{〈 \cdot, \cdot 〉}_{x}

. The Riemannian gradient

{grad}_{x}^{M} f

of the function f over the manifold

M

at the point x is uniquely defined by the following two conditions:

Tangency condition: For every $x \in M$ , ${grad}_{x}^{M} f \in T_{x} M$ .
Metric compatibility condition: For every $x \in M$ and every $v \in T_{x} M \subset A$ , it holds that ${〈 {grad}_{x}^{M} f, v 〉}_{x} = {〈 \partial_{x} f, v 〉}^{A}$ .

The tangency condition codifies the requirement that a Riemannian gradient must always be a tangent vector to the manifold, namely

(x, {grad}_{x} f) \in T M

, while the metric compatibility condition codifies the requirement that the inner product between a gradient vector and any other tangent vector belonging to the same tangent space takes a value that is invariant with respect to the chosen metric. The latter condition is better understood by considering that the linear component of the variation of a function when moving away from a point x of a quantity v is given by

d_{x} f (v) = {〈 {grad}_{x}^{M} f, v 〉}_{x} .

(253)

Clearly, the amount

d_{x} f (v)

depends only on x, f and the displacement v, certainly not on the metric. As a consequence, the gradient needs to depend on the metric. The ‘reference’ inner product is taken as the inner product that the ambient space is endowed with. In fact, the Riemannian gradient of a manifold-to-scalar function represents the rate of change of such function at a given point toward a given direction. The superscript

^{M}

may be removed for simplicity.

Let us survey the calculation of the Riemannian gradient in a number of spaces of interest in applications.

Hypercube: In the space $R^{p}$ endowed with the Euclidean metric, the Gateaux derivative of a regular function $f : R^{p} \to R$ is simply the column array of partial derivatives of function f with respect to the entries of the column array $x = {[x^{(1)} x^{(2)} x^{(3)} \dots x^{(p)}]}^{⊤}$ , namely

${grad}_{x}^{R^{p}} f = \partial_{x} f : = [\begin{matrix} \frac{\partial f}{\partial x^{(1)}} \\ \frac{\partial f}{\partial x^{(2)}} \\ ⋮ \\ \frac{\partial f}{\partial x^{(p)}} \end{matrix}] .$

(254)

Likewise, the Gateaux derivative of a regular function $f : R^{p \times q} \to R$ is the Jacobian matrix of partial derivatives of function f with respect to the entries of the matrix X, denoted by $X^{(i, j)}$ , namely

${grad}_{X}^{R^{p \times q}} f = \partial_{X} f : = [\begin{matrix} \frac{\partial f}{\partial X^{(1, 1)}} & \frac{\partial f}{\partial X^{(1, 2)}} & \dots & \frac{\partial f}{\partial X^{(1, q)}} \\ \frac{\partial f}{\partial X^{(2, 1)}} & \frac{\partial f}{\partial X^{(2, 2)}} & \dots & \frac{\partial f}{\partial X^{(2, q)}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial f}{\partial X^{(p, 1)}} & \frac{\partial f}{\partial X^{(p, 2)}} & \dots & \frac{\partial f}{\partial X^{(p, q)}} \end{matrix}] .$

(255)
Hypersphere: Given a regular function $f : S^{p - 1} \to R$ , its Riemannian gradient at $x \in S^{p - 1}$ is denoted as ${grad}_{x}^{S^{p - 1}} f \in T_{x} S^{p - 1}$ . The Riemannian gradient associated with the canonical metric has the expression:

${grad}_{x}^{S^{p - 1}} f = (I_{p} - x x^{⊤}) \partial_{x} f .$

(256)

In fact, the metric compatibility condition prescribes that $〈 {grad}_{x}^{S^{p - 1}} f, v 〉 = 〈 \partial_{x} f, v 〉$ , for every $v \in T_{x} S^{p - 1}$ ; hence, $〈 {grad}_{x}^{S^{p - 1}} f - \partial_{x} f, v 〉 = 0$ for every $v \in T_{x} S^{p - 1}$ , and therefore ${grad}_{x}^{S^{p - 1}} f - \partial_{x} f \in N_{x} S^{p - 1}$ . It follows that

${grad}_{x}^{S^{p - 1}} f = \partial_{x} f + λ x,$

(257)

for some $λ \in R$ . The tangency condition then entails ${({grad}_{x}^{S^{p - 1}} f)}^{⊤} x = 0$ , hence that ${(\partial_{x} f + λ x)}^{⊤} x = 0$ , namely $λ = - x^{⊤} \partial_{x} f$ , which gives the stated result.
Special orthogonal group: Let us compute the Riemannian gradient of a regular function $f : SO (p) \to R$ . Let the manifold $SO (p)$ be equipped with its canonical metric. Let ${grad}_{X}^{SO (p)} f$ denote the gradient vector of a function f at $R \in SO (p)$ derived from the canonical metric. According to the compatibility condition for the Riemannian gradient it must hold that:

$〈 V, {grad}_{R}^{SO (p)} f 〉 = 〈 V, \partial_{R} f 〉, \forall V \in T_{R} SO (p)$

and therefore:

$〈 V, {grad}_{R}^{SO (p)} f - \partial_{R} f 〉 = 0, \forall V \in T_{R} SO (p) .$

This implies that the quantity ${grad}_{R}^{SO (p)} f - \partial_{R} f$ belongs to the normal space $N_{R} SO (p)$ , namely:

${grad}_{R}^{SO (p)} f = \partial_{R} f + R S, w i t h S b e i n g p \times p s y m m e t r i c .$

(258)

In order to determine the unknown matrix S, we may exploit the tangency condition, namely ${({grad}_{R}^{SO (p)} f)}^{⊤} R + R^{⊤} ({grad}_{R}^{SO (p)} f) = 0_{p}$ . Let us first pre-multiply both sides of Equation (258) by $R^{⊤}$ , which gives:

$R^{⊤} {grad}_{R}^{SO (p)} f = R^{⊤} \partial_{R} f + S .$

The above equation, transposed hand by hand, gives:

${({grad}_{R}^{SO (p)} f)}^{⊤} R = \partial_{R}^{⊤} f R + S .$

Hand-by-hand summation of the last two equations gives:

$R^{⊤} \partial_{R} f + \partial_{R}^{⊤} f R = - 2 S,$

that is:

$S = - \frac{1}{2} (R^{⊤} \partial_{R} f + \partial_{R}^{⊤} f R) .$

(259)

By plugging the expression (259) into the expression (258), we obtain the Riemannian gradient in the orthogonal group, corresponding to its canonical metric:

${grad}_{R}^{SO (p)} f = \frac{1}{2} (\partial_{R} f - R \partial_{R}^{⊤} f R) .$

(260)
Stiefel manifold: Let us compute the expression of the Riemannian gradient of a regular function $f : St (n, p) \to R$ at a point $X \in St (n, p)$ corresponding to the Euclidean and the canonical metrics. Recall that the Riemannian gradient ${grad}_{X}^{St (n, p)} f$ in a Stiefel manifold embedded in the Euclidean space $R^{n \times p}$ is the unique matrix in $T_{X} St (n, p)$ such that:

$tr (U^{⊤} \partial_{X} f) = {〈 U, {grad}_{X} f 〉}_{X}, \forall U \in T_{X} St (n, p) .$

(261)
- Euclidean metric: The metric compatibility condition becomes $tr (U^{⊤} (\partial_{X} f -$ ${grad}_{X}^{St (n, p)} f)) = 0$ , which implies that ${grad}_{X}^{St (n, p)} f = \partial_{X} f + X S$ , with S being $p \times p$ symmetric. Pre-multiplying both sides of this equation by the matrix $X^{⊤}$ yields $S + X^{⊤} \partial_{X} f = X^{⊤} {grad}_{X}^{St (n, p)} f$ . Transposing both hands of the above equation and summing hand by hand yields:
  
  $S = - \frac{1}{2} (\partial_{X}^{⊤} f X + X^{⊤} \partial_{X} f) + \frac{1}{2} ({({grad}_{X}^{St (n, p)} f)}^{⊤} X + X^{⊤} {grad}_{X}^{St (n, p)} f) .$
  
  (262)
  
  From the condition ${grad}_{X}^{St (n, p)} f \in T_{X} St (n, p)$ , according to Equation (46), it follows that ${({grad}_{X}^{St (n, p)} f)}^{⊤} X + X^{⊤} {grad}_{X}^{St (n, p)} f = 0$ . In conclusion, the sought Riemannian gradient reads:
  
  ${grad}_{X}^{St (n, p)} f = \partial_{X} f - \frac{1}{2} X (\partial_{X}^{⊤} f X + X^{⊤} \partial_{X} f) .$
  
  (263)
- Canonical metric: The metric compatibility condition prescribes that:
  
  $tr (U^{⊤} (\partial_{X} f - (I_{n} - \frac{1}{2} X X^{⊤}) {grad}_{X}^{St (n, p)} f)) = 0, \forall U \in T_{X} St (n, p),$
  
  (264)
  
  and therefore, invoking the tangency condition as well, it turns out that:
  
  $\{\begin{matrix} \partial_{X} f - (I_{n} - \frac{1}{2} X X^{⊤}) {grad}_{X}^{St (n, p)} f = X S, \\ S = \frac{1}{2} (X^{⊤} \partial_{X} f + \partial_{X}^{⊤} f X) . \end{matrix}$
  
  (265)
  
  Solving for the Riemannian gradient yields the final expression:
  
  ${grad}_{X}^{St (n, p)} f = \partial_{x} f - X \partial_{X}^{⊤} f X .$
  
  (266)
Real symplectic group: According to the two considered metrics, the expression of the gradient may be computed as outlined below.
- KM metric: The structure of the pseudo-Riemannian gradient of a regular function $f : Sp (2 n) \to R$ associated with the Khvedelidze–Mladenov metric (115) is given by
  
  $\nabla_{Q} f = \frac{1}{2} Q J (Q^{⊤} \partial_{Q} f J - J \partial_{Q}^{⊤} f Q) .$
  
  (267)
  
  In fact, the pseudo-Riemannian gradient of a regular function $f : Sp (2 n) \to R$ associated with the metric (115) is computed as the solution of the following system of equations:
  
  $\{\begin{matrix} tr (Q^{- 1} {grad}_{Q} f Q^{- 1} V) = tr (\partial_{Q}^{⊤} f V), \forall V \in T_{Q} Sp (2 n), \\ {grad}_{Q}^{⊤} f J Q + Q^{⊤} J {grad}_{Q} f = 0 . \end{matrix}$
  
  (268)
  
  The first constraint ensures the compatibility of the pseudo-Riemannian gradient with the chosen metric, while the second constraint enforces the requirement for the pseudo-Riemannian gradient to lay on a specific tangent space. In particular, the metric compatibility condition may be recast as:
  
  $tr (V^{⊤} (\partial_{Q} f - Q^{- ⊤} {grad}_{Q}^{⊤} f Q^{- ⊤})) = 0 .$
  
  (269)
  
  The above condition implies that $\partial_{Q} f - Q^{- ⊤} {grad}_{Q}^{⊤} f Q^{- ⊤} \in N_{Q} Sp (2 n)$ , and hence that $\partial_{Q} f - Q^{- ⊤} {grad}_{Q}^{⊤} f Q^{- ⊤} = J Q H$ with $H \in so (2 n)$ . Therefore, the pseudo-Riemannian gradient of the criterion function f has the expression:
  
  $\nabla_{Q} f = Q \partial_{Q}^{⊤} f Q - Q H J .$
  
  (270)
  
  In order to determine the value of the unknown skew-symmetric matrix H, it is sufficient to plug the expression (270) of the gradient within the tangency condition, which becomes:
  
  ${(Q \partial_{Q}^{⊤} f Q - Q H J)}^{⊤} J Q + Q^{⊤} J (Q \partial_{Q}^{⊤} f Q - Q H J) = 0 .$
  
  (271)
  
  Solving for H gives:
  
  $H = - \frac{1}{2} (J Q^{⊤} \partial_{Q} f + \partial_{Q}^{⊤} f Q J) .$
  
  (272)
  
  Plugging the above expression into (270) gives the result (267).
- Euclidean metric: The structure of the Riemannian gradient $\nabla_{Q} f$ of a regular function $f : Sp (2 n) \to R$ corresponding to the metric (116) reads:
  
  $\nabla_{Q} f = \frac{1}{2} Q J (\partial_{Q}^{⊤} f Q J - J Q^{⊤} \partial_{Q} f) .$
  
  (273)
  
  In fact, the Riemannian gradient $\nabla_{Q} f$ must satisfy the conditions:
  
  $\{\begin{matrix} \nabla_{Q} f = Q J (H - Q^{- 1} J \partial_{Q} f), \\ H = \frac{1}{2} (Q^{- 1} J \partial_{Q} f + \partial_{Q}^{⊤} f Q J), \end{matrix}$
  
  (274)
  
  from which the expression of the Riemannian gradient associated with the metric (116) follows.
Manifold of symmetric, positive-definite matrices: The Riemannian gradient ${grad}_{P} f$ of the function $f : S^{+} (p) \to R$ may be calculated as the unique vector in $T_{P} S^{+} (p)$ that satisfies the following equation:

$tr (V \partial_{P} f) = tr (V P^{- 1} ({grad}_{P}^{S^{+} (p)} f) P^{- 1}), \forall V \in T_{P} S^{+} (n) .$

(275)

The solution of the above equation satisfies:

$\partial_{P} f - P^{- 1} ({grad}_{P}^{S^{+} (p)} f) P^{- 1} = \frac{1}{2} (\partial_{P} f - \partial_{P}^{⊤} f),$

(276)

and hence the expression of the sought Riemannian gradient follows:

${grad}_{P}^{S^{+} (p)} f = \frac{1}{2} P (\partial_{P} f + \partial_{P}^{⊤} f) P .$

(277)
Grassmann manifold: The Riemannian gradient ${grad}_{[X]} f$ may be calculated by its definition and reads as:

$\nabla_{[X]}^{Gr (n, p)} f = (I_{n} - X X^{⊤}) \partial_{X} f .$

(278)

The above calculations may be conveniently unified by recalling the notion of metric kernel, which affords turning the metric compatibility condition into a projection. Let us recall that the inner product

{〈 v, u 〉}_{x}

may be written as

{〈 v, G_{x} (u) 〉}^{A}

. On the basis of such an equality, the metric compatibility condition may be rewritten as

{〈 G_{x} ({grad}_{x}^{M} f), v 〉}^{A} = {〈 \partial_{x} f, v 〉}^{A} f o r e v e r y v \in T_{x} M .

(279)

Equivalently,

{〈 G_{x} ({grad}_{x}^{M} f) - \partial_{x} f, v 〉}^{A} = 0

for every

v \in T_{x} M

. By the definition of a normal space, such a condition may be rewritten as

G_{x} ({grad}_{x}^{M} f) - \partial_{x} f \in N_{x} M .

(280)

In practical terms, the Gateaux derivative

\partial_{x} f \in A

may be uniquely decomposed into the sum of two terms, a tangent one (that it, the gradient) and a normal one. Upon discarding the normal component, the remainder is the sought gradient.

The latter observation may be expressed using the notion of orthogonal projection. Let us define an orthogonal projector

Π_{x} : A \to T_{x} M

as an operator that maps an element of the ambient space to a tangent vector in a specific tangent space. Let us observe that

Π_{x} (a) \in T_{x} M \Leftrightarrow a - Π_{x} (a) \in N_{x} M .

(281)

The projection operator

Π_{x} : A \to T_{x} M

is defined by the two conditions:

Tangency: $Π_{x} (a) \in T_{x} M$ for $x \in M$ and $a \in A$ .
Complementarity: ${〈 v, a - Π_{x} (a) 〉}^{A} = 0$ for all $v \in T_{x} M$ .

Notice that if

ν \in N_{x} M

then

Π_{x} (ν) = 0

, while if

v \in T_{x} M

then

Π_{x} (v) = v

.

Observation 5.

Orthogonal projection stems from a minimization problem, namely, given a point

a \in A

and a vector space

V

, the orthogonal projection of a on

V

is the shortest vector in

V

that approximates a, namely

min_{v \in V} {〈 a - v, a - v 〉}^{A} .

(282)

Applying orthogonal projection to both sides of the relationship (280) leads to

Π_{x} (G_{x} ({grad}_{x}^{M} f)) - Π_{x} (\partial_{x} f) = 0,

(283)

but since

Π_{x} (G_{x} ({grad}_{x}^{M} f)) = G_{x} ({grad}_{x}^{M} f)

, we may write the explicit formula

{grad}_{x}^{M} f = G_{x}^{- 1} (Π_{x} (\partial_{x} f)) .

(284)

The latter expression is quite suggestive, as it states that the Riemannian gradient of a function may be computed as the projection of its Gateaux derivative over a tangent space, to make it a tangent vector, further compensated by the metric kernel to take into account the effect of the metric (namely, to make it sure that the inner product of gradient with any tangent vector is independent of the metric). Notice that

{〈 {grad}_{x}^{M} f, v 〉}_{x} = {〈 G_{x}^{- 1} (Π_{x} (\partial_{x} f)), G_{x} (v) 〉}^{A}

. Since

G_{x}

is self-adjoint, such expression simplifies into

{〈 Π_{x} (\partial_{x} f), v 〉}^{A}

which is, in fact, independent of the metric kernel.

Let us summarize a few expressions of interest of orthogonal projection.

Hypersphere: An expression of an orthogonal projector $Π_{x} : R^{n} \to T_{x} S^{p - 1}$ , for $x \in S^{p - 1}$ , where the ambient space $R^{p}$ is endowed with a Euclidean metric, is:

$Π_{x} (a) : = (I_{p} - x x^{⊤}) a .$

(285)

Notice that $a - Π_{x} (a) = (x^{⊤} a) x$ is radial (that is, directed along x) and hence normal.
Stiefel manifold: It might be useful to define an orthogonal projection operator $Π_{X} : R^{n \times p} \to T_{X} St (n, p)$ , for $X \in St (n, p)$ . Let us assume the ambient space $A : = R^{n \times p}$ be endowed with a Euclidean metric ${〈 U, V 〉}^{A} : = tr (U^{⊤} V)$ . In this case, the orthogonal projection takes the expression

$Π_{X} (U) : = U - \frac{1}{2} X (X^{⊤} U + U^{⊤} X),$

(286)
Grassmann manifold: An expression of orthogonal projection $Π_{[X]} : R^{n \times p} \to T_{[X]} Gr$ $(n, p)$ , for $[X] \in Gr (n, p)$ is

$Π_{[X]} (V) : = (I_{n} - X X^{⊤}) V .$

(287)

Let us apply the above considerations to a quadratic function on the hypersphere.

Example 25.

Let us consider the function

f : S^{n - 1} \to R

defined as:

f (x) : = \frac{1}{2} x^{⊤} P_{0} x,

(288)

with

P_{0} \in S^{+} (n)

being constant. We assume the hypersphere to be embedded in the ambient space

R^{n}

endowed with a Euclidean metric. The Euclidean gradient (and also the Gateaux derivative) of such a function reads as

\frac{\partial f}{\partial x} = P_{0} x .

(289)

Since, on the hypersphere,

Π_{x} (a) = (I - x x^{⊤}) a

, the Riemannian gradient of the function f takes the expression

{grad}_{x} f = (I_{n} - x x^{⊤}) P_{0} x = P_{0} x - (x^{⊤} P_{0} x) x .

(290)

■

Let us verify that the expression (286) is indeed an orthogonal projection over the tangent bundle

T St (n, p)

.

Example 26.

The expression

Π_{X} (U) : = U - \frac{1}{2} X (X^{⊤} U + U^{⊤} X)

realizes an orthogonal projection over the tangent space

T_{X} St (n, p)

. To prove such a statement, it suffices to show that

Π_{X} (U) \in T_{X} St (n, p)

and that

{〈 U - Π_{X} (U), V 〉}^{A} = 0

for every

V \in T_{X} St (n, p)

, which represent the conditions of tangency and of orthogonality, respectively. To what concerns the first property:

\begin{matrix} X^{⊤} Π_{X} (U) + Π_{X}^{⊤} (U) X \\ = X^{⊤} U + U^{⊤} X - \frac{1}{2} (X^{⊤} X) (X^{⊤} U + U^{⊤} X) - \frac{1}{2} (X^{⊤} U + U^{⊤} X) (X^{⊤} X) \\ = 0 . \end{matrix}

(291)

To what concerns the second property, notice that

\begin{matrix} {〈 U - Π_{X} (U), V 〉}^{A} \\ = tr (V^{⊤} (U - U + \frac{1}{2} X (X^{⊤} U + U^{⊤} X)) \\ = \frac{1}{2} tr (V^{⊤} X (X^{⊤} U + U^{⊤} X)) . \end{matrix}

(292)

Now observe that

H : = V^{⊤} X

is skew-symmetric, while

S : = X^{⊤} U + U^{⊤} X

is symmetric, for every

X \in St (n, p)

,

U \in A

and

V \in T_{X} St (n, p)

. Since

tr (H S) \equiv 0

, the statement holds true. ■

A further interesting counterexample clarifies the notion of orthogonality in orthogonal projection.

Example 27.

Let us consider the function

Π_{X} (U) : = U - X U^{⊤} X

which realizes a projection over the tangent bundle

T St (n, p)

but not an orthogonal one as long as the Euclidean metric is concerned. Showing that it indeed realizes a projection is fairly easy; in fact,

\begin{matrix} X^{⊤} Π_{X} (U) + Π_{X}^{⊤} (U) X \\ = X^{⊤} U - U^{⊤} X + U^{⊤} X - X^{⊤} U \\ = 0 . \end{matrix}

(293)

Likewise, we can show that such a projection isnotorthogonal in the Euclidean metric; in fact,

\begin{matrix} {〈 U - Π_{X} (U), V 〉}^{A} \\ = tr (V^{⊤} (U - U + X U^{⊤} X)) \\ = tr ((V^{⊤} X) (U^{⊤} X)) . \end{matrix}

(294)

Since the matrix product

U^{⊤} X

is arbitrary, the latter expression is not guaranteed to be null. ■

As a last example, let us consider the case of the symplectic group endowed with its canonical metric. The orthogonal projection is not easy to express in closed form.

Example 28.

Let us consider the symplectic group

Sp (2 n)

embedded into the ambient space

A : = R^{2 n \times 2 n}

endowed with a Euclidean metric

{〈 A, B 〉}^{A} : = tr (A^{⊤} B)

. By definition of orthogonal projection,

Π_{Q} (A)

must be an element of the tangent space

T_{Q} Sp (2 n)

, while

A - Π_{Q} (A)

must be an element of the normal space

N_{Q} Sp (2 n)

. According to the characterization (49), it should then hold

A - Π_{Q} (A) = J Q H,

(295)

namely

Π_{Q} (A) = A - J Q H

, with H skew-symmetric to be determined. From the tangency condition, it follows that

{(A - J Q H)}^{⊤} J Q + Q^{⊤} J (A - J Q H) = 0 .

(296)

Through some manipulations, the above equation may be rewritten as

H Q^{⊤} Q + Q^{⊤} Q H = - A^{⊤} J Q + Q^{⊤} J^{⊤} A .

(297)

All quantities in the above equation are known except for the unknown skew-symmetric matrix H. The above equation is an instance of the more general Sylvester equation (for a review, see, e.g., [55]).

According to [55], there exist a number of expressions to write the solution to such an equation. One solution is based on the Kronecker operations. In particular, in this case we can use the Kronecker sum

I_{2 n} \oplus Q^{⊤} Q

and the vectorization operator ‘

vec

’ to rewrite Equation (297) as

(I_{2 n} \oplus Q^{⊤} Q) vec (H) = vec (- A^{⊤} J Q + Q^{⊤} J^{⊤} A) .

(298)

The orthogonal projection operation may therefore be expressed as

Π_{Q} (A) = A - J Q {vec}^{- 1} {{(I_{2 n} \oplus Q^{⊤} Q)}^{- 1} vec (- A^{⊤} J Q + Q^{⊤} J^{⊤} A)} .

(299)

A more elegant expression stems from an integral representation of the solution of the Sylvester Equation (297), namely

Π_{Q} (A) = A + J Q \int_{0}^{+ \infty} Exp (Q^{⊤} Q t) (- A^{⊤} J Q + Q^{⊤} J^{⊤} A) Exp (Q^{⊤} Q t) d t .

(300)

For the integral representation of the solution of a Sylvester equation, we refer the readers to the aforementioned review paper [55]. We mention that none of these solutions are efficient from a computational point of view.

9.2. Application of Riemannian Gradient to Optimization on Manifold

An important application of the Riemannian gradient, which is of specific interest in system theory and non-linear control, is gradient-based mathematical optimization, which we are going to briefly survey in the remainder of the present section.

Let us preliminarily observe that the relationship (250) that defines the notion of Riemannian gradient possess a further interpretation. Let us take a regular function

f : M \to R

, a regular curve

γ : [- ϵ, ϵ] \to M

and a metric

〈 \cdot, \cdot 〉

in

T M

. The functions f and

γ

may be composed as

f \circ γ : [- ϵ, ϵ] \to R

. Therefore, from the relation (250), it follows that:

\frac{d}{d t} f (γ (t)) = {〈 {grad}_{γ (t)} f, \dot{γ} (t) 〉}_{γ (t)}

(301)

along the curve

γ

. As a matter of fact, such a relation is a counterpart of the familiar rule of derivation for composite functions

\frac{d}{d t} f (g (t)) = \frac{d f}{d g} (g (t)) \frac{d g}{d t} (t) .

(302)

To determine the maximum of a function on a manifold it is possible to make use of a gradient method. Such a method is based on the following important property: given a regular function

f : M \to R

, the solution

x (t)

of the following initial value problem

\dot{x} (t) = \pm {grad}_{x (t)} f, x (0) = x_{0} \in M,

(303)

is such that

lim_{t \to + \infty} x (t) = x_{★}

(304)

where

x_{★}

denotes a point of local minimum of the function f near

x_{0}

if one chooses the sign ‘−’ in front of the gradient; otherwise,

x_{★}

denotes a local maximum of the function f near

x_{0}

if one chooses the sign ‘+’ in front of the gradient.

For a function with domain

R^{n}

, the extremal points are among those that make its partial derivatives vanish to zero. A similar result holds for functions whose domain is a smooth manifold. Given a differentiable function

f : M \to R

, the extremal points are those

x \in M

for which:

{grad}_{x} f = 0 .

(305)

This observation may be exploited

To determine those points $x_{★}$ to which the solutions of the differential Equation (303) will tend toward;
Considering that Equation (305) is often non-linear and difficult to solve in closed form, while the differential Equation (303) may be solved (albeit approximately) by numerical recipes, the differential Equation (303) may be considered as a way to solve an equation such as (305).

On the basis of the relation (301), it is quite straightforward to show the following fundamental result.

Theorem 4.

The differential equation

\dot{x} (t) = - {grad}_{x (t)} f, x (0) = x_{0} \in M,

(306)

generates trajectories toward points of the domain corresponding to ever-decreasing values of the function f.

Proof.

Let us observe that

\dot{f} = {〈 {grad}_{x} f, \dot{x} 〉}_{x} = {〈 {grad}_{x} f, - {grad}_{x} f 〉}_{x} = - {∥ {grad}_{x} f ∥}_{x}^{2} \leq 0,

(307)

where equality holds if and only if

{grad}_{x} f = 0

. □

In a fully analogous way, upon choosing the sign + in front of the gradient, the corresponding differential equation generates trajectories tending toward ever-increasing values of the function f. Notice that the above facts are completely independent of the initial value

x_{0}

and from the chosen metric.

In practical terms, it is safe to assume that the function f admits maxima and minima. For example, in the case of a continuous function f defined on a compact manifold, such as the hypersphere

S^{n - 1}

, a theorem by Weierstrass ensures that f admits at least a maximum and a minimum. (In fact, the Weierstrass extreme value theorem holds for topological spaces which any manifold is an instance of.)

To summarize, the differential Equation (303) may be exploited to look for the extremal points of a function:

The dynamical system $\dot{x} (t) = + {grad}_{x (t)} f, x (0) = x_{0} \in M$ generates a trajectory in the state space $M$ that tends toward a point of maximum of the function f located near the initial state $x_{0}$ ,
Conversely, the dynamical system $\dot{x} (t) = - {grad}_{x (t)} f, x (0) = x_{0} \in M$ generates a trajectory in the space $M$ that tends toward a point of minimum of the function f located near the initial state $x_{0}$ .

The locality property of such a gradient-based extremum search is due to the monotonicity property (307), which essentially shows that the trajectory of such systems fall in the basin of attraction of the extremal point that the initial condition also belongs to. In practice, the choice of an initial condition (also termed, in this context, initial guess), is a sensitive step that determines the success or the failure of extremal-point searching. In fact, a function f might possess several extremal points (namely, local maxima and minima). It pays to keep in mind that the extremal-point searching method based on the first-order dynamical systems (303) is able to determine only an extremal point at a time and, in particular, only the one nearest to the initial guess

x_{0}

. It is therefore important to put into effect some pre-estimation technique to initialize the search procedure correctly.

From an algorithmic point of view, as opposed to what is required in numerical simulation where precision is a common demand, in the case of optimization by a gradient method, only the last point of the trajectory, namely

x_{★}

, needs to be approximated with good precision, and hence the employed numerical method does not need to be of a high order.

Example 29.

Let us consider Oja’s equation [56,57]

\dot{x} (t) = (I_{n} - x (t) x^{⊤} (t)) P_{0} x (t), x (0) = x_{0} \in S^{n - 1},

(308)

with

P_{0} \in S^{+} (n)

. As we have seen in the previous example, the right-hand side of Oja’s equation coincides with the Riemannian gradient of the function (often termed ‘Rayleigh quotient’)

f (x) : = \frac{1}{2} x^{⊤} P_{0} x

over the unit hypersphere endowed with its canonical metric. Such a function is quadratic, continuous and differentiable and is defined over a compact space; hence, it admits a maximum over the space

S^{n - 1}

. Indeed, it admits n local maxima. The differential Equation (308) hence affords determining one of its local maxima depending on the initial condition.

The extremal points are those that make the gradient

{grad}_{x} f = P_{0} x - (x^{⊤} P_{0} x) x

vanish to zero, namely the solutions

x \in S^{n - 1}

to the equation

P_{0} x = (x^{⊤} P_{0} x) x .

(309)

Clearly, the solutions

x_{★}

coincide with the eigenvectors of the symmetric, positive-definite matrix

P_{0}

. Let us observe that the eigenvalue associated with the eigenvector

x_{★}

is

λ_{★} : = x_{★}^{⊤} P_{0} x_{★}

; hence,

f (x_{★}) = \frac{1}{2} λ_{★}

. Since the dynamical system (308) looks for the maximal value of the function f, Oja’s equation generates a trajectory that tends toward the eigenvector of

P_{0}

corresponding to its maximal eigenvalue. ■

The differential equations of the kind

\dot{x} (t) = {grad}_{x (t)} f,

(310)

represent an instance of dynamical systems of the first order, on a manifold, of the gradient type. (Notice that to switch the sign in front of the gradient, it suffices to take the function

- f

instead of f. Likewise, to rescale the gradient of a factor

η > 0

, it suffices to take

η f

instead of f.)

9.3. A Golden Gradient Rule: Gradient of Squared Distance

A golden calculation rule involving the notion of Riemannian gradient and that of Riemannian distance is as follows:

{grad}_{x} d^{2} (x, y) = - 2 {log}_{x} y .

(311)

The formal proof of this result which, of course, holds under technical conditions, traces back to [53]. It is instructive to verify this property in a couple of cases.

Example 30.

Let us verify the calculation rule (311) for the familiar case that

M = R^{p}

. We have already shown that

d^{2} (x, y) = {∥ {log}_{x} y ∥}^{2}

. In the case of the hypercube

R^{p}

, endowed with a Euclidean metric, it holds that

{log}_{x} y = y - x

; therefore,

d^{2} (x, y) = {∥ x - y ∥}^{2}

, the familiar Euclidean distance in

R^{p}

. Now, let us observe that

\frac{{\partial ∥ x - y ∥}^{2}}{\partial x} = 2 (x - y),

(312)

and henceforth

\frac{\partial d^{2} (x, y)}{\partial x} = - 2 (y - x) = - 2 {log}_{x} y .

(313)

In addition, we can verify such a property for the hypersphere

S^{p - 1}

. Let us recall that

d^{2} (x, y) = \arccos^{2} (x^{⊤} y)

; therefore, it holds that

\frac{\partial d^{2} (x, y)}{\partial x} = 2 d (x, y) (- \frac{1}{\sqrt{1 - {(x^{⊤} y)}^{2}}}) y .

(314)

By the calculation rule for the gradient on the unit hypersphere, we obtain

{grad}_{x} d^{2} (x, y) = - \frac{2 d (x, y)}{sin d (x, y)} (I_{p} - x x^{⊤}) y = - 2 \frac{(I_{p} - x x^{⊤}) y}{sinc d (x, y)}

(315)

which coincides precisely with

- 2 {log}_{x} y

on the unit hypersphere. ■

For those who are mathematically minded, it is interesting to read a sketch of proof of the golden rule (311) based on properties that we have already surveyed.

Theorem 5.

Let us consider a Riemannian manifold

(M, 〈 \cdot, \cdot 〉

) embedded in an ambient space

(A, {〈 \cdot, \cdot 〉}^{A})

and a pair of points

x, y \in M

such that

d (x, y)

and

{log}_{x} y

are well defined. The Riemannian gradient of the squared distance

d^{2} (x, y)

with respect to the variable x equals

- 2 {log}_{x} y

.

Proof.

Let us define the following functions:

An arbitrary sufficiently smooth curve $α_{x} (t)$ such that $α_{x} (0) = x$ , while ${\dot{α}}_{x} (t)$ is arbitrary;
The fundamental form $F (x, v) : = {〈 v, v 〉}_{x}$ ;
A geodesic curve $c (t, s)$ connecting the point y to the point $α_{x} (t)$ , emanating from the former, with parameter $s \in [0, 1]$ ;
The partial derivatives $\dot{c} (t, s) : = \frac{\partial c}{\partial t} (t, s)$ and $c^{'} (t, s) : = \frac{\partial c}{\partial s} (t, s)$ .

In addition, notice that the following properties hold:

(P₁): $c (t, 0) = y$ , therefore $\dot{c} (t, 0) = 0$ ;
(P₂): $c (t, 1) = α_{x} (t)$ , therefore $\dot{c} (t, 1) = {\dot{α}}_{x} (t)$ ;
(P₃): $c^{'} (t, 1) = - {log}_{α_{x} (t)} y$ . (In fact, notice that if $γ_{y}^{x} : [0, 1] \to M$ denotes a geodesic from x to y then it holds that ${\dot{γ}}_{x}^{y} (0) = {log}_{y} x$ and $γ_{y}^{x} (s) = γ_{x}^{y} (1 - s)$ ; therefore, ${\dot{γ}}_{y}^{x} (1) = - {\dot{γ}}_{x}^{y} (0) = - {log}_{y} x$ .)

Let us recall that the squared Riemannian distance between two points may be written in terms of the fundamental form, namely

d^{2} (α_{x} (t), y) = F (c (t, s), c^{'} (t, s)),

(316)

where the right-hand side of the above equation is independent of s, as we have shown in Section 8. Let us integrate both sides of the above equation with respect to s in the interval

[0, 1]

. This gives:

d^{2} (α_{x} (t), y) = \int_{0}^{1} F (c (t, s), c^{'} (t, s)) d s .

(317)

Now, let us compute the derivative of both sides with respect to the parameter t:

\begin{matrix} \frac{d}{d t} d^{2} (α_{x} (t), y) = & \int_{0}^{1} \frac{d}{d t} F (c (t, s), c^{'} (t, s)) d s \\ = & \int_{0}^{1} {〈\frac{\partial F}{\partial c} (t, s), \frac{\partial c}{\partial t} (t, s)〉}^{A} d s + \int_{0}^{1} {〈\frac{\partial F}{\partial c^{'}} (t, s), \frac{\partial c^{'}}{\partial t} (t, s)〉}^{A} d s . \end{matrix}

(318)

The second integral may be rewritten through the rule of integration by parts upon observing that

\frac{\partial c^{'}}{\partial t} = \frac{\partial \dot{c}}{\partial s}

(Schwarz) and recalling that

\frac{\partial F (x, v)}{\partial v} = 2 G_{x} (v)

, where G denotes the metric kernel associated with the metrics in

M

and

A

. The application of such rule gives:

\begin{matrix} \int_{0}^{1} {〈\frac{\partial F}{\partial c^{'}} (t, s), \frac{\partial c^{'}}{\partial t} (t, s)〉}^{A} d s = & {{〈 2 G_{x} (c^{'} (t, s)), \dot{c} (t, s) 〉}^{A}|}_{s = 0}^{1} \\ - \int_{0}^{1} {〈\frac{d}{d t} \frac{\partial F}{\partial c^{'}} (t, s), \dot{c} (t, s)〉}^{A} d s . \end{matrix}

(319)

Putting the pieces together gives:

\begin{matrix} \frac{d}{d t} d^{2} (α_{x} (t), y) = & \int_{0}^{1} {〈\frac{\partial F}{\partial c} (t, s) - \frac{d}{d t} \frac{\partial F}{\partial c^{'}} (t, s), \dot{c} (t, s)〉}^{A} d s \\ + {{〈 2 c^{'} (t, s), \dot{c} (t, s) 〉}_{c (t, s)}|}_{s = 0}^{1} \end{matrix}

(320)

Since

c (\cdot, s)

is a geodesic, the first integral is null due to the differential inclusion (153); therefore, we rest with

\frac{d}{d t} d^{2} (α_{x} (t), y) = {〈 2 c^{'} (t, 1), \dot{c} (t, 1) 〉}_{c (t, 1)} - {〈 2 c^{'} (t, 0), \dot{c} (t, 0) 〉}_{c (t, 0)} .

(321)

Now, due to property

P_{1}

, the second term on the right-hand side of the above equality is null; hence, thanks to properties

P_{2}

–

P_{3}

, the above equality simplifies to

\frac{d}{d t} d^{2} (α_{x} (t), y) = 2 {〈 - {log}_{α_{x} (t)} y, {\dot{α}}_{x} (t) 〉}_{α_{x} (t)} .

(322)

Setting

t = 0

and recalling the definition of gradient (249) proves the golden rule. □

Let us emphasize how useful such a result is, since in non-linear control a number of functions to differentiate are based on distances. Let us apply the golden calculation rule to a non-linear control problem known as consensus optimization.

Example 31.

A category of dynamical systems of gradient type on manifolds are those associated withconsensus optimizationamong a set of agents [58,59]. Let us consider, as an example, a dynamical system that affords reaching attitudinal consensus in a fleet of flying objects (such as drones).

The attitude of a flying agent (namely, its orientation in space) may be represented by a rotation matrix

R \in SO (3)

that summarizes the three attitudinal coordinatesroll,pitchandyaw. A fleet of N flying agents may be described as a set of N rotation matrices

R_{i} \in SO (3)

with

i = 1, 2, 3, \dots, N

.

The attitude of each agent in the fleet is then described by a time-function

R_{i} (t)

. Consensus optimization may be formulated as follows:

Given a metric for the manifold $SO (3)$ , it is possible to define a distance function $d : SO (3) \times SO (3) \to R_{0}^{+}$ between any pair of attitude matrices in $SO (3)$ ;
A hierarchy is established among the agents in a fleet through a set of weights $w_{i j} \geq 0$ ;
Then, consensus optimization consists of determining an evolution law for each agent to minimize the distance between any pair of attitude matrices weighted according to the assigned hierarchy, namely, to minimize the function

$f_{i} : = \frac{1}{2} \sum_{j = 1}^{N} w_{i j} d^{2} (R_{i}, R_{j}), w i t h i = 1, 2, 3, \dots, N .$

(323)

Notice that $d (R_{i}, R_{i}) = 0$ ; hence, the values assigned to the diagonal coefficients $w_{i i}$ are unimportant.

On the basis of the calculation rule (311), it turns out that

{grad}_{R_{i}} f_{i} = - \sum_{j = 1}^{N} w_{i j} {log}_{R_{i}} R_{j} .

(324)

The set of gradient-type control systems to achieve consensus optimization within the fleet reads:

{\dot{R}}_{i} (t) = \sum_{j = 1}^{N} w_{i j} {log}_{R_{i} (t)} R_{j} (t), R_{i} (0) = R_{i}^{0} \in SO (3),

(325)

with

i = 1, 2, 3, \dots, N

and

t \geq 0

. (Notice that

{log}_{R} R = 0

.) ■

9.4. Riemannian Gradient in Coordinates*

Let

f : M \to R

denote a smooth manifold-to-real function, whose gradient at a point

x \in M

is sought. Let us fix a coordinate system on a local chart that includes the point x whose local coordinates are

x^{1}, x^{2}, \dots x^{p}

.

In order to determine the gradient

{grad}_{x} f

, let us make use of the metric compatibility condition, recalling that

The Euclidean inner product reads $〈 \partial_{x} f, v 〉 = \frac{\partial f}{\partial x^{k}} v^{k}$ .
The inner product reads ${〈 {grad}_{x} f, v 〉}_{x} = g_{i k} {({grad}_{x} f)}^{i} v^{k}$ .

The metric compatibility condition requires that

\frac{\partial f_{x}}{\partial x^{k}} v^{k} = g_{i k} {({grad}_{x} f)}^{i} v^{k} for every choice of v,

(326)

hence,

g_{i k} {({grad}_{x} f)}^{i} = \frac{\partial f_{x}}{\partial x^{k}}

. Multiplying both sides by

g^{h k}

one obtains

g^{h k} g_{i k} {({grad}_{x} f)}^{i} = g^{h k} \frac{\partial f_{x}}{\partial x^{k}}

. Since

g^{h k} g_{i k} = δ_{i}^{h}

, we obtain

{({grad}_{x} f)}^{h} = g^{h k} \frac{\partial f_{x}}{\partial x^{k}} .

(327)

The Riemannian gradient in intrinsic coordinates is therefore expressed as

{grad}_{x} f = g^{h k} \frac{\partial f_{x}}{\partial x^{k}} \partial_{h}^{x} .

(328)

In manifold calculus, the Riemannian gradient has yet another interpretation in relation to the differential of a manifold-to-real function and to so-called musical isomorphisms. In fact, there exist canonical operators to convert contravariant components to covariant (also referred to as lowering an index) or vice versa (raising an index).

Given a regular function

f : M \to R

, its differential

d f : T M \to T^{*} M

is expressed by:

d f_{x} (v) = {〈 {grad}_{x} f, v 〉}_{x}, v \in T_{x} M,

(329)

where

{grad}_{x} f : M \to T_{x} M

denotes its Riemannian gradient. Namely, the differential represents the linear part of the change of a function when moving away from a point x in the direction v. In local coordinates:

d f_{x} (v) = g_{i j} (x) {(\nabla_{x} f)}^{i} v^{j} .

(330)

Since the differential does not depend on the choice of coordinates, it must hold that

d f_{x} (v) = {(\partial_{i} f)}_{x} v^{i}

. It follows the metric-compatibility condition

g_{i j} (x) {(\nabla_{x} f)}^{j} = {(\partial_{i} f)}_{x}

. Such expressions may be interpreted through ‘musical’ isomorphisms:

Sharp isomorphism ( $^{♯}$ ): The ‘sharp’ isomorphism $^{♯} : T^{*} M \to T M$ takes a cotangent vector and returns a tangent vector. Namely, given a cotangent vector $ω = ω_{i} d x^{i}$ , the sharp isomorphism acts like $v : = ω^{♯} = g^{i j} ω_{i} \partial_{j}$ . For example, in the case of gradient, one may say that ${grad}_{x} f = {(d f_{x})}^{♯}$ .
Flat isomorphism ( $^{♭}$ ): The ‘flat’ isomorphism $^{♭} : T M \to T^{*} M$ takes a tangent vector and returns a cotangent vector. Namely, given a tangent vector $v = v^{i} \partial_{i}^{x}$ , the flat isomorphism acts like $ω : = v^{♭} = g_{i j} v^{i} d x^{j}$ . Then, the differential is the dual of gradient via a flat isomorphism.

10. Parallelism and Parallel Transport along a Curve

Parallel transport along a curve on a manifold is a fundamental concept that emerges from the curved nature of manifolds and is deeply interwoven in the fabric of manifold calculus.

10.1. Properties and Definition of Parallel Transport

Given a curve on a Riemannian manifold

M

embedded into a Euclidean space

A

, a point

x \in M

on the curve and a tangent vector

v \in T_{x} M

to the curve, if such a vector is moved to another point on the curve by ordinary rigid translation (which is available in

A

and may be referred to as ‘parallel translation’), in general it will not be tangent to the curve anymore. This observation suggests that it is necessary to define a notion of transport that is compatible with the structure of the manifold.

Another reason to motivate the notion of parallel transport is the lack of existence of ‘uniform’ vector fields on curved manifolds. On a flat space, such as

R^{p}

, a uniform vector field

u : R^{p} \to R^{p}

is a field

x \mapsto u_{x}

such that

u_{x}

is constant (same direction, length and orientation) for every

x \in R^{p}

. On a curved manifold embedded in an ambient space

A

, however, it is unlikely that a tangent vector field may take the same value in two points, because tangency at one point would most likely imply lack of tangency in another point. The closest notion to uniformity would be parallelism: Informally speaking, a vector field is said to be parallel along a curve if it keeps the same orientation with respect to such a curve’s own velocity field. Notice that this property is expressed in quite vague terms: such vagueness is indeed intended and implies that there exist infinitely many ways to interpret the notion of parallel transport.

Example 32.

Let

M

denote a Riemannian manifold and

x \mapsto w_{x}

a tangent vector field on the tangent bundle

T M

; namely, the function w assigns a tangent vector to each point x on the manifold. In the familiar case that

M = R^{3}

, there exists a notion ofuniformvector field; namely, a vector field may take the same amplitude and direction in any point of the base-space

M

. This is the case, for example, of an electric field within an ideal capacitor or an induction field within an ideal electrical coil. A distinguishing feature of a uniform vector field in

R^{3}

is that its directional derivative is zero everywhere; namely, given any point

x \in R^{3}

and any direction

v \in T_{x} R^{3}

, it holds that

lim_{h \to 0} \frac{w_{x + h v} - w_{x}}{h} = 0,

(331)

where the left-hand side represents the directional derivative of the vector field

w \in Γ (M)

along a direction v at the point x.

On a curved manifold

M

, uniform vector fields hardly exist, as the rigid translation of a tangent vector to a different point of a manifold will most likely result in a non-tangent vector. The closest notion to uniformity that may be recovered on a curved manifold is that of parallelism. A parallel vector field needs to satisfy a generalized version of the condition (331) (which we shall not see in this survey since it requires covariant derivation, which will be covered in a separate survey). ■

The notion of parallelism of a vector field gives rise to an important canonical operator in manifold calculus, namely, parallel transport along a curve. Parallel transport allows one to, e.g., compare vectors belonging to different tangent spaces. On a smooth Riemannian manifold

M

, fix a smooth curve

γ : [0, 1] \to M

. The parallel transport map associated with the curve

γ

is denoted as

P_{γ}^{s \to t} : T_{γ (s)} M \to T_{γ (t)} M

, which is a linear map for every

s, t \in [0, 1]

, namely

P_{γ}^{s \to t} (α w + β v) = α P_{γ}^{s \to t} (w) + β P_{γ}^{s \to t} (v)

(332)

for every

α, β \in R

and

u, v \in T_{γ (s)} M

. The parallel transport map depends smoothly on its arguments and is such that

P_{γ}^{s \to s}

coincides with the identity map and

P_{γ}^{u \to t} \circ P_{γ}^{s \to u} = P_{γ}^{s \to t}

for every

s, u, t \in [0, 1]

. (Notice that the last property holds on the same curve only, while it does not hold for different arcs.)

In most cases of importance in system theory and control, the main interest lies in the parallel transport of a tangent vector along a geodesic curve on a Riemannian manifold, rather than along a generic smooth curve. Parallel transport along a geodesic arc may be conjugated in at least two ways:

Parallel transport along a geodesic arc joining two given points: Given two (sufficiently close) points $x, y \in M$ , and a tangent vector $w \in T_{x} M$ , the parallel transport of w from x to y along the geodesic arc connecting x to y is denoted by $P^{x \to y} (w)$ , since this notation shows all relevant information. In fact, the notation $P^{x \to y}$ for the parallel transport operator is a shortened version of $P_{γ_{x}^{y}}^{0 \to 1} : T_{x} M \to T_{y} M$ , where $γ_{x}^{y} : [0, 1] \to M$ would denote the geodesic arc such that $γ_{x}^{y} (0) = x$ and $γ_{x}^{y} (1) = y$ .
Parallel transport along a geodesic specified by a point and a tangent direction: Given a point $x \in M$ , and a tangent direction $v \in T_{x} M$ , the parallel transport of a vector $w \in T_{x} M$ along the geodesic arc departing from the point x toward the direction v is denoted by $P_{γ_{x, v}}^{0 \to t} (w)$ , where $γ_{x, v} : [- ϵ, ϵ] \to M$ , with $ϵ > 0$ and $t \in [- ϵ, ϵ]$ , would denote the geodesic curve such that $γ_{x, v} (0) = x$ and ${\dot{γ}}_{x, v} (0) = v$ .

There exist other special cases of interest in control and automation. One such special case supplies a partial answer to the problem of effecting parallel transport on a manifold for which the structure of the parallel transport operator is unknown. Another special case is parallel transport along a closed loop, which may arise when dealing with periodic trajectories or self-intersecting trajectories.

Self parallel transport along a geodesic arc: A distinguishing feature of a geodesic arc is that it parallel-transports its own initial slope. On a setting where the parallel transport operator is unknown in closed form, there exists no known way to transport a given vector w along a geodesic curve $γ_{x, v}$ . However, it is possible to parallel-transport a given vector v along a geodesic $γ_{x, v}$ , namely to compute $P_{γ_{x, v}}^{0 \to t} (v)$ : it coincides exactly with ${\dot{γ}}_{x, v} (t)$ . Such a numerical trick was invoked, for instance, in [39], Subsection III.A.
Parallel transport along a closed loop: Parallel transport of a vector along a (piece-wise continuous) loopℓ is denoted as $P_{ℓ} : T_{x} M \to T_{x} M$ , where x is termed base of the loop. In general, $P_{ℓ} (v) \neq v$ , for $v \in T_{x} M$ . This phenomenon is referred to as anholonomy. Whenever $P_{ℓ}$ realizes an isometry, since $∥ P_{ℓ} {(v) ∥}_{x} = {∥ v ∥}_{x}$ , the operator $P_{ℓ}$ changes only the orientation of v. For example, if $M \subset R^{3}$ , then we may say that $P_{ℓ} (v)$ is a rotated version of a tangent vector v laying in the same tangent space, namely, $P_{ℓ}$ may be represented as an element of the orthogonal group $O (3)$ . Intuitively, holonomy is specifically related to curved spaces, and therefore holonomy must be related to curvature. This conjecture is made precise by the Ambrose–Singer theorem [60].

In the following, we shall go through a detailed derivation of the notion of parallel transport along a curve on the tangent bundle of a manifold embedded in an ambient space. In general, the result of parallel transport depends on the curve; moreover, parallel transport may be conceived in a way that preserves the angle between transported vectors but not their lengths, or to preserve both.

10.2. Coordinate-Free Derivation of Parallel Transport

Let us develop a notion of parallel transport

P

such that, in every point of the curve

γ

, it holds that

P_{γ}^{0 \to t} (w) \in T_{γ} M

. In addition, a fundamental requirement that parallel transport should fulfill is that, given two tangent vectors

u, w \in T_{x} M

that form an angle

α

, parallel transport should preserve the value of such angle along the whole curve, namely the angle between

P_{γ}^{0 \to t} (u)

and

P_{γ}^{0 \to t} (w)

. Let us recall that the angle

α

between two vectors

u, w

in

T_{x} M

is defined by

cos (α) : = \frac{{〈 u, w 〉}_{x}}{{∥ u ∥}_{x} {∥ w ∥}_{x}} .

(333)

We may thus require parallel transport to fulfill:

Tangency condition: for every $t \in [0, 1]$ it should hold that $P_{γ}^{0 \to t} (w) \in T_{γ (t)} M$ ,
Conformal isometry condition: for every value of the parameter $t \in [0, 1]$ it should hold that ${〈 P_{γ}^{0 \to t} (u), P_{γ}^{0 \to t} (w) 〉}_{γ (t)}$ keeps constant to ${〈 u, w 〉}_{x}$ along the curve.

Let us verify that the second condition implies both conformality and isometry. To what concerns isometry, let us show that taking

w = u

gives:

{〈 P_{γ}^{0 \to t} (u), P_{γ}^{0 \to t} (u) 〉}_{γ (t)} = {∥ P_{γ}^{0 \to t} (u) ∥}_{γ (t)}^{2},

(334)

from which it follows that

∥ P_{γ}^{0 \to t} {(u) ∥}_{γ (t)}

keeps constant along the curve

γ

to

{∥ u ∥}_{x}

. It is straightforward to prove conformality; in fact,

\frac{{〈 P_{γ}^{0 \to t} (u), P_{γ}^{0 \to t} (w) 〉}_{γ (t)}}{∥ P_{γ}^{0 \to t} {(u) ∥}_{γ (t)} {∥ P_{γ}^{0 \to t} (w) ∥}_{γ (t)}} = \frac{{〈 u, w 〉}_{x}}{{∥ u ∥}_{x} {∥ w ∥}_{x}}, \forall t \in [0, 1] .

(335)

(Let us recall that a linear transformation that preserves angles is termed conformal.) It is important to underline that the above conditions do not define parallel transport univocally, but they represent minimal requirements for an operator to be qualified as parallel transport. In other words, parallel transport may be defined in several ways as long as it meets the above conditions, which turn out to be of great importance in non-linear control.

To ease notation, let us define the following fields:

$\bar{u} (t) : = P_{γ}^{0 \to t} (u)$ : a vector field $(γ (t), \bar{u} (t))$ that represents the evolution, over the curve $γ$ , of the vector u that is tangent to the manifold at $γ (0)$ and after transport it will be tangent at $γ (1)$ ,
$\bar{w} (t) : = P_{γ (t)}^{0 \to t} (w)$ : analogous to $\bar{u} (t)$ , both are transport fields,
$Q (t) : = {〈 \bar{u} (t), \bar{w} (t) 〉}_{γ (t)}$ : a scalar field that represents the inner product between the above two transport fields along the curve $γ$ .

For a manifold

M

embedded in an ambient space

A

endowed with a metric

{〈 \cdot, \cdot 〉}^{A}

, the condition of conformality and isometry may be written in a more detailed way. Let us recall from Section 7 that the inner product of two tangent vectors may be written as

{〈 u, w 〉}_{x} = {〈 u, G_{x} (w) 〉}^{A}

, where the metric kernel has key properties, namely

G_{x} (v)

is linear in v,

G_{x}

is self-adjoint (namely,

{〈 u, G_{x} (w) 〉}^{A} = {〈 G_{x} (u), w 〉}^{A}

),

G_{x}

is invertible and its inverse is denoted as

G_{x}^{- 1}

, and

G_{x}

maps

T_{x} M

to itself.

In order to perform the following calculations seamlessly, it pays to highlight a few calculation rules about the metric kernel G and the fundamental form

F

:

Taylor series expansion of the kernel G: Given $G_{x} (v)$ with $x \in M$ , $v \in T_{x} M$ , let us write

$G_{γ_{x, w} (t)} (v) = G_{x} (v) + G_{x}^{•} (v, t w) + O (t^{2}),$

(336)

where $γ_{x, w} : [- ϵ, ϵ] \to M$ denotes any smooth curve such that $γ_{x, w} (0) = x$ and ${\dot{γ}}_{x, w} (0) = w$ , with $w \in T_{x} M$ denoting a direction along which a variation of the metric kernel is sought. The quantity $G_{x}^{•} (v, w)$ represents a first-order derivative of the kernel $G_{x}$ calculated with respect to the variable x. (Notation-wise, the ‘bullet’ derivative is a convenient way to denote a derivative that cannot be expressed in more specific terms. For example, if $f, g$ are real functions of a real variable, one might denote $\frac{d}{d x} f (g (x)) = \dot{f} (g (x)) \dot{g} (x)$ as $f^{•} (g (x), \dot{g} (x))$ .) The formal definition goes like

$G_{x}^{•} (v, w) : = {\frac{d}{d t} G_{γ_{x, w} (t)} (v)|}_{t = 0},$

(337)

where again $γ_{x, w} : [- ϵ, ϵ] \to M$ denotes any smooth curve such that $γ_{x, w} (0) = x$ and ${\dot{γ}}_{x, w} (0) = w$ . The function $G_{x}^{•} (v, w)$ is linear in the arguments v and w and, in general, non-linear in x. Notice that the extendability property of $G_{x}$ from $T_{x} M$ to $A$ holds also for the first argument of the derivative $G_{x}^{•}$ .
Commutativity with the inner product in the ambient space: In the expression ${〈 u, G_{x}^{•} (v, w) 〉}^{A}$ it is allowed to swap the arguments u and v. In fact, notice that

$\begin{matrix} {〈 u, G_{x}^{•} (v, w) 〉}^{A} = & {〈 u, {\frac{d}{d t} G_{γ_{x, w} (t)} (v)|}_{t = 0} 〉}^{A} = {\frac{d}{d t} {〈 u, G_{γ_{x, w} (t)} (v) 〉}^{A}|}_{t = 0} \\ = & {\frac{d}{d t} {〈 G_{γ_{x, w} (t)} (u), v 〉}^{A}|}_{t = 0} = {〈 {\frac{d}{d t} G_{γ_{x, w} (t)} (u)|}_{t = 0}, v 〉}^{A} \\ = & {〈 v, G_{x}^{•} (u, w) 〉}^{A} . \end{matrix}$

(338)

This property of $G_{x}^{•}$ holds even if the first argument of the derivative belongs to $A$ .
Partial derivatives of the fundamental form: Let us recall the definition of the fundamental form $F (x, v) : = {〈 v, G_{x} (v) 〉}^{A}$ . Its partial derivatives read

$\frac{\partial F}{\partial x} (x, v) = G_{x}^{•} (v, v), \frac{\partial F}{\partial v} (x, v) = 2 G_{x} (v) .$

(339)

Let us see a few examples to clarify the above definitions and properties.

Example 33.

Let us start considering the case of the manifold

S^{+} (n)

endowed with its canonical metric, embedded in the ambient space

A : = R^{n \times n}

endowed with a Euclidean metric. In this setting, we have seen that

G_{P} (V) = P^{- 1} V P^{- 1}

. By definition, its ‘bullet derivative’ reads

G_{P}^{•} (V, W) = {\frac{d}{d t} (γ^{- 1} V γ^{- 1})|}_{t = 0},

(340)

where

γ : [- ϵ, ϵ] \to S^{+} (n)

denotes any smooth curve such that

γ (0) = P \in S^{+} (n)

and

\dot{γ} (0) = W \in T_{P} S^{+} (n)

. By using the noticeable matrix-flow derivation rule (104), we obtain

\frac{d}{d t} (γ^{- 1} V γ^{- 1}) = (- γ^{- 1} \dot{γ} γ^{- 1}) V γ^{- 1} + γ^{- 1} V (- γ^{- 1} \dot{γ} γ^{- 1}) .

(341)

Setting

t = 0

leads to the sought expression

G_{P}^{•} (V, W) = - P^{- 1} W P^{- 1} V P^{- 1} - P^{- 1} V P^{- 1} W P^{- 1} .

(342)

Let us verify the property of commutativity with the inner product. We may calculate that

\begin{matrix} {〈 G_{P}^{•} (V, W), U 〉}^{A} = & - tr (P^{- 1} W P^{- 1} V P^{- 1} U) - tr (P^{- 1} V P^{- 1} W P^{- 1} U), \\ {〈 G_{P}^{•} (U, W), V 〉}^{A} = & - tr (P^{- 1} U P^{- 1} W P^{- 1} V) - tr (P^{- 1} W P^{- 1} U P^{- 1} V) . \end{matrix}

(343)

The two quantities may be proven equal by the cyclic permutation property of the trace.

Another example that we are surveying concerns the manifold

Sp (2 n)

endowed with its canonical metric again embedded in the ambient space

A : = R^{n \times n}

endowed with a Euclidean metric. In this setting, we have seen that

G_{Q} (V) = Q^{- ⊤} Q^{- 1} V

. Its ‘bullet derivative’ reads

G_{Q}^{•} (V, W) = {\frac{d}{d t} (γ^{- ⊤} γ^{- 1} V)|}_{t = 0},

(344)

where

γ : [- ϵ, ϵ] \to Sp (2 n)

denotes any smooth curve such that

γ (0) = Q \in Sp (2 n)

and

\dot{γ} (0) = W \in T_{Q} Sp (2 n)

.

By invoking the identity (104) once again, we obtain

\frac{d}{d t} (γ^{- ⊤} γ^{- 1}) = {(- γ^{- 1} \dot{γ} γ^{- 1})}^{⊤} γ^{- 1} + γ^{- ⊤} (- γ^{- 1} \dot{γ} γ^{- 1}) .

(345)

Setting

t = 0

leads to the expression

G_{Q}^{•} (V, W) = - Q^{- ⊤} (W^{⊤} Q^{- ⊤} + Q^{- 1} W) Q^{- 1} V .

(346)

Let us verify the commutativity property with the inner product. We may calculate that

\begin{matrix} {〈 U, G_{P}^{•} (V, W) 〉}^{A} = & - tr (U^{⊤} Q^{- ⊤} (W^{⊤} Q^{- ⊤} + Q^{- 1} W) Q^{- 1} V), \\ {〈 V, G_{P}^{•} (U, W) 〉}^{A} = & - tr (V^{⊤} Q^{- ⊤} (W^{⊤} Q^{- ⊤} + Q^{- 1} W) Q^{- 1} U) . \end{matrix}

(347)

These two quantities may be proven equal by transposing the arguments of the traces. ■

The condition of conformality and isometry may be written very compactly as

\dot{Q} = 0

. By multivariable calculus and the above definition and properties, we obtain:

\dot{Q} = {〈 \dot{\bar{u}}, G_{γ} (\bar{w}) 〉}^{A} + {〈 \bar{u}, G_{γ}^{•} (\bar{w}, \dot{γ}) 〉}^{A} + {〈 \bar{u}, G_{γ} (\dot{\bar{w}}) 〉}^{A} .

(348)

(This is one situation in which we need to invoke the ‘extendability’ property of the metric kernel; in fact, notice that

\dot{\bar{w}}

does not necessarily belong to

T_{γ} M

.) In the second term on the right-hand side, the vector fields

\bar{u}

and

\bar{w}

may be swapped, and hence we may write:

\dot{Q} = {〈 \dot{\bar{u}}, G_{γ} (\bar{w}) 〉}^{A} + {〈 \bar{w}, G_{γ}^{•} (\bar{u}, \dot{γ}) 〉}^{A} + {〈 \bar{u}, G_{γ} (\dot{\bar{w}}) 〉}^{A} .

(349)

Since the above development is still too general to lead to a computable notion of parallel transport, let us refer parallelism to the velocity field associated with the curve

γ

, namely, let us set

\bar{w} : = \dot{γ}

. This means that the parallel transport operation will keep fixed the angle between the transported vector and the velocity vector associated with the curve. The expression of

\dot{Q}

becomes then:

\dot{Q} = {〈 \dot{\bar{u}}, G_{γ} (\dot{γ}) 〉}^{A} + {〈 \dot{γ}, G_{γ}^{•} (\bar{u}, \dot{γ}) 〉}^{A} + {〈 \bar{u}, G_{γ} (\ddot{γ}) 〉}^{A} .

(350)

In the above relation, all quantities are known except for the transport field

\bar{u}

of which we are seeking a temporal evolution law.

A special case, of sure interest in applications since it dispenses us from choosing a curve

γ

, is parallel transport along a geodesic line. Let thus

γ

denote a geodesic and let us recall that, in this case, the expression

G_{γ} (\ddot{γ})

may be rewritten in terms of the velocity

\dot{γ}

through the differential inclusion that characterizes a geodesic line. Let us then revamp the geodesic equation with the aim of writing it in a Christoffel normal form. Recall that, given the fundamental form for a line

F (γ, \dot{γ}) = {〈 \dot{γ}, G_{γ} (\dot{γ}) 〉}^{A}

, whenever it is a geodesic line it holds that:

\frac{\partial F}{\partial γ} - \frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} \in N_{γ} M .

(351)

According to the relationships in (339)

\frac{\partial F}{\partial γ} = G_{γ}^{•} (\dot{γ}, \dot{γ}), \frac{\partial F}{\partial \dot{γ}} = 2 G_{γ} (\dot{γ}),

(352)

and then

\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} = 2 G_{γ}^{•} (\dot{γ}, \dot{γ}) + 2 G_{γ} (\ddot{γ}),

(353)

therefore the differential inclusion (351) that defines a geodesic line becomes

- G_{γ}^{•} (\dot{γ}, \dot{γ}) - 2 G_{γ} (\ddot{γ}) = B_{γ} (\dot{γ}, \dot{γ}),

(354)

where

B_{x} : {(T_{x} M)}^{2} \to N_{x} M

is a bilinear function of its vector arguments and returns an element of the normal space

N_{x} M

. Ultimately, we obtained

G_{γ} (\ddot{γ}) = - \frac{1}{2} G_{γ}^{•} (\dot{γ}, \dot{γ}) - \frac{1}{2} B_{γ} (\dot{γ}, \dot{γ}) .

(355)

This same expression may be rewritten compactly in one of the following normal Christoffel forms:

\{\begin{matrix} G_{γ} (\ddot{γ}) + {\bar{Γ}}_{γ} (\dot{γ}, \dot{γ}) = 0, o r \\ \ddot{γ} + {\bar{\bar{Γ}}}_{γ} (\dot{γ}, \dot{γ}) = 0, \end{matrix}

(356)

where the function

{\bar{Γ}}_{x}

is termed Christoffel form of the

1 st

kind, while the quantity

{\bar{\bar{Γ}}}_{x} : = G_{x}^{- 1} \circ {\bar{Γ}}_{x}

is termed Christoffel form of the

2 nd

kind. Notice that only the restricted Christoffel form

{\bar{Γ}}_{x} (v, v)

with

x \in M

and

v \in T_{x} M

has been defined so far, namely

{\bar{Γ}}_{x} (v, v) : = \frac{1}{2} G_{x}^{•} (v, v) + \frac{1}{2} B_{x} (v, v) .

(357)

Since both terms on the right-hand side are bilinear in their vector arguments, the Christoffel form is quadratic. It is now necessary to define the full Christoffel form

Γ_{x} (u, w)

, with

x \in M

and

u, w \in T_{x} M

on the basis of its restricted version. Such a result may be achieved through the following polarization formula:

{\bar{Γ}}_{x} (u, w) : = \frac{1}{4} \{{\bar{Γ}}_{x} (u + w, u + w) - {\bar{Γ}}_{x} (u - w, u - w)\},

(358)

which stems from the requirements that the full Christoffel form is bilinear and symmetric in its vector argument, namely

{\bar{Γ}}_{x} (u, w) = {\bar{Γ}}_{x} (w, u)

.

Employing the normal Christoffel form in the latest expression of

\dot{Q}

gives:

\dot{Q} = {〈 \dot{\bar{u}}, G_{γ} (\dot{γ}) 〉}^{A} + {〈 \dot{γ}, G_{γ}^{•} (\bar{u}, \dot{γ}) 〉}^{A} - {〈 \bar{u}, {\bar{Γ}}_{γ} (\dot{γ}, \dot{γ}) 〉}^{A} .

(359)

Since

\bar{Γ}

is derived by polarization from a restricted bilinear function, it is self-adjoint in both its arguments; hence, it commutes with the inner product

{〈 \cdot, \cdot 〉}^{A}

. Therefore, in the third term on the right hand side it is possible to swap the transport field

\bar{u}

with any of the two fields

\dot{γ}

, which leads to:

\begin{matrix} \dot{Q} = & {〈 \dot{\bar{u}}, G_{γ} (\dot{γ}) 〉}^{A} + {〈 \dot{γ}, G_{γ}^{•} (\bar{u}, \dot{γ}) 〉}^{A} - {〈 \dot{γ}, {\bar{Γ}}_{γ} (\bar{u}, \dot{γ}) 〉}^{A} \\ = & {〈 \dot{\bar{u}}, G_{γ} (\dot{γ}) 〉}^{A} + {〈 \dot{γ}, {\bar{Γ}}_{γ} (\bar{u}, \dot{γ}) 〉}^{A} \end{matrix}

(360)

where we have used the fact that

\bar{Γ}

and

\frac{1}{2} G^{•}

differ by a purely normal component.

In order to make the above addenda uniform to one another, let us introduce in the above expression twice the identity

G_{γ} \circ G_{γ}^{- 1} (= {id}_{γ})

obtaining:

\dot{Q} = {〈 G_{γ} (\dot{γ}), \dot{\bar{u}} 〉}^{A} + {〈 G_{γ} (\dot{γ}), G_{γ}^{- 1} \circ {\bar{Γ}}_{γ} (\bar{u}, \dot{γ}) 〉}^{A} .

Now it is possible to gather the terms in

G_{γ} (\dot{γ})

and to write the above sum as a single inner product:

\dot{Q} = {〈 G_{γ} (\dot{γ}), \dot{\bar{u}} + G_{γ}^{- 1} \circ {\bar{Γ}}_{γ} (\bar{u}, \dot{γ}) 〉}^{A} = 0 .

(361)

Ultimately, the last relationship still expresses the basic property of parallel transport to realize a conformal isometry, which still leaves much room open for a precise definition of parallel transport. The condition

\dot{Q} = 0

alone would give rise to infinitely many equations to compute a transport field, and hence parallel transport. Among these infinitely many, one arises simply by requiring the right-hand side of the above inner product to vanish to zero. Namely:

\dot{\bar{u}} + {\bar{\bar{Γ}}}_{γ} (\bar{u}, \dot{γ}) = 0 .

(362)

The solution of such equation supplies an expression for the transport field which, in turn, supplies a notion of parallel transport along a geodesic line, namely

P^{x \to y} (u) : = \bar{u} (1),

(363)

where

\bar{u} (t)

denotes the solution of the differential Equation (363) with the initial condition

\bar{u} (0) = u

. Such a differential equation in

A

is first-order and linear in the unknown transport field. Since the set of solutions to a linear first-order differential equation is a linear space, parallel transport is a linear isomorphism of the tangent bundle

T M

.

Example 34.

As anticipated, a noteworthy property of any geodesic line is that it self-transports its own velocity along. In other words, given a geodesic

γ_{x}^{y} : [0, 1] \to M

that connects two points

x, y \in M

, the following function turns out to realize parallel transport:

P_{γ_{x}^{y}}^{0 \to t} ({\dot{γ}}_{x}^{y} (0)) : = {\dot{γ}}_{x}^{y} (t) .

(364)

Let us verify that it meets the fundamental properties any parallel transport should meet:

Tangency: for every $t \in [0, 1]$ it holds that $P_{γ_{x}^{y}}^{0 \to t} ({\dot{γ}}_{x}^{y} (0)) \in T_{γ_{x}^{y} (t)} M$ ; in fact, $P_{γ_{x}^{y}}^{0 \to t} ({\dot{γ}}_{x}^{y} (0)) = {\dot{γ}}_{x}^{y} (t)$ is tangent to the curve at every point $γ_{x}^{y} (t)$ .
Conformal isometry: for every $t \in [0, 1]$ it holds that ${〈 P_{γ_{x}^{y}}^{0 \to t} ({\dot{γ}}_{x}^{y} (0)), P_{γ_{x}^{y}}^{0 \to t} ({\dot{γ}}_{x}^{y} (0)) 〉}_{γ_{x}^{y} (t)} = {∥ {\dot{γ}}_{x}^{y} (t) ∥}_{γ_{x}^{y} (t)}^{2}$ which is constant along a geodesic,

as claimed. ■

As a special case, it is worth considering a manifold

M

endowed with a uniform metric. In this instance, the metric kernel

G_{x}

is independent of the point x, and therefore it holds that

G^{•} \equiv 0

along every (geodesic) line. The transport field equation hence simplifies into

\dot{\bar{u}} = - \frac{1}{2} G^{- 1} \circ B_{γ} (\bar{u}, \dot{γ}) .

(365)

In the ever simpler instance where the metric is Euclidean, the metric kernel coincides with the identity (namely

G_{x} \equiv {id}_{x}

), and hence it turns out that

\dot{\bar{u}} = - \frac{1}{2} B_{γ} (\bar{u}, \dot{γ}) .

(366)

Recalling that the function B maps the double tangent bundle

{(T M)}^{2}

to the normal bundle

N M

, it is quite straightforward to figure out a practical interpretation of the differential Equation (366). It basically prescribes that, when traveling from a point over the curve

γ

to another infinitesimally close point, the transport field looses its normal component to stay tangent.

The following two examples aim at clarifying the above theoretical developments.

Example 35.

Let us apply the theoretical development just analyzed to determine a transport field equation on a hypersphere

S^{n - 1} \subset A

with

A : = R^{n}

endowed with the metric

{〈 u, w 〉}_{x} : = {〈 u, w 〉}^{A} = u^{⊤} w

.

Let us start by determining the Christoffel form (of the first or second kind is unessential, in this case, because the metric kernel coincides with the identity). The easiest way to determine Christoffel forms is via a geodesic equation. The fundamental form, in this case, reads

F : = {\dot{γ}}^{⊤} \dot{γ}

. From the fundamental form we find that

\frac{\partial F}{\partial γ} = 0 a n d \frac{\partial F}{\partial \dot{γ}} = 2 \dot{γ},

(367)

therefore

\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} = 2 \ddot{γ} .

(368)

The geodesic equation then reads

\ddot{γ} = λ γ

; hence,

λ = {\ddot{γ}}^{⊤} γ

. From the tangency of the velocity

\dot{γ}

, it follows that:

{\dot{γ}}^{⊤} γ = 0 \overset{d / d t}{\Rightarrow} {\ddot{γ}}^{⊤} γ + {\dot{γ}}^{⊤} \dot{γ} = 0,

(369)

and henceforth

{\ddot{γ}}^{⊤} γ = - {∥ \dot{γ} ∥}^{2}

. As a consequence,

λ = - ∥ \dot{γ} ∥^{2}

and:

{\bar{Γ}}_{x} (u, w) = {\bar{\bar{Γ}}}_{x} (u, w) = - (u^{⊤} w) x .

(370)

We may now verify that

{〈 {\bar{Γ}}_{x} (u, w), v 〉}^{A} = {〈 {\bar{Γ}}_{x} (v, w), u 〉}^{A}

. It holds that

\begin{matrix} {〈 {\bar{Γ}}_{x} (u, w), v 〉}^{A} = & - (u^{⊤} w) (v^{⊤} x) = 0, \\ {〈 {\bar{Γ}}_{x} (v, w), u 〉}^{A} = & - (v^{⊤} w) (u^{⊤} x) = 0, \end{matrix}

(371)

hence the assertion.

The equation of the transport field, in this case, reads as:

\dot{\bar{u}} + ({\bar{u}}^{⊤} \dot{γ}) γ = 0 .

(372)

Such differential equation may be solved and returns an expression for the transport field

\bar{u}

. The definition (363) returns then the expression for the parallel transport from the point

x \in S^{n - 1}

of a vector

w \in T_{x} S^{n - 1}

to the point

y \in S^{n - 1}

:

P^{x \to y} (w) : = [I_{n} - \frac{(x + y) y^{⊤}}{1 + x^{⊤} y}] w .

(373)

Let us verify some characteristic features of the above map:

Linearity: The linearity of $P^{x \to y} (w)$ with respect to w (but not with respect to x ad y) is quite apparent.
Identity: Letting $y \equiv x$ leads to an identity map. In fact, it holds that $P^{x \to x} (w) = [I_{n} - \frac{(x + x) x^{⊤}}{1 + x^{⊤} x}] w = [I_{n} - \frac{2 x x^{⊤}}{2}] w = (I_{n} - x x^{⊤}) w = w - x (x^{⊤} w) = w - 0 = w$ .
Tangency: It should hold that $y^{⊤} P^{x \to y} (w) = 0$ . In fact: $y^{⊤} [I_{n} - \frac{(x + y) y^{⊤}}{1 + x^{⊤} y}] w = y^{⊤} w - \frac{y^{⊤} (x + y) y^{⊤} w}{1 + x^{⊤} y} = y^{⊤} w - \frac{(y^{⊤} x + 1) y^{⊤} w}{1 + x^{⊤} y} = y^{⊤} w - y^{⊤} w = 0$ .

Important properties of the map (373) have been hence verified. ■

Example 36.

Let us consider the case of the manifold

S^{+} (n) \subset A

endowed with the metric

{〈 U, W 〉}_{P} : = tr (U P^{- 1} W P^{- 1})

embedded in the ambient space

A : = R^{n \times n}

endowed with the metric

{〈 U, V 〉}^{A} : = tr (U V)

. Recall from Section 7 that the corresponding metric kernel takes the expression

G_{P} (W) : = P^{- 1} W P^{- 1}

(which, as a function of W, is well-defined in the whole ambient space

A

).

Let us determine the Christoffel form of the second kind through calculations related to geodesy. Define the fundamental form on a curve as

F (γ, \dot{γ}) : = tr (\dot{γ} γ^{- 1} \dot{γ} γ^{- 1})

and observe that:

\begin{matrix} \frac{\partial F}{\partial \dot{γ}} & = & 2 γ^{- 1} \dot{γ} γ^{- 1} [\equiv 2 G_{γ} (\dot{γ})], \end{matrix}

(374)

\begin{matrix} \frac{\partial F}{\partial γ} & = & - 2 γ^{- 1} \dot{γ} γ^{- 1} \dot{γ} γ^{- 1} [\equiv G_{γ}^{•} (\dot{γ}, \dot{γ})], \end{matrix}

(375)

from which it follows that:

\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} = 2 \frac{d}{d t} (γ^{- 1}) \dot{γ} γ^{- 1} + 2 γ^{- 1} \ddot{γ} γ^{- 1} + 2 γ^{- 1} \dot{γ} \frac{d}{d t} (γ^{- 1}) .

(376)

Recalling the matrix identity

\frac{d}{d t} (γ^{- 1}) = - γ^{- 1} \dot{γ} γ^{- 1},

leads to

\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} = - 4 γ^{- 1} \dot{γ} γ^{- 1} \dot{γ} γ^{- 1} + 2 γ^{- 1} \ddot{γ} γ^{- 1},

(377)

which, in turn, implies that:

\begin{matrix} \frac{\partial F}{\partial γ} - \frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} = & - 2 γ^{- 1} \dot{γ} γ^{- 1} \dot{γ} γ^{- 1} + 4 γ^{- 1} \dot{γ} γ^{- 1} \dot{γ} γ^{- 1} - 2 γ^{- 1} \ddot{γ} γ^{- 1}, \\ = & 2 γ^{- 1} \dot{γ} γ^{- 1} \dot{γ} γ^{- 1} - 2 γ^{- 1} \ddot{γ} γ^{- 1}, \\ = & 2 γ^{- 1} (\dot{γ} γ^{- 1} \dot{γ} - \ddot{γ}) γ^{- 1} . \end{matrix}

(378)

The transport equation, in normal Christoffel form, hence reads:

\ddot{γ} - \dot{γ} γ^{- 1} \dot{γ} - ω γ = 0,

(379)

where ω denotes a skew-symmetric matrix function (namely

ω^{⊤} = - ω

) such that the product

ω γ

belongs to the normal space at γ.

In order to determine a function ω, let us invoke the condition of tangency of the velocity field

\dot{γ}

that, in this case, is expressed as

{\dot{γ}}^{⊤} = \dot{γ}

. From such a condition, it follows that

{\ddot{γ}}^{⊤} = \ddot{γ}

; hence:

{(\ddot{γ} - \dot{γ} γ^{- 1} \dot{γ} - ω γ)}^{⊤} = \ddot{γ} - \dot{γ} γ^{- 1} \dot{γ} - γ ω^{⊤} = 0,

(380)

from which it follows that

ω = 0

. The restricted Christoffel form of the second kind, in this case, hence reads as

{\bar{\bar{Γ}}}_{P} (V, V) : = - V P^{- 1} V .

(381)

To determine the complete Christoffel form of the second kind, let us make use of the polarization formula (358):

\begin{matrix} {\bar{\bar{Γ}}}_{P} (V, W) & : = \frac{1}{4} \{{\bar{\bar{Γ}}}_{P} (V + W, V + W) + {\bar{\bar{Γ}}}_{P} (V - W, V - W)\} \\ = \frac{1}{4} \{- (V + W) P^{- 1} (V + W) + (V - W) P^{- 1} (V - W)\} \\ = - \frac{1}{2} V P^{- 1} W - \frac{1}{2} W P^{- 1} V . \end{matrix}

(382)

Let us now verify that the form

\bar{Γ}

commutes with the ambient inner product. We have

\bar{Γ} = G \circ \bar{\bar{Γ}}

; therefore,

{\bar{Γ}}_{P} (V, W) = - \frac{1}{2} P^{- 1} (V P^{- 1} W + W P^{- 1} V) P^{- 1} .

(383)

As a consequence, we have

\begin{matrix} {〈 {\bar{Γ}}_{P} (V, W), U 〉}^{A} = & - \frac{1}{2} tr (U P^{- 1} V P^{- 1} W P^{- 1} + U P^{- 1} W P^{- 1} V P^{- 1}), \\ {〈 {\bar{Γ}}_{P} (U, W), V 〉}^{A} = & - \frac{1}{2} tr (V P^{- 1} U P^{- 1} W P^{- 1} + V P^{- 1} W P^{- 1} U P^{- 1}), \end{matrix}

(384)

which may be easily proven equal by the cyclic commutation property of matrix trace.

As a consequence, and employing the relation (342), the parallel transport equation is found to be:

\dot{\bar{U}} + γ (- γ^{- 1} \dot{γ} γ^{- 1} \bar{U} γ^{- 1} - γ^{- 1} \bar{U} γ^{- 1} \dot{γ} γ^{- 1}) γ - (- \frac{1}{2} \dot{γ} γ^{- 1} \bar{U} - \frac{1}{2} \bar{U} γ^{- 1} \dot{γ}) = 0,

(385)

or, after a few straightforward simplifications,

\dot{\bar{U}} - \frac{1}{2} (\dot{γ} γ^{- 1} \bar{U} + \bar{U} γ^{- 1} \dot{γ}) = 0 .

(386)

Through the definition (363), we may obtain the expression of the parallel transport from a point

P \in S^{+} (n)

of a tangent vector

W \in T_{P} S^{+} (n)

to a point

Q \in S^{+} (n)

:

P^{P \to Q} (W) : = \sqrt{Q P^{- 1}} W \sqrt{P^{- 1} Q},

(387)

where the matrix square root

\sqrt{\cdot}

returns a symmetric matrix. Let us verify two properties of the found operator:

Identity: Setting $Q \equiv P$ leads to parallel transport collapse to an identity map. Indeed, it holds that $P^{P \to P} (W) : = \sqrt{P P^{- 1}} W \sqrt{P^{- 1} P} = W$ .
Tangency: It holds that ${(P^{P \to Q} (W))}^{⊤} = P^{P \to Q} (W)$ . In fact, we have ${(P^{P \to Q} (W))}^{⊤} = {(P^{- 1} Q)}^{\frac{⊤}{2}} W^{⊤} {(Q P^{- 1})}^{\frac{⊤}{2}} = {(Q^{⊤} P^{- ⊤})}^{\frac{1}{2}} W {(P^{- ⊤} Q^{⊤})}^{\frac{1}{2}} = \sqrt{Q P^{- 1}} W \sqrt{P^{- 1} Q} = P^{P \to Q} (W)$ .

Linearity is apparent as well. ■

The calculations illustrated in the previous example requires the evaluation of the partial derivatives of the fundamental form

F

, determined in (374) and (375). It is quite instructive to sketch such calculations.

Example 37.

Let us survey in detail the calculations that led to relations (374) and (375). Such calculations may be carried out through calculus (exact way) or by analytic approximations (which is more clear from a computational viewpoint).

To begin with, it is useful to analytically justify a matrix approximation, namely

{(P + E)}^{- 1} \approx P^{- 1} - P^{- 1} E P^{- 1},

(388)

with

P \in S^{+} (n)

and

E \in R^{n \times n}

with

∥ E ∥

small. Such approximation may be proven through the notion ofanalytic matrix function. By definition, a matrix-to-matrix function

f : R^{n \times n} \to R^{n \times n}

is analytic in a point

X_{0} \in R^{n \times n}

if it admits, in a neighborhood of such a point, a polynomial series expansion, namely:

f (X) = \sum_{n = 0}^{\infty} a_{n} {(X - X_{0})}^{n}, f o r ∥ X - X_{0} ∥ < ρ

(389)

where

a_{n} \in R

are coefficients of the series and

ρ > 0

denotes the radius of convergence of the polynomial series.

Let us now consider the function

f (X) : = {(I_{n} + X)}^{- 1}

. It is analytic in a neighborhood of the point

X = 0_{n}

and the associated polynomial series, truncated to the second term, reads:

{(I_{n} + X)}^{- 1} \approx I_{n} - X,

(390)

from which it readily follows that

\begin{matrix} {(P + E)}^{- 1} = & {(P (I_{n} + P^{- 1} E))}^{- 1} \\ = & {(I_{n} + P^{- 1} E)}^{- 1} P^{- 1} \\ \approx & (I_{n} - P^{- 1} E) P^{- 1} \\ = & P^{- 1} - P^{- 1} E P^{- 1} . \end{matrix}

(391)

To justify the partial derivatives (374) and (375), let us define

F (P, V) : = tr (V P^{- 1} V P^{- 1})

and evaluate its partial derivatives with respect to P and V. The partial derivative of a matrix-to-scalar function may be thought of as the ‘coefficient’ of the linear term arising from an additive perturbation.

Applying such a practical idea to evaluate the partial derivative with respect to the first argument gives:

F (P + E, V) = F (P, V) + {〈\frac{\partial F}{\partial P}, E〉}^{A} + higher order terms,

(392)

where E denotes a matrix (termedperturbation) whose entries are small numbers (such that

∥ E ∥

is very small). In the present case, we have:

\begin{matrix} F (P + E, V) = & tr (V {(P + E)}^{- 1} V {(P + E)}^{- 1}), \\ \approx & tr (V (P^{- 1} - P^{- 1} E P^{- 1}) V (P^{- 1} - P^{- 1} E P^{- 1})) \\ \approx & tr (V P^{- 1} V P^{- 1}) - tr (V P^{- 1} E P^{- 1} V P^{- 1}) - tr (V P^{- 1} V P^{- 1} E P^{- 1}) \\ + higher order terms \\ = & F (P, V) - tr (P^{- 1} V P^{- 1} V P^{- 1} E) - tr (P^{- 1} V P^{- 1} V P^{- 1} E) \\ + higher order terms \\ = & F (P, V) - 2 tr (P^{- 1} V P^{- 1} V P^{- 1} E) + h i g h e r o r d e r t e r m s . \end{matrix}

(393)

Comparing this expression with (392), one obtains the sought result, namely:

\frac{\partial F}{\partial P} = - 2 P^{- 1} V P^{- 1} V P^{- 1} .

(394)

To what concerns the second argument, we may write that:

F (P, V + E) = F (P, V) + {〈\frac{\partial F}{\partial V}, E〉}^{A} + higher order terms,

(395)

where E again denotes a perturbation. In this case, we have:

\begin{matrix} F (P, V + E) = & tr ((V + E) P^{- 1} (V + E) P^{- 1}), \\ = & tr (V P^{- 1} V P^{- 1}) + tr (E P^{- 1} V P^{- 1}) + tr (V P^{- 1} E P^{- 1}) \\ + higher order terms \\ = & F (P, V) + tr (P^{- 1} V P^{- 1} E) + tr (P^{- 1} V P^{- 1} E) \\ + higher order terms \\ = & F (P, V) + 2 tr (P^{- 1} V P^{- 1} E) + higher order terms . \end{matrix}

(396)

Comparing such an expression with (395) yields:

\frac{\partial F}{\partial V} = 2 P^{- 1} V P^{- 1},

(397)

which is the sought result. ■

Let us summarizes a few known formulas about parallel transport. All the formulas given below hold under the provision that the manifolds are endowed with their canonical metrics.

Hypercube: The hypercube endowed with the standard Euclidean metric is a flat space; hence, parallel transport may be realized as a rigid translation. Namely parallel transport is an identity map.
Hypersphere: Parallel transport on the hypersphere $S^{p - 1}$ of the tangent vector $w \in T_{x} S^{p - 1}$ along the geodesic arc $γ_{x, v}$ of an extent t may be computed by [39]:

$P_{γ_{x, v}}^{0 \to t} (w) = [I_{p} + \frac{cos (∥ v ∥ t) - 1}{{∥ v ∥}^{2}} v v^{⊤} - sin (∥ v ∥ t) x v^{⊤}] w .$

(398)

For a reference, readers might want to consult [61]. Starting from this, some mathematical work leads to the following result: given two points $x, y \in S^{p - 1}$ and a vector $w \in T_{x} S^{n - 1}$ , the parallel transport operator $P^{x \to y} : T_{x} S^{p - 1} \to T_{y} S^{p - 1}$ has the following structure:

$P^{x \to y} (w) = [I_{p} - \frac{(x + y) y^{⊤}}{1 + x^{⊤} y}] w,$

(399)

provided that $x^{⊤} y \neq - 1$ (namely, the points x and y are not antipodal one to another).
Special orthogonal group: Parallel transport along a geodesic curve on the special orthogonal group $SO (p)$ may be implemented through the following formula:

$P^{X \to Y} (W) = X \sqrt{X^{⊤} Y} X^{⊤} W \sqrt{X^{⊤} Y} .$

(400)
Stiefel manifold: To the best of this author’s knowledge, there appear to be no closed formulas about parallel transport on a Stiefel manifold $St (n, p)$ . This is one of those cases in which self-parallel transport might be invoked. A self-parallel transport expression, corresponding to the canonical metric, is

$P_{γ_{X, V}}^{0 \to t} (V) = [V - X R^{⊤}] Exp (t [\begin{matrix} X^{⊤} V & - R^{⊤} \\ R & 0_{p} \end{matrix}]) [\begin{matrix} I_{p} \\ 0_{p} \end{matrix}],$

(401)

where R denotes the R-factor of the thin QR factorization of the matrix $(I_{n} - X X^{⊤}) V$ .
Manifold of symmetric, positive-definite (SPD) matrices: Given a geodesic arc $γ_{X, V} : [0, 1] \to S^{+} (p)$ and a tangent vector $W \in T_{X} S^{+} (p)$ , the expression of the parallel transport operator that shifts the tangent vector W along the geodesic arc $γ_{X, V} (t)$ toward the endpoint $t = 1$ reads:

$P_{γ_{X, V}}^{0 \to 1} (W) = Exp (\frac{V X^{- 1}}{2}) W Exp (\frac{X^{- 1} V}{2}) .$

(402)

Some mathematical work leads to the following result: given two points $X, Y \in S^{+} (p)$ and a tangent vector $W \in T_{X} S^{+} (p)$ , the parallel transport operator $P^{X \to Y} : T_{X} S^{+} (p) \to T_{Y} S^{+} (p)$ has the following structure:

$P^{X \to Y} (W) = \sqrt{Y X^{- 1}} W \sqrt{X^{- 1} Y} .$

(403)
Grassmann manifold: Parallel transport of a vector $U \in T_{[X]} Gr (n, p)$ along the geodesic $γ_{[X], V}$ with $V \in T_{[X]} Gr (n, p)$ is given by [33]:

$P_{γ_{[X], V}}^{0 \to t} (U) = ([X B A] [\begin{matrix} - sin (D t) \\ cos (D t) \end{matrix}] A^{⊤} + I_{n} - A A^{⊤}) U,$

(404)

where $A D B^{⊤}$ denotes the compact singular value decomposition of the matrix V.

Let us derive the expression of the Christoffel form of the second kind for the special orthogonal group.

Example 38.

According to the canonical metric for the special orthogonal group

SO (n)

, the fundamental form is

F (R, V) : = tr (V^{⊤} V)

; hence, the geodesic equation stemming from the differential inclusion reads as

\ddot{γ} = γ S

, where S is a symmetric function. Recalling that any geodesic line must satisfy the constraint

γ^{⊤} γ = I_{n}

, deriving this twice with respect to time gives

{\ddot{γ}}^{⊤} γ + 2 {\dot{γ}}^{⊤} \dot{γ} + γ^{⊤} \ddot{γ} = 0 .

(405)

Plugging in the geodesic equation leads to

S = - {\dot{γ}}^{⊤} \dot{γ}

, and hence the complete geodesic equation reads as

\ddot{γ} = - γ {\dot{γ}}^{⊤} \dot{γ}

. Consequently, the restricted Christoffel form of the second kind associated with the special orthogonal group endowed with the canonical metric takes the expression

{\bar{\bar{Γ}}}_{R} (V, V) : = R (V^{⊤} V) .

(406)

The full Christoffel form descends from the polarization formula (358), which leads to

{\bar{\bar{Γ}}}_{R} (V, W) = \frac{1}{2} R (V^{⊤} W + W^{⊤} V) .

(407)

It is interesting to notice that

{\bar{\bar{Γ}}}_{R} (V, W) \in N_{R} SO (n)

. ■

Let us derive the expression of the Christoffel form of the second kind for the Stiefel manifold.

Example 39.

As we have seen, the Stiefel manifold

St (n, p)

may be endowed both with a Euclidean metric and its canonical metric.

In the Euclidean metric, the fundamental form reads as

F (X, V) : = tr (V^{⊤} V)

; hence, the geodesic equation stemming from the differential inclusion reads as

\ddot{γ} = γ S

, where S is a symmetric function. Recalling that any geodesic must satisfy the constraint

γ^{⊤} γ = I_{p}

, similarly to the calculations shown in the previous example, we find that

{\bar{\bar{Γ}}}_{X} (V, V) : = X (V^{⊤} V) .

(408)

The full Christoffel form descends from the polarization formula (358), which leads to

{\bar{\bar{Γ}}}_{X} (V, W) = \frac{1}{2} X (V^{⊤} W + W^{⊤} V) .

(409)

In the canonical metric, the fundamental form reads

F (X, V) : = tr (V^{⊤} (I_{p} - \frac{1}{2} X X^{⊤}) V)

. Calculations of the geodesic equation show that

\begin{matrix} \frac{\partial F}{\partial \dot{γ}} = & 2 (I_{p} - \frac{1}{2} γ γ^{⊤}) \dot{γ}, \\ \frac{\partial F}{\partial γ} = & - \dot{γ} {\dot{γ}}^{⊤} γ . \end{matrix}

(410)

From the first derivative, it follows that

\frac{d}{d t} \frac{\partial F}{\partial \dot{γ}} = 2 (I_{p} - \frac{1}{2} γ γ^{⊤}) \ddot{γ} - (\dot{γ} γ^{⊤} + γ {\dot{γ}}^{⊤}) \dot{γ} .

(411)

From the differential inclusion (153), it hence follows the equation

2 (I_{p} - \frac{1}{2} γ γ^{⊤}) \ddot{γ} = - \dot{γ} {\dot{γ}}^{⊤} γ + \dot{γ} γ^{⊤} \dot{γ} + γ {\dot{γ}}^{⊤} \dot{γ} - γ S,

(412)

where S is a symmetric matrix function to be determined. To solve for the naive derivative

\ddot{γ}

, let us recall that

{(I_{p} - \frac{1}{2} γ γ^{⊤})}^{- 1} = I_{p} + γ γ^{⊤}

. Hence, the above equation may be rewritten as

\ddot{γ} = \dot{γ} γ^{⊤} \dot{γ} + γ {(γ^{⊤} \dot{γ})}^{2} - γ {\dot{γ}}^{⊤} \dot{γ} - γ S,

(413)

where we have repeatedly used the fact that

{\dot{γ}}^{⊤} γ = - γ^{⊤} \dot{γ}

. Plugging the above relationship into the condition (405) leads to the expression

S = 2 {(γ^{⊤} \dot{γ})}^{2}

. Plugging back such an expression into the relationship (413) leads to the geodesic equation

\ddot{γ} + \dot{γ} γ^{⊤} \dot{γ} + γ ({(γ^{⊤} \dot{γ})}^{2} + {\dot{γ}}^{⊤} \dot{γ}) = 0,

(414)

from which stems the restricted Christoffel form of the second kind

{\bar{\bar{Γ}}}_{X} (V, V) = V V^{⊤} X + X V^{⊤} (I_{n} - X X^{⊤}) V .

(415)

Calculations to obtain the full Christoffel form of the second kind lead to the expression

{\bar{\bar{Γ}}}_{X} (V, W) = \frac{1}{2} \{(V W^{⊤} + W V^{⊤}) X + X W^{⊤} (I_{n} - X X^{⊤}) V + X V^{⊤} (I_{n} - X X^{⊤}) W\}

(416)

through the polarization formula. ■

To conclude this subsection, let us prove an interesting result that concerns the parallel transport of a manifold logarithm, which will be invoked later in connection to non-linear control.

Theorem 6.

Given two points

x, y \in M

, provided the manifold logarithms

{log}_{x} y

and

{log}_{y} x

exist, it holds that

P^{x \to y} ({log}_{x} y) = - {log}_{y} x .

(417)

Proof.

Let

γ_{x}^{y} : [0, 1] \to M

denote the geodesic arc connecting x to y and let

{\tilde{γ}}_{y}^{x} (t) : = γ_{x}^{y} (1 - t)

denote the associated reversed geodesic. By definition of the logarithmic map, it holds that

{\dot{γ}}_{x}^{y} (0) = {log}_{x} y

. Since

{\tilde{γ}}_{y}^{x}

is a geodesic, it also holds that

{\dot{\tilde{γ}}}_{y}^{x} (0) = {log}_{y} x

. In addition, by the definition of a reversed geodesic, it holds that

{\dot{\tilde{γ}}}_{y}^{x} (0) = - {\dot{γ}}_{x}^{y} (1)

. By the self-parallel transport property of any geodesic, it also follows that

{\dot{γ}}_{x}^{y} (1) = P^{x \to y} ({\dot{γ}}_{x}^{y} (0))

. The assertion is hence proven. □

10.3. Coordinate-Prone Derivation of Parallel Transport*

The basic requirement of parallel transport along a curve is that it ensures the transported vector to be tangent to the curve at any given point. Clearly, this condition by itself is too weak to give rise to a unique solution to the problem of parallel transport. In Riemannian manifold calculus, it is additionally required that parallel transport along a geodesic preserves the inner product between the transported tangent vector and the tangent to the geodesic curve.

Under such a proviso, a parallel-transport rule may be constructed as follows. Let

γ : [0, 1] \to M

denote a geodesic curve on a p-dimensional Riemannian manifold

M

endowed with a metric tensor of components

g_{i j}

. Upon taking a vector

v \in T_{γ (0)} M

, define a vector field

v^{γ} \in Γ (M)

(which might be well-defined only along the curve

γ

and not necessarily outside of it) that denotes the transport of v along the curve

γ

. The tangency requirement is expressed by

v^{γ} (t) \in T_{γ (t)} M

, for every

t \in [0, 1]

. The inner-product preservation property of parallel transport is expressed by requiring that the sought vector field

v^{γ}

meets the formal condition:

{〈 v^{γ} (t), \dot{γ} (t) 〉}_{γ (t)} = constant over the curve γ (t) .

(418)

Using local coordinates

v^{γ} = (v^{1}, v^{2}, \dots, v^{p})

and

γ = (x^{1}, x^{2}, \dots, x^{p})

and differentiating the above equation with respect to the parameter t yields the condition:

\frac{d}{d t} (g_{i j} v^{i} {\dot{x}}^{j}) = 0,

(419)

namely:

\frac{\partial g_{i j}}{\partial x^{k}} {\dot{x}}^{k} {\dot{x}}^{j} v^{i} + g_{i j} {\dot{x}}^{j} {\dot{v}}^{i} + g_{i h} {\ddot{x}}^{h} v^{i} = 0 .

(420)

Now, let us exploit the fact that the components

x^{h}

describe a geodesic arc (namely, a self-parallel curve) on the Riemannian manifold

M

. Accordingly, it holds that

{\ddot{x}}^{h} = - Γ_{k j}^{h} {\dot{x}}^{k} {\dot{x}}^{j}

. Making use of such identity in Equation (420) yields:

((\frac{\partial g_{i j}}{\partial x^{k}} - g_{i h} Γ_{k j}^{h}) {\dot{x}}^{k} v^{i} + g_{i j} {\dot{v}}^{i}) {\dot{x}}^{j} = 0 .

(421)

Calculations with the Christoffel symbols show that the above equation may be rewritten as:

g_{h j} (Γ_{i k}^{h} v^{i} {\dot{x}}^{k} + {\dot{v}}^{h}) {\dot{x}}^{j} = 0 .

(422)

from such relation, one may derive the set of differential equations

{\dot{v}}^{h} = - Γ_{i k}^{h} v^{i} {\dot{x}}^{k} .

(423)

which describe the evolution of the components of the transport field. Solving the above set of non-linear equations under appropriate initial conditions yields a rule to perform parallel transport.

11. Manifold Retraction and Vector Transport

Since the computation of an exponential map on a given manifold may be cumbersome, the notion of retraction is sometimes invoked. A manifold retraction map

R : T M \to M

is a function that satisfies the following requirements [62] and that is easier to compute than an exponential map:

Any restriction $R_{x}$ is defined in some open ball $B (0, ρ_{x})$ of radius $ρ_{x} > 0$ about $0 \in T_{x} M$ and is continuously differentiable;
It holds that $R_{x} (v) = x$ if $v = 0$ ;
Let $v (t) \in B (0, ρ_{x}) \subset T_{x} M$ denote any smooth function on the tangent space $T_{x} M$ , with $t \in [- ϵ, ϵ]$ and $v (0) = 0$ . The curve $y (t) = R_{x} (v (t))$ , for $t \in [- ϵ, ϵ]$ , lies in a neighborhood of $x = y (0)$ . It holds that $\dot{y} = d R_{x} |_{v} (\dot{v})$ where $d R_{x} |_{v} : T_{v} T_{x} M \to T_{R_{x} (v)} M$ denotes a tangent map. Notice that $v (t)$ lies in $T_{x} M$ for every t; hence, the derivative $\dot{v}$ is defined simply as

$\dot{v} (t) : = lim_{h \to 0} \frac{v (t + h) - v (t)}{h} .$

(424)

For $t = 0$ , it holds that $d R_{x} |_{0} : T_{0} T_{x} M \to T_{R_{x} (0)} M$ . Let us identify $T_{0} T_{x} M ≅ T_{x} M$ and let us recall that $T_{R_{x} (0)} M = T_{x} M$ . In order for $R_{x}$ to be a retraction, the map $d R_{x} |_{0}$ must equate the identity map in $T_{x} M$ .

In practice, a retraction

R_{x} (v)

sends a tangent vector

v \in T_{x} M

to a manifold

M ∋ x

into a neighbor of the point x. Any exponential map of a Riemannian manifold is a retraction. Another class of retractions was surveyed in [63].

Manifold retraction is a computationally convenient replacement of exponential map and, according to the definition given above, it behaves as a manifold exponential up to first order. As manifold logarithm is a local inverse of manifold exponential, one might wonder if a local inverse of retraction exists. Such problems have been studied in a number of papers and a possible solution has been codified under the name of a manifold lifting map [64].

Let us examine a few formulas of interest about manifold retraction.

Hypercube: Not surprisingly, the simplest manifold retraction on the hypercube is realized through array addition, namely $R_{x} (v) : = x + v$ .
Hypersphere: Let $x \in S^{p - 1}$ and $v \in T_{x} S^{p - 1}$ . A simple retraction map on the unit hypersphere is

$R_{x} (v) : = \frac{x + v}{∥ x + v ∥},$

(425)

where $∥ \cdot ∥$ denotes a 2-norm. Let us verify that this map meets the three mentioned requirements to be a retraction:
-
Requirements 1 and 2: Easily verified by inspection.
-
Requirement 3: Let $y (t) = \frac{x + v (t)}{{[{(x + v (t))}^{⊤} (x + v (t))]}^{1 / 2}}$ . Deriving both sides with respect to t yields:

$\dot{y} = \frac{\dot{v} ∥ x + v ∥ + (x + v) \frac{1}{2} \frac{\dot{v} (x + v) + {(x + v)}^{⊤} \dot{v}}{∥ x + v ∥}}{{∥ x + v ∥}^{2}} = \frac{\dot{v} {∥ x + v ∥}^{2} + (x + v) {\dot{v}}^{⊤} (x + v)}{{∥ x + v ∥}^{3}} .$

Setting $v = 0$ leads to

$\dot{y} (0) = \frac{\dot{v} (0) {∥ x ∥}^{2} + x {\dot{v}}^{⊤} x}{{∥ x ∥}^{3}} .$

Since $∥ x ∥ = 1$ and $\dot{v} \in T_{x} S^{p - 1}$ , it follows that ${\dot{v}}^{⊤} x = 0$ , and hence the result is proven.
Stiefel manifold: There exist a number of retractions on the Stiefel manifold, which are briefly outlined below.
- Retraction based on QR factorization: In [64], it was shown that one of the retractions that map a tangent vector of $T_{X} St (n, p)$ to $St (n, p)$ is given by:
  
  $R_{X}^{qr} (V) : = qf (X + V),$
  
  (426)
  
  where $qf (\cdot)$ denotes the Q-factor of the thin QR factorization of its $R^{p \times n}$ matrix argument.
- Retraction based on polar factorization: Given a point $X \in St (n, p)$ and a vector $V \in T_{X} St (n, p)$ , the polar-factorization-based retraction may be written as [64]:
  
  $R_{X}^{pol} (V) : = pf (X + V),$
  
  (427)
  
  where $pf (\cdot)$ denotes the polar factor of a given matrix. Such a retraction may be written in closed form. In fact, write $X + V = Q S$ . From the conditions $Q^{⊤} Q = I_{p}$ and $S^{⊤} = S$ , it follows that:
  
  $\begin{matrix} {(X + V)}^{⊤} (X + V) = S^{⊤} Q^{⊤} Q S \Rightarrow X^{⊤} X + X^{⊤} V + V^{⊤} X + V^{⊤} V = S^{2} . \end{matrix}$
  
  Since it holds that $X^{⊤} X = I_{p}$ and $X^{⊤} V + V^{⊤} X = 0$ , and the matrix $I_{p} + V^{⊤} V$ is positive-definite, one obtains $S = {(I_{p} + V^{⊤} V)}^{\frac{1}{2}}$ . From the equality $X + V = Q S$ , the following closed-form expression for the polar factorization-based retraction is obtained:
  
  $R_{X}^{pol} (V) = (X + V) {(I_{p} + V^{⊤} V)}^{- \frac{1}{2}} .$
  
  (428)
- Orthographic retraction map: The paper [65] studies orthographic retractions on submanifolds of Euclidean spaces. In the paper [65], it is proven that, given a pair $(X, V) \in T St (n, p)$ , under the proviso that V is sufficiently close to $0 \in T_{X} St (n, p)$ , there exists a normal array $Z \in N_{X} St (n, p)$ such that the function
  
  $R_{X}^{orth} (V) : = X + V + Z$
  
  (429)
  
  is a retraction on $St (n, p)$ . By the structure of the normal spaces, the orthographic retraction map on the Stiefel manifold $St (n, p)$ reads as:
  
  $R_{X}^{orth} (V) = X + V + X S,$
  
  (430)
  
  provided that there exists a $p \times p$ symmetric matrix S such that $V + X (I_{p} + S) \in St (n, p)$ , namely, such that:
  
  ${(X + V + X S)}^{T} (X + V + X S) = I_{p} .$
  
  (431)
  
  The above equation in the unknown matrix S may be written in plain form as:
  
  $- S^{2} - S (X^{T} V + I_{p}) - (V^{T} X + I_{p}) S - V^{T} V = 0 .$
  
  (432)
  
  The Equation (432) represents an instance of Continuous-time Algebraic Riccati Equation (CARE). The orthographic retraction map (430) may be computed numerically as shown in [64].
Real symplectic group: Possible retractions on a real symplectic group $Sp (2 n)$ are

$R_{Q} (V) : = Q Exp (Q^{- 1} V),$

(433)

whose properties were studied in the contributions [66,67], and the one based on the Cayley map, as explained in [2].
Grassmann manifold: There exist a number of retractions on the Grassmann manifold, which are briefly outlined below.
- Retraction based on QR-factorization: In [40], one of the retractions that map a tangent vector $V \in T_{[X]} Gr (n, p)$ onto $Gr (n, p)$ is given by:
  
  $R_{X}^{qr} (V) : = qf (X + V),$
  
  (434)
  
  where $qf (\cdot)$ denotes again the Q-factor of the thin QR factorization of its $R^{p \times n}$ matrix argument and $X \in St (n, p)$ is a Stiefel representative of the subspace $[X]$ .
- Retraction based on polar factorization: Given a subspace $[X] \in Gr (n, p)$ and a tangent vector $V \in T_{[X]} Gr (n, p)$ , the polar-factorization-based retraction may be written as [40]:
  
  $R_{X}^{pol} (V) : = (X + V) {(I_{p} + V^{⊤} V)}^{- \frac{1}{2}},$
  
  (435)
  
  where $X \in St (n, p)$ is a Stiefel representative of the subspace $[X]$ .

The parallel transport Equation (362) is, in general, difficult to solve, and hence the parallel transport operator might not be available in closed form for manifolds of interest. Approximations of the exact parallel transport are available, as the ‘Schild’s ladder’ construction [68] and the vector transport method [62]. In order to define the notion of vector transport, a smooth manifold

M

is supposed to be embedded into a (possibly large) linear ambient space

A

of appropriate dimension. Upon endowing the linear space

A

with an inner product, it is possible to define an orthogonal projection operator

Π_{x} : A \to T_{x} M

, with

x \in M

. Define the Whitney sum

T M \oplus T M : = {(x, (v, w)) ∣ x \in M, v \in T_{x} M, w \in T_{x} M} .

(436)

Then, the vector transport operator

vt : (x, (v, u)) \in T M \oplus T M \mapsto w \in T_{{exp}_{x} (v)} M

(437)

associated to an exponential map

exp : T M \to M

is defined as:

{vt}_{x, v} (u) : = Π_{{exp}_{x} (v)} (u) .

(438)

In practice, instead of parallel-translating the tangent vector

u \in T_{x} M

along a geodesic arc emanating from the point

x \in M

in the direction

v \in T_{x} M

as

P_{γ_{x, v}}^{0 \to 1} (u)

, vector transport moves rigidly the vector u within

A

up to the point y and then orthogonally projects the vector u over the tangent space

T_{y} M

where

y = {exp}_{x} (v)

.

With a slight abuse of exact terminology, one may refer to vector transport of a vector

u \in T_{x} M

to a tangent space

T_{y} M

by

Π_{y} (u)

for a fixed

y \in M

. In practice, vector transport is based on the following procedure:

Embed the manifold $M$ into a metric ambient space $A$ .
Translate rigidly the vector $u \in T_{x} M$ across the ambient space $A \supset M$ to a point $y \in M$ .
Project the vector u into the tangent space $T_{y} M$ by means of a suitable orthogonal projector $Π_{y} : A \to T_{y} M$ .

Vector transport associated with the above procedure is

Π_{y} (u)

, which moves the vector u from

T_{x} M \subset A

to

T_{y} M

.

Depending on the manifold structure, vector transport might result to be much less expensive to compute than exact parallel transport. As a drawback, vector transport does not enjoy some of the fundamental properties of parallel transport; for instance, vector transport does not realize an isometry nor a conformal transformation. Isometry may be recovered through an appropriate normalization, though.

12. Control Systems on Manifolds and Numerical Implementation

This section presents an instance of error feedback control of first-order systems on manifold and their numerical implementation, with special emphasis to system synchronization. In the present research, we will regard synchronization as a goal to be achieved by non-linear proportional-type control. We shall see that the design of a synchronizing controller may be effected through the classical notion of Lyapunov function and we shall further introduce the notion of control effort to quantify the energy consumption associated with a control action.

12.1. Synchronization of First-Order Dynamical Systems via Feedback Control

Synchronization is meant primarily to make two identical (or twin) dynamical systems synchronize their dynamics overt time, provided that their initial states differ from one another and that one of them is able to access the state of the other. In order not to restrict how far such two initial states may lay apart, the state space

M

is assumed to be a geodesically complete path-connected manifold. In principle, the restriction that the two systems to synchronize must be identical is not necessary, as long as their mathematical models insist on the same state manifold.

In a system pair, the independent dynamical system will be referred to as the leader, while the controlled dynamical system will be referred to as the follower. In this section, we shall assume that the leader and the follower differ from one another, whereby the case of identical systems will follow as a special (although most meaningful) case.

We shall cover in this paper only the case that the leader is represented by a first order, non-autonomous dynamical system on manifold, described by the tangent-bundle differential equation

{\dot{x}}^{ℓ} (t) = f^{ℓ} (t, x^{ℓ} (t)), with x^{ℓ} (0) = x_{0}^{ℓ} \in M, t \geq 0,

(439)

where

x^{ℓ} \in M

denotes the leader’s state variable and

f^{ℓ} : R \times M \to T M

denotes a possibly time-dependent state-transition operator. Likewise, the controlled follower is described by

\dot{x} (t) = f (t, x (t)) + u (t), with x (0) = x_{0} \in M, t \geq 0,

(440)

where

x \in M

denotes the follower’s state variable,

f : R \times M \to T M

denotes a state-transition operator of the follower,

u \in T M

denotes a tangent-bundle control field (in particular,

u (t) \in T_{x (t)} M

) and, in general, the initial state

x_{0} \in M

differs from the initial state

x_{0}^{ℓ} \in M

.

The control field that will drive the follower to synchronize to the leader is defined on the basis of an instantaneous (non-delayed) distance-minimization design. Assume the state manifold

M

to be endowed with a specific metric and hence a distance function

d : M \times M \to R

and a logarithmic map

log : M \times M \to T M

. Let us define the function

D (t) : = \frac{1}{2} d^{2} (x^{ℓ} (t), x (t)) = \frac{1}{2} {〈 {log}_{x^{ℓ} (t)} (x (t)), {log}_{x^{ℓ} (t)} (x (t)) 〉}_{x^{ℓ} (t)},

(441)

which is proportional to the squared Riemannian distance between the state of the leader and the state of the follower. The following result shows how to define a distance minimizing control action.

Theorem 7.

The control field

u : = P^{x^{ℓ} \to x} (f^{ℓ} (t, x^{ℓ})) - f (t, x) + c {log}_{x} (x^{ℓ}),

(442)

with

c > 0

, minimizes the function

D (t)

asymptotically.

Proof.

The derivative of the function

D

with respect to the time-parameter t reads

\begin{matrix} \dot{D} = & {〈 - {log}_{x^{ℓ}} (x), {\dot{x}}^{ℓ} 〉}_{x^{ℓ}} + {〈 - {log}_{x} (x^{ℓ}), \dot{x} 〉}_{x} \\ = & - {〈 {log}_{x^{ℓ}} (x), f^{ℓ} (t, x^{ℓ}) 〉}_{x^{ℓ}} - {〈 {log}_{x} (x^{ℓ}), f (t, x) + u 〉}_{x} . \end{matrix}

(443)

The above expression appears as a sum of two terms referring to two different tangent spaces. It is convenient to move calculations to only one tangent space. We shall take, as the tangent space of reference, the one attached to the state of the follower. Assuming that parallel transport realizes a conformal isometry (see Section 10), it holds that

{〈 {log}_{x^{ℓ}} (x), f^{ℓ} (t, x^{ℓ}) 〉}_{x^{ℓ}} = {〈 P^{x^{ℓ} \to x} ({log}_{x^{ℓ}} (x)), P^{x^{ℓ} \to x} (f^{ℓ} (t, x^{ℓ})) 〉}_{x} .

(444)

In addition, by Theorem 6, it holds that

P^{x^{ℓ} \to x} ({log}_{x^{ℓ}} (x)) = - {log}_{x} (x^{ℓ})

because parallel transport and manifold logarithm are referred to the same geodesic line. Therefore, the expression (443) may be recast as

\begin{matrix} \dot{D} = & {〈 {log}_{x} (x^{ℓ}), P^{x^{ℓ} \to x} (f^{ℓ} (t, x^{ℓ})) 〉}_{x} - {〈 {log}_{x} (x^{ℓ}), f (t, x) + u 〉}_{x} \\ = & {〈 {log}_{x} (x^{ℓ}), P^{x^{ℓ} \to x} (f^{ℓ} (t, x^{ℓ})) - f (t, x) - u 〉}_{x}, \end{matrix}

(445)

thanks to the linearity of the inner product. Let us set the control field u so that the sum

P^{x^{ℓ} \to x} (f^{ℓ} (t, x^{ℓ})) - f (t, x) - u

in (445) is proportional to

- {log}_{x} (x^{ℓ})

, which leads to the expression (442). The choice that led to the control field (442) implies that the function

D

satisfies

\dot{D} = - c {〈 {log}_{x} (x^{ℓ}), {log}_{x} (x^{ℓ}) 〉}_{x} = - c {∥ {log}_{x} (x^{ℓ}) ∥}_{x}^{2} = - 2 c D .

(446)

Since the inner product defines a positive-definite local norm, the above equation entails the inequality

\dot{D} \leq 0

at any time, which implies asymptotic synchronization. Equality may hold only when

{log}_{x} (x^{ℓ}) = 0

, which happens when the follower and the leader are perfectly synchronized. □

In the expression (442), the constant c takes the meaning of a communication strength between the leader and the follower. In addition, one might notice that the differential Equation (446) may be solved for

D

and gives

D (t) \propto e^{- 2 c t}

; hence, synchronization happens with exponential time speed (fast at the beginning, then slower). Such a result is completely independent of the metric that the state manifold

M

was endowed with and the speed of synchronization depends only on the constant c.

Let us consider, as an example, a simpler and more familiar case.

Example 40.

As a special case, when

M = R^{3}

endowed with the standard Euclidean metric, the parallel transport operator is simply the identity map, while

{log}_{x} y = y - x

. Therefore, the criterion function

D

takes the shape of a quadratic error, namely

D = \frac{1}{2} e^{⊤} e

, where

e : = x - x^{ℓ}

, discussed in [69], and the corresponding control field takes the expression

u = f^{ℓ} (t, x^{ℓ}) - f (t, x) + c (x^{ℓ} - x)

, which coincides with the control field discussed in the paper [69].

The freedom in the choice of an inner product accounts for the generalization considered in [69] that consists of considering a Lyapunov function

D : = \frac{1}{2} e^{⊤} P e

with P being a symmetric, positive-definite weighting matrix. Such a weighting matrix seems to be absent from our formulation of a quadratic error (441), yet it is ‘hidden’ in such an expression. In fact, invoking once again an ambient space

A

that the state space is embedded within and a metric

{〈 \cdot, \cdot 〉}^{A}

, we may rewrite the quadratic error (441) as

D = \frac{1}{2} 〈 {log}_{x} (x^{ℓ}), G_{x} {({log}_{x} (x^{ℓ}) 〉}^{A},

(447)

and hence, by comparison, we see that the synchronization error reads

e : = {log}_{x} (x^{ℓ})

and that the role of the weighting matrix P is played by the metric kernel G. ■

In control theory, it is customary to associate a scalar index, the control effort, to the control field [70]. In the context of systems on manifold, we define the control effort

σ : R \to R

as

σ (t) : = {〈 u (t), u (t) 〉}_{x (t)} = {∥ u (t) ∥}_{x (t)}^{2} .

(448)

Whenever the follower system and the leader system are twins, namely their state-transition functions are identical

f^{ℓ} \equiv f

, under mild continuity conditions on such state-transition functions the control effort vanishes asymptotically to zero. In fact, by the continuity of the state-transition function, it follows that

P^{x^{ℓ} \to x} (f (t, x^{ℓ})) - f (t, x)

approaches 0 as

x (t)

approaches

x^{ℓ} (t)

. Such a conclusion holds no longer true if the follower and leader systems’ state-transition functions differ from one another (namely, whenever

f^{ℓ} \neq f

). It is worth underlining that the state-transition functions of the leader and of the follower systems, namely

f^{ℓ}

and f, may differ to one another even when these systems are identical since the information on the leader’s state as acquired by the follower may be affected by measurement noise, namely, it might hold that

f (t, \cdot) = f^{ℓ} (t, \cdot) + ν (t)

, where

ν (t)

denotes a measurement disturbance.

Another interesting observation concerns the speed of synchronization in connection to control effort saving. The constant c in the expression (442) influences the speed of synchronization, namely, the larger c, the speedier the synchronization. However, it is easy to see from the definition (448) that the constant c also affects the control effort, namely, the larger c, the more expensive the control action. Clearly, the proportional control constant c should be chosen as a tradeoff between speed and cost.

12.2. Numerical Methods to Simulate First-Order Systems

In the present subsection we shall recall from the specialized literature the notions of forward Euler method and of explicit

4 th

-order Runge–Kutta method to simulate numerically classical first-order, non-linear, non-autonomous systems. In addition, in the body of the present section we shall lay out extensions of such numerical methods to implement on a computing platform those mathematical dynamical systems insisting on curved manifolds. The main ideas to achieve such an extension may be summarized as follows:

Extension of straight stepping: Classical numerical methods tailored to

R^{n}

advance the solution from one step to the next by moving a system’s state along straight segments. On curved state manifolds, the notion of ‘straight segment’ needs to be replaced by the notion of ‘geodesic arc’, and henceforth additive stepping needs to be replaced by exponential-map-based stepping.

Extension of linear stages: High-order methods in

R^{n}

select a stepping direction as a linear combination of a number of estimations of a vector field (in the Runge–Kutta methods, these estimations are called ‘stages’). On curved spaces, moving directions are tangent vectors that cannot be combined together directly because they belong to different tangent spaces. Such tangent directions need to be ‘aligned’ together in a given tangent space by means of parallel transport and then combined together.

The forward Euler scheme (fEul) is perhaps the simplest numerical scheme known in the scientific literature to tackle an initial value problem. On a Euclidean space

R^{n}

, the forward Euler method to simulate numerically a dynamical system

\dot{x} (t) = f (t, x (t))

reads as

x_{k + 1} = x_{k} + T f (k T, x_{k}), k = 0, 1, 2, 3, \dots, K,

(449)

where k denotes a step counter ranging from 0 to a given integer

K > 0

, T denotes a time-discretization interval (generally,

T ≪ 1

) and

x_{k}

denotes an approximation to the true state

x (k T)

. The accuracy and numerical stability of this method turn out to be reasonable as long as the state-transition function meets certain conditions [71].

This Euler method moves forward the current state

x_{k}

to the next state

x_{k + 1}

along a straight line directed toward

f (k T, x_{k})

of a fraction specified by T. Since curved manifolds admits no straight lines in the sense of Euclidean geometry, a plain forward Euler method is inherently unsuitable to cope with a tangent-bundle differential equation, as exemplified in the following.

Example 41.

To tackle the problem that arises about the numerical implementation of dynamical systems on manifolds, it is worth examining an explanatory example based on the low-dimensional manifold

SO (2)

. As already outlined, by embedding the space

SO (2)

into the space

A : = R^{2 \times 2}

, any element of

SO (2)

may be regarded as a 2-by-2 real-valued matrix

X = [\begin{matrix} X_{11} & X_{12} \\ X_{21} & X_{22} \end{matrix}]

whose entries must satisfy the constrains:

X_{11}^{2} + X_{21}^{2} = 1

,

X_{22}^{2} + X_{12}^{2} = 1

,

X_{11} X_{12} + X_{21} X_{22} = 0

and

X_{11} X_{22} - X_{12} X_{21} = 1

.

Let us consider the first-order dynamical system

\dot{X} = f (X)

on the manifold

SO (2)

, with

f : X \in SO (2) \mapsto f (X) \in T_{X} SO (2)

. Such a dynamical system may be written as a set of four differential equations of the type

{\dot{X}}_{i j} (t) = f_{i j} (X_{11} (t), X_{12} (t), X_{21} (t), X_{22} (t))

. In the present case, it holds that

f (X) = X Ω

with

Ω = [\begin{matrix} 0 & ω \\ - ω & 0 \end{matrix}]

, where

ω = ω (X (t)) \in R

. The forward Euler stepping technique of numerical calculus to solve the above system of differential equations would read as

X_{k + 1} = X_{k} + T f (X_{k})

, with

T > 0

denoting a step size and

k \geq 0

a step counter. Such a numerical stepping method does not take into account the constraints on the entries of matrix X, namely, it generates a trajectory

k \mapsto X_{k}

in the ambient space

R^{2 \times 2}

rather than in the feasible manifold

SO (2)

. Namely, starting from a point

X_{k} \in SO (2)

, it would yield a new point

X_{k + 1} \notin SO (2)

. The reason of such behavior is that Euler techniques insist on flat spaces and do not cope automatically with curved manifolds.

It is instructive to investigate in detail the effect of an Euler stepping method in the solution of the differential equation

\dot{X} = X Ω

. In such a context, the Euler stepping equation reads:

X_{k + 1} = X_{k} (I_{2} + T Ω_{k}), w i t h Ω_{k} = [\begin{matrix} 0 & ω_{k} \\ - ω_{k} & 0 \end{matrix}], k \geq 0 .

(450)

As the starting point,

X_{0}

satisfies

X_{0}^{⊤} X_{0} = I_{2}

, and it holds that:

X_{1}^{⊤} X_{1} = {(I_{2} + T Ω_{0})}^{⊤} X_{0}^{⊤} X_{0} (I_{2} + T Ω_{0}) = I_{2} - T^{2} Ω_{0}^{2} .

(451)

Notice that

Ω_{k}^{2} = - ω_{k}^{2} I_{2}

, and hence

X_{1}^{⊤} X_{1} = (1 + T^{2} ω_{0}^{2}) I_{2}

. The result of the first step

X_{1}

already lost normality of its columns of an additive amount

T^{2} ω_{0}^{2}

and changed its determinant from 1 to

\sqrt{1 + T^{2} ω_{0}^{2}}

. Since T is generally far smaller than 1, and since the deviation is proportional to

T^{2}

, such a deviation may not be apparent from the first steps, yet progressively detrimental. However, the first step keeps the orthogonality of the columns of the matrix X (such a result is peculiar of the case

SO (2)

only and does not copy to the general case

SO (n)

with

n > 2

). For the next step, it holds that:

\begin{matrix} X_{2}^{⊤} X_{2} = & {(I_{2} + T Ω_{1})}^{⊤} X_{1}^{⊤} X_{1} (I_{2} + T Ω_{1}) \\ = & (1 + T^{2} ω_{0}^{2}) (I_{2} - T^{2} Ω_{1}^{2}) \\ = & (1 + T^{2} ω_{0}^{2}) (1 + T^{2} ω_{1}^{2}) I_{2} . \end{matrix}

(452)

By induction, it is readily verified that the matrix

X_{k}

keeps monotonically loosing the normality of its two columns of an identical amount. ■

The forward Euler method (449) can be extended to a smooth manifold by replacing the notion of straight line with the notion of geodesic line, to give

x_{k + 1} = {exp}_{x_{k}} (T f (k T, x_{k})), k = 0, 1, 2, 3, \dots, K .

(453)

The generalization from (449) to (453) is conceived as follows. On a curved state manifold

M

, each point

x_{k}

belongs to

M

while each quantity

f (k T, x_{k})

is a tangent vector in

T_{x_{k}} M

: these two quantities cannot be combined in an additive way, because

x_{k} + T f (k T, x_{k}) \notin M

, rather, such quantities are combined by the help of the exponential map, which describes a geodesic arc departing from the state

x_{k}

in the tangent direction

f (k T, x_{k})

. The above stepping method allows one to compute the first K discrete points of the trajectory generated by the dynamical system

\dot{x} = f (t, x)

, with

x \in M

, given the initial state

x_{0} \in M

. For a Euclidean state space

M = R^{n}

, it holds that

{exp}_{x} (v) = x + v

; therefore, the scheme (453) is apparently a generalization of the well-known Euler scheme (449) from Euclidean to curved spaces.

The set of equations to simulate on a computing platform a leader system and a controlled follower system by the fEul method on a manifold are hence laid out as follows

\{\begin{matrix} x_{k + 1}^{ℓ} = {exp}_{x_{k}^{ℓ}} (T f^{ℓ} (k T, x_{k}^{ℓ})), & Free leader \\ u_{k} = P^{x_{k}^{ℓ} \to x_{k}} (f^{ℓ} (k T, x_{k}^{ℓ})) - f (k T, x_{k}) + c {log}_{x_{k}} (x_{k}^{ℓ}), & Controller \\ x_{k + 1} = {exp}_{x_{k}} (T (f (k T, x_{k}) + u_{k})), & Controlled follower \\ k = 0, 1, 2, 3, \dots, K . \end{matrix}

(454)

For the sake of comparison, we notice that, in the case of a leader-follower pair evolving on a Euclidean state space

M = R^{n}

, the above Equation (454) would simplify into

\{\begin{matrix} x_{k + 1}^{ℓ} = x_{k}^{ℓ} + T f^{ℓ} (k T, x_{k}^{ℓ}), & Free leader \\ u_{k} = f^{ℓ} (k T, x_{k}^{ℓ}) - f (k T, x_{k}) + c (x_{k}^{ℓ} - x_{k}), & Controller \\ x_{k + 1} = x_{k} + T (f (k T, x_{k}) + u_{k}), & Controlled follower \\ k = 0, 1, 2, 3, \dots, K, \end{matrix}

(455)

because

x_{k}^{ℓ}, x_{k} \in R^{n}

and

f, f^{ℓ} : R \times R^{n} \to R^{n}

.

A second class of methods that we would like to recall is the explicit, fourth-order Runge–Kutta algorithm. The family of Runge–Kutta numerical integration methods on Euclidean spaces

R^{n}

was conceived in order to increase the precision of lower-order methods, such as the Euler method [72]. The explicit 4th-order Runge–Kutta method (eRK4), is based on four partial increments (stages) that, combined together, lead to a complete step:

\{\begin{matrix} h_{1, k} : = f (k T, x_{k}), \\ h_{2, k} : = f (k T + \frac{1}{2} T, x_{k} + \frac{1}{2} T h_{1, k}), \\ h_{3, k} : = f (k T + \frac{1}{2} T, x_{k} + \frac{1}{2} T h_{2, k}), \\ h_{4, k} : = f ((k + 1) T, x_{k} + T h_{3, k}), \\ x_{k + 1} = x_{k} + T (\frac{h_{1, k}}{6} + \frac{h_{2, k} + h_{3, k}}{3} + \frac{h_{4, k}}{6}), \\ k = 0, 1, 2, 3, \dots, K . \end{matrix}

(456)

Similar to the Euler method, the eRK4 method moves forward the current state

x_{k}

to the next state

x_{k + 1}

along a straight line toward a specific direction, except that such a direction is computed in a more convoluted way.

The eRK4 method may be extended to a curved manifold by appropriately converting each of the equations in (456). Such a conversion needs to take into account that the state space

M

is now a curved manifold:

\{\begin{matrix} h_{1, k} : = f (k T, x_{k}), \\ x_{k + 1 / 4} = {exp}_{x_{k}} (\frac{1}{2} T h_{1, k}), \\ h_{2, k} : = f (k T + \frac{1}{2} T, x_{k + 1 / 4}), {\tilde{h}}_{2, k} = P^{x_{k + 1 / 4} \to x_{k}} (h_{2, k}), \\ x_{k + 1 / 2} = {exp}_{x_{k}} (\frac{1}{2} T {\tilde{h}}_{2, k}), \\ h_{3, k} : = f (k T + \frac{1}{2} T, x_{k + 1 / 2}), {\tilde{h}}_{3, k} = P^{x_{k + 1 / 2} \to x_{k}} (h_{3, k}), \\ x_{k + 3 / 4} = {exp}_{x_{k}} (T {\tilde{h}}_{3, k}), \\ h_{4, k} : = f ((k + 1) T, x_{k + 3 / 4}), {\tilde{h}}_{4, k} = P^{x_{k + 3 / 4} \to x_{k}} (h_{4, k}), \\ x_{k + 1} = {exp}_{x_{k}} (\frac{1}{6} T h_{1, k} + \frac{1}{3} T ({\tilde{h}}_{2, k} + {\tilde{h}}_{3, k}) + \frac{1}{6} T {\tilde{h}}_{4, k}), \\ k = 0, 1, 2, 3, \dots, K . \end{matrix}

(457)

(The notation

x_{k + 1 / 4}

,

x_{k + 1 / 2}

, …, used to indicate intermediate steps, is standard in numerical calculus.) The reasoning that led to this numerical scheme is outlined as follows: Once a direction

h_{1, k}

is obtained in the first stage, it is used to determine the direction

h_{2, k}

in the second stage; the formula to compute

h_{2, k}

in (456) prescribes to evaluate the vector field

f (\cdot, x)

in a point

x_{k} + \frac{1}{2} T h_{1, k}

; this last expression, however, cannot be applied directly on a curved manifold and needs to be replaced by

{exp}_{x_{k}} (\frac{1}{2} T h_{1, k})

, which gives the expression

h_{2, k}

in (457); even so, the obtained result cannot be used directly, since the update rule in (456), which prescribes to advance the value

x_{k}

by

x_{k + 1} = x_{k} + T (\frac{1}{6} h_{1, k} + \frac{1}{3} h_{2, k} + \dots)

, needs to be translated into

x_{k + 1} = {exp}_{x_{k}} (\frac{1}{6} T h_{1, k} + \frac{1}{3} T h_{2, k} + \dots)

which, in turn, requires the vector

h_{2, k}

to belong to

T_{x_{k}} M

, while it belongs to

T_{x_{k + 1 / 4}} M

; for this reason, it is necessary to make a further modification and to parallel transport the vector

h_{2, k}

to

T_{x_{k}} M

, namely, to compute

{\tilde{h}}_{2, k} = P^{x_{k + 1 / 4} \to x_{k}} (h_{2, k})

; the same holds for the subsequent stages.

The complete set of equations required to implement numerically a leader and a controlled follower by the eRK4 method on a manifold are

\{\begin{matrix} \begin{matrix} h_{1, k}^{ℓ} : = f^{ℓ} (k T, x_{k}^{ℓ}), \\ x_{k + 1 / 4}^{ℓ} = {exp}_{x_{k}^{ℓ}} (\frac{1}{2} T h_{1, k}^{ℓ}), \\ h_{2, k}^{ℓ} : = f^{ℓ} (k T + \frac{1}{2} T, x_{k + 1 / 4}^{ℓ}), {\tilde{h}}_{2, k}^{ℓ} = P^{x_{k + 1 / 4}^{ℓ} \to x_{k}^{ℓ}} (h_{2, k}^{ℓ}), \\ x_{k + 1 / 2}^{ℓ} = {exp}_{x_{k}^{ℓ}} (\frac{1}{2} T {\tilde{h}}_{2, k}^{ℓ}), \\ h_{3, k}^{ℓ} : = f^{ℓ} (k T + \frac{1}{2} T, x_{k + 1 / 2}^{ℓ}), {\tilde{h}}_{3, k}^{ℓ} = P^{x_{k + 1 / 2}^{ℓ} \to x_{k}^{ℓ}} (h_{3, k}^{ℓ}), \\ x_{k + 3 / 4}^{ℓ} = {exp}_{x_{k}^{ℓ}} (T {\tilde{h}}_{3, k}^{ℓ}), \\ h_{4, k}^{ℓ} : = f^{ℓ} ((k + 1) T, x_{k + 3 / 4}^{ℓ}), {\tilde{h}}_{4, k}^{ℓ} = P^{x_{k + 3 / 4}^{ℓ} \to x_{k}^{ℓ}} (h_{4, k}^{ℓ}), \\ x_{k + 1}^{ℓ} = {exp}_{x_{k}^{ℓ}} (\frac{1}{6} T h_{1, k}^{ℓ} + \frac{1}{3} T ({\tilde{h}}_{2, k}^{ℓ} + {\tilde{h}}_{3, k}^{ℓ}) + \frac{1}{6} T {\tilde{h}}_{4, k}^{ℓ}) \end{matrix}\} & Leader \\ u_{k} = P^{x_{k}^{ℓ} \to x_{k}} (f^{ℓ} (k T, x_{k}^{ℓ})) - f (k T, x_{k}) + c {log}_{x_{k}} (x_{k}^{ℓ}), & Controller \\ \begin{array}{l} h_{1, k} : = f (k T, x_{k}), \\ x_{k + 1 / 4} = {exp}_{x_{k}} (\frac{1}{2} T h_{1, k}), \\ h_{2, k} : = f (k T + \frac{1}{2} T, x_{k + 1 / 4}), {\tilde{h}}_{2, k} = P^{x_{k + 1 / 4} \to x_{k}} (h_{2, k}), \\ x_{k + 1 / 2} = {exp}_{x_{k}} (\frac{1}{2} T {\tilde{h}}_{2, k}), \\ h_{3, k} : = f (k T + \frac{1}{2} T, x_{k + 1 / 2}), {\tilde{h}}_{3, k} = P^{x_{k + 1 / 2} \to x_{k}} (h_{3, k}), \\ x_{k + 3 / 4} = {exp}_{x_{k}} (T {\tilde{h}}_{3, k}), \\ h_{4, k} : = f ((k + 1) T, x_{k + 3 / 4}), {\tilde{h}}_{4, k} = P^{x_{k + 3 / 4} \to x_{k}} (h_{4, k}), \\ x_{k + 1} = {exp}_{x_{k}} (\frac{1}{6} T h_{1, k} + \frac{1}{3} T ({\tilde{h}}_{2, k} + {\tilde{h}}_{3, k}) + \frac{1}{6} T {\tilde{h}}_{4, k} + T u_{k}), \end{array}\} & Follower \\ k = 0, 1, 2, 3, \dots, K . \end{matrix}

(458)

The implementation of the above equations on a computing platform does not pose serious concerns as long as the chosen development language possesses adequate commands to deal seamlessly with arrays and array-type functions.

13. Riemannian Hessian of a Manifold-to-Scalar Function

The next step in Taylor series approximation to a manifold-to-scalar function beyond the first-order term (through gradient) is the second-order termed based on the notion of ‘Hessian’.

13.1. Definition, Properties and Coordinate-Free Calculation of Riemannian Hessian

Given a differentiable function

f : M \to R

, its Riemannian Hessian at a point

x \in M

is a linear operator

H_{x} f : T_{x} M \to T_{x} M

that appears in the quadratic term of the Taylor approximation

f (γ_{x, v} (t)) = f (x) + {〈 {grad}_{x} f, v 〉}_{x} t + \frac{1}{2} {〈 H_{x} f (v), v 〉}_{x} t^{2} + O (t^{3}),

(459)

where

γ_{x, v} : [- ϵ, ϵ] \to M

denotes, in principle, any smooth curve such that

γ_{x, v} (0) = x

and

{\dot{γ}}_{x, v} (0) = v

. A major problem with such liberality in the choice of a curve is that, since clearly

{〈 H_{x} (v), v 〉}_{x} = {\frac{d^{2} f (γ_{x, v} (t))}{d t^{2}}|}_{t = 0},

(460)

the Hessian operator would not just depend on the pair

(x, v)

but also on

{\ddot{γ}}_{x, v} (0)

, which is not prescribed. A Hessian that depends on the shape of the curve

γ

which, after all, should be instrumental—not determinant—is hardly acceptable in practice, which is why the notion of a Hessian operator is generally defined with respect to a geodesic

γ_{x, v}

. It is worth pointing out that, according to the ‘algebraic’ argument recalled in Observation 1, since the Hessian stems from a quadratic form, only its self-adjoint part plays a role. Indeed, since its very definition stems from a quadratic form, only its self-adjoint component may be determined.

Not surprisingly, the Riemannian Hessian on a manifold

M

embedded in an ambient space

A

is related to the ambient Hessian in

A

. Still, there is a perhaps surprising outcome in its calculation, namely, it depends on the ambient gradient of f (while the standard Hessian does not). Let us recall what is meant by ambient Hessian

\partial_{x}^{2} f

of a function

f : A \to R

, that is

\partial_{x}^{2} f (v) : = lim_{h \to 0} \frac{\partial_{ρ_{x, v} (h)} f - \partial_{x} f}{h} = {\frac{d}{d t} \partial_{ρ_{x, v} (t)} f|}_{t = 0},

(461)

where

ρ_{x, v} : [- ϵ, ϵ] \to M

denotes any smooth curve such that

ρ_{x, v} (0) = x \in M

and

{\dot{ρ}}_{x, v} (0) = v \in T_{x} M

—for instance, if

f : R^{p} \to R

, then

\partial_{x}^{2} f

takes the form of a

p \times p

matrix of partial derivatives and

\partial_{x}^{2} f (v)

just denotes the matrix-to-column-array product

(\partial_{x}^{2} f) v

.

To carry out the calculation of the Riemannian Hessian, let us recall the relationship (301) that may be written as

\frac{d}{d t} f (γ_{x, v} (t)) = {〈 {grad}_{γ_{x, v} (t)} f, {\dot{γ}}_{x, v} (t) 〉}_{γ_{x, v} (t)} .

(462)

Introducing the metric kernel and the inner product on the ambient space gives:

\frac{d}{d t} f (γ_{x, v} (t)) = {〈 G_{γ_{x, v} (t)} ({grad}_{γ_{x, v} (t)} f), {\dot{γ}}_{x, v} (t) 〉}^{A} .

(463)

Recalling the explicit expression (284) of the Riemannian gradient based on an orthogonal projector

Π

gives

\frac{d}{d t} f (γ_{x, v} (t)) = {〈 Π_{γ_{x, v} (t)} (\partial_{γ_{x, v} (t)} f), {\dot{γ}}_{x, v} (t) 〉}^{A} .

(464)

The right-hand side in the above expression has been written in a way that makes it easier to take a second derivative with respect to the parameter t, which reads as

\frac{d^{2}}{d t^{2}} f (γ_{x, v} (t)) = {〈 \frac{d}{d t} Π_{γ_{x, v} (t)} (\partial_{γ_{x, v} (t)} f), {\dot{γ}}_{x, v} (t) 〉}^{A} + {〈 Π_{γ_{x, v} (t)} (\partial_{γ_{x, v} (t)} f), {\ddot{γ}}_{x, v} (t) 〉}^{A} .

(465)

The first expression on the right-hand side may be written explicitly by introducing a linear operator

Π^{•}

that represents a derivative of the orthogonal projector, while the second term on the right-hand side may be made more explicit by recalling, from the relationships in (356), the relation between the naïve acceleration over a geodesic and the Christoffel form of second kind. Let us define the linear operator

Π_{x}^{•} (a, v) : = {\frac{d}{d t} Π_{ρ_{x, v} (t)} (a)|}_{t = 0},

(466)

where

x \in M

denotes a point where projection is carried out,

a \in A

is an element of the ambient space whose projection is sought,

v \in T_{x} M

is a tangent direction along which a variation of the projected vector

Π_{x} (a)

is sought to be estimated and

ρ_{x, v} : [- ϵ, ϵ] \to M

is any smooth curve such that

ρ_{x, v} (0) = x

and

{\dot{ρ}}_{x, v} (0) = v

.

To clarify the above definition, let us examine a useful computation rule.

Example 42.

Let us define

g : M \to A

as an ambient-valued function of a manifold-valued argument and a smooth curve

γ : R \to M

. Let us now consider the composition

Π_{γ (t)} (g (γ (t)))

. We aim at calculating its derivative with respect to the parameter t. It holds that

\frac{d}{d t} Π_{γ} (g (γ)) = Π_{γ}^{•} (g (γ), \dot{γ}) + Π_{γ} (\partial_{γ} g (\dot{γ})),

(467)

which is a linear function of

\dot{γ}

. ■

On the basis of the above calculation rule, of interest in the following developments, we can prove a collateral property of the map

Π^{•}

.

Example 43.

The map

Π^{•}

applied totwotangent vectors returns a normal element. In fact, let us recall that

Π_{x} (Π_{x} (a)) = Π_{x} (a)

for every

x \in M

and

a \in A

. On any smooth curve

γ : [- ϵ, ϵ] \to M

such that

γ (0) = x

and

\dot{γ} (0) = v

it holds that

Π_{γ} (Π_{γ} (a)) = Π_{γ} (a) .

(468)

Deriving with respect to the parameter t and applying the computation rule (467), yields

Π_{γ}^{•} (Π_{γ} (a), \dot{γ}) + Π_{γ} (Π_{γ}^{•} (a, \dot{γ})) = Π_{γ}^{•} (a, \dot{γ}) .

(469)

Setting

t = 0

yields:

Π_{x}^{•} (Π_{x} (a), v) + Π_{x} (Π_{x}^{•} (a, v)) = Π_{x}^{•} (a, v) .

(470)

Now we can set

w : = Π_{x} (a) \in T_{x} M

and write

Π_{x}^{•} (w, v) = Π_{x}^{•} (a, v) - Π_{x} (Π_{x}^{•} (a, v)) .

(471)

The term on the right-hand side represents the normal component of

Π_{x}^{•} (a, v)

, and hence

Π_{x} (Π_{x}^{•} (w, v)) = 0

. ■

On the basis of the definition (337), the second derivative (465) may be rewritten as

\begin{matrix} \frac{d^{2}}{d t^{2}} f (γ_{x, v} (t)) = & {〈 Π_{γ_{x, v} (t)}^{•} (\partial_{γ_{x, v} (t)} f, {\dot{γ}}_{x, v} (t)), {\dot{γ}}_{x, v} (t) 〉}^{A} \\ + {〈 Π_{γ_{x, v} (t)} (\frac{d}{d t} \partial_{γ_{x, v} (t)} f), {\dot{γ}}_{x, v} (t) 〉}^{A} \\ - {〈 Π_{γ_{x, v} (t)} (\partial_{γ_{x, v} (t)} f), {\bar{\bar{Γ}}}_{γ_{x, v} (t)} ({\dot{γ}}_{x, v} (t), {\dot{γ}}_{x, v} (t)) 〉}^{A} . \end{matrix}

(472)

Setting

t = 0

leads to

{〈 H_{x} f (v), v 〉}_{x} = {〈 Π_{x}^{•} (\partial_{x} f, v), v 〉}^{A} + {〈 Π_{x} (\partial_{x}^{2} f (v)), v 〉}^{A} - {〈 Π_{x} (\partial_{x} f), {\bar{\bar{Γ}}}_{x} (v, v) 〉}^{A} .

(473)

As a last step, recall that, in the rightmost term, it is possible to swap one instance of v with the term

Π_{x} (\partial_{x} f)

; hence, we write

{〈 G_{x} (H_{x} f (v)), v 〉}^{A} = {〈 Π_{x}^{•} (\partial_{x} f, v), v 〉}^{A} + {〈 Π_{x} (\partial_{x}^{2} f (v)), v 〉}^{A} - {〈 {\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} f), v), v 〉}^{A},

(474)

which must hold for every

v \in T_{x} M

. The last expression is equivalent to

G_{x} (H_{x} f (v)) - Π_{x}^{•} (\partial_{x} f, v) - Π_{x} (\partial_{x}^{2} f (v)) + {\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} f), v) \in N_{x} M .

(475)

An explicit expression of the Riemannian Hessian may now be obtained by applying the orthogonal projector

Π_{x}

to both sides and then the inverse metric kernel to the result, which ultimately gives

H_{x} f (v) = G_{x}^{- 1} (Π_{x} (\partial_{x}^{2} f (v))) + G_{x}^{- 1} (Π_{x} (Π_{x}^{•} (\partial_{x} f, v))) - G_{x}^{- 1} (Π_{x} ({\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} f), v))) .

(476)

Notice that, unless the difference between the last two terms is zero, the Riemannian Hessian depends on the ambient gradient

\partial_{x} f

.

It is worth examining separately a few special cases in which the expression of the Riemannian Hessian simplifies:

Case that $Π_{x}$ is independent of x: This case occurs when tangent spaces coincide to one another. In this case, the projection operator may simply be denoted as $Π$ and it holds that $Π^{•} \equiv 0$ ; therefore,

$H_{x} f (v) = G_{x}^{- 1} (Π (\partial_{x}^{2} f (v))) - G_{x}^{- 1} (Π ({\bar{\bar{Γ}}}_{x} (Π (\partial_{x} f), v))) .$

(477)
Case that the metric kernel coincides with the identity: This is the case most generally covered in the literature that occurs when the ambient space $A$ is a Euclidean space. In this case

$H_{x} f (v) = Π_{x} (Π_{x}^{•} (\partial_{x} f, v)) + Π_{x} (\partial_{x}^{2} f (v)) - Π_{x} ({\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} f), v)) .$

(478)

This case was explicitly covered in [73], in which the expression of the Hessian was given in terms of Weingarten map. The importance of the Weingarten map in applied sciences was further highlighted in [74].
Case that the gradient of the function f is null: In a point $x \in M$ where ${grad}_{x} f = 0$ the expression of the Hessian simplifies noticeably. In fact the last two terms in (476) vanish to zero; therefore, it holds that

$H_{x} f (v) = (G_{x}^{- 1} \circ Π_{x}) (\partial_{x}^{2} f (v)) .$

(479)

This is in fact the only case in which the Hessian does not depend on the Christoffel form.
Case of the manifold $R^{p}$ endowed with a Euclidean metric: This is the reference case that we may look at for familiarity. In this case, $G_{x} = {id}_{x}$ , $Π_{x} = {id}_{x}$ , ${\bar{Γ}}_{x} = 0$ ; hence,

$H_{x} f (v) = \partial_{x}^{2} f (v),$

(480)

which looks exactly as one expects.

To what concerns the operator

Π^{•}

, we are going to assume that the following commutation property with the ambient metric holds: For every

x \in M

,

a \in A

,

u, v \in T_{x} M

it is true that

{〈 Π_{x}^{•} (a, v), w 〉}^{A} = {〈 Π_{x}^{•} (a, w), v 〉}^{A} .

(481)

Let us go into some detail about the practical computation of the operator

Π^{•}

by surveying an example related to the Stiefel manifold.

Example 44.

The Stiefel manifold embedded in its canonical ambient space endowed with the Euclidean metric has an orthogonal projector defined by the relation (286), namely [33]:

Π_{X} (A) : = A - \frac{1}{2} X (X^{⊤} A + A^{⊤} X) .

(482)

In order to apply the definition (466), let us first take any smooth curve

ρ_{X, V} (t)

, with

X \in St (n, p)

and

V \in T_{X} St (n, p)

, and write

Π_{ρ_{X, V} (t)} (A) : = A - \frac{1}{2} ρ_{X, V} (t) (ρ_{X, V}^{⊤} (t) A + A^{⊤} ρ_{X, V} (t)) .

(483)

Taking the first derivative, side by side, with respect to the parameter t gives:

\begin{matrix} \frac{d}{d t} Π_{ρ_{X, V} (t)} (A) : = & - \frac{1}{2} {\dot{ρ}}_{X, V} (t) (ρ_{X, V}^{⊤} (t) A + A^{⊤} ρ_{X, V} (t)) \\ - \frac{1}{2} ρ_{X, V} (t) ({\dot{ρ}}_{X, V}^{⊤} (t) A + A^{⊤} {\dot{ρ}}_{X, V} (t)) . \end{matrix}

(484)

Now, setting

t = 0

and recalling that

ρ_{X, V} (0) = X

and

{\dot{ρ}}_{X, V} (0) = V

yields

Π_{X}^{•} (A, V) = - \frac{1}{2} X (V^{⊤} A + A^{⊤} V) - \frac{1}{2} V (X^{⊤} A + A^{⊤} X) .

(485)

Let us verify the commutativity property (481). We have

\begin{matrix} {〈 Π_{X}^{•} (A, V), W 〉}^{A} = - \frac{1}{2} tr (W^{⊤} X (V^{⊤} A + A^{⊤} V)) - \frac{1}{2} tr (W^{⊤} V (X^{⊤} A + A^{⊤} X)), \\ {〈 Π_{X}^{•} (A, W), V 〉}^{A} = - \frac{1}{2} tr (V^{⊤} X (W^{⊤} A + A^{⊤} W)) - \frac{1}{2} tr (V^{⊤} W (X^{⊤} A + A^{⊤} X)) . \end{matrix}

(486)

Notice that the first terms on the right-hand sides of both inner products are null, because

W^{⊤} X

and

V^{⊤} X

are skew-symmetric, while the remaining terms may be proven equal by the trace cyclic permutation property and by symmetry. ■

The case of the special orthogonal group is relevant and instructive as well.

Example 45.

The special orthogonal group

SO (n)

embedded into the ambient space

A : = R^{n \times n}

, endowed with the Euclidean metric, admits the orthogonal projection

Π_{R} (A) : = \frac{1}{2} (A - R A^{⊤} R) .

(487)

Let us verify that

Π_{R}

maps any matrix from

A

to the tangent space

T_{R} SO (n)

:

R^{⊤} Π_{R} (A) + Π_{R}^{⊤} (A) R = \frac{1}{2} R^{⊤} (A - R A^{⊤} R) + \frac{1}{2} {(A - R A^{⊤} R)}^{⊤} R = 0,

(488)

by easy matrix calculations. In addition, let us verify that

A - Π_{R} (A) \in N_{R} SO (n)

:

\begin{matrix} {〈 A - Π_{R} (A), V 〉}^{A} = & tr (V^{⊤} (A - \frac{1}{2} (A - R A^{⊤} R))) \\ = & \frac{1}{2} tr (V^{⊤} A + (V^{⊤} R) A^{⊤} R) \\ = & \frac{1}{2} tr (V^{⊤} A + (- R^{⊤} V) A^{⊤} R) \\ = & \frac{1}{2} tr (V^{⊤} A - A^{⊤} (R R^{⊤}) V)) \\ = & \frac{1}{2} tr (V^{⊤} A - A^{⊤} V) \\ = & 0 . \end{matrix}

(489)

To apply the definition (466), let us first take a smooth curve

ρ_{R, V} (t)

, with

R \in SO (n)

and

V \in T_{X} SO (n)

, and evaluate

Π_{ρ_{R, V} (t)} (A) : = \frac{1}{2} (A - ρ_{R, V} (t) A^{⊤} ρ_{R, V} (t)) .

(490)

Taking the first derivative, side by side, with respect to the parameter t gives:

\frac{d}{d t} Π_{ρ_{R, V} (t)} (A) : = - \frac{1}{2} {\dot{ρ}}_{R, V} (t) A^{⊤} ρ_{R, V} (t) - \frac{1}{2} ρ_{R, V} (t) A^{⊤} {\dot{ρ}}_{R, V} (t) .

(491)

Setting

t = 0

and recalling that

ρ_{R, V} (0) = R

and

{\dot{ρ}}_{R, V} (0) = R

yields

Π_{R}^{•} (A, V) = - \frac{1}{2} (V A^{⊤} R + R A^{⊤} V) .

(492)

To end this example, let us verify the commutativity property (481). Let us observe that

\begin{matrix} {〈 Π_{R}^{•} (A, V), W 〉}^{A} = - \frac{1}{2} tr (W^{⊤} V A^{⊤} R) - \frac{1}{2} tr (W^{⊤} R A^{⊤} V), \\ {〈 Π_{R}^{•} (A, W), V 〉}^{A} = - \frac{1}{2} tr (V^{⊤} W A^{⊤} R) - \frac{1}{2} tr (V^{⊤} R A^{⊤} W) . \end{matrix}

(493)

The above expressions may be proven equal by some clever matrix manipulation. For example, by noticing that

W^{⊤} = - R^{⊤} W R^{⊤}

and

V = - R V^{⊤} R

, one may show that

\begin{matrix} tr (W^{⊤} R A^{⊤} V) = & tr ((- R^{⊤} W R^{⊤}) R A^{⊤} (- R V^{⊤} R)) \\ = & tr (W A^{⊤} R V^{⊤}) \\ = & tr (V^{⊤} W A^{⊤} R), \end{matrix}

(494)

by repeated application of cyclic permutation invariance.

Let us complete the calculation of the Riemannian Hessian for the special orthogonal group. According to the Hessian formula (478), we preliminarily need to evaluate

\begin{matrix} Π_{R} (Π_{R}^{•} (A, V)) = & \frac{1}{4} \{(A V^{⊤} - V A^{⊤}) R - R (A^{⊤} V - V^{⊤} A)\}, \\ Π_{R} ({\bar{\bar{Γ}}}_{R} (V, W)) = & 0, \end{matrix}

(495)

therefore

\begin{matrix} H_{R} f (V) = & \frac{1}{2} (\partial_{R}^{2} f (V) - R {(\partial_{R}^{2} f (V))}^{⊤} R) \\ - \frac{1}{4} \{(\partial_{R} f V^{⊤} - V \partial_{R}^{⊤} f) R - R (\partial_{R}^{⊤} f V - V^{⊤} \partial_{R} f)\} . \end{matrix}

(496)

It is easy to verify, by direct calculation, that

H_{R} (V) \in T_{R} SO (n)

. ■

Let us survey the computation of the Riemannian Hessian on the hypersphere.

Example 46.

Let us compute the Riemannian Hessian associated with the hypersphere

M : = S^{p - 1}

endowed with the canonical metric

{〈 u, v 〉}_{x} : = u^{⊤} v

, embedded in the ambient space

A : = R^{p}

endowed with the metric

{〈 u, v 〉}^{A} : = u^{⊤} v

. Recall that it holds

Π_{x} (a) : = (I_{p} - x x^{⊤}) a

and

{\bar{Γ}}_{x} (u, v) = - x (u^{⊤} v)

.

It is just necessary to compute the derivative

Π_{x}^{•} (a, v)

and then to make use of the relationship (478). By definition, we have

Π_{x}^{•} (a, v) : = {\frac{d}{d t} Π_{ρ_{x, v} (t)} (a)|}_{t = 0} = - (x v^{⊤} + v x^{⊤}) a .

(497)

Ultimately, we obtain

H_{x} f (v) = (I_{p} - x x^{⊤}) (\partial_{x}^{2} f) v - (x^{⊤} \partial_{x} f) v,

(498)

which represents the Riemannian Hessian utilized in [61].

It is instrumental to notice that, for

a \in R^{p}

and

v \in T_{x} S^{p - 1}

, the quantity

Π_{x}^{•} (a, v)

possesses a radial (normal) component

x (v^{⊤} a)

as well as a tangential component

v (x^{⊤} a)

. Let us verify the property (481). We have

\begin{matrix} {〈 Π_{x}^{•} (a, v), w 〉}^{A} = - (w^{⊤} x) (v^{⊤} a) + (w^{⊤} v) (x^{⊤} a), \\ {〈 Π_{x}^{•} (a, w), v 〉}^{A} = - (v^{⊤} x) (w^{⊤} a) + (v^{⊤} w) (x^{⊤} a) . \end{matrix}

(499)

The first terms on the right-hand sides are null because

v, w \in T_{x} S^{p - 1}

, while the remaining terms are to one another equal. ■

The space of symmetric, positive-definite matrices is interesting to what concerns Riemannian Hessian.

Example 47.

Let us recall that, for the space

S^{+} (n)

, we have chosen

G_{P} (W) : = P^{- 1} W P^{- 1}

, also it holds that

{\bar{\bar{Γ}}}_{P} (V, W) : = - \frac{1}{2} V P^{- 1} W - \frac{1}{2} W P^{- 1} V

,

Π_{P} (A) = \frac{1}{2} (A + A^{⊤})

for any

A \in GL (n)

. Since

Π_{P}

does not explicitly depend on the point P, we may utilize Formula (477) to evaluate the Hessian. ■

As a last example, let us compute the Riemannian Hessian of a linear function. In the familiar

R^{n}

case, we should obtain a null Hessian, while in a manifold setting this is easily guessed not to be the case.

Example 48.

Let a smooth manifold

M

be embedded in an ambient space

A

and let

ℓ : M \to R

denote a linear function such that

\partial_{x} ℓ = a \in A

constant and

\partial_{x}^{2} ℓ = 0

. Not having assumed anything else, the Hessian of the function ℓ should be calculated by means of the expression (476), namely

H_{x} ℓ (v) = G_{x}^{- 1} \{Π_{x} (\partial_{x}^{2} ℓ (v) + Π_{x}^{•} (\partial_{x} ℓ, v) - {\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} ℓ), v))\} .

(500)

In this case, the Riemannian Hessian reads as

H_{x} ℓ (v) = G_{x}^{- 1} \{Π_{x} (Π_{x}^{•} (a, v) - {\bar{\bar{Γ}}}_{x} (Π_{x} (a), v))\},

(501)

which clearly varies from point to point on

M

. ■

The property (481) implies that in an inner product such as

{〈 H_{x} (v), w 〉}_{x}

the tangent vector arguments may be swapped, which implies the Riemannian Hessian operator is self-adjoint, namely

{〈 H_{x} (v), w 〉}_{x} = {〈 v, H_{x} (w) 〉}_{x} .

(502)

This property may be proven directly by the relationship (476); in fact,

\begin{matrix} {〈 H_{x} (v), w 〉}_{x} = & {〈 G_{x}^{- 1} (Π_{x} (\partial_{x}^{2} f (v))), G_{x} (w) 〉}^{A} \\ + {〈 G_{x}^{- 1} (Π_{x} (Π_{x}^{•} (\partial_{x} f, v))), G_{x} (w) 〉}^{A} \\ - {〈 G_{x}^{- 1} (Π_{x} ({\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} f), v))), G_{x} (w) 〉}^{A} \\ = & {〈 Π_{x} (\partial_{x}^{2} f (v)), w 〉}^{A} \\ + {〈 Π_{x} (Π_{x}^{•} (\partial_{x} f, v)), w 〉}^{A} \\ - {〈 Π_{x} ({\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} f), v)), w 〉}^{A} \\ = & {〈 \partial_{x}^{2} f (v), w 〉}^{A} + {〈 Π_{x}^{•} (\partial_{x} f, v), w 〉}^{A} - {〈 {\bar{\bar{Γ}}}_{x} (Π_{x} (\partial_{x} f), v), w 〉}^{A} . \end{matrix}

(503)

The first inner product is symmetric in v and w because the ambient Hessian is self-adjoint, the second term is self-adjoint by the property (481) and the last term is symmetric in v and w because the Christoffel form commutes with the inner product.

13.2. A Newton-like Optimization Algorithm

Given a function

f : M \to R

, we may define a quadratic approximation at a point

x \in M

in the direction

v \in T_{x} M

as

Q_{f} (x, v) : = f (x) + {〈 {grad}_{x} f, v 〉}_{x} + \frac{1}{2} {〈 H_{x} f (v), v 〉}_{x} .

(504)

As often recalled, the inner products may be evaluated in terms of ambient metrics and metric kernel. Therefore, the change in the value of the function f may be evaluated as

Q_{f} (x, v) - f (x) = {〈 G_{x} ({grad}_{x} f), v 〉}^{A} + \frac{1}{2} {〈 G_{x} (H_{x} f (v)), v 〉}^{A} .

(505)

In optimization, it is fundamental to find a direction of maximal change from a given point, which may be determined by solving the following problem in v:

\frac{\partial}{\partial v} (Q_{f} (x, v) - f (x)) = 0 .

(506)

A consequence of a property shown in the previous subsection, namely that the Riemannian Hessian is self-adjoint, is that

\partial_{v} {〈 H_{x} f (v), v 〉}_{x} = 2 H_{x} f (v) .

(507)

Recalling that

G_{x} \circ H_{x} f

is linear, the Equation (506) hence reads

G_{x} ({grad}_{x} f) + G_{x} (H_{x} f (v)) = 0 .

(508)

Recalling that

G_{x}

is invertible, the above relationship simplifies to

{grad}_{x} f + H_{x} f (v) = 0

, which leads to the optimal direction

v = - {(H_{x} f)}^{- 1} ({grad}_{x} f) .

(509)

Notice that

v \in T_{x} M

, and hence the result is consistent to what was expected. It is the case to notice that the inverse of the Hessian may be hard to express in closed form; hence, it is sometimes easier to just set up the (linear) equation

H_{x} (v) = - {grad}_{x} f

and solve it by any linear-algebra tool.

The optimal direction may be exploited in a Newton-like optimization algorithm as follows:

\{\begin{matrix} v_{k} = - {(H_{x_{k}} f)}^{- 1} ({grad}_{x_{k}} f), \\ x_{k + 1} = {exp}_{x_{k}} (h v_{k}), \end{matrix}

(510)

where

k \geq 0

is an integer step-counter,

h > 0

is a step-size and

x_{0}

denotes an initial guess.

14. Conclusions

The present paper focused on manifold calculus to describe the system-theoretic properties of non-linear systems whose states belong to curved manifolds and was conceived as the first part of a longer tutorial in manifold calculus.

In particular, the present tutorial paper focuses mainly on mathematical definitions and calculation rules, expressed in the language of embedded manifold calculus, that form a knowledge base to develop further concepts and applications. A number of manifolds of interest in applications were covered and a number of examples clarified some collateral, yet interesting, aspects.

A section of the present paper focuses on the design of a manifold-type system synchronization algorithm based on feedback control and on developing numerical methods tailored to curved manifolds to implement such systems and algorithms.

Since the present contribution aimed to lay out the basic concepts in manifold calculus and Lie group theory, it only covers the basis of first-order dynamical systems and proportional type control. Second-order systems and their control require advanced notions, such as covariant derivation. Such advanced notions will be covered in a forthcoming contribution.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

Abraham, R.; Marsden, J.; Ratiu, T. Manifolds, Tensor Analysis, and Applications; Springer: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
Arnol’d, V.; Givental’, A. Symplectic geometry. In Dynamical Systems IV: Symplectic Geometry &Its Applications, 2nd ed.; Arnol’d, V., Novikov, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; Volume 4, pp. 1–138. [Google Scholar]
Bloch, A. An Introduction to Aspects of Geometric Control Theory. In Nonholonomic Mechanics and Control; Interdisciplinary Applied Mathematics; Krishnaprasad, P., Murray, R., Eds.; Springer: New York, NY, USA, 2015; Volume 24. [Google Scholar]
Bullo, F.; Lewis, A. Geometric Control of Mechanical Systems: Modeling, Analysis, and Design for Mechanical Control Systems; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Sastry, S. Geometric Nonlinear Control. In Nonlinear Systems—Analysis, Stability, and Control; Sastry, S., Ed.; Springer Science + Business Media: New York, NY, USA, 1999; pp. 510–573. [Google Scholar]
Hairer, E.; Lubich, C.; Wanner, G. Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, 2nd ed.; Springer: Berlin, Germany; New York, NY, USA, 2006. [Google Scholar]
Ermentrout, G.; Rinzel, J. Beyond a pacemaker’s entrainment limit-phase walk-through. Am. J. Physiol. 1984, 246, R102–R106. [Google Scholar] [CrossRef] [PubMed]
Schäfer, C.; Rosenblum, M.; Kurths, J.; Abel, H. Heartbeat synchronised with ventilation. Nature 1998, 392, 239–240. [Google Scholar] [CrossRef] [PubMed]
Blasius, B.; Huppert, A.; Stone, L. Complex dynamics and phase synchronization in spatially extended ecological systems. Nature 1999, 399, 354–359. [Google Scholar] [CrossRef] [PubMed]
Castrejón-Pita, A.; Read, P. Synchronization in a pair of thermally coupled rotating baroclinic annuli: Understanding atmospheric teleconnections in the laboratory. Phys. Rev. Lett. 2010, 104, 204501. [Google Scholar] [CrossRef]
Ermentrout, B.; Wechselberger, M. Canards, clusters, and synchronization in a weakly coupled interneuron model. SIAM J. Appl. Dyn. Syst. 2009, 8, 253–278. [Google Scholar] [CrossRef]
Kloeden, P. Synchronization of nonautonomous dynamical systems. Electron. J. Differ. Equ. 2003, 1, 1–10. [Google Scholar]
Pikovsky, A.; Rosenblum, M.; Kurths, J. Synchronization—A Universal Concept in Nonlinear Sciences; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
Stankovski, T. Tackling the Inverse Problem for Non-Autonomous Systems: Application to the Life Sciences; Springer Theses; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Blekhman, I.; Landa, P.; Rosenblum, M. Synchronization and chaotization in interacting dynamical systems. Appl. Mech. Rev. 1995, 48, 733–752. [Google Scholar] [CrossRef]
Luo, A. A theory for synchronization of dynamical systems. Commun. Nonlinear Sci. Numer. Simul. 2009, 14, 1901–1951. [Google Scholar] [CrossRef]
Stoer, J.; Bulirsch, R. Introduction to Numerical Analysis, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Golub, G.; Van Loan, C. Matrix Computations, 3rd ed.; Johns Hopkins: Baltimore, MD, USA, 1996. [Google Scholar]
Higham, N. Computing the polar decomposition—With applications. SIAM J. Sci. Stat. Comput. 1986, 7, 1160–1174. [Google Scholar] [CrossRef] [Green Version]
Fiori, S.; Prifti, S. Exact low-order polynomial expressions to compute the Kolmogoroff-Nagumo mean in the affine symplectic group of optical transference matrices. Linear Multilinear Algebra 2017, 65, 840–856. [Google Scholar] [CrossRef]
Olver, P. Applications of Lie Groups to Differential Equations, 2nd ed.; Graduate Texts in Mathematics; Springer: Berlin/Heidelberg, Germany, 2003; Volume 107. [Google Scholar]
Fiori, S. A fast fixed-point neural blind deconvolution algorithm. IEEE Trans. Neural Netw. 2004, 15, 455–459. [Google Scholar] [CrossRef]
Fiori, S. Geodesic-based and projection-based neural blind deconvolution algorithms. Signal Process. 2008, 88, 521–538. [Google Scholar] [CrossRef]
Pajunen, P.; Girolami, M. Implementing decisions in binary decision trees using independent component analysis. In Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation, Helsinki, Finland, 19–22 June 2000; pp. 477–481. [Google Scholar]
Zhu, Y.; Mio, W.; Liu, X. Optimal dimension reduction for image retrieval with correlation metrics. In Proceedings of the International Conference on Neural Networks (IJCNN 2009), Atlanta, GA, USA, 14–19 June 2009; pp. 3565–3570. [Google Scholar]
Yershova, A.; LaValle, S. Deterministic sampling methods for spheres and SO(3). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2004), New Orleans, LA, USA, 26 April–1 May 2004; Volume 4, pp. 3974–3980. [Google Scholar]
Trendafilov, N.; Lippert, R. The multimode Procrustes problem. Linear Algebra Its Appl. 2002, 349, 245–264. [Google Scholar] [CrossRef] [Green Version]
Vasconcelos, J.; Elkaim, G.; Silvestre, C.; Oliveira, P.; Cardeira, B. Geometric approach to strapdown magnetometer calibration in sensor frame. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 1293–1306. [Google Scholar] [CrossRef] [Green Version]
Viswanathan, S.; Sanyal, A.; Samiei, E. Integrated Guidance and Feedback Control of Underactuated Robotics System in SE(3). J. Intell. Robot Syst. 2018, 89, 251–263. [Google Scholar] [CrossRef]
Celledoni, E.; Fiori, S. Neural learning by geometric integration of reduced ‘rigid-body’ equations. J. Comput. Appl. Math. 2004, 172, 247–269. [Google Scholar] [CrossRef] [Green Version]
Yoo, J.; Choi, S. Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds. In Proceedings of the Intelligent Data Engineering and Automated Learning (IDEAL 2008), Daejeon, Korea, 2–5 November 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 140–147. [Google Scholar]
Amari, S.I. Natural gradient learning for over- and under-complete bases in ICA. Neural Comput. 1999, 11, 1875–1883. [Google Scholar] [CrossRef]
Edelman, A.; Arias, T.; Smith, S. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 1998, 20, 303–353. [Google Scholar] [CrossRef]
Eldén, L.; Park, H. A Procrustes problem on the Stiefel manifold. Numer. Math. 1999, 82, 599–619. [Google Scholar] [CrossRef]
Yger, F.; Berar, M.; Gasso, G.; Rakotomamonjy, A. Oblique principal subspace tracking on manifold. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 2429–2432. [Google Scholar]
Moakher, M. A differential geometry approach to the geometric mean of symmetric positive definite matrices. SIAM J. Matrix Anal. Appl. 2005, 26, 735–747. [Google Scholar] [CrossRef]
Smith, S. Covariance, subspace, and instrinsic Cramér-Rao bounds. IEEE Trans. Signal Process. 2005, 53, 1610–1630. [Google Scholar] [CrossRef] [Green Version]
Bonnabel, S.; Sepulchre, R. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM J. Matrix Anal. Appl. 2009, 31, 1055–1070. [Google Scholar] [CrossRef] [Green Version]
Fiori, S. Extended Hamiltonian learning on Riemannian manifolds: Numerical aspects. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 7–21. [Google Scholar] [CrossRef]
Fiori, S.; Kaneko, T.; Tanaka, T. Tangent-bundle maps on the Grassmann manifold: Application to empirical arithmetic averaging. IEEE Trans. Signal Process. 2015, 63, 155–168. [Google Scholar] [CrossRef]
Stratmann, P.; Lakatos, D.; Albu-Schäffer, A. Neuromodulation and synaptic plasticity for the control of fast periodic movement: Energy efficiency in coupled compliant joints via PCA. Front. Neurorobot. 2016, 10, 2. [Google Scholar] [CrossRef]
Leonard, N.; Krishnaprasad, P. Control of switched electrical networks using averaging on Lie groups. In Proceedings of the 33rd IEEE Conference on Decision and Control, Lake Buena Vista, FL, USA, 14–16 December 1994; pp. 1919–1924. [Google Scholar]
Tapp, K. Matrix Groups for Undergraduates, 2nd ed.; Student Mathematical Library; American Mathematical Society: Providence, RI, USA, 2016; Volume 79. [Google Scholar]
Arsigny, V.; Pennec, X.; Ayache, N.; Bi-Invariant Means in Lie Groups. Application to Left-Invariant Polyaffine Transformations. 2006. Available online: https://hal.inria.fr/inria-00071383 (accessed on 26 October 2021).
Amari, S.I. Natural gradient works efficiently in learning. Neural Comput. 1998, 10, 251–276. [Google Scholar] [CrossRef]
Khvedelidze, A.; Mladenov, D. Generalized Calogero-Moser-Sutherland models from geodesic motion on GL⁺(n,R) group manifold. Phys. Lett. A 2002, 299, 522–530. [Google Scholar] [CrossRef] [Green Version]
Henderson, H.V.; Searle, S.R. On deriving the inverse of a sum of matrices. SIAM Rev. 1981, 23, 53–60. [Google Scholar] [CrossRef]
Courant, R.; Hilbert, D. Methods of Mathematical Physics, 1st ed.; Interscience Publishers: New York, NY, USA, 1953; Volume I. [Google Scholar]
Del Buono, N.; Lopez, L. Runge-Kutta type methods based on geodesics for systems of ODEs on the Stiefel manifold. BIT Numer. Math. 2001, 41, 912–923. [Google Scholar] [CrossRef]
Wang, J.; Sun, H.; Fiori, S. Empirical means on pseudo-orthogonal groups. Mathematics 2019, 7, 940. [Google Scholar] [CrossRef] [Green Version]
McGraw, T.; Vemuri, B.; Yezierski, B.; Mareci, T. Von Mises-Fisher mixture model of the diffusion ODF. In Proceedings of the 3rd IEEE International Symposium on Biomedical Imaging: Macro to Nano (ISBI 2006), Arlington, VA, USA, 6–9 April 2006; pp. 65–68. [Google Scholar]
Fréchet, M. Les élements aléatoires de nature quelconque dans un espace distancié. Ann. De l’Institut Henri PoincarÉ 1948, 10, 215–310. [Google Scholar]
Karcher, H. Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 1977, 30, 509–541. [Google Scholar] [CrossRef]
Tyagi, A.; Davis, J. A recursive filter for linear systems on Riemannian manifolds. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
Simoncini, V. Computational methods for linear matrix equations. SIAM Rev. 2016, 58, 377–441. [Google Scholar] [CrossRef] [Green Version]
Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol. 1982, 15, 267–273. [Google Scholar] [CrossRef]
Oja, E. Neural networks, principal components, and subspaces. Int. J. Neural Syst. 1989, 1, 61–68. [Google Scholar] [CrossRef]
Nedić, A.; Olshevsky, A.; Shi, W. Decentralized consensus optimization and resource allocation. In Large-Scale and Distributed Optimization; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 2018; Volume 2227. [Google Scholar]
Nedić, A.; Ozdaglar, A.; Parrilo, P. Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 2010, 55, 922–938. [Google Scholar] [CrossRef]
Ambrose, W.; Singer, I. A theorem on holonomy. Trans. Am. Math. Soc. 1953, 75, 428–443. [Google Scholar] [CrossRef]
Fiori, S. Blind deconvolution by a Newton method on the non-unitary hypersphere. Int. J. Adapt. Control Signal Process. 2013, 27, 488–518. [Google Scholar] [CrossRef]
Qi, C.H.; Gallivan, K.; Absil, P.A. Riemannian BFGS Algorithm with Applications. In Recent Advances in Optimization and Its Applications in Engineering, Part 3; Diehl, M., Diehl, M., Glineur, F., Jarlebring, E., Michiels, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 183–192. [Google Scholar]
Vandereycken, B.; Vandewalle, S. A Riemannian optimization approach for computing low-rank solutions of Lyapunov equations. SIAM J. Matrix Anal. Appl. 2010, 31, 2553–2579. [Google Scholar] [CrossRef] [Green Version]
Kaneko, T.; Fiori, S.; Tanaka, T. Empirical Arithmetic Averaging over the Compact Stiefel Manifold. IEEE Trans. Signal Process. 2013, 61, 883–894. [Google Scholar] [CrossRef]
Absil, P.A.; Malick, J. Projection-like retractions on matrix manifolds. SIAM J. Optim. 2012, 22, 135–158. [Google Scholar] [CrossRef]
Fiori, S. Averaging over the Lie group of optical systems transference matrices. Front. Electr. Electron. Eng. China 2011, 6, 137–145. [Google Scholar] [CrossRef]
Harris, W.; Cardoso, J. The exponential-mean-log-transference as a possible representation of the optical character of an average eye. Ophthalmic Physiol. Opt. 2006, 26, 380–383. [Google Scholar] [CrossRef]
Misner, C.; Thorne, K.; Wheeler, J. Gravitation; W.H. Freeman Publisher: New York, NY, USA, 1973. [Google Scholar]
Ding, K.; Han, Q.L. Master-slave synchronization of nonautonomous chaotic systems and its application to rotating pendulums. Int. J. Bifurc. Chaos 2012, 22, 1250147. [Google Scholar] [CrossRef]
Magdy, M.; Ng, T. Regulation and control effort in self-tuning controllers. IEE Proc. D—Control Theory Appl. 1986, 133, 289–292. [Google Scholar] [CrossRef]
Courant, R.; Friedrichs, K.; Lewy, H. On the partial difference equations of mathematical physics. IBM J. 1967, 11, 215–234. [Google Scholar] [CrossRef]
Lambert, J. Numerical Methods for Ordinary Differential Systems: The Initial Value Problem; Wiley: New York, NY, USA, 1992. [Google Scholar]
Absil, P.; Mahony, R.; Trumpf, J. An extrinsic look at the Riemannian Hessian. In Geometric Science of Information. GSI 2013; Lecture Notes in Computer Science; Nielsen, F., Barbaresco, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8085. [Google Scholar] [CrossRef] [Green Version]
Cao, Y.; Li, D.; Sun, H.; Assadi, A.H.; Zhang, S. Efficient Weingarten map and curvature estimation on manifolds. Mach. Learn. 2021, 110, 1319–1344. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fiori, S. Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems. Symmetry 2021, 13, 2092. https://doi.org/10.3390/sym13112092

AMA Style

Fiori S. Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems. Symmetry. 2021; 13(11):2092. https://doi.org/10.3390/sym13112092

Chicago/Turabian Style

Fiori, Simone. 2021. "Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems" Symmetry 13, no. 11: 2092. https://doi.org/10.3390/sym13112092

APA Style

Fiori, S. (2021). Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems. Symmetry, 13(11), 2092. https://doi.org/10.3390/sym13112092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems

Abstract

1. Introduction

2. Coordinate-Free Embedded Manifold Calculus

2.1. General Notation and Properties

2.2. Manifolds and Embedded Manifolds (or Submanifolds)

3. Smooth Curves, Tangent Vector Fields, Tangent Spaces and Bundle, Normal Spaces

3.1. Curves and Bundles for Embedded Manifolds

3.2. Vector Fields

3.3. Canonical Curves, Canonical Basis of a Tangent Space*

4. First-Order Dynamical Systems on Manifolds

5. Tangent Maps: Pushforward and Pullback

6. Lie Groups, Lie Algebras, Lie Brackets

7. Metrization, Riemannian Manifolds

7.1. Coordinate-Free Metrization by Inner Products and Metric Kernels

7.2. Covariancy, Contravariancy, Tensors*

8. Geodesic Arc, Riemannian Distance, Exponential and Logarithmic Map

8.1. Coordinate-Free Embedded Geodesy

8.2. Exponential and Logarithmic Maps

8.3. Geodesic Interpolation

8.4. Coordinate-Prone Geodesy, Christoffel Symbols*

9. Riemannian Gradient of a Manifold-to-Scalar Function

9.1. Riemannian Gradient: Motivation and Definition

9.2. Application of Riemannian Gradient to Optimization on Manifold

9.3. A Golden Gradient Rule: Gradient of Squared Distance

9.4. Riemannian Gradient in Coordinates*

10. Parallelism and Parallel Transport along a Curve

10.1. Properties and Definition of Parallel Transport

10.2. Coordinate-Free Derivation of Parallel Transport

10.3. Coordinate-Prone Derivation of Parallel Transport*

11. Manifold Retraction and Vector Transport

12. Control Systems on Manifolds and Numerical Implementation

12.1. Synchronization of First-Order Dynamical Systems via Feedback Control

12.2. Numerical Methods to Simulate First-Order Systems

13. Riemannian Hessian of a Manifold-to-Scalar Function

13.1. Definition, Properties and Coordinate-Free Calculation of Riemannian Hessian

13.2. A Newton-like Optimization Algorithm

14. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI