All articles published by MDPI are made immediately available worldwide under an open access license. No special
permission is required to reuse all or part of the article published by MDPI, including figures and tables. For
articles published under an open access Creative Common CC BY license, any part of the article may be reused without
permission provided that the original article is clearly cited. For more information, please refer to
https://www.mdpi.com/openaccess.
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature
Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for
future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive
positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world.
Editors select a small number of articles recently published in the journal that they believe will be particularly
interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the
most exciting work published in the various research areas of the journal.
The aim of the present tutorial paper is to recall notions from manifold calculus and to illustrate how these tools prove useful in describing system-theoretic properties. Special emphasis is put on embedded manifold calculus (which is coordinate-free and relies on the embedding of a manifold into a larger ambient space). In addition, we also consider the control of non-linear systems whose states belong to curved manifolds. As a case study, synchronization of non-linear systems by feedback control on smooth manifolds (including Lie groups) is surveyed. Special emphasis is also put on numerical methods to simulate non-linear control systems on curved manifolds. The present tutorial is meant to cover a portion of the mentioned topics, such as first-order systems, but it does not cover topics such as covariant derivation and second-order dynamical systems, which will be covered in a subsequent tutorial paper.
The theory of dynamical systems whose state spaces possess the structures of curved manifolds has been applied primarily in physics (especially to mathematically describe the theory of general relativity at the beginning of the 20th century). More recently, dynamical systems on manifolds have proven their relevance in a number of subjects in engineering and applied sciences such as in robotics and in biomedical engineering [1,2,3,4,5] to name a few. The observation at the core of such applications is that those dynamical systems whose descriptive variables are bound to one another by non-linear holonomic constraints may be studied by means of the rich variety of mathematical tools provided by manifold calculus (a shortened terminology to denote ’calculus on manifold’) and may be framed in the class of dynamical systems on manifold.
The structures of the state manifolds of such dynamical systems depend on the application at hand. As a concrete example, whenever the dynamics of a rigid body is concerned, the state manifold of choice is the special orthogonal group , because it encodes the attitude of a flying drone or a submarine robot. Most applications of interest (such as the computation of the dynamics of flying bodies) concern well-studied and well-understood curved manifolds, such as the special orthogonal group, the unit hypersphere and the group of symmetric, positive-definite matrices.
From the perspective of performing numerical simulations of dynamical systems on a computing platform, it is necessary to design adequate numerical methods to compute approximate solutions which still meet the structures of the state manifolds. Classical numerical methods, such as those in the Euler class or in the Runge–Kutta class, will fail when applied directly to such dynamical systems because they were designed to work on flat spaces and cannot cope with non-flat manifolds [6].
As a specific applied field, the time synchronization of first-order dynamical systems on curved state manifolds by non-linear control will be surveyed. The time synchronization of non-autonomous dynamical systems has been applied in physiology [7,8], ecology [9], atmosphere physics [10], neurology [11] and many more applied fields [12,13,14]. Synchronization theory is a topic that appears to be very interesting from an abstract point of view and, at the same time, very useful in a number of applications. Indeed, synchronization theory appears to be an exciting and interdisciplinary research topic that combines key ideas from system theory, control theory and manifold calculus. In particular, non-linear control theory on manifolds may supply tools to design control fields to force a pair of dynamical systems into synchronizing their dynamics over time [15,16].
The present tutorial paper is devoted to recalling fundamental notions from manifold calculus and to explaining how these concepts apply to system theory and to non-linear control theory, taking time synchronization as a representative case of study in non-linear control. The present contribution is oriented to readers who either possess a good command of calculus but not of manifold calculus, or to readers who claim an understanding of theoretical manifold calculus, but do not have enough insight in the application/computation aspects of this field. In addition, a basic understanding of system theory and control theory will be assumed. The content of the present tutorial paper may be summarized as follows:
It provides a clear and well-motivated introduction to manifold calculus, the basis of system and control theories on manifolds, with special emphasis on computational and applicational aspects. The present contribution provides practical formulas to deal with those real-valued manifolds that, in the author’s experience, are the most accessed in engineering and applied science problems. As a matter of fact, complex-valued manifolds are not treated at all.
It clearly states and illustrates the idea that, when one wishes to perform a simulation, by a computing platform, of dynamical systems on manifolds described in terms of differential equations, it is necessary to time-discretize such differential equations in a suitable way. In order to achieve such a discretization, it is not safe to invoke standard discretization methods (such as the ones based on Euler forward–backward discretization), which do not work as they stand on curved manifolds. One should therefore resort to more sophisticated numerical integration techniques.
By the author’s choice, the present tutorial paper does not carry any graphical illustrations nor any numerical simulation results. Readers who are interested in deepening their understanding of this topic are invited to to sketch graphs autonomously and to code examples in their favorite programming language.
The present paper is organized as follows. Section 2 lays out some introductory material, such as the motivation behind manifold calculus and a list of manifolds mostly accessed in applications. Section 3 introduces the notion of curves and tangent bundles and allied topics, such as normal spaces. Section 4 provides a first introduction, with examples, to first-order dynamical systems on manifolds. Section 5 presents a special kind of derivative of a function having a manifold as a domain and a manifold as a co-domain, termed a pushforward map. Section 6 introduces a class of manifolds that have peculiar features: the Lie groups. Section 7, in detail, treats the fundamental concept of metrization of a curved space through the more familiar notion of metrization of vector spaces. Section 8 introduces notions such as geodesic lines, Riemannian distance and exponential maps. Section 9 surveys the notion of Riemannian gradient and illustrates such a concept via examples. Section 10 introduces and exemplifies the notion of parallel transport along a curve, which is of paramount importance in manifold calculus and its computer-based implementations. Section 11 outlines the concept of manifold retraction and vector transport, which are computationally convenient approximations of exponential maps and parallel transport, respectively. Section 12 illustrates a feedback control theory suitable for application to first-order systems insisting on state manifolds. Section 13 introduces the notion of Riemannian Hessian, which stems from a second-order approximation of a manifold-to-scalar function, and recalls optimization algorithms that extend the Newton method to look for a zero of a vector field. Section 14 concludes this tutorial paper.
As a distinctive aspect of the present tutorial paper, the main flow of discussion is based on coordinate-free (or component-free) expressions, which facilitates the implementation of the main equations by a matrix-friendly computational language (such as MATLAB). Coordinate-free manifold calculus is introduced by the technical tool of embedded calculus, which stems from embedding a manifold into a larger ambient space where the calculation rules are simpler and more familiar to readers. The starred subsections, namely subsections marked by an asterisk (*), address some specific arguments, related to coordinate-prone manifold calculus, which may be skipped by the uninterested readers without detriment to the comprehension of the main flow of this presentation.
The present tutorial paper does not cover a number of subjects, such as the covariant derivation of vector fields, continuous-time second-order dynamical systems arising from a Lagrangian framework nor higher-order discrete-time dynamical systems, nor the key topics related to manifold curvature. These topics will be the subject of a forthcoming tutorial paper.
2. Coordinate-Free Embedded Manifold Calculus
In this section, we recall the relevant notation and fundamental properties of matrix calculus, as well as several examples of manifolds of interest in engineering and applied sciences.
In manifold calculus, both embedded (or extrinsic) and intrinsic coordinates may be accessed. With the aim of elucidating the difference between extrinsic and intrinsic coordinates for manifold elements, let us mention the case of the space of planar rotations. The space is a mono-dimensional manifold; therefore, any element may be pointed to by one intrinsic coordinate: a matrix may be represented as
with . On the other hand, by embedding the space into the space of real-valued matrices, it turns out that any element of may be regarded as a two-by-two real-valued matrix
whose entries must satisfy the four non-linear constrains:
The four parameters , , , denote embedded or extrinsic coordinates. The orthogonality and normality constraints of the columns of the matrix X may be represented through two coordinate-free constraints in a compact way as
where a superscript denotes matrix transposition and denotes a identity matrix.
Another illustrative example is the ordinary sphere that may be parametrized as follows:
with parameters and (in cartography, the coordinate is termed ‘latitude’, while the coordinate is termed ‘longitude’.) In general, the dimension of a manifold denotes the minimal number of parameters that are required to individuate a point on a manifold. While in common speech the ordinary sphere is termed ‘three-dimensional’, it is indeed a bi-dimensional manifold embedded in a three-dimensional ambient space. The dimension of the manifold is 2; the dimension of the embedding space, instead, is 3.
In general, in engineering and applied mathematics, using intrinsic coordinates is deprecated. As we have just seen, representing an ordinary sphere through intrinsic coordinates is fairly easy, but what about, for example, a hypersphere ? Any intrinsic representation would require nine angles and a series of complicated trigonometric expressions. Assuming that a manifold of interest may be embedded into a larger ambient space , it is way easier to represent one such manifold as a subset of points of an ambient space that meet certain constraints. For example, the sphere may be embedded in and every point of such a sphere may be represented by an array such that
Clearly, the ten coordinates , termed embedded coordinates, turn out to be unnecessary (nine coordinates would be enough). The existence of an embedding may be evaluated through classical results of manifold calculus, which also state the best dimension of the ambient space on the basis of the structural properties of the embedding. Two such results are the strong Whitney embedding theorem and the Nash isometric embedding theorem for Riemannian manifolds. In the present tutorial, we are concerned exclusively with manifolds that may be embedded into a linear ambient space.
The number of extrinsic coordinates is, in general, far larger than the number of intrinsic coordinates; hence, a coordinate-free embedded representation is, in general, redundant. However, as long as computer implementation is concerned, a coordinate-free representation is advantageous because computing languages, such as MATLAB and Python, generally deal seamlessly with bi-dimensional arrays. With reference to the sphere, let us examine a simple example that tells us why manifolds need their own calculus.
Example1.
Take a point and a direction , such that , and let us write the parametric equation of a line departing from x in the direction v as , with . It is important to point out that none of the points over such a line belong to the sphere, except for the point x. In fact, we have
since v is orthogonal to x. It turns out that if , hence the assertion. ■
This simple example tells us that standard constructs, such as straight lines, do not work on curved manifolds, hence the need of a specific calculus to be consistently developed.
2.1. General Notation and Properties
In general, column arrays are denoted by a lower-case letter, while matrices are denoted by an upper-case letter. Manifold elements, for a generic manifold, are denoted as lower-case letters.
A number of matrix functions and factorizations will be invoked in this paper, namely:
Matrix trace: The trace of a square matrix (namely the sum of its principal-diagonal entries) is denoted by . Matrix trace has a cyclic permutation invariance property. For example, given three conformable (i.e., mutually multipliable) matrices , it holds that:
Matrix square root: Given a matrix , its square root R is the unique matrix such that . Not every matrix admits a square root. Special square roots (such as a symmetric square root) will be defined later.
Spectral factorization: Given a matrix , let us assume that there exists an orthogonal matrix X (i.e., one such that ) and a real diagonal matrix D such that . The expression on the right-hand side denotes the spectral factorization of the matrix M. Such a factorization turns out to be very useful in evaluating matrix polynomials. For example, it holds that
Exponentiating full matrices is cumbersome, while exponentiating diagonal matrices is amiable.
Thin QR factorization: Any matrix , with , may be factored as the product of a orthogonal matrix Q and a upper triangular matrix R, namely . In general, a QR factorization is not unique [17]. To remove such kind of indeterminacy, the R-factor may be chosen with strictly positive entries on its main diagonal, so that the factorization is unique.
Compact singular value factorization (SVD): Compact SVD of a matrix is a matrix factorization of the type in which D is square diagonal of size , where is the rank of M, and has only nonzero singular values. In this variant, A denotes a matrix and B denotes a matrix, such that [18].
Polar factorization: Given a real-valued matrix M, its polar factorization is written as , where X denotes a matrix such that , termed polar factor, and S denotes a symmetric positive semidefinite matrix [19]. The polar factorization of a matrix always exists and, if the matrix is full rank, its polar factor is unique.
Matrix exponential: Given a matrix , its matrix exponential is denoted as . Matrix exponential is defined via a series as
There exist special formulas to compute the matrix exponential via a finite number of operations for special matrices (see, for example, [20] and references therein). For example, for a symmetric matrix , that admits a spectral factorization , it holds that
where .
Principal matrix logarithm: Given a matrix , its principal matrix logarithm is denoted as . Matrix logarithm is defined via a series as
In the specialized literaure it is possible to find special formulas to compute the principal matrix logarithm via a finite number of operations for special matrices. For example, for a positive-definite matrix , which admits a spectral factorization , it holds that
where . Recall that any symmetric, positive-definite matrix has only positive eigenvalues; hence, is a real-valued matrix.
The other notation used within this paper is defined at its earliest occurrence. We just recall that the symbol denotes a definition and that the derivative of a function with respect to its scalar parameter t is denoted as .
2.2. Manifolds and Embedded Manifolds (or Submanifolds)
The formal definition of a smooth manifold is quite convoluted, as it requires notions from mathematical topology [21]. More practically, a manifold may be essentially regarded as a generalization of surfaces in higher dimensions that is endowed with the noticeable property of being locally similar to a flat space. In addition, a manifold may generally be regarded as an abstract mathematical object, not necessarily ready for computation, whereas, in practice, manifolds of interest in applied science and engineering are essentially matrix manifolds.
Let us consider a smooth manifold and a point x on it. From an abstract point of view, x is an element of a set and does not necessarily carry any intrinsic numerical features. In order to be able to develop calculus on manifolds, to, for instance, compute the directional derivative of a function , it is convenient to ‘coordinatize’ a manifold. To this aim, let us take a neighborhood (open set) that contains the point x and a coordinate map . The size p denotes the minimal number of coordinates that is necessary to specify the location of the point x unequivocally and is taken as the dimension of the manifold. The map needs to be a one-to-one map. In this way, we attach a set of coordinates to the point x. Such a construction establishes a smooth one-to-one correspondence between a point on a manifold and its coordinate point; therefore, the two concepts may be confused and we may safely speak of a point when actually speaking of its coordinates.
The above theoretical construction carries a number of practical drawbacks. For instance, notice that in general a manifold cannot be covered by a single coordinate map. Indeed, in general a manifold needs to be covered by a number of neighborhoods , each of which is equipped with a coordinate map . The set thus forms a basis for the manifold. Such a basis need not be finite, although it needs to be countable and is termed ‘atlas’. In general, the basis neighborhoods may happen to be overlapping one another; hence, the coordinate maps need to satisfy the compatibility conditions. Such conditions formalize the natural requirement that there need to be a one-to-one smooth correspondence between any two different coordinate systems insisting on regions of the manifolds belonging to more than one neighborhood. In formal terms, if , then the ‘transition functions’ and should possess the structure of diffeomorphisms, namely functions endowed with inverses.
As mentioned in the introduction, in the present tutorial we are neglecting coordinates in favor of embeddings, except in starred sections that cover coordinate-prone calculations and that may be skipped by uninterested readers. A smooth manifold is by nature a continuous object. Manifolds of interest in applications, described in embedded terms, may be summarized as follows:
Hypercube: The simplest manifold of interest is perhaps the hypercube , which is essentially the set spanned by p real-valued variables (or p-tuples).
Hypersphere, oblique manifold, hyperellipsoid: A hypersphere is represented as and is the subset of points of the hypercube with unit Euclidean distance from point 0. This is a smooth manifold of dimension embedded in the hypercube ; in fact, with only coordinates, we can identify unequivocally any point on a sphere. In [21], it is shown how to ‘coordinatize’ such a manifold through, e.g., the stereographic projection, which requires two coordinate maps applied to two convenient neighborhoods (each including only one ‘pole’) on the sphere. The special cases are , the unit circle, and , the ordinary sphere. There exist a number of applications insisting on the hyperspheres such as, for instance, blind deconvolution [22,23], data classification [24], adaptive pattern recognition [25] and motion planning, optimization, and verification in robotics and in computational biology [26]. A smooth manifold closely related to the unit hypersphere is the oblique manifold [27], defined as:
where the operator returns the zero matrix except for the main diagonal that is copied from the main diagonal of its argument. The structure of the oblique manifold may be easily studied on the basis of the unit hypersphere; in fact, the following identification holds true:
hence, each column of a -matrix may be treated as a -column array. The hyperellipsoid is defined as:
with being diagonal and positive-definite and being a hyper-rotation such that and . The mathematical structure of the hyperellipsoid may be studied on the basis of the manifold structure of the hypersphere. The hyperellipsoid is used, e.g., in the calibration of magnetometers [28].
General linear group and special linear group: The general linear group is defined as }. This is the subset of the space of matrices which are invertible. The special linear group is defined as }. This is the subset of the general linear group made by all matrices with a unitary determinant.
Orthogonal group, special orthogonal group, special Euclidean group: An orthogonal group of size p is defined by . The manifold has dimension . In fact, every matrix in possesses entries which are constrained by orthogonality/normality restrictions. The manifold of special orthogonal matrices is defined as . A smooth manifold closely related to the special orthogonal group is the special Euclidean group, denoted as , that finds applications in robotics (see, e.g., [29]). The special Euclidean group is a set of matrices defined as:
Stiefel manifold: The (compact) Stiefel manifold is defined as:
where . Avery Stiefel matrix has entries, but its elements are constrained by non-linear constraints, and hence the dimension of a Stiefel manifold is . Exemplary applications are blind source separation [30], non-negative matrix factorization [31], best basis search/selection [32], electronic structures computation [33] and factor analysis in psychometrics [34]. A generalization of the Stiefel manifold, to be applied to principal subspace tracking, was studied in the contribution [35]. Such a generalized Stiefel manifold is defined as:
where B denotes any symmetric, positive-definite matrix. The contribution [35] studied the structure of the tangent bundle of the generalized Stiefel manifold and suggested a computationally convenient calculus for this manifold.
Real symplectic group: The real symplectic group is defined as
where the symbol denotes again a identity matrix and the symbol denotes a whole-zero matrix. The skew-symmetric matrix J enjoys the following properties: , .
Manifold of symmetric, positive-definite (SPD) matrices: The main features of the space
of symmetric, positive-definite matrices was surveyed, e.g., in [36,37]. We recall that a matrix is termed positive definite if for every column array it holds that . A useful property is that the (real) eigenvalues of any SPD matrix are all strictly positive. A related manifold is the space of symmetric fixed-rank positive-semidefinite matrices, that may be defined as:
The growing use of low-rank matrix approximations to retain tractability in large-scale applications boosted extensions of the calculus of positive-definite matrices to their low-rank counterparts [38].
Grassmann manifold: A Grassmann manifold is a set of subspaces of spanned by p-independent vectors, namely
with being a p-tuple of arbitrary and linearly independent n-dimensional arrays. Grassmann manifolds are compact, smooth manifolds and are special cases of more general objects termed flag manifolds [39]. A representation of any of such subspaces may be assumed as the equivalence class [33]. In practice, an element of the Grassmann manifold is represented by a matrix in , whose columns span the subspace .
It is easy to see that manifolds of interest in applications may exhibit large dimensions. Coordinatizing these manifolds may be inconvenient for practical purposes when their sizes are larger than some units. For this reason, matrix manifolds are treated as submanifolds of or and their elements are represented by solid arrays.
In general, we shall denote as an ambient space that a manifold is embedded into. We shall assume that any ambient space of interest in the present tutorial paper is a Euclidean vector space, namely, a finite-dimensional vector space over the real numbers endowed with an inner product, that we shall denote as .
Example2.
An elementary example of manifold calculus involving the circle is the Borsuk–Ulam theorem.
Theorem1.
Let denote a continuous function. There exist two antipodal points such that .
Proof.
Let with . Choose an arbitrary point :
If , then ;
If , notice that = . Since g is continuous and takes a different sign in two different points of its domain, there must exist at least a point such that .
The above cases prove the assertion by exhaustion. □
The Borsuk–Ulam theorem has an interesting consequence for the temperature of the Earth. In fact, let us identify the equator with and the temperature over the equator with the function f, then the Borsuk–Ulam theorem implies that there exist two antipodal points over the equator where the temperature is the same. Indeed, the Borsuk–Ulam theorem holds in every dimension, which implies, for example, that if represents the Earth’s surface and returns the atmospheric temperature and the Earth’s pressure in a point of the Earth surface, then there exist two antipodal points on Earth that exhibit the same temperature-pressure value pair! ■
Matrix factorizations are characterized by a paramount importance in applied calculations, as illustrated in the following.
Example3.
Let us recall that every symmetric, positive-definite matrix may be factorized as:
where , and D denotes a diagonal matrix. In fact, D is the matrix whose in-diagonal entries coincide with the eigenvalues and R is the matrix whose columns coincide with the eigenvectors of P. The factorization (24) simplifies some calculations, for example, it holds that:
The net result is that convoluted matrix operations may be turned into fairly simple computations that require a finite number of elementary operations. ■
Manifolds may be classified as compact or non-compact:
The manifolds , with , and , with , are compact since there exists a ball , with , that contains them.
The manifold is non-compact since no t finite-radius ball exists that contains it.
In practical terms, the elements of a compact manifold are necessarily limited in value, while the elements of a non-compact manifold may take arbitrarily large values.
From a computational point of view, dynamical systems on compact manifolds, even if they turn out to be unstable, do not pose serious implementation problems, while dynamical systems on non-compact manifolds may result in implementational difficulties (and easily cause runtime errors).
3. Smooth Curves, Tangent Vector Fields, Tangent Spaces and Bundle, Normal Spaces
An interesting object we may think of on a smooth manifold is a smooth curve , with . It is worth remarking that a curve may cross different coordinate charts ; therefore, it is generally necessary to split a curve in as many branches (or segments) as coordinate charts it crosses. The function describes a curve on the manifold delimited by the endpoints and .
3.1. Curves and Bundles for Embedded Manifolds
Let us assume that a manifold is embedded into an ambient space of suitable dimensions (for instance, the sphere is embedded in the ambient space ). In the following, a number of examples, and a counterexample, are discussed to clarify the important notion of a smooth curve on a manifold.
Example4.
On a hypersphere , consider the following function:
with and being arbitrary (but let us avoid the singularity ). Notice that . To verify that such a function γ traces indeed a curve on the hyersphere it suffices to show that, for every and , it holds that . Indeed,
In addition, consider the following function
with and with and such that . (In this case, it turns out that .) Similarly to the previous example, in order to prove that γ is a valid curve on the hypersphere, it suffices to show that for every :
It is not difficult to evaluate and, in particular, to show that .
Let us now consider an example of a curve over the manifold of the hyper-rotations. In particular, let us examine the following curve on the manifold :
with and being arbitrarily chosen. (Notice that .) In order to show that function γ represents a curve over the space of planar rotations, it suffices to compute the product and to show that it keeps equal to the identity and that for every value of t in its range:
Further to this, let us prove that the following function represents a curve over the manifold :
with , and . (Even in this case, it holds that .) In order to prove such a statement, it is necessary to compute and to show that it equals and that at any time:
where the superscript denotes the inverse of the transposed matrix.
Notice that, in the proof, we have made use of some matrix identities, including the commutativity property and .
As a further example, let us consider the following function, which we shall prove to represent a curve over the manifold of the symmetric, positive-definite matrices:
with , , and . (Notice that .) To prove the assertion, it suffices to show that and that the eigenvalues of are strictly positive for any t in its range. Symmetry is immediately verified. The eigenvalues are and .
Let us now consider a counterexample. Define the following function:
with and arbitrary. In general, the above function does not represent a curve in . To prove such an assertion, it suffices to recall that, in general, a linear combination of two positive-definite matrices does not result in a positive-definite matrix. As a numerical example, consider the following: and . Then, . It is readily verified that if , then is not necessarily positive-definite for every value of t in the range (for example, is not positive-definite). From this counterexample, we can verify that the manifold , as all manifolds just exemplified, is not a linear space with respect to standard matrix operations. ■
From a system-theoretic perspective, a curve may be thought of as a trajectory generated by a dynamical system whose state space is a smooth manifold. Take, for instance, the case of a little magnetized ball rolling over a metallic spherical surface: no matter how the little ball moves over the surface of the sphere, during its motion the contact point will describe a curve over the larger sphere from the initial time to . The parameter t may be thought of as a time index, whose progression corresponds to the flowing of time. Thinking of physical objects moving over curved surfaces helps in understanding the main concepts in manifold calculus, even though a certain degree of abstraction is still necessary, not only because the dimensions of manifolds of interest may be far larger than 3, but also because most manifolds are difficult to visualize in practice.
A noteworthy property of curves is that they are equivalent up to a regular reparameterization.
Comparing the position over a smooth curve in two close-by instants provides the notion of speed, namely, how quickly an object is moving along a trajectory and in which direction. In particular, denoting by a curve such that , the quantity
denotes such a speed, which is represented as a tangent vector at the point x on the manifold. Clearly, the vector does not belong to the curved manifold , whereas it is tangent to it in the point x. (Notice that in this tutorial the word ‘vector’ is reserved to tangent vectors, while a column-type array—which may represent a location—is termed ‘array’.)
Consider every possible smooth curve on a manifold of dimension p passing through the point x and compute the tangent vectors to all these curves in the point x at once. The collection of these tangent vectors span a linear space of dimension p, which is referred to as tangent space (or ‘tangent plane’) to the manifold at the point x, and is denoted with
Since there exist infinitely many curves passing through a given point with the same velocity, one should define a tangent vector as an equivalence class of curves passing through such a given point, while being tangent to each other at that point. A tangent plane is hence a collection of such vectors. In addition, we should underline that each tangent space is a subset of the ambient space due to the special nature of , being both a point set and a vector space.
Taking a point and a tangent vector , any pair is thought of as belonging to an abstract set termed tangent bundle, defined as
In order to facilitate computation, it is useful to introduce the concept of the normal space of an embedded manifold in a given point under a chosen metric:
The normal space represents the orthogonal complement of the tangent space with respect to a Euclidean ambient space that the manifold is embedded into (in fact, some authors denote normal spaces as ).
Let us examine the structure of the tangent/normal spaces of a number of smooth manifolds:
Hypercube: Since the space is linear, when it is embedded into itself, each tangent space coincides to the whole space, namely, for every, it holds that . Since a normal space is the orthogonal complement to a tangent space, we must conclude that .
Hypersphere: At every point , the tangent space has the structure
The normal space at every point of the hypersphere, which is the orthogonal complement of the tangent space with respect to the ambient space that the manifold is embedded in, has the structure
in the case that the ambient space is equipped with the Euclidean inner product .
Special orthogonal group: The tangent space of the manifold has the structure
This may be proven by differentiating a generic curve passing through X at . Every such curve satisfies the orthogonal-group characteristic equation ; therefore, after differentiation, one obtains
By recalling that the tangent space is formed by velocity vectors , the above-mentioned result is readily achieved. Provided the ambient space is endowed with the canonical Euclidean metric , the normal space at a point X may be defined as
It is easy to convince oneself that every tangent vector may be written as , with H skew-symmetric (i.e., such that ); then, any element may be written as , with . In fact, the normality condition implies , which is equivalent to ; therefore, the normality condition may be recast as . It is hence necessary and sufficient that ; therefore,
Stiefel manifold: Given a trajectory , derivation with respect to the parameter t yields , which means that the tangent space to the manifold in a point has the structure:
The normal space has the structure:
Real symplectic group: The tangent space associated with the real symplectic group has the structure:
The tangent spaces and the normal space associated with the real symplectic group may be characterized as follows:
Space of symmetric, positive-definite matrices: Given a point , its tangent bundle may be characterized simply by observing that every curve satisfies and . Only the equality constraint influences the structure of the tangent space; therefore,
Notice that every tangent space is identical to each other as it does not depend on the base point P.
Grassmann manifold: For every element , the tangent space may be represented as:
A tangent space may be decomposed as the direct sum of a horizontal space and of a vertical space at [40]. Starting from a point , moving along a horizontal direction causes a change in subspace, while moving along a vertical direction does not change the subspace .
3.2. Vector Fields
Let us now consider a map that associates a tangent vector with every point x of a manifold. One such map is termed tangent vector field (or simply vector field, for short). The set of all tangent vector fields of a manifold is denoted as . Vector fields are objects of prime importance in manifold calculus. In physical system dynamics, vector fields are associated with, e.g., speed and acceleration.
Example5.
A vector field on a manifold is a function that assigns to every point a tangent vector . A vector field may even depend on the time parameter, in which case it will be denoted as with .
Let us now consider the hypersphere embedded in and the function defined as:
where x indicates a column array of 8 elements and is an arbitrary constant matrix. Such a function describes a vector field in because it holds that for every x. In fact,
Notice that the tangent vector is inextricably tied to the point x on the manifold. In fact, in general, taking two distinct points , it turns out that . It is then sensible to describe a vector field as a set of pairs:
As a further example, let us consider the manifold embedded in and a function :
where R is a orthogonal matrix variable, while A and B are skew-symmetric constant matrices (namely, and ). Such a function represents a (time-varying) vector field on the manifold of hyper-rotations in that, for every and , it holds that . Such an assertion may be proven as follows
As a last example, let us take the manifold embedded in and a function defined as
where P is a symmetric, positive-definite matrix variable (hence, even and are symmetric, positive-definite). It turns out that the function f represents a vector field in ; in fact, for every , it holds that :
(Notice that, in the proof, only the symmetry of the matrix P and of its powers and has been made use of.) ■
As a matter of fact, manifold calculus primarily deals with two kinds of objects:
Points on a manifold, denoted as ;
Tangent vectors, denoted as .
and occasionally on normal arrays, which are instrumental in calculations. In the case of matrix manifolds, which are of prime importance in applications, the ambient space ; hence, points x and tangent vectors v are essentially arrays or matrices (either rectangular or square).
3.3. Canonical Curves, Canonical Basis of a Tangent Space*
In intrinsic manifold calculus, the way to regard, e.g., tangent spaces and vector fields is based on differential operators [21]. Formally, if denotes a smooth function space, a tangent vector is defined in such a way that ; namely, denotes the directional derivative of the function along the direction v. Recall that a differential operator in may be written as a linear combination of elementary differential operators, namely the derivatives along the basis axes, through some coefficients.
Let denote a smooth manifold of dimension p and let denote any smooth curve such that . Let us assume, for simplicity, that the image of is entirely contained in a chart . The map traces out a smooth curve in the parameter space , which is differentiable in the usual multivariable calculus sense. Let us denote the intrinsic coordinates of the points over the curve as
The superscript notation to denote the th coordinate is standard in manifold calculus. This notation helps when checking expressions written in intrinsic coordinates. Since the chart is invertible by definition, we may represent the curve in local coordinates by
In particular, it holds that . On the basis of the representation (60), we may define as many as pcanonical curves as for and . Namely, a canonical curve around a point is traced on a manifold by letting only one of the p coordinates vary at a time. Let us now consider the tangent vectors
The set is termed canonical basis of the vector space . Its property of being a basis is implied by the fact that such canonical vectors are linearly independent from one another. Therefore, any tangent vector may be written as a linear combination of canonical vectors, that is
where the quantities represent the coefficients of the linear combination. To shorten the equations, the above expression may be written simply as , where the information about what happens at the point x has been suppressed and the summation is implied by the presence of a repeated index (i), one in upper position () and one in lower position (), which is commonly referred to as Einstein summation convention.
Example6.
Let us consider the case of a sphere embedded into . A parametrization of (excluding the South pole) is
for (where is commonly referred to as ‘latitude’) and (where is commonly referred to as ‘longitude’). For example, the point coincides with the North pole, while the point lays on the equator of the sphere. Given a point , the curve
traces a ‘meridian’ through x, while the curve
traces a ‘parallel’ through x. Hence, by definition, the tangent space is spanned by the canonical basis vectors
and
For example, it is readily seen that
represents tangent vectors at (i.e., the North pole). ■
The dual space to a tangent space , denoted as , is termed cotangent space. Elements of a cotangent space are termed covectors. Covectors combine with vectors by ‘annihilation’ to a scalar. Given a canonical basis of , it is possible to define a canonical basis of by the conditions . The union of all cotangent spaces form a cotangent bundle.
4. First-Order Dynamical Systems on Manifolds
A number of physical phenomena are described in system theory as systems of coupled differential equations in several variables. A fairly general representation of such systems is
where denotes a state variable array and represents a set of descriptive variables, and represents a state-transition function. The multi-variable array-type function f denotes a (possibly time-varying) vector field that represents the set of velocities of change of the state of the system. The solutions of the differential Equation (69) represent integral curves of the vector field f. In this case, the space represents the flat state space and the velocity space of the model (69) at once.
In order to take into account further constraints on the state variables, termed invariants, it is convenient to introduce the notion of curved state space in the form of a smooth manifold. This logical process leads to a type of first-order dynamical system described by
where denotes the state-transition function of the mathematical model and denotes the state of the system at any given time t. Even in this case, the function f denotes the velocity field; however, in this case, the state space is the manifold , while the velocity space is the tangent bundle .
Let us underline two aspects of the above equation:
Meaning of in the expression (70): Since we assumed the manifold to be embedded in an ambient space of type , the quantity is an array or a matrix made of the derivatives of the entries of with respect to the time parameter t. Let us recall, however, that even if such a specification helps one understand the subject, we shall never write any relation involving single components of such matrix-type objects as we shall treat any state variable x as a whole, (except in a low-dimensional example). The solution of the differential Equation (70) is the trajectory of the system and is represented by a curve on the manifold ; hence, denotes the speed along the trajectory since, for every , it holds that , namely .
Separation (or non-equivalency) of first-order and second-order systems: When dealing with dynamical systems in , the system (69) is actually fairly general since, upon introducing additional variables, it is possible to turn an n-th order system into a first-order system. We consider now how such a property stems from the legitimate ‘confusion’ between the state space and velocity space. In contrast to this, when dealing with dynamical systems on manifolds, such confusion is not legitimate, since while ; namely, the system state and the system velocity belong to very different spaces. It suffices to recall that is generally a curved space (i.e., non linear) while each is a vector space (flat, linear). For this reason, second-order systems are not assimilable to first-order systems.
Example7.
Let us reconsider the function (52). On the basis of such a vector field, we may define the following first-order system on the hypersphere :
Such a system may be rewritten equivalently as:
Let us examine the equilibrium points of such dynamical system. Any equilibrium point must satisfy the equation
Whenever A is symmetric and positive-definite, Equation (73) establishes that is an eigenvector of matrix A. In effect, the state of the system (72) evolves towards an eigenvector of the matrix A. For this reason, the dynamical system (73) represents a prototypical example of a continuous-time calculation system.
The differential Equation (72) is commonly referred to as theOja equationand was studied by Prof. Erkki Oja from the Helsinki University of Technology (currently Aalto University). It was studied in several contexts, including automation and control [41], and constitutes an exemplification of the fact that even a fairly simple computing element might be able to perform a complex calculations, namely, extracting an eigenvector (and the corresponding eigenvalue ) from an arbitrarily-sized matrix. ■
Let us now consider an example, drawn from circuit theory, that introduces the notion of ‘invariants’ for a dynamical system, which indeed motivates the study of dynamical systems on manifolds.
Example8.
In order to emphasize the notion ofinvariantsin connection to dynamical systems, let us consider the simple model of an ideal DC-to-DC converter studied in [42]. The mathematical model of such converter reads:
where are state voltages (across capacitors), is a state current (across an inductor) and denotes the control input (a switch). The following function is an invariant for such an electrical circuit
Indeed, such a quantity represents the total energy across the electrical network. By invariant it is meant that , namely that the time-function keeps the same value for every t. Notice, in fact, that
In order to simplify the analysis of such a dynamical system, let us define the following abstract state variables:
Let us arrange the above state variables into the state array . The system (74) may thus be rewritten as , with
In short, .
In terms of the new state variables, the invariant may be rewritten as:
Assuming, for instance, that , Equation (79) establishes that the state x belongs to the sphere . Hence, we may observe that the state space of the DC-to-DC converter is the unit sphere embedded in .
As a last verification step, let us prove that for every t. It suffices to prove that , namely that . By definition of H, it is readily found to be a skew-symmetric matrix, namely that for every t, it holds that . By transposing the product , we find the sought result; in fact,
and since is a scalar, it must hold that , therefore it must be equal to zero. ■
First-order dynamical systems are characterized by a (possibly varying) vector field that determines its dynamics, as illustrated by the following example.
Example9.
Let us consider the following first-order dynamical system on the manifold of three-dimensional rotations in space:
In this system, the matrix represents the orientation of a moving orthogonal frame attached to a rigid body with respect to an inertial reference frame, while the arbitrary constant matrix determines its rotational speed (namely, the orientation of the axis of rotation and the rotational velocity).
The orientation matrix is often termedattitudefor rigid bodies such as drones and satellites:
The instance indicates that the object is horizontal with respect to the reference frame;
The instance indicates that it is necessary to rotate the body-fixed axes to align them to the inertial axes.
The dynamical system (81) is of the type . Let us verify that is a vector field of . To show such a property, it suffices to prove that . This is in fact true:
It is worth noticing that the property holds true even in the case that A is a time-varying matrix field, namely . ■
5. Tangent Maps: Pushforward and Pullback
Let us consider two Riemannian manifolds and . Any smooth function transforms a curve in into a curve in . Since both curves are associated with their velocity vector fields, which define their tangent spaces, one might wonder how the function f maps tangent spaces in into tangent spaces in . The answer is materialized as a pushforward map.
Let , then a pushforward map is defined such that for every smooth curve , it holds that:
In general, therefore, given a function that maps a point x from the manifold to the point on the manifold , the map associates to any tangent vector v belonging to the tangent space a tangent vector belonging to the tangent space by ‘pushing’ such vector to such space.
A pushforward map is indeed a linear approximation of a smooth map on tangent spaces. Any tangent map is linear in the argument v. The map at a point x represents, in practical terms, the best linear approximation of the function f near x. A pushforward map may be regarded as a generalization of the total derivative of ordinary calculus. (Some authors would denote a pushforward map by an asterisk as in ; however, this notation takes up the space of the reference point x hence hindering it.)
Let us consider two special cases of interest.
Pushforward of a manifold-to-scalar function: The special case that , namely that f is a manifold-to-scalar function, is particularly important in applications. Such a special case will be covered later since it involves the notion of Riemannian gradient.
Pushforward of a matrix-to-matrix function: This is the case that the smooth manifolds and are real matrix manifolds embedded in . Any smooth function between any such pair of manifolds is of matrix-to-matrix type. Let us assume that the function f is analytic about a point , namely, that it may be expressed as a polynomial series:
Then, the pushforward map in a point applied to the tangent direction may be expressed as:
It is easily recognized that the tangent map is linear in the argument V. As a reference for the readers, we recall the analytic expansion of three matrix-to-matrix functions, that may be used to compute the corresponding pushforward maps:
Matrix exponential: For the map , it holds that , for ;
Principal matrix logarithm: For the map , it holds that , , for ;
Matrix inversion: For the map , it holds that , for .
The inverse of a pushforward map is termed pullback map and is denoted as . A map ‘pulls’ any tangent vector w from the tangent space back to the tangent space .
6. Lie Groups, Lie Algebras, Lie Brackets
Lie groups are hybrid mathematical constructions sharing properties that characterize smooth manifolds and algebraic groups. Let us recall that an algebraic group is an algebraic structure made of a set , either discrete or continuous, endowed with an internal operation denoted as , usually referred to as group multiplication (not necessarily a multiplication is the standard sense, though), an inversion operation denoted as , and an identity element with respect to group multiplication denoted as e. The multiplication, inversion and identity are related in such a way that, for every triple , it holds that:
Note that, in general, group multiplication is not commutative, i.e. .
Two instances of algebraic groups are , a discrete group, and ), a continuous group. The structure represents the set of integer numbers (either positive or negative, also including 0) with the standard addition as a group multiplication, which implies that the inverse is the standard subtraction while the identity is the 0 element. The second example, namely the structure ), represents the subset of non-singular matrices of size , endowed with the standard matrix-to-matrix multiplication ‘·’ as group multiplication. In such a case, the inverse operation coincides with the standard matrix inversion while the group identity coincides with the identity matrix . It is straightforward to show that such group operations/identities satisfy the recalled group axioms. A counterexample of a structure that is not an algebraic group is given by the set of non-negative integer numbers , which does not form a group under standard addition/subtraction (in fact, the subtraction of two positive integers does not necessarily return a positive integer).
With the notions of algebraic groups and smooth manifolds, we may now define a well-known object of manifold calculus, namely a Lie group. A Lie group conjugates the properties of an algebraic group and of a smooth manifold, as it is a set endowed with group properties and a manifold structure. Paraphrasing Wirth:
Lie group := Manifold + Algebraic group.
Let us denote by the tangent space of a Lie group at a point . The tangent space at the identity, namely , represents a special instance of tangent spaces. In fact, such a tangent space, upon being endowed with a binary operator termed Lie brackets, possesses the structure of a so-called Lie algebra and will be denoted as .
Let us examine the structure of the Lie algebra of the following Lie groups:
Hypercube: The hypercube, also known as a translation group, is a Lie group under standard matrix sum (matrix subtraction and zero matrix complete the group structure). The Lie algebra of coincides with itself.
General linear group and the special linear group: Both the general linear group and the special linear group are Lie groups under standard matrix multiplication and inversion. The Lie algebra of the general linear group, namely , coincides with . The linear algebra associated with the special linear group is more interesting and its determination involves some clever matrix computations. Let us consider endowed with standard matrix multiplication, inversion and as the group identity and a curve defined by , where denotes the matrix exponential and . Clearly, ; hence, represents any element of the Lie algebra . It is not hard to prove that
hence, . Now, let us recall a result from matrix calculus [43]:
In the present case, since , it follows from the above considerations that . In conclusion, we found that
the space of traceless matrices. The algebra has dimension .
Special orthogonal group: The manifold is a Lie group under standard matrix multiplication. The Lie algebra associated with the special orthogonal group is the set of skew-symmetric matrices . In fact, at the identity it holds that . The Lie algebra is a vector space of dimension .
Real symplectic group: The Lie algebra associated with the real symplectic group may be characterized as follows:
where the quantity J denotes again the fundamental skew-symmetric matrix.
Manifold of symmetric, positive-definite matrices: The Lie algebra associated with the Lie group is the set of symmetric matrices, namely . The space of symmetric, positive-definite matrices is not a group under standard matrix multiplication. We recall from [44] the following group structure :
-
Multiplication: , (logarithmic multiplication), with , where ‘’ denotes the principal matrix logarithm;
-
Identity element: (notice that );
-
Inverse: (matrix inversion), with (any symmetric, positive-definite matrix is non-singular).
It is easy to verify that the proposed instances of satisfy the algebraic-group axioms in . Additionally, the logarithmic multiplication on is compatible with its smooth manifold structure, as the map is smooth.
An essential peculiarity of any Lie groups is that the whole group may always be brought back to a convenient neighborhood of the identity e and the same holds true for every tangent space , , which may be brought back to the algebra . Let us consider, for instance, a curve such that . We may define a new curve
that has the property . Conversely, . This operation closely resembles a translation of a curve into a neighborhood of the group identity.
The above observation leads to defining two Lie-group functions:
Right translation: Defined as a function by for every pair ;
Left-translation: Defined as a function by for every pair ;
the distinction between the left translation and right translation is somewhat arbitrary: we chose a rather non-standard one.
Notice that and commute; in fact,
Let us consider a simple example to get acquainted with the notion of left/right translation.
Example10.
Let us particularize the notion of left/right translation to the familiar hypercube. In , considered as a group under standard array addition, these functions may be defined as and . The composition returns , namely ; hence, . Since and commute, we may say that, in this specific case, the functions and are the inverse of one another. In general, this is not true, though, and the failure of the composition to equal an identity map is caused by the lack of commutativity of a group. ■
Now, let us consider the curve : it crosses the identity e at . Consequently, the pushforward maps every vector in to a vector in . Namely,
Likewise, taking an arbitrary smooth curve passing by the identity at and considering the curve , it is readily seen that the latter crosses the point x at . Consequently, the pushforward maps every vector in to a vector in . Namely,
Recall that and are of equal dimension and that the pushforward map is linear; hence, the pushforward map allows us to translate a vector belonging to a tangent space of a group to a vector belonging to its algebra (and vice versa through its pullback). This is the reason for which the Lie algebra of a Lie group is sometimes termed the ‘generator’ of a group.
It is easy to see that, if the structure of is known for a group , it might be convenient to coordinatize a neighborhood of the identity of through elements of the associated algebra with the help of a conveniently selected homeomorphism (namely, a continuous function between topological spaces that has a continuous inverse function). Such a homeomorphism is known in the literature as an exponential map and is denoted as . We shall discuss the notion of exponential map for general manifolds in a later section.
As mentioned, a Lie algebra is endowed with a binary operator termed Lie bracket, denoted as . Let us survey its derivation. Let us define the function termed inner isomorphism as
The inner isomorphism provides a measure of non-commutativity of a Lie group. It is interesting to notice that the function preserves the inner product of ; in fact, taken two elements , it holds that
hence, it represents a Lie-group automorphism.
Now, let us take the differential of with respect to the variable y at . To this aim, it suffices to consider a curve such that and an element , and to compute the derivative of with respect to t at . The calculation of the derivative gives
Setting and letting and yields
By virtue of the properties (93) and (94), is a linear map from to itself, namely an endomorphism in . The map is termed adjoint representation of the Lie algebra .
In addition, let us now consider a smooth curve such that and and let us evaluate the map
By the chain rule, one obtains
This is termed adjoint operator and coincides with the Lie bracket of two tangent vectors up to sign, namely
The Lie bracket provides a measure of non-commutativity of the algebra .
Example11.
Let us consider the Lie group endowed with standard matrix multiplication and let us write down explicitly the inner isomorphism, the adjoint representation and the Lie bracket.
Since and , the inner isomorphism reads .
Taking a curve such that and yields
To end with, taking a curve such that and yields
In the above calculation, we have used a known matrix identity, namely
In this case, denotes the matrix commutator. ■
7. Metrization, Riemannian Manifolds
A Riemannian manifold is a smooth manifold whose tangent bundle is equipped with a smooth family of positive-definite inner products . An inner product is locally defined at every point of a manifold as a bilinear function from to . It is important to remark that the inner product acts on two elements of a tangent space to the manifold at a given point x; it therefore depends (smoothly) on the point x. Hence, such an inner product gives rise to a local metric. Whenever an inner product does not depend explicitly on the point x, it is termed uniform. Given two tangent vectors , their inner products are denoted as .
7.1. Coordinate-Free Metrization by Inner Products and Metric Kernels
Let us recall some of the general properties of any inner product. In general, any vector space may be endowed with an inner product (also termed scalar product) denoted by . Any inner product has a set of properties:
For every , it holds that ,
For every , it holds that ,
For every and , it holds that ,
The norm of a vector is defined as ,
An inner product is non-degenerate if and only if for every implies .
Let us consider a simple example about the above notions.
Example12.
Let us consider the special case and the inner product:
with S being a symmetric, matrix. If S, in addition to being symmetric, is also positive-definite, then such an inner product is non-degenerate. ■
It pays to recall the notion of adjoint operator with respect to a metric on a vector space. Let us consider again the vector space endowed with an inner product and a linear operator . Ad adjoint operator is one such that, for every , it holds that
A self-adjoint operator is one such that , while an antiadjoint operator is one such that . The following observation on self-adjoint operators in quadratic forms is in order.
Observation1.
Every operator may be decomposed as the sum of a self-adjoint and of an antiadjoint operator; in fact, it holds that
As far as quadratic forms are concerned, only the self-adjoint component plays a role. In fact, it is not hard to show that
This is indeed true for any arbitrary inner product of the type , in fact, some straightforward algebraic work reveals that
Since the right-hand side is a combination of quadratic forms, the left-hand side depends only on the self-adjoint part of ω.
In the case of a manifold, at every point, corresponds a tangent space that may be endowed with an inner product. Therefore, to denote the inner product assigned to the vector space , we use the notation .
Example13.
Given a Riemannian manifold and two smooth vector fields , we may define the function as
Notice that the inner product variessmoothlyfrom one tangent space to another; hence, the function h is regular in . The function represents a scalar field on .
As a further example, let us consider an arbitrary curve over a manifold . On the basis of such curve, we can define a function as
Notice that the inner product ‘accompanies’ the point that travels along the curve and involves different tangent spaces. Even the function represents a scalar fieldrestrictedto the curve γ. ■
On a Riemannian manifold , the norm of a vector is defined as . A specific property of Riemannian manifolds is that the norm is positive-definite; namely , where if an only if .
Choosing the ‘right’ metric for a given manifold in a given application is one of the most challenging aspects of metrization, especially because it is seldom unclear what ‘right’ practically means. When in doubt, it is possible to resort to canonical metrics. A number of canonical metrics for different manifolds of interest are summarized in the following:
Hypercube: For the space of (column) arrays, the canonical metric is the Euclidean metric , for every , while for the space of rectangular matrices, the canonical metric is the Euclidean metric , for every . Notice that both these metrics do not depend explicitly on the point that they are calculated at, and hence they are uniform.
Hypersphere: The hypersphere embedded into the ambient space inherits its canonical metric; hence, we shall choose for every . Even this metric is uniform.
General linear group and the special linear group: A metric for the general linear group is:
Such a metric was popularized, for instance, in [45], in the context of machine learning.
Special orthogonal group: The canonical metric in is defined as , for any and . Notice that the norm of a tangent vector is , known as the Frobenius norm [18].
Stiefel manifold: There are two well-known metrics for the Stiefel manifold, namely, the Euclidean metric and the canonical metric.
Euclidean metric: A possible metric that the Stiefel manifold may be endowed with is the Euclidean metric, inherited from the embedding of in :
Canonical metric: The Stiefel manifold may be endowed with a second kind of metric, termed ‘canonical metric’. The associated inner product reads:
which, unlike the Euclidean metric (113), is not uniform over the Stiefel manifold.
Real symplectic group: There exist two known metrics in the scientific literature that were applied to the real symplectic group.
Khvedelidze–Mladenov metric: A metric for the real symplectic group is:
It is referred to as Khvedelidze–Mladenov metric (or KM metric, for short, [46]). This is an indefinite metric; hence, a manifold endowed with this metric is not Riemannian (it is in fact referred to as a pseudo-Riemannian manifold).
Euclidean metric: A further metric for the real symplectic group is:
Such a metric is inherited from the embedding of a real symplectic group into the space of real invertible matrices that, in turn, is embedded into the real hypercube and hence inherits its canonical metric.
Space of symmetric, positive-definite matrices: The canonical metric in is defined as , for any and . Clearly, this is not a uniform metric.
Grassmann manifold: The canonical metric on a Grassmann manifold is
which corresponds to the Euclidean metric in the Stiefel manifold that is used to represent elements of the Grassmann manifold.
Example14.
The inner product (115) gives rise to a metric which is not positive-definite on the space . To verify this property, it suffices to evaluate the structure of the squared norm with and . By the structure of the tangent space , it is known that with symmetric. It holds that:
with being symmetric and being arbitrary. Hence, , which has an indefinite sign. ■
In general, there exists a canonical metric inherited from the ambient space that a manifold is embedded in. We know that whenever a manifold is embedded in an ambient space , the inner product between two tangent vectors may be written as
where denotes an inner product in . The expression (118) is based on a metric kernel . Metric kernels play a prominent role in coordinate-free embedded manifold calculus since a kernel and its derivative determine most of the main functions and maps that we shall encounter in the next sections. We assume that the metric kernel has the following properties:
Linearity: is linear in v, namely , for every , and ;
Symmetry: is a self-adjoint operator, namely ;
Closure with respect to : is an endomorphism of , namely , for every and ;
Invertibility: is invertible, namely, its inverse is well-defined for every .
In general, a metric kernel is not well-defined in the whole ambient space . We shall invoke the fact that it is well-defined in and that it might be extended to , in fact in the expression one must take but it is allowable to take , since the metric kernel is linear in the argument a. Occasionally, we might need to extend the metric kernel G by an operator that is defined at least in a neighborhood of and such that in . (In principle, such an ‘extendability’ requirement is not necessary, although it facilitates some computations and certainly clarifies some computational developments.)
Example15.
Let us consider a hypersphere where endowed with a metric . It is readily seen that, in this example, ; therefore, the metric kernel is linear, self-adjoint, invertible and an endomorphism of .
Let us further consider the manifold endowed with the metric , where is endowed with the metric . In this case, the metric kernel does not coincide with the identity, rather:
Let us verify that such metric kernel has the four mentioned properties:
1.
Linearity: appears to be linear in W; in fact, it holds that , for every , ,
2.
Symmetry: is self-adjoint; in fact, it holds that , by virtue of the cyclic permutation invariance of the trace operator,
3.
Closure: turns out to be an endomorphism of ; in fact, for every matrix , it holds that is a symmetric matrix. Hence, it belongs to (notice that because both U and P are symmetric),
4.
Invertibility: is invertible; in fact, if , then .
Notice that is not well-defined in (in fact, if P is taken in , does not necessarily exist).
As a last example, let us consider the Stiefel manifold is endowed with the metric , where endowed with . In this case, the metric kernel reads
Linearity and symmetry are readily proven. Closure may be proven as follows. Take and , namely . Define . It now suffices to show that , which holds true; in fact,
The invertibility of the matrix kernel follows from the invertibility of the matrix . From the Sherman–Morrison–Woodbury (matrix inversion) formula [47]
it follows, upon setting , , and , that
that exists for every . ■
Further, let us examine separately and concisely the structure of the metric kernel for the symplectic group endowed with its canonical metric.
Example16.
Let us consider the symplectic group endowed with the canonical metric . Assume the symplectic group to be embedded in the ambient space endowed with the Euclidean metric . A simple expansion gives
hence the metric kernel reads .
7.2. Covariancy, Contravariancy, Tensors*
The convention on index used so far comes from the nature of different objects in relation to their behavior upon coordinate changes.
Let a smooth manifold be a submanifold of dimension p of a Euclidean space and let us fix a curve , described by:
The subscript tells that the components of the curve are expressed through coordinates x. The tangent vector has the components:
Now, let us introduce a change of variable , for , such that
In the new coordinates, the tangent vector at is expressed as
At the heart of intrinsic manifold calculus is the requirement that one such tangent vector stays exactly the same, no matter which coordinates are used. Observe that the components and the components are related by
Such a set of equations prescribes how the components of a tangent vector vary upon a coordinate change. Every object that obeys the above law is termed contravariant. Contravariant objects are marked by an upper index.
The canonical basis vectors do not obey the law (129). In fact, from the requirement that , it follows that
which implies that . Multiplying both sides by gives
Since , we obtain the transformation law
Every object that obeys the above law is termed covariant. Covariant objects are marked by a lower index.
Concerning tangent vectors, we may summarize the above results saying that the components of a tangent vector are contravariant, while the basis of a tangent space is covariant.
Given two tangent vectors , their inner products may be written as
due to the bilinearity of the inner product. Now, define . Then, . The functions are clearly symmetric in their indexes (hence, on a smooth manifold of dimension p, these functions are , but only are independent). Moreover, these functions are covariant in both indexes.
Given the canonical basis of a cotangent space , every covector may be written as . It is easy to show that the components of a covector are covariant, while the basis elements of a cotangent space are contravariant.
Vectors and covectors are special cases of more general objects termed tensors. A vector is a -tensor, while a covector is a -tensor. In general, it is possible to construct -tensors by an operation termed tensor product denoted by ⊗. Each component of one such tensor has p upper indexes and q lower indexes. For example, one can construct the metric tensor , which is a -tensor. Its inverse is , which is a -tensor. Additionally, represents the mixed components of the so-termed fundamental tensor, which is a -tensor.
Example17.
Let us show in what sense functions are covariant components of a tensor. Given a variable change , from property (132) it follows that
This is the transformation law of a -tensor. The point is that the components transform into components through a linear expression, whose coefficients depend on the derivatives of one set of coordinates with respect to the other set. ■
8. Geodesic Arc, Riemannian Distance, Exponential and Logarithmic Map
A geodesic on a smooth manifold may be intuitively looked at in different ways:
On a general manifold, the concept of geodesic extends the concept of a straight line from a flat space to a curved space. In fact, let us consider a curved manifold embedded in an ambient space . Such ambient space contains straight lines in the usual meaning, but the manifold, being curved, hardly accommodates any straight lines. Geodesics are curves that resemble straight lines in that they copy some of their distinguishing features.
On a metrizable manifold, a geodesic connecting two points is locally defined as the shortest curve on the manifold connecting these endpoints. Therefore, once a metric is specified, the equation of the geodesic arises from the minimization of a length functional. Such a definition comes from the observation that any straight line in is indeed the shortest path connecting any two given points.
A further distinguishing feature of straight lines is that they are self-parallel, namely, sliding a straight lines infinitesimally along itself returns the same exact lines. Such a concept gives rise to a definition of geodesic which requires to specify the mathematical meaning of ‘sliding a piece of line infinitesimally along itself’. The technical argument to access such a definition is covariant derivation, which is not covered in the present tutorial (while it will be covered in a forthcoming review paper).
Another intuitive interpretation is based on the observation that a geodesic emanating from a point on a manifold coincides with the path followed by a particle sliding on the manifold at a constant speed. For a manifold embedded in a larger ambient space and in special circumstances, this is equivalent to the requirement that the naïve (or embedded) acceleration of the particle is either zero or perpendicular to the tangent space to the manifold at every point of its trajectory. (In the present tutorial paper, we use the term naïve acceleration to distinguish it from covariant acceleration, which will only be defined in a subsequent tutorial.)
Starting from the above informal description, we are going to examine in detail the notion of embedded geodesy. In particular, we shall treat the problem according to an energy-minimizing principle (which is in fact equivalent to a length-minimizing principle).
8.1. Coordinate-Free Embedded Geodesy
The length of a smooth curve may be evaluated though a rectification formula as:
The net result of this argument is that, through the definition of an inner product on the tangent bundle to a Riemannian manifold, we are able to measure the lengths of paths in the manifold itself, which turns the manifold into a metric space.
Example18.
Let us consider the manifold described in intrinsic coordinates:
Let us consider the curve obtained by mapping a segment of the plane to the manifold , namely:
Let us assume the manifold to be endowed with the metric . In order to evaluate the length of such a curve, it is necessary to evaluate the velocity field :
The length of the curve is now obtained through the rectification formula:
Such an integral may not be expressed in terms of elementary functions. ■
On the basis of the concepts of ‘curve’ and of ‘length of a curve’, it is possible to define the notion of ‘distance between two points on a Riemannian manifold’ .
To define a notion of distance, let us consider what follows:
Given a manifold , take two arbitrary points ;
Choose a curve that has p and q as endpoints, namely, such that and ;
Take as as distance the length of such a curve, namely ; one such definition seems a good starting point, but needs to be perfected since there exist infinitely many curves joining two given points.
It is important to underline that the notion of distance is not univocally defined since it depends on the inner product that a manifold is endowed with. Since there exist infinitely many curves joining two points on a manifold, a distance between two points p and q is actually defined as the length of the shortest curve connecting such two points, namely
The problem of selecting the shortest path connecting two points is far from easy to solve. A possible method to look for it is the so-called variational method that we are going to introduce in the following.
The key point is to introduce an energy functional defined as
whose minimum argument coincides to a geodesic. The variational method consists of looking for a curve that makes the energy functional stationary, namely, for a curve such that . A concept that facilitates the formulation of the variational method is that of normal space to a manifold in a point.
Clearly, the above definition introduces yet another degree of freedom, since the definition of normal space is based on a choice of inner product to evaluate orthogonality. It is instructive to notice that, since a normal space is defined as the orthogonal complement to a tangent space with respect to the ambient space, then
In other terms, every element in may be decomposed exactly as the sum of a tangent and a normal vector.
Example19.
Let us evaluate the structure of the normal spaces to the hypersphere embedded in . Let us recall that and that . Let us select for every . Since, by definition,
it is easily found that
Let us further determine the structure of the normal spaces for the manifold . Let us recall that and that . Let us take and for . Let us also recall that
where . Hence, any tangent vector at R may be written as with H being skew-symmetric. By definition,
and hence it is readily found that
As a verification step, let us show that, for every special orthogonal matrix R, skew-symmetric matrix H and symmetric matrix S, it holds that . To this aim, let us show that:
Now, it is not hard to prove that every skew-symmetric matrix H is ‘orthogonal’ to every symmetric matrix S; in fact:
and therefore . From this result it follows that .
As a last example, let us determine the structure of the normal space to the manifold . Recall that and that . In this case, choose and for every . By definition, it holds that
from which it turns out that
In fact, represents the set of matrices that are orthogonal to all symmetric matrices, which coincides with the set of all skew-symmetric matrices. Notice that does not depend explicitly on the base point P. ■
The notion of normal space affords formulating the variational problem as a differential inclusion. For every specific manifold, it will then be possible to turn such a differential inclusion into a differential equation to be solved under appropriate initial conditions. Let us consider a curve and let us define the fundamental form as:
The following result holds.
Theorem2.
For a manifold , the variational equation is equivalent to the following differential inclusion:
Notice that such a relation appears as a generalized form of the well-known Euler–Lagrange equation of rational mechanics [48].
Proof.
In order to prove such an important result, notice that the energy functional may be rewritten as:
It is important to notice that, by virtue of the property (118), for every , there exists a neighborhood of in , where the partial derivatives and exist (technically, one should use an extension in place of ). Such partial derivatives are intended in the Gateaux sense, namely
Let us define an arbitrary smooth variation , namely a deviation from the geodesic , such that .
For a sufficiently small constant , it is possible to evaluate the quantity . Applying the Taylor series expansion in gives:
where denotes a Landau symbol and it is used to compactly represent the remainder of the series (e.g., in Lagrange form). The variation of the energy of a curve may be concretely defined as:
The last integral is null. The second last integral may be rewritten, upon integration by parts, as:
The first addendum on the right-hand side is null because the variation vanishes at the endpoints of the curve; therefore,
By imposing that it is readily obtained that, since the variation is arbitrary, it must hold the relation (153). □
Let us apply the above theorem to a number of manifolds.
Example20.
Let us determine the equation of geodesics on the hypersphere through Equation (153). Let us recall that ; assume that (independently of x) and let us recall that . Now, the equation of the geodesic reads:
In summary, the equation for the geodesic is
(Since λ is arbitrary, it absorbed the factor .) The geodesic equation is then a second-order differential equation in γ that needs two conditions to be solved.
As a further example, let us derive the geodesic equation for the manifold . Let us recall that ; the canonical inner product in this space is (independently of R) and that . According to the general principle, the geodesic equation reads as
In summary, the geodesic equation on the space reads as
with S denoting an arbitrary symmetric function. ■
Through the notion of geodesic arcs, one may define the notion of Riemannian (or geodesic) distance between two nearby points in a smooth manifold. In fact, let us denote by two nearby points and assume that there exists a unique geodesic arc such that and . By definition, the length of such geodesic arcs is taken as the distance between its two endpoints, namely:
It is worth noticing, at this point, that the expression of the geodesic distance may be rewritten in terms of the fundamental form , namely
A noteworthy (and extremely useful) result that we are going to prove is that, along a geodesic, the fundamental form keeps constant. We shall see after the proof that such a result noticeably simplifies the computation of the geodesic distance.
Theorem3.
Along a geodesic arc , the function stays constant with respect to t.
Proof.
Let us mention that, for any given pair , it holds that
which clearly stems from the fact that is quadratic in v. Such a property may be shown by recalling that , where G denotes again a metric kernel, and by noticing that
from which it follows that
hence the property (166). Let us now show that the differential inclusion (153) and the property (166) imply the constancy of the fundamental form along a geodesic, namely that:
on every geodesic. By multivariable calculus it is readily proven that:
Integrating such an equation over the interval gives:
The integral on the left-hand side is equal to , while the last integral on the right-hand side may be evaluated through integration by parts and equals
by Equation (166). From Equations (171) and (172), it follows that:
hence
From the differential inclusion (153), it follows that the integrand is null; therefore, for every . □
The above result means that any geodesic trajectory is traveled at a constant speed. Since, along a geodesic arc, the speed is constant for any t, it holds that
It is worth underlining that, provided that one knows the functional expression of the geodesic connecting two points, their distance does not require integration, but just algebraic operations. (The difficulty is hidden by the fact that finding the exact expression of a geodesic connecting two points may be more troublesome than one might expect.)
We make an observation in order to justify an idea presented at the beginning of this subsection about normal naïve acceleration for a geodesic.
Observation2.
Let us take a closer look at the differential inclusion (153), where , namely
Under the hypothesis that the metric kernel does not depend explicitly on the point x, the first term on the left-hand side vanishes. In this case, let us plainly assume that . In addition, we have already established that . The above differential inclusion may hence be written as
namely, the naïve acceleration is perpendicular to the velocity. This is perhaps the simplest form of a geodesic equation on a manifold. This is, in fact, the case for the hypersphere surveyed in an example.
A geodesic arc may be expressed in terms of two pieces of information, in addition to its endpoints, such as the initial values and , which represent the point where a geodesic departs from and its initial speed, respectively. A geodesic arc determined by these two pieces of information will be denoted as .
Example21.
Let us consider a geodesic , with , such that . In this case, it holds that
■
8.2. Exponential and Logarithmic Maps
To a geodesic arc is associated a map defined as:
The function (179) is termed exponential map with pole and is of paramount importance in manifold calculus. The exponential map takes two arguments, the point over the manifold and a vector tangent to the manifold at the pole x, and returns a point on the manifold itself. A practical reading of the exponential map is that it advances a point x toward a direction v, like vector addition moves a point along a straight line in flat spaces. In other words, the following two expressions are the counterparts of each other in a flat space and in a manifold :
Much like the first expression denotes the shortest/straightest line on a flat space (endowed with a Euclidean metric), the second expression denotes the shortest/straightest line over a curved manifold.
The exponential map depends on the pair and is locally invertible around . This follows from the fact that and by local invertibility results. In the flat manifold , the exponential map may be inverted easily; in fact
In the flat manifold , we recover the classical and intuitive meaning of the exponential map and of its inverse:
Exponential map: Given a point x and a vector v in , the exponential map moves the point from x to . (Indeed, the term ‘vector’ comes from the homonymus Latin term that means ‘transporter’.)
Inverse exponential map: Given two points x and y in , the inverse of the exponential map returns a vector v such that . In other words, the inverse exponential map applied to two points determines the vector v that ‘transports’ x to y.
Let us observe that the inverse exponential map is non-symmetric in its arguments, namely the notation makes it immediately clear that it is not allowed to swap x with y. Indeed, even in the flat space , it holds that . (In the case of a curved manifold, such a reciprocity relation is more convoluted.)
Generally, the inverse exponential map is termed logarithmic map. On a manifold where it is defined an exponential map a logarithmic map is denoted as . A logarithmic map takes as arguments two points and returns a vector. It is important to underline that a logarithmic map is defined only locally, namely, only if lay sufficiently close to one another. The lack of a global logarithmic map may be understood as follows. A logarithmic map is defined by the following equation:
and therefore, the existence of is tied to the chance of determining one and only one geodesic arc that connects the given points , and then, by the equality , it follows that . However, not every pair of points may be connected by a unique geodesic, and hence a global logarithmic map, in general, fails to exist. A quick example of such an unavoidable problem is found on the sphere ; taking x as the North pole and y as the South pole, there exist infinitely many geodesic lines that connect the two poles. Hence, is undetermined.
Let us examine the expressions of geodesic arcs, geodesic distances and exponential maps for manifolds of interest in system theory and non-linear control.
Hypercube: The space endowed with the Euclidean metric admits straight lines as geodesics; in fact, since it follows that , and hence the geodesic equation is simply and its solution is for every , and . Now, take two points and look for a geodesic arc connecting them with . It is necessary to find a vector such that . Such a vector is clearly ; hence, the unique geodesic arc connecting x to y is . Since , the Riemannian distance reads
which is a well-known result from calculus and geometry.
Hypersphere: On the hypersphere embedded in the Euclidean space , a geodesic line may be conceived as a curve on which a particle, departing from the point with velocity , slides with constant speed , where denotes the standard vector norm. On the hypersphere, we denote such a curve as , where the variable provides a parametrization of the curve. The differential equation characterizing geodesics on the hypersphere may be determined by observing that, with the given conditions, in this case the naïve acceleration of the particle must be either null or normal to the tangent space at any point of the hypersphere itself, namely . Since the normal space to a hypersphere at a point x is radial along x, the geodesic equation reads as . In explicit form, the equation of the geodesic on the unit hypersphere may be written as [49]:
as it is easy to verify by substitution. Additionally, it is easy to verify that , and that for every t. The exponential map associated with the above geodesic is
The relationship (184) for the geodesic represents a ‘great circle’ on the hypersphere. Now let us take two points (non-antipodal, such that ) and let us look for a geodesic arc of the form (184) connecting them. It is clearly necessary to find a vector such that . Such an equation in the unknown v may be expressed explicitly as
where ‘’ denotes the cardinal sine function. Pre-multiplying the above equation by gives , namely ; hence
This expression represents the inverse of the exponential map applied to points , namely
Notice that such a logarithmic map is defined only when , namely when the two points are antipodal. The unique geodesic arc connecting x to y is given by
A noticeable consequence is that, since , the Riemannian distance between the points x and y reads
where the inverse cosine function ‘’ returns a value in .
Special orthogonal group: In general, it is not easy to obtain the expression of a geodesic arc on a given manifold in closed form. In the present case, with the assumptions considered, the geodesic on departing from the identity with velocity has expression . (It is important to verify that and .) It might be useful to verify such an essential result by the help of the following arguments. A geodesic on the Riemannian manifold , embedded in the Euclidean ambient space and endowed with its canonical metric, departing from the identity , should satisfy , and therefore it should hold:
Additionally, we know that any geodesic arc belongs entirely to the base manifold; therefore . By differentiating such an expression two times with respect to the parameter t, one obtains:
By plugging Equation (191) into Equation (192), we find that , which leads to the second-order differential equation on the orthogonal group:
to be solved with the initial conditions and . It is a straightforward task to verify that the solution to this second-order differential equation is given by the one-parameter curve , where ‘Exp’ denotes a matrix exponential.
The expression of the geodesic arc in the position of interest may be made explicit by taking advantage of the Lie-group structure of the orthogonal group endowed with the canonical metric. In fact, let us consider the pair and as well as the geodesic that emanates from X, namely , with velocity V. We claim that the geodesic departing from in the direction is:
In fact, let us consider the left-translated curve . It has the following properties:
The curve belongs to the orthogonal group at any time. This may be proven by computing the quantity and taking into account that the identity holds true. Therefore,
(2)
It satisfies Equation (193); hence, it is a geodesic. In fact, notice that ; hence, and . Now, the right-hand side of Equation (193) has the expression because , hence the claim.
(3)
It satisfies and ; hence, it has the correct base point and direction.
Therefore, the exponential map associated with the above geodesic expression reads
Now, fixed two points , let us compute the geodesic arc that joins them. It all boils down in finding a tangent vector such that . Pre-multiplying this expression by gives ; hence, . Pre-multiplying the last equality by X gives the sought tangent vector as . The logarithmic map associated with the above exponential map therefore reads
The Riemannian distance between X and Y then takes the expression
where ‘Log’ denotes, as usual, the matrix logarithm.
Stiefel manifold: Let us consider the expression of geodesics corresponding to two metrics.
Euclidean metric: The solution of the geodesic equation, with the initial conditions and , reads [33]:
for . The expression of the exponential map corresponding to the Euclidean metric reads, therefore,
while the expression of the logarithmic map and of the Riemannian distance between two given points, in closed form, are unknown at present (to the best of the author’s knowledge). The pseudo-identity matrix , with rows and p columns, is just an identity matrix topping a zero matrix.
Canonical metric: The geodesic arc may be computed as follows. Let Q and R denote the factors of the thin QR factorization of the matrix V, then:
for . The expression of the exponential map corresponding to the canonical metric is, therefore,
while the expressions of the logarithmic map and of the Riemannian distance between two points are still unknown.
In fact, neither the logarithmic map nor the geodesic distance are known in closed form for a Stiefel manifold.
Real symplectic group: According to the two considered metrics, we have:
KM metric: Under the pseudo-Riemannian metric (115), it is indeed possible to solve the geodesic equation in closed form. The geodesic curve with and corresponding to the indefinite Khvedelidze–Mladenov metric (115) has the expression:
In fact, the geodesic equation in variational form is:
The calculation of this variation is facilitated by the following rules of the calculus of variations:
for curves . By computing the variations, integrating by parts and recalling that the variations vanish at endpoints, it is found that the geodesic equation in variational form reads:
The variation is arbitrary. By the structure of the normal space , the equation , with , implies that with . Therefore, Equation (207) is satisfied if and only if:
or, equivalently,
for some . In order to determine the value of matrix H, note that:
Substituting the expression into the above equation yields the condition . Hence, and the geodesic equation reads:
Its solution, with the initial conditions and , is found to be of the form (202). By definition of matrix exponential, it follows that .
Euclidean metric: The expression of the geodesic corresponding to the Euclidean metric was derived in [50]. Let be a geodesic arc connecting the points . Let us define . The geodesic that minimizes the following energy functional
is the solution of the differential (Lax) equation:
Furthermore, for the initial conditions and , the geodesic on the real symplectic group is given by
in the case that the real symplectic group is equipped by a Euclidean metric.
Space of symmetric, positive-definite matrices: The geodesic arc, corresponding to the canonical metric, emanating from a point in the direction has the expression:
where denotes a symmetric matrix square root. The exponential and the logarithmic maps for and thus read:
The symmetric matrix square root of a matrix P may be computed by means of its eigenvalue factorization. In fact, if the matrix P is factored as , with and every , then it holds that . The squared Riemannian distance between two points is given by
Notice that the following identity holds true:
and hence the Riemannian distance between two SPD matrices may be written equivalently as
as obtained by the series expansion of the matrix logarithm function (cf. Section 2.1).
Grassmann manifold: A geodesic arc on a Grassmann manifold emanating from with the velocity may be written as:
where denotes a Stiefel matrix whose columns form an orthonormal basis of the subspace and denotes the compact singular value factorization of the matrix V. The sin/cos functions applied to a diagonal matrix simply act, in terms of components, on the entries of the main diagonal. As a compact, smooth Riemannian manifold, a Grassmann manifold is geodesically complete, which implies that any two points on may be connected by a geodesic arc.
The exponential map associated with the canonical metric reads:
The logarithmic map of two subspaces is not easy to compute in general. In [40], it is shown that if their Stiefel representatives are such that the product is symmetric, then , where denotes the spectral factorization of the matrix and .
Example22.
Let us compute the length of a geodesic curve that, by definition, represents the distance between points :
Let us start by showing that:
with ; hence,
It is instructive to verify that does not actually depend on t, which facilitates the computation of the length. In fact, the length of is given by:
Therefore, the distance between equals:
As a further example, let us determine the length of the geodesic curve given by the relationship (194). By definition, it holds that
Let us show that
therefore
as proven in the general theoretical developments. ■
The following observation clarifies a technical argument.
Observation3.
The parameter t in the expressions of geodesic lines, which we may identify astime, indicates which point of a geodesic one is referring to. Normally, the value corresponds to the initial point x, namely . It is instructive to observe that, in all examined cases, the velocity v and the time parameter t appear to be multiplied to one another. For example, the equation of the geodesic on a hypersphere may be rewritten as
This means that it is always possible to re-scale the parameter t as as long as the velocity v is re-scaled as , where . This is the reason why the time parameter t normally ranges in the interval .
8.3. Geodesic Interpolation
The notions of metric, distance, geodesics, are related to the notion of interpolation. In fact, the interpolation between two points may be defined through the optimization problem [51]:
with parameter providing the degree of interpolation between the two points. The one-parameter curve describes a trajectory over the manifold , having x and y as endpoints, which may be regarded as an interpolation curve. For example, the value is regarded as a midpoint between points x and y.
Example23.
Let us consider two close points and let us look for an interpolation of such points in endowed with its canonical metric. We are looking for a minimizer of the criterion:
with assigned. We may look for an explicit solution of the above minimization problem by reasoning as follows. The points X and Y may be connected by a geodesic arc with and such that , namely, with V satisfying condition . As the interpolating point must be as close as possible to both X and Y, it should belong to the geodesic ; therefore, we are left with the problem of minimizing the criterion function:
with respect to the variable τ. In the second line, we have used the identity . The minimum of the above criterion function in is readily seen to be achieved in ; thus, the interpolating curve reads:
for . Notice that a matrix power , with and , is defined (inasmuch as it exists) in terms of matrix exponential/logarithm, hence the last identity. ■
Generalizing the notion of midpoint, one comes up with the notion of mean value (and of dispersion of a sample set around its mean value) in a metrizable manifold . We may consider what follows:
The notion of ‘mean value’ of objects in a metrizable space should reflect the intuitive understanding that the mean value is an element of the space that locates ‘amidst’ the available objects. Therefore, a fundamental tool in the definition of mean value is a measure of ‘how far apart’ elements in the sample space lie to one another.
The notion of ‘metric variance’ of objects in a metrizable space should be defined in a way that accounts for the dispersion of these objects about their mean values and also depends on how the dissimilarity of such objects is measured.
A way of defining the mean value of a set of objects , , is provided by the notion of Fréchet sample mean. The Fréchet sample mean and associated sample metric variance [52] may be defined as:
where d denotes a distance function in the Riemannian manifold . It is worth noting that there is no guarantee, in general, that the optimization problem (233) admits a unique solution. For example, consider the space and a sample set consisting of two antipodal points; then, every point over the equator is a mean value. (Notice that a mean is Fréchet only if it is a global minimizer, otherwise it is a Karcher mean.) If the sample points are close enough to each other, it is known that the mean value is unique [53].
The notion of mean value of a set of points belonging to a curved manifold is utterly important in system theory and non-liner control. In fact, the mean value is, by definition, close to all points in a collection of points. Therefore, the tangent space associated with a cloud of data points may serve as a reference tangent space (see, for example, [54]).
In terms of the variation of an energy functional, a geodesic on a manifold is defined as the unique curve , that meets the following requirement
where the symbol denotes the variation of the integral functional. The integral functional in (235) represents the total action associated with the parametrized smooth curve of local coordinates and may be written explicitly as:
The variation in the expression (235) corresponds to a perturbation of the total action (236). Let denote an arbitrary parametrized smooth curve of local coordinates such that . Define the perturbed action as
with , . The condition (235) may be expressed explicitly in terms of the perturbed action as:
The perturbed total action may be expanded around as follows:
where is such that . The first term in the integral on the right-hand side of the last line may be integrated by parts, namely:
whose first term on the right-hand side vanishes to zero because , hence:
A similar result holds for the second term within the integral on the right-hand side. Therefore, it holds that:
Let us now recall the notion of Christoffel symbols of the first kind (named after Elwin Bruno Christoffel) associated with a metric tensor of components . These coefficients are defined as:
On top of it, the Christoffel symbols of the second kind are defined as . The Christoffel symbols of the second kind are symmetric in the covariant indices, namely, .
On the basis of the Christoffel symbols, Equation (241) may be rewritten as
Multiplying by and introducing the Christoffel symbols of the second kind, Equation (243) may be written as:
Such a system of second-order differential equations needs two initial conditions to be solved. Typical initial conditions are and .
Example24.
One might wonder if the Christoffel symbols defined in (242) are the components of a tensor. The answer is negative, because none of its lower index denote covariancy. ■
9. Riemannian Gradient of a Manifold-to-Scalar Function
The gradient of a scalar function in a point of its manifold-type domain represents a degree of variability of the function in a vicinity of such a point. In the present section, we are going to survey the notion of ‘Riemannian gradient’ starting from more familiar cases to obtain a full definition for manifold-to-scalar functions.
9.1. Riemannian Gradient: Motivation and Definition
In the simplest case of a scalar function of a scalar variable , its degree of variability is quantified by its slope . In fact, let us recall that, given a point , moving slightly away from it of a little amount , the value of the function may be related to the value of through a Taylor series expansion. Such an expansion, truncated to a first-order term by a Lagrange-type remainder, reads
and hence the first derivative of the function (that is, its gradient) truly quantifies the variability of the function around the point x. In the above expression, again denotes a Landau symbol.
In the more involved case of a scalar function of several variables, , the gradient quantifies again the degree of variability of the function around a point, although such information is spread along p axes. In fact, in this case, there exist multiple directions along which the variability of a function may be explored. After choosing a direction, one may move away along a straight line. Starting from a point and moving away along a direction of a small fraction t, the value of may be related to the value of through a multivariable Taylor series expansion which, in the first order, reads
Therefore, the gradient of the function f (which, in this case, may be termed ‘Euclidean gradient’ and coincides with the ‘Gateaux derivative’ and with the ‘Jacobian’ of the function) encodes the variability of the function and the quantity represents the directional derivative of the function f at x along the direction v. In the following, we shall utilize the shorter notation .
In the case of a scalar function whose domain is a curved manifold , its gradient, which we shall denote as , serves again to quantify the degree of variability of a function in the neighborhood of point along a given tangent direction. On the same line of reasoning as before, starting from a point one may think of moving away slightly along a given tangent direction . After choosing a direction, the simplest way to move away from a point along a given direction is to travel along a geodesic line for a short time. Thus, taking a geodesic , the value of may be related to the value through a Taylor-type expansion, which is written as
Therefore, the gradient of the function f quantifies the variability of a function around a point, while may be interpreted as a directional derivative of the function f at x along a direction v. From the relationship (247), we see that
Taking the limit as t approaches zero, the second addendum on the right-hand side vanishes to zero; hence, we obtain
The right-hand side is resemblant of a Gateaux derivative of the function f in the direction v. The relationship (249) may be indeed taken as a definition of the gradient of the function with the proviso that it should hold for every .
By taking a closer look at the expression (249), it is readily seen that it holds even if the geodesic line is replaced by any smooth curve as long as and , in which case we can write
Now let us underline an illusory contradiction in the above expression.
Observation4.
The curve γ takes part only in closed proximity of the point x; therefore, the right-hand side of the above relation only depends on the point x and the tangent v, as well as the function f, and, in particular, it does not depend on the metric. In other words, the result of the expression quantifies how much the function f varies upon moving away from the point x toward the direction v. As opposed to this, the left-hand side of the relation (250) seems to depend on the metric . Since the metric may be chosen arbitrarily and independently of the function f and of the quantities x and v, the above relation looks like a conceptual clash.
The only possible way to fix such an illusory contradiction is to recognize that the gradient itself must depend on the metric in such a way that the directional derivative does not. In other words, if we denote by and two metrics for the manifold , then it must hold that
for every , where denotes the gradient of the function f at x corresponding to the metric and denotes the gradient of the function f at x corresponding to the metric .
Let us formalize the above discussion with the aim of formalizing the notion of Riemannian gradient on a Riemannian manifold. Let us consider a Riemannian manifold embedded in an ambient space and, for every point x, the tangent space . Let us now consider a smooth function and an inner product in . We shall require that the function f is extendable to a neighborhood of x in so as to be able to compute the Gateaux derivative of the function f with respect to x. (As usual, we are supposed to use an extension in the following expressions.) Let us recall that
We denote the inner product that the tangent bundle is endowed with at as . The Riemannian gradient of the function f over the manifold at the point x is uniquely defined by the following two conditions:
Tangency condition: For every , .
Metric compatibility condition: For every and every , it holds that .
The tangency condition codifies the requirement that a Riemannian gradient must always be a tangent vector to the manifold, namely , while the metric compatibility condition codifies the requirement that the inner product between a gradient vector and any other tangent vector belonging to the same tangent space takes a value that is invariant with respect to the chosen metric. The latter condition is better understood by considering that the linear component of the variation of a function when moving away from a point x of a quantity v is given by
Clearly, the amount depends only on x, f and the displacement v, certainly not on the metric. As a consequence, the gradient needs to depend on the metric. The ‘reference’ inner product is taken as the inner product that the ambient space is endowed with. In fact, the Riemannian gradient of a manifold-to-scalar function represents the rate of change of such function at a given point toward a given direction. The superscript may be removed for simplicity.
Let us survey the calculation of the Riemannian gradient in a number of spaces of interest in applications.
Hypercube: In the space endowed with the Euclidean metric, the Gateaux derivative of a regular function is simply the column array of partial derivatives of function f with respect to the entries of the column array , namely
Likewise, the Gateaux derivative of a regular function is the Jacobian matrix of partial derivatives of function f with respect to the entries of the matrix X, denoted by , namely
Hypersphere: Given a regular function , its Riemannian gradient at is denoted as . The Riemannian gradient associated with the canonical metric has the expression:
In fact, the metric compatibility condition prescribes that , for every ; hence, for every , and therefore . It follows that
for some . The tangency condition then entails , hence that , namely , which gives the stated result.
Special orthogonal group: Let us compute the Riemannian gradient of a regular function . Let the manifold be equipped with its canonical metric. Let denote the gradient vector of a function f at derived from the canonical metric. According to the compatibility condition for the Riemannian gradient it must hold that:
and therefore:
This implies that the quantity belongs to the normal space , namely:
In order to determine the unknown matrix S, we may exploit the tangency condition, namely . Let us first pre-multiply both sides of Equation (258) by , which gives:
The above equation, transposed hand by hand, gives:
Hand-by-hand summation of the last two equations gives:
that is:
By plugging the expression (259) into the expression (258), we obtain the Riemannian gradient in the orthogonal group, corresponding to its canonical metric:
Stiefel manifold: Let us compute the expression of the Riemannian gradient of a regular function at a point corresponding to the Euclidean and the canonical metrics. Recall that the Riemannian gradient in a Stiefel manifold embedded in the Euclidean space is the unique matrix in such that:
Euclidean metric: The metric compatibility condition becomes , which implies that , with S being symmetric. Pre-multiplying both sides of this equation by the matrix yields . Transposing both hands of the above equation and summing hand by hand yields:
From the condition , according to Equation (46), it follows that . In conclusion, the sought Riemannian gradient reads:
Canonical metric: The metric compatibility condition prescribes that:
and therefore, invoking the tangency condition as well, it turns out that:
Solving for the Riemannian gradient yields the final expression:
Real symplectic group: According to the two considered metrics, the expression of the gradient may be computed as outlined below.
KM metric: The structure of the pseudo-Riemannian gradient of a regular function associated with the Khvedelidze–Mladenov metric (115) is given by
In fact, the pseudo-Riemannian gradient of a regular function associated with the metric (115) is computed as the solution of the following system of equations:
The first constraint ensures the compatibility of the pseudo-Riemannian gradient with the chosen metric, while the second constraint enforces the requirement for the pseudo-Riemannian gradient to lay on a specific tangent space. In particular, the metric compatibility condition may be recast as:
The above condition implies that , and hence that with . Therefore, the pseudo-Riemannian gradient of the criterion function f has the expression:
In order to determine the value of the unknown skew-symmetric matrix H, it is sufficient to plug the expression (270) of the gradient within the tangency condition, which becomes:
Solving for H gives:
Plugging the above expression into (270) gives the result (267).
Euclidean metric: The structure of the Riemannian gradient of a regular function corresponding to the metric (116) reads:
In fact, the Riemannian gradient must satisfy the conditions:
from which the expression of the Riemannian gradient associated with the metric (116) follows.
Manifold of symmetric, positive-definite matrices: The Riemannian gradient of the function may be calculated as the unique vector in that satisfies the following equation:
The solution of the above equation satisfies:
and hence the expression of the sought Riemannian gradient follows:
Grassmann manifold: The Riemannian gradient may be calculated by its definition and reads as:
The above calculations may be conveniently unified by recalling the notion of metric kernel, which affords turning the metric compatibility condition into a projection. Let us recall that the inner product may be written as . On the basis of such an equality, the metric compatibility condition may be rewritten as
Equivalently, for every . By the definition of a normal space, such a condition may be rewritten as
In practical terms, the Gateaux derivative may be uniquely decomposed into the sum of two terms, a tangent one (that it, the gradient) and a normal one. Upon discarding the normal component, the remainder is the sought gradient.
The latter observation may be expressed using the notion of orthogonal projection. Let us define an orthogonal projector as an operator that maps an element of the ambient space to a tangent vector in a specific tangent space. Let us observe that
The projection operator is defined by the two conditions:
Tangency: for and .
Complementarity: for all .
Notice that if then , while if then .
Observation5.
Orthogonal projection stems from a minimization problem, namely, given a point and a vector space , the orthogonal projection of a on is the shortest vector in that approximates a, namely
Applying orthogonal projection to both sides of the relationship (280) leads to
but since , we may write the explicit formula
The latter expression is quite suggestive, as it states that the Riemannian gradient of a function may be computed as the projection of its Gateaux derivative over a tangent space, to make it a tangent vector, further compensated by the metric kernel to take into account the effect of the metric (namely, to make it sure that the inner product of gradient with any tangent vector is independent of the metric). Notice that . Since is self-adjoint, such expression simplifies into which is, in fact, independent of the metric kernel.
Let us summarize a few expressions of interest of orthogonal projection.
Hypersphere: An expression of an orthogonal projector , for , where the ambient space is endowed with a Euclidean metric, is:
Notice that is radial (that is, directed along x) and hence normal.
Stiefel manifold: It might be useful to define an orthogonal projection operator , for . Let us assume the ambient space be endowed with a Euclidean metric . In this case, the orthogonal projection takes the expression
Grassmann manifold: An expression of orthogonal projection , for is
Let us apply the above considerations to a quadratic function on the hypersphere.
Example25.
Let us consider the function defined as:
with being constant. We assume the hypersphere to be embedded in the ambient space endowed with a Euclidean metric. The Euclidean gradient (and also the Gateaux derivative) of such a function reads as
Since, on the hypersphere, , the Riemannian gradient of the function f takes the expression
■
Let us verify that the expression (286) is indeed an orthogonal projection over the tangent bundle .
Example26.
The expression realizes an orthogonal projection over the tangent space . To prove such a statement, it suffices to show that and that for every , which represent the conditions of tangency and of orthogonality, respectively. To what concerns the first property:
To what concerns the second property, notice that
Now observe that is skew-symmetric, while is symmetric, for every , and . Since , the statement holds true. ■
A further interesting counterexample clarifies the notion of orthogonality in orthogonal projection.
Example27.
Let us consider the function which realizes a projection over the tangent bundle but not an orthogonal one as long as the Euclidean metric is concerned. Showing that it indeed realizes a projection is fairly easy; in fact,
Likewise, we can show that such a projection isnotorthogonal in the Euclidean metric; in fact,
Since the matrix product is arbitrary, the latter expression is not guaranteed to be null. ■
As a last example, let us consider the case of the symplectic group endowed with its canonical metric. The orthogonal projection is not easy to express in closed form.
Example28.
Let us consider the symplectic group embedded into the ambient space endowed with a Euclidean metric . By definition of orthogonal projection, must be an element of the tangent space , while must be an element of the normal space . According to the characterization (49), it should then hold
namely , with H skew-symmetric to be determined. From the tangency condition, it follows that
Through some manipulations, the above equation may be rewritten as
All quantities in the above equation are known except for the unknown skew-symmetric matrix H. The above equation is an instance of the more general Sylvester equation (for a review, see, e.g., [55]).
According to [55], there exist a number of expressions to write the solution to such an equation. One solution is based on the Kronecker operations. In particular, in this case we can use the Kronecker sum and the vectorization operator ‘’ to rewrite Equation (297) as
The orthogonal projection operation may therefore be expressed as
A more elegant expression stems from an integral representation of the solution of the Sylvester Equation (297), namely
For the integral representation of the solution of a Sylvester equation, we refer the readers to the aforementioned review paper [55]. We mention that none of these solutions are efficient from a computational point of view.
9.2. Application of Riemannian Gradient to Optimization on Manifold
An important application of the Riemannian gradient, which is of specific interest in system theory and non-linear control, is gradient-based mathematical optimization, which we are going to briefly survey in the remainder of the present section.
Let us preliminarily observe that the relationship (250) that defines the notion of Riemannian gradient possess a further interpretation. Let us take a regular function , a regular curve and a metric in . The functions f and may be composed as . Therefore, from the relation (250), it follows that:
along the curve . As a matter of fact, such a relation is a counterpart of the familiar rule of derivation for composite functions
To determine the maximum of a function on a manifold it is possible to make use of a gradient method. Such a method is based on the following important property: given a regular function , the solution of the following initial value problem
is such that
where denotes a point of local minimum of the function f near if one chooses the sign ‘−’ in front of the gradient; otherwise, denotes a local maximum of the function f near if one chooses the sign ‘+’ in front of the gradient.
For a function with domain , the extremal points are among those that make its partial derivatives vanish to zero. A similar result holds for functions whose domain is a smooth manifold. Given a differentiable function , the extremal points are those for which:
This observation may be exploited
To determine those points to which the solutions of the differential Equation (303) will tend toward;
Considering that Equation (305) is often non-linear and difficult to solve in closed form, while the differential Equation (303) may be solved (albeit approximately) by numerical recipes, the differential Equation (303) may be considered as a way to solve an equation such as (305).
On the basis of the relation (301), it is quite straightforward to show the following fundamental result.
Theorem4.
The differential equation
generates trajectories toward points of the domain corresponding to ever-decreasing values of the function f.
Proof.
Let us observe that
where equality holds if and only if . □
In a fully analogous way, upon choosing the sign + in front of the gradient, the corresponding differential equation generates trajectories tending toward ever-increasing values of the function f. Notice that the above facts are completely independent of the initial value and from the chosen metric.
In practical terms, it is safe to assume that the function f admits maxima and minima. For example, in the case of a continuous function f defined on a compact manifold, such as the hypersphere , a theorem by Weierstrass ensures that f admits at least a maximum and a minimum. (In fact, the Weierstrass extreme value theorem holds for topological spaces which any manifold is an instance of.)
To summarize, the differential Equation (303) may be exploited to look for the extremal points of a function:
The dynamical system generates a trajectory in the state space that tends toward a point of maximum of the function f located near the initial state ,
Conversely, the dynamical system generates a trajectory in the space that tends toward a point of minimum of the function f located near the initial state .
The locality property of such a gradient-based extremum search is due to the monotonicity property (307), which essentially shows that the trajectory of such systems fall in the basin of attraction of the extremal point that the initial condition also belongs to. In practice, the choice of an initial condition (also termed, in this context, initial guess), is a sensitive step that determines the success or the failure of extremal-point searching. In fact, a function f might possess several extremal points (namely, local maxima and minima). It pays to keep in mind that the extremal-point searching method based on the first-order dynamical systems (303) is able to determine only an extremal point at a time and, in particular, only the one nearest to the initial guess . It is therefore important to put into effect some pre-estimation technique to initialize the search procedure correctly.
From an algorithmic point of view, as opposed to what is required in numerical simulation where precision is a common demand, in the case of optimization by a gradient method, only the last point of the trajectory, namely , needs to be approximated with good precision, and hence the employed numerical method does not need to be of a high order.
with . As we have seen in the previous example, the right-hand side of Oja’s equation coincides with the Riemannian gradient of the function (often termed ‘Rayleigh quotient’) over the unit hypersphere endowed with its canonical metric. Such a function is quadratic, continuous and differentiable and is defined over a compact space; hence, it admits a maximum over the space . Indeed, it admits n local maxima. The differential Equation (308) hence affords determining one of its local maxima depending on the initial condition.
The extremal points are those that make the gradient vanish to zero, namely the solutions to the equation
Clearly, the solutions coincide with the eigenvectors of the symmetric, positive-definite matrix . Let us observe that the eigenvalue associated with the eigenvector is ; hence, . Since the dynamical system (308) looks for the maximal value of the function f, Oja’s equation generates a trajectory that tends toward the eigenvector of corresponding to its maximal eigenvalue. ■
The differential equations of the kind
represent an instance of dynamical systems of the first order, on a manifold, of the gradient type. (Notice that to switch the sign in front of the gradient, it suffices to take the function instead of f. Likewise, to rescale the gradient of a factor , it suffices to take instead of f.)
9.3. A Golden Gradient Rule: Gradient of Squared Distance
A golden calculation rule involving the notion of Riemannian gradient and that of Riemannian distance is as follows:
The formal proof of this result which, of course, holds under technical conditions, traces back to [53]. It is instructive to verify this property in a couple of cases.
Example30.
Let us verify the calculation rule (311) for the familiar case that . We have already shown that . In the case of the hypercube , endowed with a Euclidean metric, it holds that ; therefore, , the familiar Euclidean distance in . Now, let us observe that
and henceforth
In addition, we can verify such a property for the hypersphere . Let us recall that ; therefore, it holds that
By the calculation rule for the gradient on the unit hypersphere, we obtain
which coincides precisely with on the unit hypersphere. ■
For those who are mathematically minded, it is interesting to read a sketch of proof of the golden rule (311) based on properties that we have already surveyed.
Theorem5.
Let us consider a Riemannian manifold ) embedded in an ambient space and a pair of points such that and are well defined. The Riemannian gradient of the squared distance with respect to the variable x equals .
Proof.
Let us define the following functions:
An arbitrary sufficiently smooth curve such that , while is arbitrary;
The fundamental form ;
A geodesic curve connecting the point y to the point , emanating from the former, with parameter ;
The partial derivatives and .
In addition, notice that the following properties hold:
(P1)
, therefore ;
(P2)
, therefore ;
(P3)
. (In fact, notice that if denotes a geodesic from x to y then it holds that and ; therefore, .)
Let us recall that the squared Riemannian distance between two points may be written in terms of the fundamental form, namely
where the right-hand side of the above equation is independent of s, as we have shown in Section 8. Let us integrate both sides of the above equation with respect to s in the interval . This gives:
Now, let us compute the derivative of both sides with respect to the parameter t:
The second integral may be rewritten through the rule of integration by parts upon observing that (Schwarz) and recalling that , where G denotes the metric kernel associated with the metrics in and . The application of such rule gives:
Putting the pieces together gives:
Since is a geodesic, the first integral is null due to the differential inclusion (153); therefore, we rest with
Now, due to property , the second term on the right-hand side of the above equality is null; hence, thanks to properties –, the above equality simplifies to
Setting and recalling the definition of gradient (249) proves the golden rule. □
Let us emphasize how useful such a result is, since in non-linear control a number of functions to differentiate are based on distances. Let us apply the golden calculation rule to a non-linear control problem known as consensus optimization.
Example31.
A category of dynamical systems of gradient type on manifolds are those associated withconsensus optimizationamong a set of agents [58,59]. Let us consider, as an example, a dynamical system that affords reaching attitudinal consensus in a fleet of flying objects (such as drones).
The attitude of a flying agent (namely, its orientation in space) may be represented by a rotation matrix that summarizes the three attitudinal coordinatesroll,pitchandyaw. A fleet of N flying agents may be described as a set of N rotation matrices with .
The attitude of each agent in the fleet is then described by a time-function . Consensus optimization may be formulated as follows:
Given a metric for the manifold , it is possible to define a distance function between any pair of attitude matrices in ;
A hierarchy is established among the agents in a fleet through a set of weights ;
Then, consensus optimization consists of determining an evolution law for each agent to minimize the distance between any pair of attitude matrices weighted according to the assigned hierarchy, namely, to minimize the function
Notice that ; hence, the values assigned to the diagonal coefficients are unimportant.
On the basis of the calculation rule (311), it turns out that
The set of gradient-type control systems to achieve consensus optimization within the fleet reads:
with and . (Notice that .) ■
9.4. Riemannian Gradient in Coordinates*
Let denote a smooth manifold-to-real function, whose gradient at a point is sought. Let us fix a coordinate system on a local chart that includes the point x whose local coordinates are .
In order to determine the gradient , let us make use of the metric compatibility condition, recalling that
The Euclidean inner product reads .
The inner product reads .
The metric compatibility condition requires that
hence, . Multiplying both sides by one obtains . Since , we obtain
The Riemannian gradient in intrinsic coordinates is therefore expressed as
In manifold calculus, the Riemannian gradient has yet another interpretation in relation to the differential of a manifold-to-real function and to so-called musical isomorphisms. In fact, there exist canonical operators to convert contravariant components to covariant (also referred to as lowering an index) or vice versa (raising an index).
Given a regular function , its differential is expressed by:
where denotes its Riemannian gradient. Namely, the differential represents the linear part of the change of a function when moving away from a point x in the direction v. In local coordinates:
Since the differential does not depend on the choice of coordinates, it must hold that . It follows the metric-compatibility condition . Such expressions may be interpreted through ‘musical’ isomorphisms:
Sharp isomorphism (): The ‘sharp’ isomorphism takes a cotangent vector and returns a tangent vector. Namely, given a cotangent vector , the sharp isomorphism acts like . For example, in the case of gradient, one may say that .
Flat isomorphism (): The ‘flat’ isomorphism takes a tangent vector and returns a cotangent vector. Namely, given a tangent vector , the flat isomorphism acts like . Then, the differential is the dual of gradient via a flat isomorphism.
10. Parallelism and Parallel Transport along a Curve
Parallel transport along a curve on a manifold is a fundamental concept that emerges from the curved nature of manifolds and is deeply interwoven in the fabric of manifold calculus.
10.1. Properties and Definition of Parallel Transport
Given a curve on a Riemannian manifold embedded into a Euclidean space , a point on the curve and a tangent vector to the curve, if such a vector is moved to another point on the curve by ordinary rigid translation (which is available in and may be referred to as ‘parallel translation’), in general it will not be tangent to the curve anymore. This observation suggests that it is necessary to define a notion of transport that is compatible with the structure of the manifold.
Another reason to motivate the notion of parallel transport is the lack of existence of ‘uniform’ vector fields on curved manifolds. On a flat space, such as , a uniform vector field is a field such that is constant (same direction, length and orientation) for every . On a curved manifold embedded in an ambient space , however, it is unlikely that a tangent vector field may take the same value in two points, because tangency at one point would most likely imply lack of tangency in another point. The closest notion to uniformity would be parallelism: Informally speaking, a vector field is said to be parallel along a curve if it keeps the same orientation with respect to such a curve’s own velocity field. Notice that this property is expressed in quite vague terms: such vagueness is indeed intended and implies that there exist infinitely many ways to interpret the notion of parallel transport.
Example32.
Let denote a Riemannian manifold and a tangent vector field on the tangent bundle ; namely, the function w assigns a tangent vector to each point x on the manifold. In the familiar case that , there exists a notion ofuniformvector field; namely, a vector field may take the same amplitude and direction in any point of the base-space . This is the case, for example, of an electric field within an ideal capacitor or an induction field within an ideal electrical coil. A distinguishing feature of a uniform vector field in is that its directional derivative is zero everywhere; namely, given any point and any direction , it holds that
where the left-hand side represents the directional derivative of the vector field along a direction v at the point x.
On a curved manifold , uniform vector fields hardly exist, as the rigid translation of a tangent vector to a different point of a manifold will most likely result in a non-tangent vector. The closest notion to uniformity that may be recovered on a curved manifold is that of parallelism. A parallel vector field needs to satisfy a generalized version of the condition (331) (which we shall not see in this survey since it requires covariant derivation, which will be covered in a separate survey). ■
The notion of parallelism of a vector field gives rise to an important canonical operator in manifold calculus, namely, parallel transport along a curve. Parallel transport allows one to, e.g., compare vectors belonging to different tangent spaces. On a smooth Riemannian manifold , fix a smooth curve . The parallel transport map associated with the curve is denoted as , which is a linear map for every , namely
for every and . The parallel transport map depends smoothly on its arguments and is such that coincides with the identity map and for every . (Notice that the last property holds on the same curve only, while it does not hold for different arcs.)
In most cases of importance in system theory and control, the main interest lies in the parallel transport of a tangent vector along a geodesic curve on a Riemannian manifold, rather than along a generic smooth curve. Parallel transport along a geodesic arc may be conjugated in at least two ways:
Parallel transport along a geodesic arc joining two given points: Given two (sufficiently close) points , and a tangent vector , the parallel transport of w from x to y along the geodesic arc connecting x to y is denoted by , since this notation shows all relevant information. In fact, the notation for the parallel transport operator is a shortened version of , where would denote the geodesic arc such that and .
Parallel transport along a geodesic specified by a point and a tangent direction: Given a point , and a tangent direction , the parallel transport of a vector along the geodesic arc departing from the point x toward the direction v is denoted by , where , with and , would denote the geodesic curve such that and .
There exist other special cases of interest in control and automation. One such special case supplies a partial answer to the problem of effecting parallel transport on a manifold for which the structure of the parallel transport operator is unknown. Another special case is parallel transport along a closed loop, which may arise when dealing with periodic trajectories or self-intersecting trajectories.
Self parallel transport along a geodesic arc: A distinguishing feature of a geodesic arc is that it parallel-transports its own initial slope. On a setting where the parallel transport operator is unknown in closed form, there exists no known way to transport a given vector w along a geodesic curve . However, it is possible to parallel-transport a given vector v along a geodesic , namely to compute : it coincides exactly with . Such a numerical trick was invoked, for instance, in [39], Subsection III.A.
Parallel transport along a closed loop: Parallel transport of a vector along a (piece-wise continuous) loopℓ is denoted as , where x is termed base of the loop. In general, , for . This phenomenon is referred to as anholonomy. Whenever realizes an isometry, since , the operator changes only the orientation of v. For example, if , then we may say that is a rotated version of a tangent vector v laying in the same tangent space, namely, may be represented as an element of the orthogonal group . Intuitively, holonomy is specifically related to curved spaces, and therefore holonomy must be related to curvature. This conjecture is made precise by the Ambrose–Singer theorem [60].
In the following, we shall go through a detailed derivation of the notion of parallel transport along a curve on the tangent bundle of a manifold embedded in an ambient space. In general, the result of parallel transport depends on the curve; moreover, parallel transport may be conceived in a way that preserves the angle between transported vectors but not their lengths, or to preserve both.
10.2. Coordinate-Free Derivation of Parallel Transport
Let us develop a notion of parallel transport such that, in every point of the curve , it holds that . In addition, a fundamental requirement that parallel transport should fulfill is that, given two tangent vectors that form an angle , parallel transport should preserve the value of such angle along the whole curve, namely the angle between and . Let us recall that the angle between two vectors in is defined by
We may thus require parallel transport to fulfill:
Tangency condition: for every it should hold that ,
Conformal isometry condition: for every value of the parameter it should hold that keeps constant to along the curve.
Let us verify that the second condition implies both conformality and isometry. To what concerns isometry, let us show that taking gives:
from which it follows that keeps constant along the curve to . It is straightforward to prove conformality; in fact,
(Let us recall that a linear transformation that preserves angles is termed conformal.) It is important to underline that the above conditions do not define parallel transport univocally, but they represent minimal requirements for an operator to be qualified as parallel transport. In other words, parallel transport may be defined in several ways as long as it meets the above conditions, which turn out to be of great importance in non-linear control.
To ease notation, let us define the following fields:
: a vector field that represents the evolution, over the curve , of the vector u that is tangent to the manifold at and after transport it will be tangent at ,
: analogous to , both are transport fields,
: a scalar field that represents the inner product between the above two transport fields along the curve .
For a manifold embedded in an ambient space endowed with a metric , the condition of conformality and isometry may be written in a more detailed way. Let us recall from Section 7 that the inner product of two tangent vectors may be written as , where the metric kernel has key properties, namely is linear in v, is self-adjoint (namely, ), is invertible and its inverse is denoted as , and maps to itself.
In order to perform the following calculations seamlessly, it pays to highlight a few calculation rules about the metric kernel G and the fundamental form :
Taylor series expansion of the kernel G: Given with , , let us write
where denotes any smooth curve such that and , with denoting a direction along which a variation of the metric kernel is sought. The quantity represents a first-order derivative of the kernel calculated with respect to the variable x. (Notation-wise, the ‘bullet’ derivative is a convenient way to denote a derivative that cannot be expressed in more specific terms. For example, if are real functions of a real variable, one might denote as .) The formal definition goes like
where again denotes any smooth curve such that and . The function is linear in the arguments v and w and, in general, non-linear in x. Notice that the extendability property of from to holds also for the first argument of the derivative .
Commutativity with the inner product in the ambient space: In the expression it is allowed to swap the arguments u and v. In fact, notice that
This property of holds even if the first argument of the derivative belongs to .
Partial derivatives of the fundamental form: Let us recall the definition of the fundamental form . Its partial derivatives read
Let us see a few examples to clarify the above definitions and properties.
Example33.
Let us start considering the case of the manifold endowed with its canonical metric, embedded in the ambient space endowed with a Euclidean metric. In this setting, we have seen that . By definition, its ‘bullet derivative’ reads
where denotes any smooth curve such that and . By using the noticeable matrix-flow derivation rule (104), we obtain
Setting leads to the sought expression
Let us verify the property of commutativity with the inner product. We may calculate that
The two quantities may be proven equal by the cyclic permutation property of the trace.
Another example that we are surveying concerns the manifold endowed with its canonical metric again embedded in the ambient space endowed with a Euclidean metric. In this setting, we have seen that . Its ‘bullet derivative’ reads
where denotes any smooth curve such that and .
By invoking the identity (104) once again, we obtain
Setting leads to the expression
Let us verify the commutativity property with the inner product. We may calculate that
These two quantities may be proven equal by transposing the arguments of the traces. ■
The condition of conformality and isometry may be written very compactly as . By multivariable calculus and the above definition and properties, we obtain:
(This is one situation in which we need to invoke the ‘extendability’ property of the metric kernel; in fact, notice that does not necessarily belong to .) In the second term on the right-hand side, the vector fields and may be swapped, and hence we may write:
Since the above development is still too general to lead to a computable notion of parallel transport, let us refer parallelism to the velocity field associated with the curve , namely, let us set . This means that the parallel transport operation will keep fixed the angle between the transported vector and the velocity vector associated with the curve. The expression of becomes then:
In the above relation, all quantities are known except for the transport field of which we are seeking a temporal evolution law.
A special case, of sure interest in applications since it dispenses us from choosing a curve , is parallel transport along a geodesic line. Let thus denote a geodesic and let us recall that, in this case, the expression may be rewritten in terms of the velocity through the differential inclusion that characterizes a geodesic line. Let us then revamp the geodesic equation with the aim of writing it in a Christoffel normal form. Recall that, given the fundamental form for a line , whenever it is a geodesic line it holds that:
therefore the differential inclusion (351) that defines a geodesic line becomes
where is a bilinear function of its vector arguments and returns an element of the normal space . Ultimately, we obtained
This same expression may be rewritten compactly in one of the following normal Christoffel forms:
where the function is termed Christoffel form of the kind, while the quantity is termed Christoffel form of the kind. Notice that only the restricted Christoffel form with and has been defined so far, namely
Since both terms on the right-hand side are bilinear in their vector arguments, the Christoffel form is quadratic. It is now necessary to define the full Christoffel form , with and on the basis of its restricted version. Such a result may be achieved through the following polarization formula:
which stems from the requirements that the full Christoffel form is bilinear and symmetric in its vector argument, namely .
Employing the normal Christoffel form in the latest expression of gives:
Since is derived by polarization from a restricted bilinear function, it is self-adjoint in both its arguments; hence, it commutes with the inner product . Therefore, in the third term on the right hand side it is possible to swap the transport field with any of the two fields , which leads to:
where we have used the fact that and differ by a purely normal component.
In order to make the above addenda uniform to one another, let us introduce in the above expression twice the identity obtaining:
Now it is possible to gather the terms in and to write the above sum as a single inner product:
Ultimately, the last relationship still expresses the basic property of parallel transport to realize a conformal isometry, which still leaves much room open for a precise definition of parallel transport. The condition alone would give rise to infinitely many equations to compute a transport field, and hence parallel transport. Among these infinitely many, one arises simply by requiring the right-hand side of the above inner product to vanish to zero. Namely:
The solution of such equation supplies an expression for the transport field which, in turn, supplies a notion of parallel transport along a geodesic line, namely
where denotes the solution of the differential Equation (363) with the initial condition . Such a differential equation in is first-order and linear in the unknown transport field. Since the set of solutions to a linear first-order differential equation is a linear space, parallel transport is a linear isomorphism of the tangent bundle .
Example34.
As anticipated, a noteworthy property of any geodesic line is that it self-transports its own velocity along. In other words, given a geodesic that connects two points , the following function turns out to realize parallel transport:
Let us verify that it meets the fundamental properties any parallel transport should meet:
Tangency: for every it holds that ; in fact, is tangent to the curve at every point .
Conformal isometry: for every it holds that which is constant along a geodesic,
as claimed. ■
As a special case, it is worth considering a manifold endowed with a uniform metric. In this instance, the metric kernel is independent of the point x, and therefore it holds that along every (geodesic) line. The transport field equation hence simplifies into
In the ever simpler instance where the metric is Euclidean, the metric kernel coincides with the identity (namely ), and hence it turns out that
Recalling that the function B maps the double tangent bundle to the normal bundle , it is quite straightforward to figure out a practical interpretation of the differential Equation (366). It basically prescribes that, when traveling from a point over the curve to another infinitesimally close point, the transport field looses its normal component to stay tangent.
The following two examples aim at clarifying the above theoretical developments.
Example35.
Let us apply the theoretical development just analyzed to determine a transport field equation on a hypersphere with endowed with the metric .
Let us start by determining the Christoffel form (of the first or second kind is unessential, in this case, because the metric kernel coincides with the identity). The easiest way to determine Christoffel forms is via a geodesic equation. The fundamental form, in this case, reads . From the fundamental form we find that
therefore
The geodesic equation then reads ; hence, . From the tangency of the velocity , it follows that:
and henceforth . As a consequence, and:
We may now verify that . It holds that
hence the assertion.
The equation of the transport field, in this case, reads as:
Such differential equation may be solved and returns an expression for the transport field . The definition (363) returns then the expression for the parallel transport from the point of a vector to the point :
Let us verify some characteristic features of the above map:
Linearity: The linearity of with respect to w (but not with respect to x ad y) is quite apparent.
Identity: Letting leads to an identity map. In fact, it holds that .
Tangency: It should hold that . In fact: .
Important properties of the map (373) have been hence verified. ■
Example36.
Let us consider the case of the manifold endowed with the metric embedded in the ambient space endowed with the metric . Recall from Section 7 that the corresponding metric kernel takes the expression (which, as a function of W, is well-defined in the whole ambient space ).
Let us determine the Christoffel form of the second kind through calculations related to geodesy. Define the fundamental form on a curve as and observe that:
from which it follows that:
Recalling the matrix identity
leads to
which, in turn, implies that:
The transport equation, in normal Christoffel form, hence reads:
where ω denotes a skew-symmetric matrix function (namely ) such that the product belongs to the normal space at γ.
In order to determine a function ω, let us invoke the condition of tangency of the velocity field that, in this case, is expressed as . From such a condition, it follows that ; hence:
from which it follows that . The restricted Christoffel form of the second kind, in this case, hence reads as
To determine the complete Christoffel form of the second kind, let us make use of the polarization formula (358):
Let us now verify that the form commutes with the ambient inner product. We have ; therefore,
As a consequence, we have
which may be easily proven equal by the cyclic commutation property of matrix trace.
As a consequence, and employing the relation (342), the parallel transport equation is found to be:
or, after a few straightforward simplifications,
Through the definition (363), we may obtain the expression of the parallel transport from a point of a tangent vector to a point :
where the matrix square root returns a symmetric matrix. Let us verify two properties of the found operator:
Identity: Setting leads to parallel transport collapse to an identity map. Indeed, it holds that .
Tangency: It holds that . In fact, we have .
Linearity is apparent as well. ■
The calculations illustrated in the previous example requires the evaluation of the partial derivatives of the fundamental form , determined in (374) and (375). It is quite instructive to sketch such calculations.
Example37.
Let us survey in detail the calculations that led to relations (374) and (375). Such calculations may be carried out through calculus (exact way) or by analytic approximations (which is more clear from a computational viewpoint).
To begin with, it is useful to analytically justify a matrix approximation, namely
with and with small. Such approximation may be proven through the notion ofanalytic matrix function. By definition, a matrix-to-matrix function is analytic in a point if it admits, in a neighborhood of such a point, a polynomial series expansion, namely:
where are coefficients of the series and denotes the radius of convergence of the polynomial series.
Let us now consider the function . It is analytic in a neighborhood of the point and the associated polynomial series, truncated to the second term, reads:
from which it readily follows that
To justify the partial derivatives (374) and (375), let us define and evaluate its partial derivatives with respect to P and V. The partial derivative of a matrix-to-scalar function may be thought of as the ‘coefficient’ of the linear term arising from an additive perturbation.
Applying such a practical idea to evaluate the partial derivative with respect to the first argument gives:
where E denotes a matrix (termedperturbation) whose entries are small numbers (such that is very small). In the present case, we have:
Comparing this expression with (392), one obtains the sought result, namely:
To what concerns the second argument, we may write that:
where E again denotes a perturbation. In this case, we have:
Let us summarizes a few known formulas about parallel transport. All the formulas given below hold under the provision that the manifolds are endowed with their canonical metrics.
Hypercube: The hypercube endowed with the standard Euclidean metric is a flat space; hence, parallel transport may be realized as a rigid translation. Namely parallel transport is an identity map.
Hypersphere: Parallel transport on the hypersphere of the tangent vector along the geodesic arc of an extent t may be computed by [39]:
For a reference, readers might want to consult [61]. Starting from this, some mathematical work leads to the following result: given two points and a vector , the parallel transport operator has the following structure:
provided that (namely, the points x and y are not antipodal one to another).
Special orthogonal group: Parallel transport along a geodesic curve on the special orthogonal group may be implemented through the following formula:
Stiefel manifold: To the best of this author’s knowledge, there appear to be no closed formulas about parallel transport on a Stiefel manifold . This is one of those cases in which self-parallel transport might be invoked. A self-parallel transport expression, corresponding to the canonical metric, is
where R denotes the R-factor of the thin QR factorization of the matrix .
Manifold of symmetric, positive-definite (SPD) matrices: Given a geodesic arc and a tangent vector , the expression of the parallel transport operator that shifts the tangent vector W along the geodesic arc toward the endpoint reads:
Some mathematical work leads to the following result: given two points and a tangent vector , the parallel transport operator has the following structure:
Grassmann manifold: Parallel transport of a vector along the geodesic with is given by [33]:
where denotes the compact singular value decomposition of the matrix V.
Let us derive the expression of the Christoffel form of the second kind for the special orthogonal group.
Example38.
According to the canonical metric for the special orthogonal group , the fundamental form is ; hence, the geodesic equation stemming from the differential inclusion reads as , where S is a symmetric function. Recalling that any geodesic line must satisfy the constraint , deriving this twice with respect to time gives
Plugging in the geodesic equation leads to , and hence the complete geodesic equation reads as . Consequently, the restricted Christoffel form of the second kind associated with the special orthogonal group endowed with the canonical metric takes the expression
The full Christoffel form descends from the polarization formula (358), which leads to
It is interesting to notice that . ■
Let us derive the expression of the Christoffel form of the second kind for the Stiefel manifold.
Example39.
As we have seen, the Stiefel manifold may be endowed both with a Euclidean metric and its canonical metric.
In the Euclidean metric, the fundamental form reads as ; hence, the geodesic equation stemming from the differential inclusion reads as , where S is a symmetric function. Recalling that any geodesic must satisfy the constraint , similarly to the calculations shown in the previous example, we find that
The full Christoffel form descends from the polarization formula (358), which leads to
In the canonical metric, the fundamental form reads . Calculations of the geodesic equation show that
From the first derivative, it follows that
From the differential inclusion (153), it hence follows the equation
where S is a symmetric matrix function to be determined. To solve for the naive derivative , let us recall that . Hence, the above equation may be rewritten as
where we have repeatedly used the fact that . Plugging the above relationship into the condition (405) leads to the expression . Plugging back such an expression into the relationship (413) leads to the geodesic equation
from which stems the restricted Christoffel form of the second kind
Calculations to obtain the full Christoffel form of the second kind lead to the expression
through the polarization formula. ■
To conclude this subsection, let us prove an interesting result that concerns the parallel transport of a manifold logarithm, which will be invoked later in connection to non-linear control.
Theorem6.
Given two points , provided the manifold logarithms and exist, it holds that
Proof.
Let denote the geodesic arc connecting x to y and let denote the associated reversed geodesic. By definition of the logarithmic map, it holds that . Since is a geodesic, it also holds that . In addition, by the definition of a reversed geodesic, it holds that . By the self-parallel transport property of any geodesic, it also follows that . The assertion is hence proven. □
10.3. Coordinate-Prone Derivation of Parallel Transport*
The basic requirement of parallel transport along a curve is that it ensures the transported vector to be tangent to the curve at any given point. Clearly, this condition by itself is too weak to give rise to a unique solution to the problem of parallel transport. In Riemannian manifold calculus, it is additionally required that parallel transport along a geodesic preserves the inner product between the transported tangent vector and the tangent to the geodesic curve.
Under such a proviso, a parallel-transport rule may be constructed as follows. Let denote a geodesic curve on a p-dimensional Riemannian manifold endowed with a metric tensor of components . Upon taking a vector , define a vector field (which might be well-defined only along the curve and not necessarily outside of it) that denotes the transport of v along the curve . The tangency requirement is expressed by , for every . The inner-product preservation property of parallel transport is expressed by requiring that the sought vector field meets the formal condition:
Using local coordinates and and differentiating the above equation with respect to the parameter t yields the condition:
namely:
Now, let us exploit the fact that the components describe a geodesic arc (namely, a self-parallel curve) on the Riemannian manifold . Accordingly, it holds that . Making use of such identity in Equation (420) yields:
Calculations with the Christoffel symbols show that the above equation may be rewritten as:
from such relation, one may derive the set of differential equations
which describe the evolution of the components of the transport field. Solving the above set of non-linear equations under appropriate initial conditions yields a rule to perform parallel transport.
11. Manifold Retraction and Vector Transport
Since the computation of an exponential map on a given manifold may be cumbersome, the notion of retraction is sometimes invoked. A manifold retraction map is a function that satisfies the following requirements [62] and that is easier to compute than an exponential map:
Any restriction is defined in some open ball of radius about and is continuously differentiable;
It holds that if ;
Let denote any smooth function on the tangent space , with and . The curve , for , lies in a neighborhood of . It holds that where denotes a tangent map. Notice that lies in for every t; hence, the derivative is defined simply as
For , it holds that . Let us identify and let us recall that . In order for to be a retraction, the map must equate the identity map in .
In practice, a retraction sends a tangent vector to a manifold into a neighbor of the point x. Any exponential map of a Riemannian manifold is a retraction. Another class of retractions was surveyed in [63].
Manifold retraction is a computationally convenient replacement of exponential map and, according to the definition given above, it behaves as a manifold exponential up to first order. As manifold logarithm is a local inverse of manifold exponential, one might wonder if a local inverse of retraction exists. Such problems have been studied in a number of papers and a possible solution has been codified under the name of a manifold lifting map [64].
Let us examine a few formulas of interest about manifold retraction.
Hypercube: Not surprisingly, the simplest manifold retraction on the hypercube is realized through array addition, namely .
Hypersphere: Let and . A simple retraction map on the unit hypersphere is
where denotes a 2-norm. Let us verify that this map meets the three mentioned requirements to be a retraction:
-
Requirements 1 and 2: Easily verified by inspection.
-
Requirement 3: Let . Deriving both sides with respect to t yields:
Setting leads to
Since and , it follows that , and hence the result is proven.
Stiefel manifold: There exist a number of retractions on the Stiefel manifold, which are briefly outlined below.
Retraction based on QR factorization: In [64], it was shown that one of the retractions that map a tangent vector of to is given by:
where denotes the Q-factor of the thin QR factorization of its matrix argument.
Retraction based on polar factorization: Given a point and a vector , the polar-factorization-based retraction may be written as [64]:
where denotes the polar factor of a given matrix. Such a retraction may be written in closed form. In fact, write . From the conditions and , it follows that:
Since it holds that and , and the matrix is positive-definite, one obtains . From the equality , the following closed-form expression for the polar factorization-based retraction is obtained:
Orthographic retraction map: The paper [65] studies orthographic retractions on submanifolds of Euclidean spaces. In the paper [65], it is proven that, given a pair , under the proviso that V is sufficiently close to , there exists a normal array such that the function
is a retraction on . By the structure of the normal spaces, the orthographic retraction map on the Stiefel manifold reads as:
provided that there exists a symmetric matrix S such that , namely, such that:
The above equation in the unknown matrix S may be written in plain form as:
The Equation (432) represents an instance of Continuous-time Algebraic Riccati Equation (CARE). The orthographic retraction map (430) may be computed numerically as shown in [64].
Real symplectic group: Possible retractions on a real symplectic group are
whose properties were studied in the contributions [66,67], and the one based on the Cayley map, as explained in [2].
Grassmann manifold: There exist a number of retractions on the Grassmann manifold, which are briefly outlined below.
Retraction based on QR-factorization: In [40], one of the retractions that map a tangent vector onto is given by:
where denotes again the Q-factor of the thin QR factorization of its matrix argument and is a Stiefel representative of the subspace .
Retraction based on polar factorization: Given a subspace and a tangent vector , the polar-factorization-based retraction may be written as [40]:
where is a Stiefel representative of the subspace .
The parallel transport Equation (362) is, in general, difficult to solve, and hence the parallel transport operator might not be available in closed form for manifolds of interest. Approximations of the exact parallel transport are available, as the ‘Schild’s ladder’ construction [68] and the vector transport method [62]. In order to define the notion of vector transport, a smooth manifold is supposed to be embedded into a (possibly large) linear ambient space of appropriate dimension. Upon endowing the linear space with an inner product, it is possible to define an orthogonal projection operator , with . Define the Whitney sum
Then, the vector transport operator
associated to an exponential map is defined as:
In practice, instead of parallel-translating the tangent vector along a geodesic arc emanating from the point in the direction as , vector transport moves rigidly the vector u within up to the point y and then orthogonally projects the vector u over the tangent space where .
With a slight abuse of exact terminology, one may refer to vector transport of a vector to a tangent space by for a fixed . In practice, vector transport is based on the following procedure:
Embed the manifold into a metric ambient space .
Translate rigidly the vector across the ambient space to a point .
Project the vector u into the tangent space by means of a suitable orthogonal projector .
Vector transport associated with the above procedure is , which moves the vector u from to .
Depending on the manifold structure, vector transport might result to be much less expensive to compute than exact parallel transport. As a drawback, vector transport does not enjoy some of the fundamental properties of parallel transport; for instance, vector transport does not realize an isometry nor a conformal transformation. Isometry may be recovered through an appropriate normalization, though.
12. Control Systems on Manifolds and Numerical Implementation
This section presents an instance of error feedback control of first-order systems on manifold and their numerical implementation, with special emphasis to system synchronization. In the present research, we will regard synchronization as a goal to be achieved by non-linear proportional-type control. We shall see that the design of a synchronizing controller may be effected through the classical notion of Lyapunov function and we shall further introduce the notion of control effort to quantify the energy consumption associated with a control action.
12.1. Synchronization of First-Order Dynamical Systems via Feedback Control
Synchronization is meant primarily to make two identical (or twin) dynamical systems synchronize their dynamics overt time, provided that their initial states differ from one another and that one of them is able to access the state of the other. In order not to restrict how far such two initial states may lay apart, the state space is assumed to be a geodesically complete path-connected manifold. In principle, the restriction that the two systems to synchronize must be identical is not necessary, as long as their mathematical models insist on the same state manifold.
In a system pair, the independent dynamical system will be referred to as the leader, while the controlled dynamical system will be referred to as the follower. In this section, we shall assume that the leader and the follower differ from one another, whereby the case of identical systems will follow as a special (although most meaningful) case.
We shall cover in this paper only the case that the leader is represented by a first order, non-autonomous dynamical system on manifold, described by the tangent-bundle differential equation
where denotes the leader’s state variable and denotes a possibly time-dependent state-transition operator. Likewise, the controlled follower is described by
where denotes the follower’s state variable, denotes a state-transition operator of the follower, denotes a tangent-bundle control field (in particular, ) and, in general, the initial state differs from the initial state .
The control field that will drive the follower to synchronize to the leader is defined on the basis of an instantaneous (non-delayed) distance-minimization design. Assume the state manifold to be endowed with a specific metric and hence a distance function and a logarithmic map . Let us define the function
which is proportional to the squared Riemannian distance between the state of the leader and the state of the follower. The following result shows how to define a distance minimizing control action.
Theorem7.
The control field
with , minimizes the function asymptotically.
Proof.
The derivative of the function with respect to the time-parameter t reads
The above expression appears as a sum of two terms referring to two different tangent spaces. It is convenient to move calculations to only one tangent space. We shall take, as the tangent space of reference, the one attached to the state of the follower. Assuming that parallel transport realizes a conformal isometry (see Section 10), it holds that
In addition, by Theorem 6, it holds that because parallel transport and manifold logarithm are referred to the same geodesic line. Therefore, the expression (443) may be recast as
thanks to the linearity of the inner product. Let us set the control field u so that the sum in (445) is proportional to , which leads to the expression (442). The choice that led to the control field (442) implies that the function satisfies
Since the inner product defines a positive-definite local norm, the above equation entails the inequality at any time, which implies asymptotic synchronization. Equality may hold only when , which happens when the follower and the leader are perfectly synchronized. □
In the expression (442), the constant c takes the meaning of a communication strength between the leader and the follower. In addition, one might notice that the differential Equation (446) may be solved for and gives ; hence, synchronization happens with exponential time speed (fast at the beginning, then slower). Such a result is completely independent of the metric that the state manifold was endowed with and the speed of synchronization depends only on the constant c.
Let us consider, as an example, a simpler and more familiar case.
Example40.
As a special case, when endowed with the standard Euclidean metric, the parallel transport operator is simply the identity map, while . Therefore, the criterion function takes the shape of a quadratic error, namely , where , discussed in [69], and the corresponding control field takes the expression , which coincides with the control field discussed in the paper [69].
The freedom in the choice of an inner product accounts for the generalization considered in [69] that consists of considering a Lyapunov function with P being a symmetric, positive-definite weighting matrix. Such a weighting matrix seems to be absent from our formulation of a quadratic error (441), yet it is ‘hidden’ in such an expression. In fact, invoking once again an ambient space that the state space is embedded within and a metric , we may rewrite the quadratic error (441) as
and hence, by comparison, we see that the synchronization error reads and that the role of the weighting matrix P is played by the metric kernel G. ■
In control theory, it is customary to associate a scalar index, the control effort, to the control field [70]. In the context of systems on manifold, we define the control effort as
Whenever the follower system and the leader system are twins, namely their state-transition functions are identical , under mild continuity conditions on such state-transition functions the control effort vanishes asymptotically to zero. In fact, by the continuity of the state-transition function, it follows that approaches 0 as approaches . Such a conclusion holds no longer true if the follower and leader systems’ state-transition functions differ from one another (namely, whenever ). It is worth underlining that the state-transition functions of the leader and of the follower systems, namely and f, may differ to one another even when these systems are identical since the information on the leader’s state as acquired by the follower may be affected by measurement noise, namely, it might hold that , where denotes a measurement disturbance.
Another interesting observation concerns the speed of synchronization in connection to control effort saving. The constant c in the expression (442) influences the speed of synchronization, namely, the larger c, the speedier the synchronization. However, it is easy to see from the definition (448) that the constant c also affects the control effort, namely, the larger c, the more expensive the control action. Clearly, the proportional control constant c should be chosen as a tradeoff between speed and cost.
12.2. Numerical Methods to Simulate First-Order Systems
In the present subsection we shall recall from the specialized literature the notions of forward Euler method and of explicit -order Runge–Kutta method to simulate numerically classical first-order, non-linear, non-autonomous systems. In addition, in the body of the present section we shall lay out extensions of such numerical methods to implement on a computing platform those mathematical dynamical systems insisting on curved manifolds. The main ideas to achieve such an extension may be summarized as follows:
Extension of straight stepping: Classical numerical methods tailored to advance the solution from one step to the next by moving a system’s state along straight segments. On curved state manifolds, the notion of ‘straight segment’ needs to be replaced by the notion of ‘geodesic arc’, and henceforth additive stepping needs to be replaced by exponential-map-based stepping.
Extension of linear stages: High-order methods in select a stepping direction as a linear combination of a number of estimations of a vector field (in the Runge–Kutta methods, these estimations are called ‘stages’). On curved spaces, moving directions are tangent vectors that cannot be combined together directly because they belong to different tangent spaces. Such tangent directions need to be ‘aligned’ together in a given tangent space by means of parallel transport and then combined together.
The forward Euler scheme (fEul) is perhaps the simplest numerical scheme known in the scientific literature to tackle an initial value problem. On a Euclidean space , the forward Euler method to simulate numerically a dynamical system reads as
where k denotes a step counter ranging from 0 to a given integer , T denotes a time-discretization interval (generally, ) and denotes an approximation to the true state . The accuracy and numerical stability of this method turn out to be reasonable as long as the state-transition function meets certain conditions [71].
This Euler method moves forward the current state to the next state along a straight line directed toward of a fraction specified by T. Since curved manifolds admits no straight lines in the sense of Euclidean geometry, a plain forward Euler method is inherently unsuitable to cope with a tangent-bundle differential equation, as exemplified in the following.
Example41.
To tackle the problem that arises about the numerical implementation of dynamical systems on manifolds, it is worth examining an explanatory example based on the low-dimensional manifold . As already outlined, by embedding the space into the space , any element of may be regarded as a 2-by-2 real-valued matrix whose entries must satisfy the constrains: , , and .
Let us consider the first-order dynamical system on the manifold , with . Such a dynamical system may be written as a set of four differential equations of the type . In the present case, it holds that with , where . The forward Euler stepping technique of numerical calculus to solve the above system of differential equations would read as , with denoting a step size and a step counter. Such a numerical stepping method does not take into account the constraints on the entries of matrix X, namely, it generates a trajectory in the ambient space rather than in the feasible manifold . Namely, starting from a point , it would yield a new point . The reason of such behavior is that Euler techniques insist on flat spaces and do not cope automatically with curved manifolds.
It is instructive to investigate in detail the effect of an Euler stepping method in the solution of the differential equation . In such a context, the Euler stepping equation reads:
As the starting point, satisfies , and it holds that:
Notice that , and hence . The result of the first step already lost normality of its columns of an additive amount and changed its determinant from 1 to . Since T is generally far smaller than 1, and since the deviation is proportional to , such a deviation may not be apparent from the first steps, yet progressively detrimental. However, the first step keeps the orthogonality of the columns of the matrix X (such a result is peculiar of the case only and does not copy to the general case with ). For the next step, it holds that:
By induction, it is readily verified that the matrix keeps monotonically loosing the normality of its two columns of an identical amount. ■
The forward Euler method (449) can be extended to a smooth manifold by replacing the notion of straight line with the notion of geodesic line, to give
The generalization from (449) to (453) is conceived as follows. On a curved state manifold , each point belongs to while each quantity is a tangent vector in : these two quantities cannot be combined in an additive way, because , rather, such quantities are combined by the help of the exponential map, which describes a geodesic arc departing from the state in the tangent direction . The above stepping method allows one to compute the first K discrete points of the trajectory generated by the dynamical system , with , given the initial state . For a Euclidean state space , it holds that ; therefore, the scheme (453) is apparently a generalization of the well-known Euler scheme (449) from Euclidean to curved spaces.
The set of equations to simulate on a computing platform a leader system and a controlled follower system by the fEul method on a manifold are hence laid out as follows
For the sake of comparison, we notice that, in the case of a leader-follower pair evolving on a Euclidean state space , the above Equation (454) would simplify into
because and .
A second class of methods that we would like to recall is the explicit, fourth-order Runge–Kutta algorithm. The family of Runge–Kutta numerical integration methods on Euclidean spaces was conceived in order to increase the precision of lower-order methods, such as the Euler method [72]. The explicit 4th-order Runge–Kutta method (eRK4), is based on four partial increments (stages) that, combined together, lead to a complete step:
Similar to the Euler method, the eRK4 method moves forward the current state to the next state along a straight line toward a specific direction, except that such a direction is computed in a more convoluted way.
The eRK4 method may be extended to a curved manifold by appropriately converting each of the equations in (456). Such a conversion needs to take into account that the state space is now a curved manifold:
(The notation , , …, used to indicate intermediate steps, is standard in numerical calculus.) The reasoning that led to this numerical scheme is outlined as follows: Once a direction is obtained in the first stage, it is used to determine the direction in the second stage; the formula to compute in (456) prescribes to evaluate the vector field in a point ; this last expression, however, cannot be applied directly on a curved manifold and needs to be replaced by , which gives the expression in (457); even so, the obtained result cannot be used directly, since the update rule in (456), which prescribes to advance the value by , needs to be translated into which, in turn, requires the vector to belong to , while it belongs to ; for this reason, it is necessary to make a further modification and to parallel transport the vector to , namely, to compute ; the same holds for the subsequent stages.
The complete set of equations required to implement numerically a leader and a controlled follower by the eRK4 method on a manifold are
The implementation of the above equations on a computing platform does not pose serious concerns as long as the chosen development language possesses adequate commands to deal seamlessly with arrays and array-type functions.
13. Riemannian Hessian of a Manifold-to-Scalar Function
The next step in Taylor series approximation to a manifold-to-scalar function beyond the first-order term (through gradient) is the second-order termed based on the notion of ‘Hessian’.
13.1. Definition, Properties and Coordinate-Free Calculation of Riemannian Hessian
Given a differentiable function , its Riemannian Hessian at a point is a linear operator that appears in the quadratic term of the Taylor approximation
where denotes, in principle, any smooth curve such that and . A major problem with such liberality in the choice of a curve is that, since clearly
the Hessian operator would not just depend on the pair but also on , which is not prescribed. A Hessian that depends on the shape of the curve which, after all, should be instrumental—not determinant—is hardly acceptable in practice, which is why the notion of a Hessian operator is generally defined with respect to a geodesic. It is worth pointing out that, according to the ‘algebraic’ argument recalled in Observation 1, since the Hessian stems from a quadratic form, only its self-adjoint part plays a role. Indeed, since its very definition stems from a quadratic form, only its self-adjoint component may be determined.
Not surprisingly, the Riemannian Hessian on a manifold embedded in an ambient space is related to the ambient Hessian in . Still, there is a perhaps surprising outcome in its calculation, namely, it depends on the ambient gradient of f (while the standard Hessian does not). Let us recall what is meant by ambient Hessian of a function , that is
where denotes any smooth curve such that and —for instance, if , then takes the form of a matrix of partial derivatives and just denotes the matrix-to-column-array product .
To carry out the calculation of the Riemannian Hessian, let us recall the relationship (301) that may be written as
Introducing the metric kernel and the inner product on the ambient space gives:
Recalling the explicit expression (284) of the Riemannian gradient based on an orthogonal projector gives
The right-hand side in the above expression has been written in a way that makes it easier to take a second derivative with respect to the parameter t, which reads as
The first expression on the right-hand side may be written explicitly by introducing a linear operator that represents a derivative of the orthogonal projector, while the second term on the right-hand side may be made more explicit by recalling, from the relationships in (356), the relation between the naïve acceleration over a geodesic and the Christoffel form of second kind. Let us define the linear operator
where denotes a point where projection is carried out, is an element of the ambient space whose projection is sought, is a tangent direction along which a variation of the projected vector is sought to be estimated and is any smooth curve such that and .
To clarify the above definition, let us examine a useful computation rule.
Example42.
Let us define as an ambient-valued function of a manifold-valued argument and a smooth curve . Let us now consider the composition . We aim at calculating its derivative with respect to the parameter t. It holds that
which is a linear function of . ■
On the basis of the above calculation rule, of interest in the following developments, we can prove a collateral property of the map .
Example43.
The map applied totwotangent vectors returns a normal element. In fact, let us recall that for every and . On any smooth curve such that and it holds that
Deriving with respect to the parameter t and applying the computation rule (467), yields
Setting yields:
Now we can set and write
The term on the right-hand side represents the normal component of , and hence . ■
On the basis of the definition (337), the second derivative (465) may be rewritten as
Setting leads to
As a last step, recall that, in the rightmost term, it is possible to swap one instance of v with the term ; hence, we write
which must hold for every . The last expression is equivalent to
An explicit expression of the Riemannian Hessian may now be obtained by applying the orthogonal projector to both sides and then the inverse metric kernel to the result, which ultimately gives
Notice that, unless the difference between the last two terms is zero, the Riemannian Hessian depends on the ambient gradient .
It is worth examining separately a few special cases in which the expression of the Riemannian Hessian simplifies:
Case that is independent of x: This case occurs when tangent spaces coincide to one another. In this case, the projection operator may simply be denoted as and it holds that ; therefore,
Case that the metric kernel coincides with the identity: This is the case most generally covered in the literature that occurs when the ambient space is a Euclidean space. In this case
This case was explicitly covered in [73], in which the expression of the Hessian was given in terms of Weingarten map. The importance of the Weingarten map in applied sciences was further highlighted in [74].
Case that the gradient of the function f is null: In a point where the expression of the Hessian simplifies noticeably. In fact the last two terms in (476) vanish to zero; therefore, it holds that
This is in fact the only case in which the Hessian does not depend on the Christoffel form.
Case of the manifold endowed with a Euclidean metric: This is the reference case that we may look at for familiarity. In this case, , , ; hence,
which looks exactly as one expects.
To what concerns the operator , we are going to assume that the following commutation property with the ambient metric holds: For every , , it is true that
Let us go into some detail about the practical computation of the operator by surveying an example related to the Stiefel manifold.
Example44.
The Stiefel manifold embedded in its canonical ambient space endowed with the Euclidean metric has an orthogonal projector defined by the relation (286), namely [33]:
In order to apply the definition (466), let us first take any smooth curve , with and , and write
Taking the first derivative, side by side, with respect to the parameter t gives:
Now, setting and recalling that and yields
Let us verify the commutativity property (481). We have
Notice that the first terms on the right-hand sides of both inner products are null, because and are skew-symmetric, while the remaining terms may be proven equal by the trace cyclic permutation property and by symmetry. ■
The case of the special orthogonal group is relevant and instructive as well.
Example45.
The special orthogonal group embedded into the ambient space , endowed with the Euclidean metric, admits the orthogonal projection
Let us verify that maps any matrix from to the tangent space :
by easy matrix calculations. In addition, let us verify that :
To apply the definition (466), let us first take a smooth curve , with and , and evaluate
Taking the first derivative, side by side, with respect to the parameter t gives:
Setting and recalling that and yields
To end this example, let us verify the commutativity property (481). Let us observe that
The above expressions may be proven equal by some clever matrix manipulation. For example, by noticing that and , one may show that
by repeated application of cyclic permutation invariance.
Let us complete the calculation of the Riemannian Hessian for the special orthogonal group. According to the Hessian formula (478), we preliminarily need to evaluate
therefore
It is easy to verify, by direct calculation, that . ■
Let us survey the computation of the Riemannian Hessian on the hypersphere.
Example46.
Let us compute the Riemannian Hessian associated with the hypersphere endowed with the canonical metric , embedded in the ambient space endowed with the metric . Recall that it holds and .
It is just necessary to compute the derivative and then to make use of the relationship (478). By definition, we have
Ultimately, we obtain
which represents the Riemannian Hessian utilized in [61].
It is instrumental to notice that, for and , the quantity possesses a radial (normal) component as well as a tangential component . Let us verify the property (481). We have
The first terms on the right-hand sides are null because , while the remaining terms are to one another equal. ■
The space of symmetric, positive-definite matrices is interesting to what concerns Riemannian Hessian.
Example47.
Let us recall that, for the space , we have chosen , also it holds that , for any . Since does not explicitly depend on the point P, we may utilize Formula (477) to evaluate the Hessian. ■
As a last example, let us compute the Riemannian Hessian of a linear function. In the familiar case, we should obtain a null Hessian, while in a manifold setting this is easily guessed not to be the case.
Example48.
Let a smooth manifold be embedded in an ambient space and let denote a linear function such that constant and . Not having assumed anything else, the Hessian of the function ℓ should be calculated by means of the expression (476), namely
In this case, the Riemannian Hessian reads as
which clearly varies from point to point on . ■
The property (481) implies that in an inner product such as the tangent vector arguments may be swapped, which implies the Riemannian Hessian operator is self-adjoint, namely
This property may be proven directly by the relationship (476); in fact,
The first inner product is symmetric in v and w because the ambient Hessian is self-adjoint, the second term is self-adjoint by the property (481) and the last term is symmetric in v and w because the Christoffel form commutes with the inner product.
13.2. A Newton-like Optimization Algorithm
Given a function , we may define a quadratic approximation at a point in the direction as
As often recalled, the inner products may be evaluated in terms of ambient metrics and metric kernel. Therefore, the change in the value of the function f may be evaluated as
In optimization, it is fundamental to find a direction of maximal change from a given point, which may be determined by solving the following problem in v:
A consequence of a property shown in the previous subsection, namely that the Riemannian Hessian is self-adjoint, is that
Recalling that is linear, the Equation (506) hence reads
Recalling that is invertible, the above relationship simplifies to , which leads to the optimal direction
Notice that , and hence the result is consistent to what was expected. It is the case to notice that the inverse of the Hessian may be hard to express in closed form; hence, it is sometimes easier to just set up the (linear) equation and solve it by any linear-algebra tool.
The optimal direction may be exploited in a Newton-like optimization algorithm as follows:
where is an integer step-counter, is a step-size and denotes an initial guess.
14. Conclusions
The present paper focused on manifold calculus to describe the system-theoretic properties of non-linear systems whose states belong to curved manifolds and was conceived as the first part of a longer tutorial in manifold calculus.
In particular, the present tutorial paper focuses mainly on mathematical definitions and calculation rules, expressed in the language of embedded manifold calculus, that form a knowledge base to develop further concepts and applications. A number of manifolds of interest in applications were covered and a number of examples clarified some collateral, yet interesting, aspects.
A section of the present paper focuses on the design of a manifold-type system synchronization algorithm based on feedback control and on developing numerical methods tailored to curved manifolds to implement such systems and algorithms.
Since the present contribution aimed to lay out the basic concepts in manifold calculus and Lie group theory, it only covers the basis of first-order dynamical systems and proportional type control. Second-order systems and their control require advanced notions, such as covariant derivation. Such advanced notions will be covered in a forthcoming contribution.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Conflicts of Interest
The author declares no conflict of interest.
References
Abraham, R.; Marsden, J.; Ratiu, T. Manifolds, Tensor Analysis, and Applications; Springer: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
Arnol’d, V.; Givental’, A. Symplectic geometry. In Dynamical Systems IV: Symplectic Geometry &Its Applications, 2nd ed.; Arnol’d, V., Novikov, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; Volume 4, pp. 1–138. [Google Scholar]
Bloch, A. An Introduction to Aspects of Geometric Control Theory. In Nonholonomic Mechanics and Control; Interdisciplinary Applied Mathematics; Krishnaprasad, P., Murray, R., Eds.; Springer: New York, NY, USA, 2015; Volume 24. [Google Scholar]
Bullo, F.; Lewis, A. Geometric Control of Mechanical Systems: Modeling, Analysis, and Design for Mechanical Control Systems; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Sastry, S. Geometric Nonlinear Control. In Nonlinear Systems—Analysis, Stability, and Control; Sastry, S., Ed.; Springer Science + Business Media: New York, NY, USA, 1999; pp. 510–573. [Google Scholar]
Hairer, E.; Lubich, C.; Wanner, G. Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, 2nd ed.; Springer: Berlin, Germany; New York, NY, USA, 2006. [Google Scholar]
Ermentrout, G.; Rinzel, J. Beyond a pacemaker’s entrainment limit-phase walk-through. Am. J. Physiol.1984, 246, R102–R106. [Google Scholar] [CrossRef] [PubMed]
Schäfer, C.; Rosenblum, M.; Kurths, J.; Abel, H. Heartbeat synchronised with ventilation. Nature1998, 392, 239–240. [Google Scholar] [CrossRef] [PubMed]
Blasius, B.; Huppert, A.; Stone, L. Complex dynamics and phase synchronization in spatially extended ecological systems. Nature1999, 399, 354–359. [Google Scholar] [CrossRef] [PubMed]
Castrejón-Pita, A.; Read, P. Synchronization in a pair of thermally coupled rotating baroclinic annuli: Understanding atmospheric teleconnections in the laboratory. Phys. Rev. Lett.2010, 104, 204501. [Google Scholar] [CrossRef]
Ermentrout, B.; Wechselberger, M. Canards, clusters, and synchronization in a weakly coupled interneuron model. SIAM J. Appl. Dyn. Syst.2009, 8, 253–278. [Google Scholar] [CrossRef]
Kloeden, P. Synchronization of nonautonomous dynamical systems. Electron. J. Differ. Equ.2003, 1, 1–10. [Google Scholar]
Pikovsky, A.; Rosenblum, M.; Kurths, J. Synchronization—A Universal Concept in Nonlinear Sciences; Cambridge University Press: Cambridge, UK, 2001. [Google Scholar]
Stankovski, T. Tackling the Inverse Problem for Non-Autonomous Systems: Application to the Life Sciences; Springer Theses; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Blekhman, I.; Landa, P.; Rosenblum, M. Synchronization and chaotization in interacting dynamical systems. Appl. Mech. Rev.1995, 48, 733–752. [Google Scholar] [CrossRef]
Luo, A. A theory for synchronization of dynamical systems. Commun. Nonlinear Sci. Numer. Simul.2009, 14, 1901–1951. [Google Scholar] [CrossRef]
Stoer, J.; Bulirsch, R. Introduction to Numerical Analysis, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Golub, G.; Van Loan, C. Matrix Computations, 3rd ed.; Johns Hopkins: Baltimore, MD, USA, 1996. [Google Scholar]
Higham, N. Computing the polar decomposition—With applications. SIAM J. Sci. Stat. Comput.1986, 7, 1160–1174. [Google Scholar] [CrossRef] [Green Version]
Fiori, S.; Prifti, S. Exact low-order polynomial expressions to compute the Kolmogoroff-Nagumo mean in the affine symplectic group of optical transference matrices. Linear Multilinear Algebra2017, 65, 840–856. [Google Scholar] [CrossRef]
Olver, P. Applications of Lie Groups to Differential Equations, 2nd ed.; Graduate Texts in Mathematics; Springer: Berlin/Heidelberg, Germany, 2003; Volume 107. [Google Scholar]
Fiori, S. A fast fixed-point neural blind deconvolution algorithm. IEEE Trans. Neural Netw.2004, 15, 455–459. [Google Scholar] [CrossRef]
Fiori, S. Geodesic-based and projection-based neural blind deconvolution algorithms. Signal Process.2008, 88, 521–538. [Google Scholar] [CrossRef]
Pajunen, P.; Girolami, M. Implementing decisions in binary decision trees using independent component analysis. In Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation, Helsinki, Finland, 19–22 June 2000; pp. 477–481. [Google Scholar]
Zhu, Y.; Mio, W.; Liu, X. Optimal dimension reduction for image retrieval with correlation metrics. In Proceedings of the International Conference on Neural Networks (IJCNN 2009), Atlanta, GA, USA, 14–19 June 2009; pp. 3565–3570. [Google Scholar]
Yershova, A.; LaValle, S. Deterministic sampling methods for spheres and SO(3). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2004), New Orleans, LA, USA, 26 April–1 May 2004; Volume 4, pp. 3974–3980. [Google Scholar]
Trendafilov, N.; Lippert, R. The multimode Procrustes problem. Linear Algebra Its Appl.2002, 349, 245–264. [Google Scholar] [CrossRef] [Green Version]
Viswanathan, S.; Sanyal, A.; Samiei, E. Integrated Guidance and Feedback Control of Underactuated Robotics System in SE(3). J. Intell. Robot Syst.2018, 89, 251–263. [Google Scholar] [CrossRef]
Celledoni, E.; Fiori, S. Neural learning by geometric integration of reduced ‘rigid-body’ equations. J. Comput. Appl. Math.2004, 172, 247–269. [Google Scholar] [CrossRef] [Green Version]
Yoo, J.; Choi, S. Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds. In Proceedings of the Intelligent Data Engineering and Automated Learning (IDEAL 2008), Daejeon, Korea, 2–5 November 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 140–147. [Google Scholar]
Amari, S.I. Natural gradient learning for over- and under-complete bases in ICA. Neural Comput.1999, 11, 1875–1883. [Google Scholar] [CrossRef]
Edelman, A.; Arias, T.; Smith, S. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl.1998, 20, 303–353. [Google Scholar] [CrossRef]
Eldén, L.; Park, H. A Procrustes problem on the Stiefel manifold. Numer. Math.1999, 82, 599–619. [Google Scholar] [CrossRef]
Yger, F.; Berar, M.; Gasso, G.; Rakotomamonjy, A. Oblique principal subspace tracking on manifold. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 2429–2432. [Google Scholar]
Moakher, M. A differential geometry approach to the geometric mean of symmetric positive definite matrices. SIAM J. Matrix Anal. Appl.2005, 26, 735–747. [Google Scholar] [CrossRef]
Smith, S. Covariance, subspace, and instrinsic Cramér-Rao bounds. IEEE Trans. Signal Process.2005, 53, 1610–1630. [Google Scholar] [CrossRef] [Green Version]
Bonnabel, S.; Sepulchre, R. Riemannian metric and geometric mean for positive semidefinite matrices of fixed rank. SIAM J. Matrix Anal. Appl.2009, 31, 1055–1070. [Google Scholar] [CrossRef] [Green Version]
Fiori, S.; Kaneko, T.; Tanaka, T. Tangent-bundle maps on the Grassmann manifold: Application to empirical arithmetic averaging. IEEE Trans. Signal Process.2015, 63, 155–168. [Google Scholar] [CrossRef]
Stratmann, P.; Lakatos, D.; Albu-Schäffer, A. Neuromodulation and synaptic plasticity for the control of fast periodic movement: Energy efficiency in coupled compliant joints via PCA. Front. Neurorobot.2016, 10, 2. [Google Scholar] [CrossRef]
Leonard, N.; Krishnaprasad, P. Control of switched electrical networks using averaging on Lie groups. In Proceedings of the 33rd IEEE Conference on Decision and Control, Lake Buena Vista, FL, USA, 14–16 December 1994; pp. 1919–1924. [Google Scholar]
Tapp, K. Matrix Groups for Undergraduates, 2nd ed.; Student Mathematical Library; American Mathematical Society: Providence, RI, USA, 2016; Volume 79. [Google Scholar]
Arsigny, V.; Pennec, X.; Ayache, N.; Bi-Invariant Means in Lie Groups. Application to Left-Invariant Polyaffine Transformations. 2006. Available online: https://hal.inria.fr/inria-00071383 (accessed on 26 October 2021).
Amari, S.I. Natural gradient works efficiently in learning. Neural Comput.1998, 10, 251–276. [Google Scholar] [CrossRef]
Khvedelidze, A.; Mladenov, D. Generalized Calogero-Moser-Sutherland models from geodesic motion on GL+(n,R) group manifold. Phys. Lett. A2002, 299, 522–530. [Google Scholar] [CrossRef] [Green Version]
Henderson, H.V.; Searle, S.R. On deriving the inverse of a sum of matrices. SIAM Rev.1981, 23, 53–60. [Google Scholar] [CrossRef]
Courant, R.; Hilbert, D. Methods of Mathematical Physics, 1st ed.; Interscience Publishers: New York, NY, USA, 1953; Volume I. [Google Scholar]
Del Buono, N.; Lopez, L. Runge-Kutta type methods based on geodesics for systems of ODEs on the Stiefel manifold. BIT Numer. Math.2001, 41, 912–923. [Google Scholar] [CrossRef]
Wang, J.; Sun, H.; Fiori, S. Empirical means on pseudo-orthogonal groups. Mathematics2019, 7, 940. [Google Scholar] [CrossRef] [Green Version]
McGraw, T.; Vemuri, B.; Yezierski, B.; Mareci, T. Von Mises-Fisher mixture model of the diffusion ODF. In Proceedings of the 3rd IEEE International Symposium on Biomedical Imaging: Macro to Nano (ISBI 2006), Arlington, VA, USA, 6–9 April 2006; pp. 65–68. [Google Scholar]
Fréchet, M. Les élements aléatoires de nature quelconque dans un espace distancié. Ann. De l’Institut Henri PoincarÉ1948, 10, 215–310. [Google Scholar]
Karcher, H. Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math.1977, 30, 509–541. [Google Scholar] [CrossRef]
Tyagi, A.; Davis, J. A recursive filter for linear systems on Riemannian manifolds. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biol.1982, 15, 267–273. [Google Scholar] [CrossRef]
Oja, E. Neural networks, principal components, and subspaces. Int. J. Neural Syst.1989, 1, 61–68. [Google Scholar] [CrossRef]
Nedić, A.; Olshevsky, A.; Shi, W. Decentralized consensus optimization and resource allocation. In Large-Scale and Distributed Optimization; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 2018; Volume 2227. [Google Scholar]
Nedić, A.; Ozdaglar, A.; Parrilo, P. Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control2010, 55, 922–938. [Google Scholar] [CrossRef]
Ambrose, W.; Singer, I. A theorem on holonomy. Trans. Am. Math. Soc.1953, 75, 428–443. [Google Scholar] [CrossRef]
Fiori, S. Blind deconvolution by a Newton method on the non-unitary hypersphere. Int. J. Adapt. Control Signal Process.2013, 27, 488–518. [Google Scholar] [CrossRef]
Qi, C.H.; Gallivan, K.; Absil, P.A. Riemannian BFGS Algorithm with Applications. In Recent Advances in Optimization and Its Applications in Engineering, Part 3; Diehl, M., Diehl, M., Glineur, F., Jarlebring, E., Michiels, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 183–192. [Google Scholar]
Vandereycken, B.; Vandewalle, S. A Riemannian optimization approach for computing low-rank solutions of Lyapunov equations. SIAM J. Matrix Anal. Appl.2010, 31, 2553–2579. [Google Scholar] [CrossRef] [Green Version]
Kaneko, T.; Fiori, S.; Tanaka, T. Empirical Arithmetic Averaging over the Compact Stiefel Manifold. IEEE Trans. Signal Process.2013, 61, 883–894. [Google Scholar] [CrossRef]
Absil, P.A.; Malick, J. Projection-like retractions on matrix manifolds. SIAM J. Optim.2012, 22, 135–158. [Google Scholar] [CrossRef]
Fiori, S. Averaging over the Lie group of optical systems transference matrices. Front. Electr. Electron. Eng. China2011, 6, 137–145. [Google Scholar] [CrossRef]
Harris, W.; Cardoso, J. The exponential-mean-log-transference as a possible representation of the optical character of an average eye. Ophthalmic Physiol. Opt.2006, 26, 380–383. [Google Scholar] [CrossRef]
Misner, C.; Thorne, K.; Wheeler, J. Gravitation; W.H. Freeman Publisher: New York, NY, USA, 1973. [Google Scholar]
Ding, K.; Han, Q.L. Master-slave synchronization of nonautonomous chaotic systems and its application to rotating pendulums. Int. J. Bifurc. Chaos2012, 22, 1250147. [Google Scholar] [CrossRef]
Magdy, M.; Ng, T. Regulation and control effort in self-tuning controllers. IEE Proc. D—Control Theory Appl.1986, 133, 289–292. [Google Scholar] [CrossRef]
Courant, R.; Friedrichs, K.; Lewy, H. On the partial difference equations of mathematical physics. IBM J.1967, 11, 215–234. [Google Scholar] [CrossRef]
Lambert, J. Numerical Methods for Ordinary Differential Systems: The Initial Value Problem; Wiley: New York, NY, USA, 1992. [Google Scholar]
Absil, P.; Mahony, R.; Trumpf, J. An extrinsic look at the Riemannian Hessian. In Geometric Science of Information. GSI 2013; Lecture Notes in Computer Science; Nielsen, F., Barbaresco, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8085. [Google Scholar] [CrossRef] [Green Version]
Cao, Y.; Li, D.; Sun, H.; Assadi, A.H.; Zhang, S. Efficient Weingarten map and curvature estimation on manifolds. Mach. Learn.2021, 110, 1319–1344. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Fiori, S.
Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems. Symmetry2021, 13, 2092.
https://doi.org/10.3390/sym13112092
AMA Style
Fiori S.
Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems. Symmetry. 2021; 13(11):2092.
https://doi.org/10.3390/sym13112092
Chicago/Turabian Style
Fiori, Simone.
2021. "Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems" Symmetry 13, no. 11: 2092.
https://doi.org/10.3390/sym13112092
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.
Article Metrics
No
No
Article Access Statistics
For more information on the journal statistics, click here.
Multiple requests from the same IP address are counted as one view.
Fiori, S.
Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems. Symmetry2021, 13, 2092.
https://doi.org/10.3390/sym13112092
AMA Style
Fiori S.
Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems. Symmetry. 2021; 13(11):2092.
https://doi.org/10.3390/sym13112092
Chicago/Turabian Style
Fiori, Simone.
2021. "Manifold Calculus in System Theory and Control—Fundamentals and First-Order Systems" Symmetry 13, no. 11: 2092.
https://doi.org/10.3390/sym13112092
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.