La thèse de Kirillov, parue en 1962, a suscité immédiatement beaucoup d’intérêt…En outre, quantité de notions naturelles concernant les représentations s’interprètent géométriquement en terme d’orbites coadjointes: restriction à un sous-groupe, induction unitaire, produit tensoriel, mesure de Plancherel, la topologie de l’ensemble représentations unitaires irréductibles… Kirillov s’est vite convaincu, et il a convaincu la communauté mathématique que cette « méthode des orbites » devait être applicable à des groupes bien plus généraux que les groupes nilpotents. Il n’a pas hésité à aborder le cas des groupes de Lie connexes quelconques. Evidemment, des difficultés considérables ont surgi immédiatement. Néanmoins, Kirillov a indiqué une voie d’accès, qui ensuite a été largement utilisée.
- Jacques Dixmier, Brèves remarques sur l’œuvre de A.A. Kirillov
On comprend ainsi comment Lagrange a pu développer les lois de la Mécanique des systèmes formés de solides sans s’occuper des variations de la température de ces corps et Fourier traiter des variations de la température de ces mêmes corps solides sans s’occuper de leur mouvement; comment on peut étudier le mouvement de la Terre, assimilée à un solide rigide, sans se préoccuper de la température de cet astre et étudier le refroidissement du globe terrestre sans se préoccuper de son mouvement. Une telle indépendance entre les problèmes qui ressortissent à la Mécanique et les problèmes qui ressortissent à la Théorie de la chaleur n’existe plus lorsque les systèmes auxquels on a affaire ne sont plus des systèmes classiques; si, par exemple, au lieu de regarder la Terre comme un solide rigide, d’état invariable, on tient compte des changements de volume, de forme, d’état physique et chimique qui accompagnent son refroidissement, on ne peut plus séparer le problème du mouvement de la Terre et le problème du refroidissement terrestre. … On sait que cette forme de relations supplémentaires avait été introduite par Newton et les géomètres du XVIIIème siècle dans la théorie du son. Ces considérations montrent que les questions qui ressortissent à la Thermodynamique ont dû solliciter l’attention des physiciens dès qu’on a voulu aborder l’étude des systèmes autres que des systèmes classiques; et, en fait, c’est la théorie de la propagation du son dans l’air qui a provoqué Laplace à créer la Thermodynamique.
-
P. Duhem, L’intégrale des forces vives en thermodynamique, JMPA 4:5-19, 1898 [1,2,3,4]
Sous cette aspiration, la physique qui était d’abord une science des “agents” doit devenir une science des “milieux”. C’est en s’adressant à des milieux nouveaux que l’on peut espérer pousser la diversification et l’analyse des phénomènes jusqu’à en provoquer la géométrisation fine et complexe, vraiment intrinsèque…Sans doute, la réalité ne nous a pas encore livré tous ses modèles, mais nous savons déjà qu’elle ne peut en posséder un plus grand nombre que celui qui lui est assigné par la théorie mathématique des groupes
- Gaston Bachelard, Etude sur l’Evolution d’un problème de Physique –La propagation thermique dans les solides, 1928
1. Introduction
The previous French quotes by the Mathematician Jacques Dixmier, the Physicist Pierre Duhem, and the Philosopher Gaston Bachelard are important to introduce the epistemological context of models that will be developed in the paper. Jacques Diximer refers to Alexander Kirillov seminal idea of coadjoint orbits method to consider Lie group representation model. Pierre Duhem makes comments to the origin of the gap between the theory of heat and the theory of Mechanics. Finally, Gaston Bachelard make prediction that new Thermodynamics foundations will be given by groups. We will try in this paper, to prove that these ideas could be reconciled by the Souriau model of Lie groups Thermodynamics through the mathematical structure of Lie algebra cohomology.
After a the state of the art and trends in Machine Learning based on Information Geometry, we will present, in this introduction, the main objective of this paper to jointly apply models from geometric statistical mechanics and tools from Information geometry to solve “Gauss density” definition problem for statistics on Lie groups and homogeneous manifolds. We will also present use-cases motivation for Lie group machine learning illustrating for Doppler statistics analysis with SU(1,1) statistics, and for kinematics data analysis with SE(2) statistics.
1.1. State of the Art and Trends in Machine Learning Based on Information Geometry
The classical simple gradient descent used in Deep Learning has two drawbacks: the use of the same non-adaptive learning rate for all parameter components, and a non-invariance with respect to parameter re-encoding inducing different learning rates. As the parameter space of multilayer networks forms a Riemannian space equipped with Fisher information metric, instead of the usual gradient descent method, the natural gradient or Riemannian gradient method, which takes account of the geometric structure of the Riemannian space, is more effective for learning. The natural gradient preserves this invariance to be insensitive to the characteristic scale of each parameter direction. The Fisher metric defines a Riemannian metric as the Hessian of two dual potential functions (the Entropy and the log-partition function). Yann Ollivier and Gaétan Marceau-Caron provided in 2016 [
5] the first experimental results on non-synthetic data sets for the quasi-diagonal Riemannian Natural gradient descents for neural networks introduced previously by Yann Ollivier in [
6] (MNIST, SVHN, and FACE data sets). The quasi-diagonal Riemannian algorithms consistently beat simple stochastic gradient gradient descents by a varying margin. The computational overhead with respect to simple backpropagation is around a factor 2, and reach their final performance quickly, thus requiring fewer training epochs and a smaller total computation time. The main goal of natural gradient is to obtain invariance properties, such as, for a neural network, insensitivity of the training algorithm to whether a logistic or tanh activation function is used, or insensitivity to simple changes of variables in the parameters, such as scaling some parameters. In 2017, same authors have introduced the resulting natural Langevin dynamics [
7] combining the advantages of natural gradient descent and Fisher-preconditioned Langevin dynamics for large neural networks, validated on MNIST with Fisher matrix preconditioning. With all invariance properties of natural gradient, this Langevin Dynamics avoids overfitting as a regularization method, and replaces classical methods based on a controlled amount of noise to stochastic gradient descents, that ensures convergence to the Bayesian posterior on model parameters. The theoretically optimal covariance of the noise is the inverse Fisher metric, and Y. Ollivier and G. Marceau-Caron have shown how to implement this in practice with neural networks using efficient Fisher metric approximations. In 2017, Yann Ollivier has also introduced TANGO algorithm (True Asymptotic Natural Gradient Optimization) [
8], which converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation, and where in large dimension, small learning rates will be required to approximate the natural gradient well. Y. Ollivier has also shown that it is possible to get arbitrarily close to exact natural gradient descent with a lightweight algorithm. About natural gradient for Deep Learning, we can refer to [
9,
10]. This year, Shun-ichi Amari [
11] has given an elementary geometrical proof that any target function is realized in a sufficiently small neighborhood of any randomly connected deep network, provided the width (the number of neurons in a layer) is sufficiently large.
In this paper, we will introduce how to extend these approaches for data as elements of Lie groups or data lying on a homogeneous manifold where a Lie group acts transitively. This extension is considered in the framework and interconnexion of Souriau “Lie groups Thermodynamics”, Information Geometry and Kirillov representation theory [
12] to define probability densities for Lie groups, as Souriau covariant Gibbs densities (density of Maximum of Entropy). We will develop this case for the matrix Lie group SU(1,1) (case with null cohomology) through the computation of Souriau’s moment map, and Kirillov’s orbit method. We will also develop the method for SE(2) Lie group (case with non-null cohomology) where a Souriau cocycle should be taken into account due to the defect of equivariance of the coadjoint action on the moment map.
Supervised learning approaches are based on neural networks whose parameters are estimated by natural gradient algorithms. Non-supervised algorithm are based on clustering by using technics called “k-means” or “Mean-shift” using distance between elements of the dataset. In both cases, if we want to extend these approaches for Lie groups dataset, we have to extend the notion of Gaussian densities and distance between elements. We propose to use Geometric Statistical Model coming from Geometric Statistical Mechanics to introduce “Gauss density” of Lie group elements. Jointly, we can associate a natural distance between these Lie group elements on the Symplectic manifold by means of KKS 2-form, introducing a natural Riemannian metric associated to Fisher Metric from Information Geometry. The objective of this paper is to explain how to use Geometric Statistical Mechanics tools in this context.
1.2. Objectives of this Paper
The purpose of this article is multiple. The work of Professor Jean-Marie Souriau is well known in the field of “Geometric Mechanics” of which he is one of the founders with his book “structure of dynamic systems” published in 1969, and in which he introduced the foundations of Symplectic Geometry. Inside this book, chapter IV dealing with the extension of Geometric Mechanics to Statistical Mechanics, has been little read or misunderstood by this community. We have discovered that this model was part of and generalized another discipline, which is called Information Geometry. We have demonstrated in other previous articles that one could generalize Fisher metric (invariant metric used in Information Geometry) for Lie groups, with this model. It is therefore a question of rehabilitating the work of Jean-Marie Souriau in a broader framework, which concerns statistics and machine learning extended to objects considered as elements of a Lie group or a homogeneous manifold.
The second goal is to solve with these new tools problems that were still unsolved in statistics and machine learning. These unresolved problems concern the definition and calculation of the expression of probability densities, playing the role of Gaussian density, for elements of a Lie group or elements of homogeneous manifolds. In this article, we completely solve the problem for 2 Lie groups very useful in machine learning but also in physics, the Lie groups SU(1,1) and SE(2). The calculation is not a simple application of the Souriau model, because it is necessary to establish the “moment map” associated with these groups and define a Laplace transform on their coadjoint orbits of these groups (action of the group on the dual space of Lie algebra). In a second step, we must use Information Geometry to write these covariant Gibbs densities in the correct parametrization which parametrizes the generalized Gaussian law from statistical moments on the homogeneous symplectic manifold associated with coadjoint orbits. In the case SU(1,1), which corresponds to a case of null cohomology (equivariance of the coadjoint operator on the moment application), as the homogeneous symplectic manifold to the coadjoint orbit is the Poincaré unit disk, we solve jointly, an open problem to define mathematically the notion of Gaussian density in this disk in hyperbolic geometry. With the property that this density is by construction invariant under the action of the group SU(1,1), which is the condition sine qua none to preserve the symmetries and the invariance of the associated Fisher metric. We show that this model achieves a breakthrough in machine learning, because we have a Gibbs density and a Fisher metric invariant by change of parametrization and invariant under the action of symmetries. Gibbs density allows us to extend the classical supervised statistical machine learning algorithms, and Fisher metric allows us to adress unsupervised learning problem as k-means problems in metric space. The model opens the way to machine learning for Lie groups with multiple applications in robotics, sensor signal processing, image processing.
In the last part of the article, based on this model, we give a new “geometric” definition of Entropy by showing that Entropy is an invariant Casimir function in coadjoint representation. The Casimir functions have been widely studied within the framework of Poisson structures and manifolds [
13,
14,
15,
16]. This characterization of Entropy is new, because previously Entropy was defined axiomatically. Using this Casimir function property, we show that it is possible to use full geometric approaches to construct the Entropy function only from the structure coefficients of the Lie group associated with the symmetries involved. We show that we can also introduce an Euler-Poincaré equation and its stochastic variant to study other open problems in statistics and thermodynamics. The application of this Casimir characterization, which is demonstrated in this article, are developed in another twin article published in the same special issue with François Gay-Balmaz [
17].
1.3. Motivation of Lie Group Machine Learning with Use-Cases
Machine learning is a field of study of artificial intelligence, which is based on statistical approaches to give computers the ability to “learn” from data, that is, to classify data from observations in a supervised or non-supervised way. Machine learning generally has two phases. The first consists in estimating a model from data, called observations. This so-called “training” phase is generally carried out before the practical use of the model. The second phase corresponds to the start of production: the model being determined, new data can then be classified. According to the information available during the learning phase, learning is qualified in different ways. If the data is labeled (that is, the task response is known for that data), it is supervised learning. We speak of classification if the labels are discrete, or of regression if they are continuous. In the most general case, without a label, we seek to determine the underlying structure of the data (which can be a probability density) and it is then unsupervised learning. Machine learning can be applied to different types of data, such as graphs, trees, curves, or more simply feature vectors, which can be continuous or discrete. We propose to extend the approach, when datasets are element of matrix lie groups.
Learning algorithms can be categorized according to their learning mode. For supervised learning, the classes are predetermined and the examples known, and then the system learns to classify according to a classification model. An expert must label examples beforehand. The process takes place in two phases. For unsupervised learning, when the system or operator has only examples, but no label, and the number of classes and their nature have not been predetermined, we speak of unsupervised learning or clustering. No expert is required. The algorithm must discover for itself the more or less hidden structure of the data, by data partitioning and data clustering. The system must cluster the data according to their available attributes, to classify them into homogeneous groups of examples. Similarity is generally calculated according to a distance function between pairs of examples.
We will illustrate two problems of Machine Learning on Lie groups coming from Radar Industry. Target recognition on Radar micro-Doppler data could be modeled by a problem of classification of dataset considered as elements of SU(1,1) Lie group (see
Figure 1). Radar complex time series of micro-Doppler observation of data are classically processed on sliding time window to estimate their associated covariance matrices that are characterized by a Toeplitz Hermitian Positive-definitiveness structure. Using a well-known Verblunsky/Trench Theorem, we can parametrize all Toeplitz Hermitian Positive Definite Covariance matrices of stationary Radar Time series in a product space with a real positive axis (for signal power) and a Poincaré polydisk (for Doppler Spectrum shape). If we consider the Poincaré Unit Disk as an homogeneous space where SU(1,1) Lie group acts transitively. Each data in Poincaré unit disk of this polydisk could be then coded by SU(1,1) matrix Lie group element. We have transformed the problem into a statistical learning challenge processing data of SU(1,1) matrix Lie group. Another exemple considers flying object recognition on their kinematics coded in SE(2) or SE(3) Lie Groups. 3D (or 2D) trajectories could be coded by SE(3) (or SE(2)) Lie group time series provided through Invariant Extended Kalman Filter (IEKF) Radar Tracker, that locally estimates displacement of Frenet-Seret frame. Object kinematics will be then coded by time series of SE(3) (or SE(2)) matrix Lie groups characterizing local rotation/translation of Frenet frame along the drone 3D (or 2D) trajectory. Statistics of this SE(3) (or SE(2)) Lie group elements will characterize flight mechanics of different kinds of object (birds, drones, …).
SU(1,1) or SE(2) are also fundamental tools in Image Processing (Sub-Riemannian Geometry of vision with SE(2)), in robotics (rigid bodies statistical analysis with SE(2)), in Natural Langage Processing (methods of graph-embedding in Poincaré disk with SU(1,1)), …. For instance, SU(1,1) Lie group which acts on Poincaré unit Disk is highly studied to embed isometrically a graph in an hyperbolic space. It is used by GAFAM (Google, Facebook, …) for Natural Language Processing by reducing graph analysis to a Machine Learning problem in Hyperbolic Poincaré Unit Disk. Hyperbolic Neural network [
18] have been developed in this framework. SU(1,1) Lie group is also fundamental in Quantum physics to describe Coherent states of an electron in a magnetic field for instance [
19] and Coherent states in Quantum Optics [
20] (some statistical photon-counting aspects of SU(1,1) coherent states are emphasized). SE(2) Lie group is especially fundamental for Geometry of Vision considering sub-Riemannian approaches of the Citti-Petitot-Sarti Model [
21] but also also in neuroimagery [
22].
1.3.1. SU(1,1) Lie Group Machine Learning for Doppler Data Statistics Analysis
Lie group structure appears naturally on Doppler data, if we consider time series of locally stationary signal and their associated covariance matrix. Covariance matrix is Toeplitz Hermitian Positive Definite. Based on Theorem due to Verblunsky [
23,
24] and Trench [
25], we can parametrize Hermitian Positive Definite Matrix in product space involving the Poincaré unit Polydisk:
where
D is the Poincaré Unit Disk:
The Poincaré unit disk is an homogeneous bounded domain where the Lie group SU(1,1) act transitively. This Matrix Group is given by:
where SU(1,1) acts on the Poincaré Unit Disk by:
with Cartan Decomposition of SU(1,1)
We can observe that
could be considered as action of
on the centre on the unit disk
. The principal idea is that we can code any point
in the unit disk by an element of the Lie group SU(1,1). Main advantage is that the point position is no longer coded by coordinates but intrinsically by transformation from the orogin 0 to this point. Finally, a covariance matrix of a stationary signal could be coded by (
n−1) Matrix SU(1,1) Lie group elements:
1.3.2. SE(2) and SE(3) Lie Groups Machine Learning for Kinematics Data Statistics Analysis
When we consider a 3D trajectory of a mobile target, we can describe this curve by a time evolution of the local Frenet-Serret frame (local frame with tangent vector, normal vector and binormal vector) as illustrated in
Figure 2. This frame evolution is described by the Frenet-Serret formula that gives the kinematic properties of the target moving along the continuous, differentiable curve in 3D Euclidean space ℝ
3. More specifically, the formulas describe the derivatives of the so-called tangent, normal, and binormal unit vectors in terms of each other.
We will consider motions determined by exponentials of paths in the Lie algebra. Such a motion is determined by a unit speed space-curve
. Now in a Frenet-Serret motion a point in the moving body moves along the curve and the coordinate frame in the moving body remains aligned with the tangent
, normal
, and binormal
, of the curve. Using the 4-dimensional representation of the Lie group SE(3), the motion can be specified as:
where
is the curve and the rotation matrix
has the unit vectors
,
, and
as columns:
If we introduce the Darboux vector
that we can rewritte from Frenet-Serret Formulas:
Then, we can write with Ω is the 3 × 3 anti-symmetric matrix corresponding to
:
We note that
and
.
The instantaneous twist of the motion
is given by:
This is the Lie algebra element corresponding to the tangent vector to the curve
. It is well known that elements of the Lie algebra
can be described as lines with a pitch. The fixed axode of a motion
is given by the axis of
as t varies. The instantaneous twist in the moving reference frame is given by
, that is, by the adjoint action on the twist in the fixed frame. The instantaneous twist
can also be found from the relation:
We can observe that we could describe a 3D trajectory by a time series of
SE(3) Lie group elements:
with
Then, the trajectory will be given by the following time series of
SE(3) elements:
2. New Results Introduced in the Paper
The paper is structured in two parts:
- -
1st Part on “Gauss Density on Lie groups”: This part is totally new in Machine learning with an extension of “Gauss densities” (defined as Maximum Entropy model) for Lie groups coupling both Souriau model (introduced in statistical physics domain), with Information Geometry in Geometric Machine Learning domain. We illustrate with two use-cases SU(1,1) and SE(2) that are the most useful Lie groups in Image Processing (Sub-Riemannian Geometry of vision with SE(2)), in robotics (rigid bodies statistical analysis with SE(2)), in Natural Langage Processing (SU(1,1) with methods of graph-embedding in Poincaré disk), …. Some tentatives have been developed to define noise on Lie groups by adding additional Gaussian components on elements of the Lie algebra [
26,
27,
28,
29], but these models are not mathematically correct because they do not preserve the symmetries and the moment map associated to these symmetries by the Noether Theorem.
- -
2nd part on “Entropy definition extension as Casimir Function”: This part gives a new geometric definition of Entropy as invariant Casimir function in coadjoint representation, explaining the invariance of entropy under the affine coadjoint action on moment map in the dual space of Lie algebra. This definition was not in the paper of Souriau. With this new definition, we can compute Entropy only by structure constraints given by the Lie group. It opens the door to new generalization of Maximum Entropy method and first of all computation of “Gaussian densities” for any Lie group. Applications of this new property is not developed in this paper but in a twin paper in the same special issue [
17]. We refert to M. Gromov papers to consider more geometric structures of Entropy [
30,
31].
The main new results of this paper are the introduction of “Gauss density” for Lie groups or data on homogeneous space where a Lie groups acts transitively, and the full computation for SU(1,1) Lie group. This group acts transitively on the Poincaré unit disk, and so we have also solved an open problem related to Gauss density on this homogeneous space. For this purpose, the main approach has considered an extended definition of classical “Gauss density”, as introduced by Jaynes, in term of density of Maximum Entropy. In this way, the initial problem was transfert to a new one related to the good definition of Entropy for Lie groups. To address this problem, first, we have recalled the classical Euclidean case, where the Entropy could be defined as the Legendre transform of minus the log-partition function (defined by Laplace transform) by the following equation . The next step was to explain how to extend the log-partition function for Lie groups. We have then considered the Laplace transform in the framework of Lie group representation theory as introduced by Alexander Kirillov and Geometric Statical Mechanics as modeled by Jean-Marie Souriau. We have preserved the same Legendre structure, and have defined the Entropy , parametrized on the dual space of the Lie algebra (called geometric heat), as Legendre transform of minus of the log-partition function , parametrized on the Lie algebra by (called geometric Planck Temperature), from a Laplace transform defined on the homogeneous symplectic manifold (associated to the Lie group by the Kirrilov-Kostant-Souriau 2-form called KKS 2-form in the litterature). By introducing the moment map , fundamental tool of representation theory introduced by Souriau, we were able to define the log-partition function on the coadjoint orbit of the Lie group, . The entropy is then given by the Legendre transform . We have then defined the Gauss density for Lie groups as the density that maximizes this Entropy under the constraint of its associated first moment . The Gauss density is then established by analogy with thermodynamics as the Gibbs density . But this is not enough, because this density is not given in the good parametrization. We have proposed to express the Gibbs density with respect to the 1st statistical moment (statistical mean of moment map) by inverting the relation . The Gibbs density with will provide the extended definition of Gauss density in final good parametrization.
For the time being, no “Gaussian density” was defined on Poincaré unit disk with the mandatory property to be covariant under the action of SU(1,1) Lie group that acts transitively on this homogeneous bounded domain. We have applied the previous model via computation of moment map and developed the full computation of this extended Gauss density for SU(1,1) Lie group, and then deduced as consequence the gauss density for the Poincaré unit disk considered as the homogeneous symplectic manifold associated to the coadjoint orbit of the SU(1,1) Lie group via KKS 2 form. Considering the Lie algebra and the dual space of the Lie algebra , we have computed the moment map defined by , that maps the Poincaré unit disk into a coadjoint orbit in , The moment map is a diffeomorphism of onto one sheet of the two-sheeted hyperboloid in , determined by the following equation . But the full SU(1,1) Lie group is not related to any equilibrium Gibbs state (the open subset of the Lie algebra, associated to this Gibbs state is empty). We have then considered one-parameter subgroups of the Lie group such that the open subset is not empty. In the neighborhood of the identity element, the elements of can be written as the exponential of an element of its Lie algebra. If we make the remark that we have the following relation , we can developed the exponential map by a Taylor expansion of the exponential function, which is given by the following relation .
We can observe that one condition is that then the subset to consider is given by the subset such that . Finally, we have computed the covariant Gibbs density in the unit disk given by and by the moment map of the Lie group , that could be expressed in the following equation: . To write the final Gibbs density with respect to its statistical moment, we rewrite the density with , by where and .
To extend this approach for covariant Gibbs density on Siegel Unit Disk , that is a classical matrix extension of Poincaré unit Disk, we have proposed to consider unitary group and the homogeneous space and the moment map given by .
After Lie group (case with null cohomology), we have considered the same model for Lie group with non-null cohomology that needs the use of symplectic one-cocycle to manage the defect of cohomology. We have considered the special Euclidean group with , and the Lie algebra of with , to define the moment map that is given by the expression with . Then, the Gibbs density is deduced for generalized temperature by , with the log-partition function given by the following expression with and where . To obtain the good parametrization related to statical moments, we have inverted the relation , to provide the covariant Gibbs density parametrized by . The final Gauss density for SE(2) is then .
We conclude the paper by a deeper study of Souriau model structure. We observe that Souriau Entropy
defined on coadjoint orbit of the group has a property of invariance
with respect to Souriau affine definition of coadjoint action
where
is called the Souriau cocyle. In the framework of Souriau Lie groups Thermodynamics, we can then characterize the Entropy as a generalized Casimir invariant function in coadjoint representation, and Massieu characteristic function (or log-partition function), dual of Entropy by Legendre transform, as a generalized Casimir function in adjoint representation. When
M is a Poisson manifold, a function on
M is a Casimir function if and only if this function is constant on each symplectic leaf (the non-empty open subsets of the symplectic leaves are the smallest embedded manifolds of
M which are Poisson submanifolds) [
15]. Classically, the Entropy is defined axiomatically as Shannon or von Neumann Entropies without any geometric structures constraints. In this paper, the Entropy is also presented as solution of the Casimir equation
with
, where
appears in case of non-null cohomology (non-equivariance of coadjoint operator on the moment map), with
the Souriau Symplectic cocycle. The dual space of the Lie algebra foliates into coadjoint orbits that are also the level sets on the entropy. The KKS (Kostant-Kirillov Souriau) 2-form, and the Souriau-Koszul-Fisher metric transform each orbit into a homogeneous Symplectic manifold. The information manifold foliates into level sets of the entropy that could be interpreted in the framework of Thermodynamics by the fact that motion remaining on this complex surfaces is non-dissipative, whereas motion transversal to these surfaces is dissipative, where the dynamic is given by
with stable equilibrium when
. We have finally also observed that
where
, showing that Entropy production is linked with Souriau tensor related to Fisher metric.
The Casimir equations that we have introduced in non-zero cohomology case are consequences of the constancy of the entropy on adjoint orbits of the Lie algebra and of the equivariance of the map between the set of generalized temperatures and the dual space of the Lie algebra, as introduced by Jean-Marie in his 1974 paper. We explained this fact in the paper by starting elaboration of Casimir equations from the Souriau equation. Casimir equations are then presented in this context, as a fully equivalent form written in a new way, especially in the framework of Souriau Lie groups Thermodynamics. Souriau has not observed that the Entropy is an invariant Casimir function in coadjoint representation, but we can assume that he was fully aware of this invariant structure.
From Souriau equation
published in 1974, we have rewritten as direct consequence this equation on a Casimir form
. This equation preserves the geometric structures included in Souriau equation but allow us to consider the Entropy from the point of view of Casimir invariant function. The concept of Entropy and the concept of Casimir function were, for the time being, two disjoint concepts that have been developed independently in the past. There is a large literature on Casimir function, especially the russian one that have characterized properties of Casimir function. We refer to Igor V. Shirokov who has proposed a method for constructing invariants of the coadjoint representation of Lie groups with an arbitrary dimension and structure based on local symplectic coordinates on the coadjoint orbits. With Oleg L. Kurnyavko, Igor V. Shirokov has also proposed a general method for constructing invariant Casimir functions. The second reference is about A.T. Fomenko and V.V. Trofimov who have also deeply studied Casimir functions (but in case of null cohomology) and have developed the following equation that we can write for Entropy in null cohomology case
with
a representation of Lie algebras defined on basis
in
. We refer to a twin paper [
17] developing consequences of this new definition of Entropy as an invariant Casimir function. In this twin paper, we study the associated Euler-Poincaré equation
and the stochastic extension based on a new Stratonovich differential equation for the stochastic process given by the following relation by mean of Souriau’s symplectic cocycle
. These kind of stochastic equations have been also studied by Alexis Arnaudon and Daryl Holm but only in the restricted case of null-cohomology [
32].
We give references from classical textbooks (as Souriau book and papers) to preprints because different approaches have been developed in parallel to address Lie groups statistics, as soon as mid of last century, but without bridges between these disciplines which have developed specific tools to address this problem. We have limited these references to main and important documents, which are characterized as seminal and as tutorial of their domains. We have preserved references in French, because some works as Souriau Lie groups Thermodynamics model have not been yet largely spread towards the different communities.
3. Learning Inference Lie Groups Thermodynamics and Covariant Gibbs Density
We identify the Riemanian metric introduced by Souriau based on cohomology, in the framework of “Lie groups thermodynamics” as an extension of classical Fisher metric introduced in information geometry. We have observed that Souriau metric preserves Fisher metric structure as the Hessian of the minus logarithm of a partition function, where the partition function is defined as a generalized Laplace transform on a sharp convex cone. Souriau’s definition of Fisher metric extends the classical one in case of Lie groups or homogeneous manifolds. Souriau has developed this “Lie groups thermodynamics” theory in the framework of homogeneous symplectic manifolds in geometric statistical mechanics for dynamical systems, but as observed by Souriau, these model equations are no longer linked to the symplectic manifold but equations only depend on the Lie group and the associated cocycle [
33,
34]. This analogy with Fisher metric opens potential applications in machine learning, where the Fisher metric is used in the framework of information geometry, to define the “natural gradient” tool for improving ordinary stochastic gradient descent sensitivity to rescaling or changes of variable in parameter space. In machine learning revised by natural gradient of information geometry, the ordinary gradient is designed to integrate the Fisher matrix. Amari has theoretically proved the asymptotic optimality of the natural gradient compared to classical gradient. With the Souriau approach, the Fisher metric could be extended, by Souriau-Fisher metric, to design natural gradients for data on homogeneous manifolds. Information geometry has been derived from invariant geometrical structure involved in statistical inference. The Fisher metric defines a Riemannian metric as the Hessian of two dual potential functions, linked to dually coupled affine connections in a manifold of probability distributions. With the Souriau model, this structure is extended preserving the Legendre transform between two dual potential function parametrized in Lie algebra of the group acting transentively on the homogeneous manifold.
3.1. Inference by Natutal Gradient and Legendre Structure
Classically, to optimize the parameter
of a probabilistic model, based on a sequence of observations
, is an online gradient descent:
with learning rate
, and the loss function
. This simple gradient descent has a first drawback of using the same non-adaptive learning rate for all parameter components, and a second drawback of non invariance with respect to parameter re-encoding inducing different learning rates. Amari has introduced the natural gradient to preserve this invariance to be insensitive to the characteristic scale of each parameter direction. The gradient descent could be corrected by
where
is the Fisher information matrix with respect to parameter
, given by:
with natural gradient:
Amari has proved that the Riemannian metric in an exponential family is the Fisher information matrix defined by:
and the dual potential, the Shannon entropy, is given by the Legendre transform:
We can observe that is linked with the cumulant generating function.
J.L. Koszul and E. Vinberg have introduced an affinely invariant Hessian metric on a sharp convex cone through its characteristic function:
Jean-Louis Koszul has introduced the following forms
2nd Koszul form:
with the following property of positive definitiveness:
Koszul has defined the following Diffeomorphism:
with preservation of Legendre transform:
3.2. Souriau Lie Groups Thermodynamique and Souriau-Koszul-Fisher Metric
This relations have been extended by Jean-Marie Souriau in geometric statistical mechanics, where he developed a “Lie groups thermodynamics” of dynamical systems where the (maximum entropy) Gibbs density is covariant with respect to the action of the Lie group. In the Souriau model, previous structures of information geometry are preserved:
In the Souriau Lie groups thermodynamics model,
is a “geometric” (Planck) temperature, element of Lie algebra
of the group, and
is a “geometric” heat, element of the dual space of the Lie algebra
of the group. Souriau has proposed a Riemannian metric that we have identified as a generalization of the Fisher metric:
Souriau has proved that all co-adjoint orbit of a Lie group given by carries a natural homogeneous symplectic structure by a closed G-invariant 2-form. If we define and with where if , , the G-invariant 2-form is given by the following expression . Souriau Foundamental Theorem is that « Every symplectic manifold on which a Lie group acts transitively by a Hamiltonian action is a covering space of a coadjoint orbit ». We can observe that for Souriau model, Fisher metric is an extension of this 2-form in non-equivariant case .
The Souriau additional term
is generated by non-equivariance through Symplectic cocycle. The tensor
used to define this extended Fisher metric is defined by the moment map
, application from
(homogeneous symplectic manifold) to the dual space of the Lie algebra
, given by:
This tensor
is also defined in tangent space of the cocycle
(this cocycle appears due to the non-equivariance of the coadjoint operator
, action of the group on the dual space of the lie algebra; the action of the group on the dual space of the Lie algebra is modified with a cocycle so that the momentu map becomes equivariant relative to this new affine action):
is called nonequivariance one-cocycle, and it is a measure of the lack of equivariance of the moment map.
The cocycle should verify:
We can also compute tangent of one-cocycle
at neutral element, to compute 2-cocycle
:
We can also write:
By differentiating the equation on affine action, we have:
It can be then deduced that the tensor could be also written:
with the cocycle property:
By noting the action of the group on the dual space of the Lie algebra:
Associativity is also derived:
This study of the moment map equivariance, and the existence of an affine action of G on , whose linear part is the coadjoint action, for which the moment is equivariant, is at the cornerstone of Souriau theory of geometric mechanics and Lie groups thermodynamics.
3.3. Souriau Entropy and Souriau-Fisher-Koszul Metric Invariance under the Action of the Group and Covariant Souriau Gibbs Density
In Souriau’s Lie groups thermodynamics, the invariance by re-parameterization in information geometry has been replaced by invariance with respect to the action of the group. When an element of the group
acts on the element
of the Lie algebra, given by adjoint operator
. Under the action of the group
, the entropy
and the Fisher metric
are invariant:
In the framework of Lie group action on a symplectic manifold, equivariance of moment map could be studied to prove that there is a unique action
a(.,.) of the Lie group
on the dual
of its Lie algebra for which the moment map
is equivariant, that means for each
:
When coadjoint action is not equivariant, the symmetry is broken, and new “cohomological” relations should be verified in Lie algebra of the group. A natural equilibrium state will thus be characterized by an element of the Lie algebra of the Lie group, determining the equilibrium temperature . The entropy , parametrized by the geometric heat (mean of energy , element of the dual space of the Lie algebra) is defined by the Legendre transform of the Massieu potential parametrized by ( is the minus logarithm of the partition function ).
A Gibbs state, in the usual sense, is a statistical state at which the entropy is stationary with respect to all infinitesimal variations of the statistical state for which the mean value of the energy remains constant. In the sense of Souriau, a generalized Gibbs state is a statistical state at which the entropy is stationary with respect to all infinitesimal variations of the statistical state for which the mean value of the moment map remains constant. This generalization is very natural, since the energy can be considered as the moment map of the Hamiltonian action of the one-dimensional Lie group of time translations. Furthermore, each generalized Gibbs state is associated to an element of the Lie algebra of the group, called by Souriau a generalized temperature, and that the set of possible generalized temperature is not, in general the whole Lie algeba, but an open convex subset of the Lie algebra, which may be empty, for which some integrals encountered in the expression of the generalized Gibbs state are normally convergent. So, for some Lie groups, generalized Gibbs states do not exist, and there is no Souriau Lie groups thermodynamics.
Souriau has then defined a Gibbs density that is covariant under the action of the group:
We can express the Gibbs density with respect to
by inverting the relation
. Then
with
. All Souriau equations of Lie groups Thermodynamics are illustrated in
Figure 3 and
Figure 4.
Souriau completed his “geometric heat theory” by introducing a 2-form in the Lie algebra, that is a Riemannian metric tensor in the values of adjoint orbit of
,
with
an element of the Lie algebra. This metric is given for
:
where
is a cocycle of the Lie algebra, defined by
with
a cocycle of the Lie group defined by
.
We observe that Souriau Riemannian metric, introduced with symplectic cocycle, is a generalization of the Fisher metric, that we call the Souriau-Fisher metric, that preserves the property to be defined as a Hessian of the partition function logarithm
as in classical information geometry. We will establish the equality of two terms, between Souriau definition based on Lie group cocycle
and parameterized by “geometric heat”
Q (element of the dual space of the Lie algebra) and “geometric temperature”
β (element of Lie algebra) and hessian of characteristic function
with respect to the variable
β (as illustrated in
Figure 5):
If we differentiate this relation of Souriau theorem
, this relation occurs:
As the entropy is defined by the Legendre transform of the characteristic function, a dual metric of the Fisher metric is also given by the hessian of “geometric entropy” with respect to the dual variable given by Q: .
For the maximum entropy density (Gibbs density), the following three terms coincide:
that describes the convexity of the log-likelihood function,
the Fisher metric that describes the covariance of the log-likelihood gradient, whereas
that describes the covariance of the observables. We can also observe that the Fisher metric
is exactly the Souriau metric defined through symplectic cocycle:
The Fisher metric has been considered by Souriau as a generalization of “heat capacity”. Souriau called it the “geometric capacity”.
3.4. Covariant Souriau Gibbs Density and Information Manifold Foliation
R.F. Streater has studied in 1999, Information Geometry for some Lie algebra where for certain unitary representation of a Lie algebra, he has defined the statistical manifold of states as convex cone for which the partition function is finite, making reference to Bogoliubov-Kubo-Mori metric. But Streater has only developed the case with null cohomology for so (3) and sl (2,R) Lie alebras. Nevertheless, as observed by R.F. Streater in his paper “
Information Geometry for some Lie algebras” [
35], referring to Kirillov work and Roger Balian paper, “
We can expect further natural structures to arise in this case. Indeed, it is known (*) that the dual to the Lie algebra, which parametrizes the state-space in this case, foliates into coadjoint orbits; there are also the level sets on the entropy; Kirillov form, and the BKM (Bogoliubov-Kubo-Mori) metric, together make each orbit into kähler space, along the lines proposed by Kostant. Motion along these holomorphic directions is nondissipative. The transversal to the orbits is a real half-line, which represents the dissipative direction…We study the case of sl (2,R) in the discrete series of representations. We show the information manifold foliates into level sets of the entropy, each being isometric to H, the Poincaré upper half-plane… The states of constant entropy are the hyperboloids and is the dissipative coordinate… For an integrable system described by a Lie algebra in a traceable representation, we find that the information manifold foliates into complex spaces; the level sets of entropy can be given a complex structure by the method of Kostant. Motion remaining on the complex surfaces is nondissipative, whereas motion transversal to these surfaces is dissipative. In information geometry, the state is parametrized by the canonical coordinates. Which function of them is measured by a thermometer? In our models, it is reasonable to designate to be the temperature; it is a dissipative coordinate, and it increases with time, showing that the system is thermalizing”.
4. Mathematical Definition of Souriau Moment Map
Previously, we have introduced the concept of Souriau’s moment map. In this chapter, we will introduce a mathematical definition of this tool, as defined in Souriau’s book [
36] with modern notations [
37,
38,
39,
40,
41]. Other details on moment map are also given in Jean-Louis Koszul’s Book [
42].
4.1. Operations on Vector Fields
Consider a map
,
, the derivative of
at
,
is given by:
Second derivative is given by the linear map
:
Consider a vector Field
on
defined by:
, operations on vector fields are given by adjoint action and Lie bracket:
0-form is a scalar, 1-form are row
in dual space. 2-forms can be regarded as antisymmetric matrices
with
. m-forms are all scalar multiples of the standard volume form vol, defined by
.
4.2. Derivative Rules by Sophus Lie, Elie Cartan and Henri Cartan
With the following classical definitions:
Pull back:
is a p-form on
Interior product:
is the (p−1)form on
obtained by inserting
as the first argument of
Exterior product: is the (p + 1)-form on
where
is a p-form and
is a 1-form on
(where the hat indicates a term to be omitted):
Lie derivative:
is a p-form on
, and
if the flow of
consists of symmetries of
:
is the (p+1)-form on
defined by taking the ordinary derivative of
and then antisymmetrizing:
From these definitions, the properties of the exterior and Lie Derivative were established by Sophus Lie, Elie Cartan, and Henri Cartan:
(Sophus Lie equation)
4.3. Souriau Moment Map
Considering Manifolds and Lie groups, We define the tangent bundle
of
as the disjoint union of the
, or the set of all pairs
with
and
. If
is a smooth map between manifolds, its tangent map is the map:
A Lie group is a group
with a manifold structure such that the product
and the inversion
are smooth maps from
(resp. G) to
. Its Lie algebra is the tangent space
at the identity element. A smooth action of
on a manifold
is a group morphism:
The orbit of is .
The tangent space to an orbit at :
with
and where
Let
be a connected symplectic manifold. A vector field
on
is called symplectic if its flow preserves the 2-form:
. If we use Elie Cartan’s formula, we can deduce that
but as
then
. We observe that the 1-form
is closed. When this 1-form is exact, there is a smooth function
on
with:
This vector field is called Hamiltonian and could be defined as symplectic gradient .
Let a Lie group that acts on and that also preserve . A moment map exists if these infinitesimal generators are actually hamiltonian, so that a map exists with:
The Poisson bracket of two functions
,
is defined by:
If is connected, then the moment map is G-equivariant if and only if it satisfies .
Souriau has proved thet every coadjoint orbit of a Lie group is a homogeneous symplectic manifold when endowed with the KKS 2-form , and conversely, every homogeneous symplectic manifold of a connected Lie group G is, up to a possible covering, a coadjoint orbit of some central extension of G. is G-invariant.
5. Poincaré Unit Disk, SU(1,1) Lie Group and Souriau Moment Map
We will introduce Souriau moment map for SU(1,1)/K group that acts transitively on Poincaré Unit Disk, based on moment map. More details on computation of moment map for SU(1,1)/K Lie group is given in
Appendix A of this document.
5.1. Poincaré Unit Disk and SU(1,1) Lie Group
The group of complex unimodular pseudo-unitary matrices
, is the set of elements
such that [
43,
44,
45,
46,
47,
48,
49,
50,
51,
52]:
We can show that the most general matrix
belongs to the Lie group given by:
Its Cartan decomposition is given by:
is associated to group of holomorphic automorphisms of the Poincaré unit disk
in the complex plane, by considering its action on the disk as
. The following measure on Unit disk:
is invariant under the action of
captured by the fractional holomorphic transformation:
The complex unit disk admits a Kähler structure determined by potential function:
The invariant 2-form is:
which is closed
. This group
is isomorphic to the group
as a real Lie group, and the Lie algebra
is given by:
with the bases
:
with the commutation relation:
Dual base on the dual space of the Lie algebra is named
. The dual vector space
can be identified with the subspace of
of the form:
Coadjoint action of on dual space of the Lie algebra is written .
5.2. Coadjoint Orbit of SU(1,1) and Souriau Moment Map
We will use results of C. Cishahayo and S. de Bièvre [
53] and B. Cahen [
54,
55] for computation of moment map of
. Let
, orbit
of
for the coadjoint action of
could be identified with the upper half sheet
of
, the two-sheet hyperboloid. The stabilizer of
for the coadjoint action of
is torus
.
K induces rotations of the unit disk, and leaves 0 invariant. The stabilizer for the origin 0 of unit disk is maximal compact subgroup
K of
SU(1,1). We can observe [
54] that
. On the other hand
is diffeomorphic to the unit disk
, then by composition, the Souriau moment map is given by:
is linked to the natural action of
on
(by fractional linear transforms) but also the coadjoint action of
on
.
could be interpreted as the stereographic projection from the two-sphere
onto
[
56]. In case
where
then the coadjoint orbit is given by
with
, with stabilizer of
for coadjoint action the torus
with Lie algebra
.
is associated with a holomorphic discrete series representation
of
by the KKS (Kirillov-Kostant-Souriau) method of orbits.
Group
act on
by homography
.
This action corresponds with coadjoint action of on . The Kirillov-Kostant-Souriau 2-form of
is given by:
and is associated in the frame by
with:
with the corresponding Poisson Bracket:
It has been also observed that there are 3 basic observables generating the
symmetry on classical level:
with the Poisson commutation rule:
vector points to the upper sheet of the two-sheeted hyperboloid in
given by
, whose the stereographic projection onto the open unit disk is:
Under the action of :
is transform in:
This transform can be viewed as the co-adjoint action of on the coadjoint orbit identified with . We can also observe that the quotient
is isomorphic to the upper sheet of the hyperboloid described by
, by the following parametrization
, given by
, and its stereographic projection onto the inside of the unit disk, parametrized by
.
6. Covariant Gibbs Density by Souriau Thermodynamics for Poincaré Unit Disk
6.1. Fourier Transform, Laplace Transform and Lie Group Representation Theory
In Souriau Lie Group Thermododynamic, we have to consider Laplace Transform defined on coadjoint orbits to define Massieu Potential Function and Gibbs density. This problem has been solved in the domain of Kirillov Representation Theory. Representation theory studies abstract algebraic structures by representing their elements as linear transformations of vector spaces, and algebraic objects (Lie groups, Lie algebras) by describing its elements by matrices and the algebraic operations in terms of matrix addition and matrix multiplication, reducing problems of abstract algebra to problems in linear algebra. Representation theory generalizes Fourier analysis via harmonic analysis. The modern development of Fourier analysis during XXth century has explored the generalization of Fourier and Fourier-Plancherel formula for non-commutative harmonic analysis, applied to locally compact non-Abelian groups. This has been solved by geometric approaches based on “orbits methods” (Fourier-Plancherel formula for
G is given by coadjoint representation of
G in dual vector space of its Lie algebra) with many contributors (Dixmier, Kirillov, Bernat, Arnold, Berezin, Kostant, Souriau, Duflo, Guichardet, Torasso, Vergne, Paradan, etc.) [
57,
58,
59,
60,
61,
62,
63,
64,
65,
66,
67,
68].
For classical commutative harmonic analysis, we consider the following groups:
For non-commutative harmonic analysis, Group unitary irreductible representation is
with H Hilbert space and character by
. Fourier transform for non-commutative group is
with character
. If we describe group element with exponential map
, we have:
where
Kirillov Character formula is:
We will use Kirillov representation theory and his character formula to compute Souriau covariant Gibbs density in the unit Poincaré disk. For any Lie group
, a coadjoint orbit
has a canonical symplectic form
given by KKS 2-form. As seen, if
is finite dimensional, the corresponding volume element defines a
-invariant measure supported on
, which can be interpreted as a tempered distribution. The Fourier transform (where
d is the half of the dimension of the orbit O):
is Ad
-invariant. When
is an integral coadjoint orbit, Kirillov formula, given previously, expresses Fourier transform
by Kirillov character
:
is, as defined previously, the “
Kirillov character” of a unitary representation associated to the orbit.
6.2. Souriau Covariant Gibbs Density in Poincaré Unit Disk for SU(1,1) Lie Group
In the following, we will give the full development to compute the Souriau covariant Gibbs density. As the Gibbs density is not defined for all geometric temperature, as observed by Souriau, we have used his approach by considering a one-parameter subgroup of the Lie group generated by exponential map from a one element of Lie algebra given by geometric temperature. The subset of Lie algebra where the Gibbs density is deduced from the contraints related to this one-parameter subgroup generation.
Considering the Lie group and its Lie algebra given by elements . A basis for this Lie algebra is with with .
The compact subgroup is generated by , while and generate a hyperbolic subgroup. The dual space of the Lie algebra is given by with the basis with .
Let consider be the open unit disk of Poincaré. For each , the pair is a symplectic homogeneous manifold with , where is invariant under the action: .
This action is transitive and is globally and strongly Hamiltonian. Its generators are the hamiltonian vector fields associated to the functions:
The associated moment map
defined by
, maps
into a coadjoint orbit in
. Then, we can write the moment map as a matrix element of
:
The moment map is a diffeomorphism of onto one sheet of the two-sheeted hyperboloid in , determined by . We note the coadjoint orbit of , given by the upper sheet of the two-sheeted hyperboloid given by previous equation. The orbit method of Kostant-Kirillov-Souriau associates to each of these coadjoint orbits a representation of the discrete series of , provided that is a half integer greater or equal than 1 (). When explicitly executing the Kostant-Kirillov construction, the representation Hilbert spaces are realized as closed reproducing kernel subspaces of . The Kostant-Kirillov-Souriau orbit method shows that to each coadjoint orbit of a connected Lie group is associated a unitary irreducible representation of G acting in a Hilbert space H.
Souriau has oberved that action of the full Galilean group on the space of motions of an isolated mechanical system is not related to any equilibrium Gibbs state (the open subset of the Lie algebra, associated to this Gibbs state is empty). The main Souriau idea was to define the Gibbs states for one-parameter subgroups of the Galilean group. We will use the same approach, in this case We will consider action of the Lie group
on the symplectic manifold (
M,
ω) (Poincaré unit disk) and its momentum map
are such that the open subset
is not empty. This condition is not always satisfied when (
M, ω) is a cotangent bundle, but of course it is satisfied when it is a compact manifold. The idea of Souriau is to consider a one parameter subgroup of
. To parametrize elements of
is through its Lie algebra. In the neighborhood of the identity element, the elements of
can be written as the exponential of an element
of its Lie algebra:
The condition
can be expanded for
and is equivalent to
which then implies
. We can observe that
and
contain 3 degrees of freedom, as required. Also because
, we get
. We can then exponentiate
with exponential map to get:
If we make the remark that
, we can developed the exponential map:
We can observe that one condition is that
then the subset to consider is
such that
. The generalized Gibbs states of the full
group do not exist. However, generalized Gibbs states for the one-parameter subgroups
,
, of the
group do exist. The generalized Gibbs state associated to
remains invariant under the restriction of the action to the one-parameter subgroup of
generated by
.
To go futher, we will develop the Souriau Gibbs density from the Souriau moment map
and the Souriau temperature
. If we note
, we can write the moment map:
We can the write the covariant Gibbs density in the unit disk given by moment map of the Lie group
and geometric temperature in its Lie algebra
:
To write the Gibbs density with respect to its statistical moments, we have to express the density with respect to
. Then, we have to invert the relation between
and
, to replace this last variable
by
where
with
, deduce from Legendre tranform. The mean moment map is given by:
This mean moment map can be obtained by Karcher mean computation on the one-sheet hyperboloid corresponding to the coadjoint orbit. For the dual pairing, we can observed that
with
and
with
.
The integral of normalization in Gibbs density could be computed through Kirillov character formula by
where
with following relation
.
Recently, Enrico De Micheli [
69] has introduced a Laplace-type transform (the so-called Spherical Laplace Transform) with a connection to the Non-Euclidean Fourier Transform in the sense of Helgason, and the principal series of the unitary representation of
SU(1,1).
6.3. Extension to SU (p,q) Unitary Group for Siegel Unit Disk
Mode details are given in
Appendix B, on parameterization of SU(1,1) and extension to SU (p,q). To address computation of covariant Gibbs density for Siegel Unit Disk, we will consider in this section
Unitary Group:
We can use the following decomposition for
:
and consider the action of
on Siegel Unit Disk
given by:
Benjamin Cahen has study this case and introduced the moment map by identifing G-equivariantly
with
by means of the Killing form
on
:
The set of all elements of
fixed by
is
:
Then, we the equivatiant moment map is given by:
with:
7. Lie Groups Thermodynamics for SE(2) Lie Group
After
Lie group with null cohomology and then without Souriau one-cocycle, we will consider Souriau model for
Lie group with non-null cohomology and then with introduction of Souriau one-cocycle [
70].
We will consider first
Lie group:
A vector at the identity to
is given by:
We consider the special Euclidean group
.
the group operation is given by:
The Lie algebra
of
has underlying vector space
and Lie bracket:
Lie bracket is given by:
Adjoint action of
is given by:
Coadjoint action of
is given by:
The moment map
of
is defined by:
with the right action of
on
:
the infinitesimal generator of
has the expression:
Let
be the moment map of this action relative to the symplectic form, we can compute it from its definition:
We then compute the one-cocycle of
from the moment map:
Coadjoint orbit of
are generated by:
The Souriau Symplectic form in this case of non-null cohomology is given by:
With the expression of moment map, we can compute Souriau covariant Gibbs density of Maximum Entropy.
Considering the symplectic form on , we have seen that the action of SE(2) is symplectic and admits the momentum map, .
Souriau Gibbs density is defined for generalized temperature
and given by:
The Massieu Potential could be computed:
By derivation of Massieu potential, we can deduce expression of Heat:
We can the inverse this relation to express generalized temperature with respect to the heat:
We can the express the Gibbs density with respect to the Heat Q which is the mean of moment map:
So we can rewrite the Gibbs density:
We can also provide a Fisher metric in dual Lie algebra as hessian of the Entropy:
and as
, Fisher metric in dual space of Lie Algebra parameterization could be written:
8. New Entropy Definition as Generalized Casimir Invariant Functions for Coadjoint and Adjoint Representation
In his paper written in 1974, Jean-Marie Souriau has observed that if we consider the heat expression , that we can write . For each tangent to the orbit, and so generated by an element of the Lie algebra, if we consider the relation and we differentiate it at using the property that , we obtain . Souriau has stopped by this last equation, the characterization of Group action on . Souriau has also observed that . We propose to characterize more explicitly this invariance, by characterizing Entropy as an invariant Casimir function in coadjoint representation.
From last Souriau equation, if we use the identities , and , then we can deduce that . So, Entropy should verify , characterizes an invariant Casimir function in case of non-null cohomology, that we propose to write with Poisson brackets, where , .
In a Poisson manifold, Casimir functions , in case of null cohomology, are functions whose Poisson brackets will all functions vanish, . In the dual of the Lie algebra of a connected Lie group , the Casimir functions are the -invariant functions, because if and , then vanishes for all if and only if . A function is on is -invariant if where Lie group acts on functions on by , and where infinitesimal characterizations of -invariant functions on , . The symplectic leaves of a Poisson manifold are contained in the connected components of the level sets of the Casimir functions and Casimir function is constant on a symplectic leaf. Coadjoint orbits lie on level sets of the Casimir functions, which are conserved quantities. Casimir functions Level sets are symplectic manifolds. Coadjoint motion of the moment map for a solution curve take place on the intersections of levels sets of the Hamiltonian and the Casimir functions. Alexis Arnaudon has studied stochastic coadjoint processes whose solutions lie on coadjoint orbits.
We have observed that , that shows that Souriau Entropy is a Casimir function in case with non-null cohomology when an additional cocycle should be taken into account. Indeed, infinitesimal variation is characterized by the following differentiation: . We recover extended Casimir equation in case of non-null cohomology verified by Entropy, , and then the generalized Casimir condition . Hamiltonian motion on these affine coadjoint orbits is given by the solutions of the Lie-Poisson equations with cocycle.
The identification of Entropy as an Invariant Casimir Function in Coadjoint representation is also important in Information Theory, because classically Entropy is introduced axiomatically. With this new approach, we can build Entropy by constructing the Casimir Function associated to the Lie group and also in case of non-null cohomology. Igor V. Shirokov [
71,
72,
73,
74,
75] has proposed a method for constructing invariants of the coadjoint representation of Lie groups with an arbitrary dimension and structure based on local symplectic coordinates on the coadjoint orbits. The idea of the method of constructing coadjoint invariants is to construct the canonical transition to the Darboux coordinates on the orbits of the dual Lie algebra
of maximal dimension dual to the Lie algebra
of the Lie group
. These relations provide invariants of the coadjoint representation of the Lie group
.
This geometric framework unifies several earlier works on the subject, including Souriau’s symplectic model of statistical mechanics, and approaches developed in Information Geometry and Quantum Information Geometry. This approach helps to identify the common geometric structures appearing in various domains from statistical mechanics to statistical learning. The emphasis is put on the role of the affine equivariance with respect to Lie group actions, as extension of the Fisher metric in presence of equivariance and the associated Lie-Poisson equations with cocycle (affine Lie-Poisson equations). The entropy of the Souriau model as a Casimir function can be used to apply a geometric model for energy preserving entropy production on Lie algebras. We can exploit the geometric framework of this new equation to build geometric numerical integrator schemes for some of the equations associated to Souriau’s model and its polysymplectic extension. This new equation is important because it introduce new structure of differential equations in case of non-null cohomology and for an arbitrary Hamiltonian : .
The equation
is important because it allows extending stochastic perturbation of the Lie-Poisson equation with cocycle within the setting of stochastic Hamiltonian dynamics, which preserves the affine coadjoint orbits. We can extend model for stochastic geometric modeling in fluid dynamics via variational principles described in [
32,
76]. This extension results in the new Stratonovich differential equation for the stochastic process
.
This new equation is also very usefull for geometric symplectic Lie group integrator for Lie-Poisson equations with cocycle that preserves the affine coadjoint orbits for general Hamiltonian. This equation is also very relevant in the framework of dynamics with Casimir dissipation/production, to formulate a dynamical geometric model for dissipation/production of this Casimir. This allows to extend the general Lie algebraic approach developed in [
77,
78] for Casimir dissipation, to take into account of a cocycle, and to a wider class of dissipation. Paper [
17] will exploit this new Casimir equation in case of non-null cohomology.
This equation
could be used also to make the link with 2nd principle of Thermodynamique, that will be deduced from positivity of Souriau-Fisher metric:
Entropy production is then linked with Souriau-Fisher structure,
with
Souriau tensor related to Fisher metric.
8.1. Casimir Invariant and Generalized Casimir Invariant
Hendrik Brugt Gerhard Casimir, a Dutch physicist, studied what is called Casimir operators and Casimir invariants (H. Casimir and Van der Waerden studied the SU(2) group, the group of isospin/angular momentum, as the model of the algebraic approach to the study of the unitary representations of semi-simple compact Lie groups). Kirillov has explained that Casimir operators are in one-to-one correspondence with polynomial invariants characterizing orbits of the coadjoint representation. Solutions are not necessarily polynomials and the nonpolynomial solutions are called
generalized Casimir invariants. For certain classes of Lie algebras, all invariants of the coadjoint representation are functions of polynomial ones. In physics, Hamiltonians and integrals of motion of classical integrable Hamiltonian systems are not polynomials in the momenta [
71,
72,
73,
74,
75,
79,
80,
81,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92].
8.2. Souriau Entropy as Generalized Casimir Invariant in Coadjoint Representation
In Souriau Lie groups Thermodynamics, we will see that coadjoint orbits lie on level sets of the Entropy that could be considered as a Casimir invariant function:
We will consider first the case of null-cohomology, Entropy as Casimir invariant function is a conserved quantity, because Casimir function has null Lie Poisson brackets functions [
93,
94]:
We can observe that
, then:
We can also write:
It means that
. We can remark that if we note
with
the structure tensor, we observe that this equation is in fact the Casimir condition for invariant function in coadjoint representation as we will see hereafter. The restriction of the Lie-Poisson bracket to an orbit generates a symplectic structure on the orbit, called the KKS (Kirillov-Kostant-Souriau) structure, or the canonical symplectic structure. Casimir function is characterized as a quantity which commutes with each linear functional on the Poisson manifold, and then it is conserved by dynamics of any Hamiltonian.
Given a Hamiltonian
, the equation of motion for
is:
In case of non-null cohomology, the Lie Poisson brackets functions are given by:
That we can develop in the following:
We have found the generalized Casimir equation for Entropy in the non-null cohomology case:
That could be also written:
This equation was observed by Souriau in his paper of 1974, where he has written that geometric temperature
is a kernel of
, that is written:
That we can develop to recover the Casimir equation:
Then the generalized Casimir Equation in non-null cohomogy is given by:
Given a Hamiltonian
, the equation of motion for
is:
Level sets of the Casimir Entropy function, on which the coadjoint orbits lie, are symplectic manifolds.
8.3. Souriau Entropy Invariance in Coadjoint Representation
If we note the space of analytic function on the dual space of the Lie agebra , a function is a Casimir invariant if for any , we have . We have observed previously that Souriau’s Entropy analytic function defined on dual space of the Lie algebra by Legendre transform of Massieu Characteric analytic function (minus logarithm of Laplace transform) defined on Lie algebra was an invariant function under the affine coadjoint action . In case of null-cohomology, Souriau cocycle cancels , and we recover Casimir invariant function in coadjoint representation .
We can then observe that Souriau Entropy is an extended Casimir invariant function in case of non-null cohomogy. This characteristic of Souriau Entropy could be a new characterization of Entropy. In Souriau Lie groups Thermodynamics, Entropy is a generalized Casimir invariant function for coadjoint representation in case of non-null cohomology, and Massieu Characteristic function by Legendre duality is a generalized Casimir function for adjoint representation.
We will explain how to prove that Souriau Entropy is invariant under the action of the group, starting from its definition:
with
Considering Souriau Entropy
where the heat
an element of the dual space of the Lie algebra is parameterized by
an element of the Lie algebra, the Lie group
acts through
by adjoint operator
, the entropy is given by
with
given by fundamental Souriau equation:
The invariance of Souriau Entropy is deduced from the following developments:
Based on this expression of Massieu Characteristic function transform by action of the group, we can use Legendre transform to study how Souriau Entropy is changed:
We finally prove that Souriau Entropy is invariant in coadjoint representation
in general case of non-null cohomology, that we could write
, if we note affine coadjoint action
. This is also true in case of null-cohomology when the Souriau cocycle cancels
, and we recover classical generalized Casimir invariant function definition on coadjoint representation for Entropy
generalized Casimir invariant function definition on adjoint representation for Massieu Characteristic function
.
8.4. Souriau Entropy Given by Casimir Invariant Functions Equations
Based on development given in the following we can state that:
As the Entropy
is a generalized Casimir invariant function in the coadjoint representation,
, then
should be solution of the following differential equation:
where
is the structure tensor of the Lie algebra
in the basis
, while
are the coordinates in
in the basis
defined by
. The structure tensor s given by
with
.
8.5. Characterization of Generalized Casimir Invariant Functions in Coadjoint Representation
We will describe recent characterization of generalized Casimir invariant functions by Oleg L. Kurnyavko and Igor V. Shirokov [
72,
73,
75] who have proposed Algebraic method for construction of Casimir invariants of Lie groups coadjoint representations (see
Appendix C). Modern invariant theory based on geometric methods, which was credited classically as non-constructive, has some exception admitting a constructive solution related to the constructing invariants of Lie groups representations.
Let
be a connected Lie group,
a representation of the group
in the linear space
,
the operators associated to the representation of the group
on the linear space
, then the invariants are given by the following equation:
With the properties that:
Solution is given by the following differential equation:
are elements of the matrices of the Lie algebra representation basis of
.
That we can write and .
If we consider the dual space
, the co-tangent representation is given by:
And co-represnetation invariants are given by:
They have underlined the relationship between invariants of representations and conjugate representations, where the algebraic construction of Lie groups representations invariants are given by invariants of the conjugate representation with respect to the invariants of the original representation.
Shirokov Theorem 1. Letbe a non-degenerate invariant of the representation, then conjugate representation invariant can be found by Legrendre tranform:and also the converse problem: Shirokov has considered
the representation invariant
, and
the representation invariant
conjugate to
, with the conditions:
Invariant Casimir Functions of the coadjoint representation has been studied for completely integrable Hamiltonian systems, as classical systems on the orbits of the coadjoint representation. Oleg L. Kurnyavko and Igor V. Shirokov have considered the relationship between invariants of representations of Lie groups and their conjugate dual representations.
Considering the coadjoint action given by:
Invariants of a coadjoint representation are called Casimir functions, with the property:
the infinitesimal invariance is given by the equations:
The number of functionally independent invariants is given by the rank of the matrix , called the index of the Lie algebra : .
From these adjoint and coadjoint representation, Shirokov has introduced the following theorem:
Shirokov Theorem 2. Letbe a non-degenerate invariant of the adjoint representation, then conjugate representation invariant, invariant of coadjoint representationcan be found by formula:and also the converse problem, let, invariant of coadjoint representation is given by:
8.6. Constructing Generalized Casimir Invariant Functions in Coadjoint Representation
I. V. Shirokov has proposed a method for constructing invariants of the coadjoint representation of Lie groups with an arbitrary dimension and structure based on local symplectic coordinates on the coadjoint orbits. Oleg L. Kurnyavko and Igor V. Shirokov have also proposed a general method for constructing Casimir invariants.
We will give some other developments of Casimir Invariant Functions by A.T. Fomenko and V.V. Trofimov, related to Orbits of the coadjoint representation and the associated canonical symplectic structure.
The coadjoint orbit
passing through the point
is given by
Kirillov, Kostant and Souriau have introduced a KKS 2-form on co-adjoint co-orbits that then inherit a structure of homogeneous symplectic manifold:
This KKS 2-form
is invariant with respect to the coadjoint action
:
The symplectic structure is given due to the property that
, that could be proved making link with Jacobi identity.
Jacobi identity can be computed:
Using Elie Cartan formula
. If
is a Hamiltonian vector field,
and then
. If
, then the Jacobi identity is satisfied
and conversely.
Let consider the Berezin Bracket:
This Berezin Bracket is given by:
By developping Berezin Bracket
, we can prove that the bracket verify jacoby identy
and then
.
We will see that differential equation for (semi-)invariants of the coadjoint representations could be established. We will note the space of analytic function on the dual space of the Lie agebra . A function is an invariant if for any , we have , and is semi-invariant if where is a character of the Lie group .
We have a representation of Lie algebras
defined on basis
in
where
is the space of vector fields on
an open subset in
, given by:
where
is the structure tensor of the Lie algebra
in the basis
, while
are the coordinates in
in the basis
defined by
. The representation is not dependent of the choice of the basis, with the property:
.
We have the property, that:
This result is obtained by the following development:
We use then Taylor expansion of
given by:
We can observe that
is invariant if
and then
or
that could be written
.
If
is semi-invariant of the coadjoint representation of group if and only if:
9. Conclusion: Lie Groups Thermodynamics for Machine Learning
With Lie groups Thermodynamics, we have presented Souriau tools to extend Gibbs density for Lie groups [
95,
96,
97,
98,
99,
100,
101,
102,
103,
104,
105,
106,
107]. We can make reference to other explorations of Lie Group Representation theory to built exponential families [
108,
109,
110,
111] or Information Geometry in Quantum Physics [
112,
113,
114,
115,
116,
117,
118,
119,
120,
121,
122,
123]. Gibbs density estimation is a basic tool in statistical macine learning. Classically, we can associate to any posterior distribution an effective generalized geometric temperature, given by an element of the dual space of the Lie algebra, relating it to the Gibbs prior distribution. Classification rules could be introduced by Gibbs measures defined on parameter sets and depending on the observed sample value. A Gibbs measure is a special kind of probability measure used in statistical mechanics to describe the state of a particle system driven by a given energy function at some given temperature. Gibbs measures will be realized as minimizers of the average loss value under entropy constraints. In this extension for Lie groups, an important tool is the log-Laplace transform related to the Massieu Characteristic Function in Thermodynamics (a re-parameterization of the free energy by Planck temperature preserving Legendre transform with respect to Entropy). As we want to deal with Lie group data for Machine Learning, we will consider tools very similar to those used in statistical mechanics to describe particle systems with many degrees of freedom. Classification rules could be described by Gibbs measures defined on parameter sets and depending on the observed sample value. Comparing any posterior distribution with a Gibbs prior distribution make it possible to provide a way to build an estimator which can be proved to reach adaptively at the best possible asymptotic error rate (by temperature selection of a Gibbs posterior distribution built within a single parametric model). Estimators derived from Gibbs posteriors show excellent performance in diverse tasks, such as classification, regression and ranking. The usual recommendation is to sample from a Gibbs posterior using MCMC (Markov chain Monte Carlo). With covariant Souriau Gibbs density, it is possible to extend MCMC and Gibbs sampler approach for Lie Groups Machine Learning.
More recently, the use of perturbation techniques has been proposed as an alternative to MCMC techniques for sampling. These results have been extended in conditional random fields loss, proving that the maximum in expectation with low-rank perturbations, provides an upperbound on the log partition (what we call Massieu characteristic function). New lower bounds on the partition function and new unbiased sequential sampler for the Gibbs distribution based on low-rank perturbations have been introduced. All these methods are based on sampling from the Gibbs distribution, upper-bounding the log partition function. All these results are synthetized in [
124], where they also propose a new general method, with connections to the recently-proposed Fenchel-Young losses [
125], using doubly stochastic scheme for minimization of these losses, for unsupervised and supervised learning. This is a generalization to the Gibbs distribution. Methods for learning parameters of a Gibbs distribution on data
are based on maximization of the likelihood:
that is optimized by gradients methods using the empirical log-likelihood, given by:
For this method of moment-matching, the expectation of the Gibbs distribution is a challenge in some cases. This approach has been replaced by computing
, with a method called “
perturb-and-MAP” to learn the parameters in this model as a proxy for log-likelihood. This minimization is equivalent to maximizing previous equation by substituting the log-partition
with:
This approach could be linked with the use of Fenchel-Young losses [
125]. In the perturbed model, the Fenchel-Young loss is given by:
with loss gradient
where
and
Bregman divergence associated to
. As
generalizes the log-sum-exp function on the simplex, its dual
is a generalization of the negative entropy (which is the Fenchel dual of log-sum-exp).These connections have been studied in [
126].
To conclude, we have seen that Lie group tools based on Representation Theory and Orbits Methods could be used with Souriau-Fisher Metric on Coadjoint Orbits as an extension of Fisher Metric for Lie group through homogeneous Symplectic Manifolds on Lie group Co-Adjoint Orbits.
We can then beneficiate of different tools based on Souriau Lie groups Thermodynamics and Kirillov Representation Theory, as illustrated in
Figure 6, for:
- ⚬
Geodesic Natural Gradient on Lie Algebra: Extension of Neural Network Natural Gradient from Information Geometry on Lie Algebra for Lie Groups Machine Learning.
- ⚬
Souriau Maximum Entropy Density on Co-Adjoint Orbits: Covariant Maximum Entropy Probability Density for Lie groups defined with Souriau Moment Map, Co-Adjoint Orbits Method & Kirillov Representation Theory
- ⚬
Symplectic Integrator preserving Moment Map: Extension of Neural Network Natural Gradient to Geometric Integrators as Symplectic integrators that preserve moment map
- ⚬
Souriau Exponential Map on Lie Algebra: Exponential Map for Geodesic Natural Gradient on Lie Algebra based on Souriau Algorithm for Matrix Characteristic Polynomial
- ⚬
Fréchet Geodesic Barycenter by Hermann Karcher Flow: Extension of Mean/Median on Lie group by Fréchet Definition of Geodesic Barycenter on Souriau-Fisher Metric Space, solved by Karcher Flow.
- ⚬
Mean-Shift on Lie groups with Souriau-Fisher Distance: Extension of Mean-Shift for Homogeneous Symplectic Manifold and Souriau-Fisher Metric Space.
[There is nothing more in physical theories than symmetry groups except the mathematical construction which allows precisely to show that there is nothing more] « Il n’y a rien de plus dans les théories physiques que les groupes de symétrie si ce n’est la construction mathématique qui permet précisément de montrer qu’il n’y a rien de plus ».
La notion classique d’ensemble canonique de Gibbs est étendue au cas d’une variété symplectique sur laquelle un groupe de Lie possède une action symplectique (“groupe dynamique”). La définition rigoureuse donnée ici permet d’étendre un certain nombre de propriétés thermodynamiques classiques (la température est ici un élément de l’algèbre de Lie du groupe, la chaleur un élément de son dual), notamment des inégalités de convexité. Dans le cas de groupes non commutatifs, des propriétés particulières apparaissent: la symétrie est spontanément brisée, certaines relations de type cohomologique sont vérifiées dans l’algèbre de Lie du groupe [The classical notion of Gibbs’ canonical ensemble is extended to the case of a symplectic manifold on which a Lie group has a symplectic action (“dynamic group”). The rigorous definition given here makes it possible to extend a certain number of classical thermodynamic properties (the temperature here is an element of the Lie group algebra, heat an element of its dual), notably inequalities of convexity. In the case of non-commutative groups, particular properties appear: the symmetry is spontaneously broken, certain relations of cohomological type are verified in the Lie algebra of the group].
Jean-Marie Souriau, Mécanique Statistique, Groupes de Lie et Cosmologie, colloque CNRS n°237 – Géométrie Symplectique et physique mathématique