A Geodesic-Based Riemannian Gradient Approach to Averaging on the Lorentz Group

Wang, Jing; Sun, Huafei; Li, Didong

doi:10.3390/e19120698

Open AccessArticle

A Geodesic-Based Riemannian Gradient Approach to Averaging on the Lorentz Group

by

Jing Wang

¹,

Huafei Sun

^1,2,* and

Didong Li

³

¹

School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China

²

Beijing Key Laboratory on MCAACI, Beijing Institute of Technology, Beijing 100081, China

³

Department of Mathematics, Duke University, Durham, NC 27708, USA

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(12), 698; https://doi.org/10.3390/e19120698

Submission received: 19 November 2017 / Revised: 16 December 2017 / Accepted: 17 December 2017 / Published: 20 December 2017

(This article belongs to the Special Issue Entropic Aspects Arising from Geometric Descriptions of Physical Phenomena)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we propose an efficient algorithm to solve the averaging problem on the Lorentz group

O (n, k)

. Firstly, we introduce the geometric structures of

O (n, k)

endowed with a Riemannian metric where geodesic could be written in closed form. Then, the algorithm is presented based on the Riemannian-steepest-descent approach. Finally, we compare the above algorithm with the Euclidean gradient algorithm and the extended Hamiltonian algorithm. Numerical experiments show that the geodesic-based Riemannian-steepest-descent algorithm performs the best in terms of the convergence rate.

Keywords:

Lorentz group; geodesic; average; Riemannian-steepest-descent algorithm; extended Hamiltonian algorithm

1. Introduction

Averaging measured data on matrix manifolds is one of the most frequently arising problems in many fields. For example, one may need to determine an average eye of a set of optical systems [1]. In [2], the authors concern the statistics of covariance matrices for biomedical engineering applications. The current research mainly focuses on the compact Lie groups, such as the special orthogonal group

S O (n)

[3] and the unitary group

U (n)

[4], or other matrix manifolds, such as the special Euclidean group

S E (n, R)

[5], Grassmann manifold

G r (n, k)

[6,7], the Stiefel manifold

S t (n, k)

[8,9] and the symmetry positive-definite matrix manifold

S P D (n)

[10,11]. However, there is little research on averaging over the Lorentz group

O (n, k)

.

In [12], Clifford algebra, as a generalization of quaternions, is applied to approximate the average on the special Lorentz group

S O (1, 2)

. In engineering, the Lorentz group plays a fundamental role, e.g., in the analysis of motion of charged particles in electromagnetism [13]. In physics, the Lorentz group forms a basis for the transformation of parabolic catadioptric image features [14]. According to [15], the Lorentz group

O (3, 1)

is equal to the linear transformations of the projective plane and used to characterize the catadioptric fundamental matrices that are applied to study the catadioptric cameras.

This paper aims at investigating the averaging on the Lorentz group with a left-invariant metric that plays the role of Riemannian metric so that the geodesic is given in closed form. The considered averaging optimization problem could be solved numerically via a geodesic-based Riemannian-steepest-descent algorithm (RSDA). Furthermore, the devised method RSDA is compared with the line-search algorithm, Euclidean gradient algorithm (EGA) and the second-order learning algorithm, extended Hamiltonian algorithm (EHA), proposed in [16,17].

The remainder of the paper is organized as follows. In Section 2, we summarize the geometric structures of the Lorentz group. The averaging optimization problem, the geodesic-based Riemannian-steepest-descent algorithm and the extended Hamiltonian algorithm on the Lorentz group are presented in Section 3. In Section 4, we show the numerical results of the RSDA and compare the convergence rate of the RSDA with the other two algorithms. Section 5 presents the conclusions of the paper.

2. Geometry of the Lorentz Group

In this section, we review the foundations of Lorentz group and describe a Riemannian metrization in which the geodesic could be written in closed forms.

Let

M (n, R)

be the set of

n \times n

real matrices and

G L (n, R)

be its subset containing only nonsingular matrices. It is well-known that

G L (n, R)

is a Lie group, i.e., a group which is also a differentiable manifold and for which the operations of group multiplication and inverse are smooth.

Definition 1.

The Lorentz group

O (n, k)

as a Lie subgroup of

G L (n + k, R)

is defined by

\begin{matrix} O (n, k) : = {A \in G L (n + k, R) ∣ A^{⊤} I_{n, k} A = I_{n, k}}, I_{n, k} : = (\begin{matrix} I_{n} & 0_{n \times k} \\ 0_{k \times n} & - I_{k} \end{matrix}), \end{matrix}

where I is identity matrix, and

0

is

n \times k

zero matrix.

For any

A \in O (n, k)

, the following identities hold:

det (A) = \pm 1, I_{n, k}^{2} = I_{n + k}, I_{n, k}^{⊤} = I_{n, k}^{- 1} = I_{n, k}, A^{⊤} = I_{n, k} A^{- 1} I_{n, k}, A^{- ⊤} = I_{n, k} A I_{n, k} .

The tangent space

T_{A} O (n, k)

, the Lie algebra

o (n, k)

and the normal space

N_{A} O (n, k)

associated with the Lorentz group

O (n, k)

can be characterized as follows:

\begin{matrix} \{\begin{matrix} \begin{matrix} T_{A} O (n, k) & = {A I_{n, k} Ω ∣ Ω \in R^{(n + k) \times (n + k)}, Ω^{⊤} = - Ω}, \\ o (n, k) & = {I_{n, k} Ω ∣ Ω \in R^{(n + k) \times (n + k)}, Ω^{⊤} = - Ω}, \\ N_{A} O (n, k) & = {I_{n, k} A S ∣ S \in R^{(n + k) \times (n + k)}, S^{⊤} = S} . \end{matrix} \end{matrix} \end{matrix}

We employ the left-invariant metric as the Riemannian metric on

T_{A} O (n, k)

at

A \in O (n, k)

{〈 X, Y 〉}_{A} : = {〈 A^{- 1} X, A^{- 1} Y 〉}_{I_{n + k}} = tr ({(A^{- 1} X)}^{⊤} (A^{- 1} Y)), X, Y \in T_{A} O (n, k) .

(1)

Let

γ : [0, 1] \to O (n, k)

be a smooth curve in

O (n, k)

, and its length associated with the Riemannian metric (1) is defined as

L (γ) = \int_{0}^{1} \sqrt{{〈 \dot{γ} (t), \dot{γ} (t) 〉}_{γ (t)}} d t .

(2)

The geodesic distance between two points

A_{1}

and

A_{2}

in

O (n, k)

is the infimum of lengths of curves connecting them, namely,

d (A_{1}, A_{2}) : = inf {L (γ) | γ : [0, 1] \to O (n, k), γ (0) = A_{1}, γ (1) = A_{2}} .

Applying the variational method from Equation (2), we can find that the geodesic equation satisfies

\ddot{γ} + Γ_{γ} (\dot{γ}, \dot{γ}) = 0,

where

Γ

denotes the Christoffel symbol, the over-dot and the double over-dot denote first-order and second-order derivation with respect to parameter t, respectively. When the values

γ (0) = A \in O (n, k)

and

\dot{γ} (0) = X \in T_{A} O (n, k)

are specified, we shall denote the solution as

γ_{A, X} (t)

, which is associated with the Riemannian exponential map

{exp}_{A} : T_{A} O (n, k) \to O (n, k)

defined as

{exp}_{A} (X) = γ_{A, X} (1) .

Let

f : O (n, k) \to R

be a differentiable function and

\nabla_{A} f \in T_{A} O (n, k)

denotes the Riemannian gradient of function f with respect to the metric (1), which is defined by the compatibility condition

{〈 \nabla_{A} f, X 〉}_{A} = {〈 \partial_{A} f, X 〉}^{E}, \forall X \in T_{A} O (n, k),

(3)

where

\partial_{A} f

denotes the Euclidean gradient of function f and

{〈 \cdot, \cdot 〉}^{E}

denotes the Euclidean metric.

The exponential of a matrix A in

M (n, R)

is defined by

e^{A} = \sum_{k = 0}^{\infty} \frac{1}{k!} A^{k} .

We remark that

e^{A + B} = e^{A} e^{B}

when

A B = B A

for A and B in

M (n, R)

.

For the Frobenius norm

∥ \cdot ∥

, when

∥ A - I_{n} ∥ < 1

, the logarithm of A is given by

log A = \sum_{k = 1}^{\infty} {(- 1)}^{k + 1} \frac{{(A - I_{n})}^{k}}{k} .

In general,

log (A B) \neq log A + log B

. We here recall the important fact that

log (B^{- 1} A B) = B^{- 1} log (A) B, B \in G L (n, R) .

Lemma 1.

Suppose that

f (A)

is a scalar function of

n \times n

matrix A in

M (n, R)

[18]. If

d f (A) = tr (W d A)

holds, then

\partial_{A} f = W^{⊤},

where

W^{⊤}

denotes the derivative of

f (A)

.

The structure of the Riemannian gradient associated with the Riemannian metric (1) is given by the following result.

Theorem 1.

The Riemannian gradient of a sufficiently regular function

f : O (n, k) \to R

associated with the Riemannian metric (1) satisfies

\nabla_{A} f = \frac{1}{2} A I_{n, k} (I_{n, k} A^{⊤} \partial_{A} f - \partial_{A}^{⊤} f A I_{n, k}) .

(4)

Proof of Theorem 1.

According to equality (3), the gradient

\nabla_{A} f

is computed as

{〈 \partial_{A} f - A^{- ⊤} A^{- 1} \nabla_{A} f, X 〉}^{E} = 0, \forall X \in T_{A} O (n, k),

as X is arbitrary, the condition above implies that

\partial_{A} f - A^{- ⊤} A^{- 1} \nabla_{A} f \in N_{A} O (n, k)

, hence

\partial_{A} f - A^{- ⊤} A^{- 1} \nabla_{A} f = I_{n, k} A S

, with

S^{⊤} = S

, and thus we have

\nabla_{A} f = A A^{⊤} \partial_{A} f - A I_{n, k} S .

Since

\nabla_{A} f \in T_{A} O (n, k)

, we have

{(\nabla_{A} f)}^{⊤} I_{n, k} A + A^{⊤} I_{n, k} \nabla_{A} f = 0 .

Inserting the expression

\nabla_{A} f

into the equation above, we have

S = \frac{1}{2} (I_{n, k} A^{⊤} \partial_{A} f + \partial_{A}^{⊤} f A I_{n, k}) .

Substituting S into the expression of

\nabla_{A} f

, we finish the proof. ☐

In the text, we would like to give credit to [19] for the explicit expression of geodesic over the general linear group endowed with the natural left-invariant Riemannian metric (1). Under the same Riemannian metric, it is clear that the geodesic on the Lorentz group can be written in closed form (cf. Theorem 2.14 in [19]).

Proposition 1.

The geodesic

γ_{A, X} : [0, 1] \to O (n, k)

, with

A \in O (n, k)

,

X \in T_{A} O (n, k)

corresponding to the Riemannian metric (1) has expression

γ_{A, X} (t) = A e^{t {(A^{- 1} X)}^{⊤}} e^{t [A^{- 1} X - {(A^{- 1} X)}^{⊤}]} .

(5)

Proof of Proposition 1.

In fact, by the variational formula

δ \int_{0}^{1} {〈 \dot{γ}, \dot{γ} 〉}_{γ} d t = 0

, the geodesic equation in Christoffel form reads

\ddot{γ} + \dot{γ} {\dot{γ}}^{⊤} γ^{- ⊤} - γ {\dot{γ}}^{⊤} γ^{- ⊤} γ^{- 1} \dot{γ} - \dot{γ} γ^{- 1} \dot{γ} = 0 .

By the definition that

H (t) : = γ^{- 1} (t) \dot{γ} (t)

, we see that the geodesic satisfies the Euler–Lagrange equation

\dot{H} (t) = H^{⊤} (t) H (t) - H (t) H^{⊤} (t) .

We can verify that the curve

γ (t) = A e^{t {(A^{- 1} X)}^{⊤}} e^{t [A^{- 1} X - {(A^{- 1} X)}^{⊤}]}

emanating from A with a velocity X is the solution of the Euler–Lagrange equation above. ☐

Remark 1.

From the proof of Proposition 1 and the fact

A^{- ⊤} = I_{n, k} A I_{n, k}

,

\forall X \in T_{A} O (n, k)

, we conclude that

\begin{matrix} Γ_{A} (X, X) & = X X^{⊤} A^{- ⊤} - A X^{⊤} A^{- ⊤} A^{- 1} X - X A^{- 1} X \\ = X X^{⊤} I_{n, k} A I_{n, k} - A X^{⊤} I_{n, k} A I_{n, k} A^{- 1} X - X A^{- 1} X . \end{matrix}

3. Optimization on the Lorentz Group

This section aims at investigating the averaging optimization problem on the Lorentz group, and finding the average value out of a set of Lorentz matrices. Section 3.1 and Section 3.2 recall the geodesic-based Riemannian-steepest-descent algorithm and the extended Hamiltonian algorithm on the Lorentz group, respectively.

3.1. Riemannian-Steepest-Descent Algorithm on the Lorentz Group

According to the expression of geodesic (5) with

A = γ (0)

,

B = γ (1) \in O (n, k)

and

X = \dot{γ} (0) \in T_{A} O (n, k)

, we have

A^{- 1} B = e^{{(A^{- 1} X)}^{⊤}} e^{A^{- 1} X - {(A^{- 1} X)}^{⊤}},

(6)

and, furthermore, we have

log (A^{- 1} B) = log [e^{{(A^{- 1} X)}^{⊤}} e^{A^{- 1} X - {(A^{- 1} X)}^{⊤}}] .

(7)

Note that the geodesic distance

d (A, B) = \sqrt{{〈 X, X 〉}_{A}} = ∥ A^{- 1} X ∥

. From Equation (6), it is difficult to give an explicit expression of X in A and B. In [20], the authors show that the geodesic distance connecting A and B could be obtained by solving the following problem

min_{U \in o (n, k)} {∥ e^{U^{⊤}} e^{U - U^{⊤}} - A^{- 1} B ∥}^{2} .

(8)

Then, the geodesic distance between A and B,

d (A, B) = ∥ U^{*} ∥

, where

U^{*}

is a minimizer of Equation (8). In order to compute

U^{*}

, a descent procedure can be applied so we need to find the gradient first. The differential with respect to U of the function

J (U) = ∥ e^{U^{⊤}} e^{U - U^{⊤}} - A^{- 1} {B ∥}^{2}

is given by

\begin{matrix} d J (U) & = 2 d (tr ({(e^{U^{⊤}} e^{U - U^{⊤}} - A^{- 1} B)}^{⊤} (e^{U^{⊤}} e^{U - U^{⊤}} - A^{- 1} B))) \\ = 2 tr ({(e^{U^{⊤}} e^{U - U^{⊤}} - A^{- 1} B)}^{⊤} e^{U^{⊤}} d U e^{U - U^{⊤}}) \\ = 2 tr (e^{U - U^{⊤}} {(e^{U^{⊤}} e^{U - U^{⊤}} - A^{- 1} B)}^{⊤} e^{U^{⊤}} d U), \end{matrix}

and thus, according to Lemma 1, we can get the Euclidean gradient of

J (U)

\partial_{U} J = 2 e^{U} (e^{U^{⊤}} e^{U - U^{⊤}} - A^{- 1} B) e^{U^{⊤} - U} .

(9)

On the other hand, recall the Goldberg exponential formula in [21]

log (e^{X} e^{Y}) = X + Y + \sum_{n = 2}^{\infty} \sum_{| ω | = n} g_{ω} ω,

(10)

where

ω

denotes a word in the letters X and Y, and the inner sum is over all words

ω

with length

| ω | = n

. The symbol

g_{ω}

denotes Goldberg coefficient on the word

ω

, a rational number. If the word

ω

begins with X, say

ω = ω_{X, Y} = X^{r_{1}} Y^{r_{2}} X^{r_{3}} \dots {(X ⋁ Y)}^{r_{m}},

where

r_{1}, \dots, r_{m}

are positive integers satisfying

r_{1} + \dots + r_{m} = n

, and

{(X ⋁ Y)}^{r_{m}}

denotes

X^{r_{m}}

if m is odd and

Y^{r_{m}}

if m is even.

ω_{Y, X}

is similarly defined.

The Goldberg coefficient

g_{ω}

is

g_{ω} = \int_{0}^{1} t^{m^{'}} {(t - 1)}^{m^{''}} G_{r_{1}} (t) \dots G_{r_{m}} (t) d t,

in which

m^{'} = [m / 2]

and

m^{″} = [(m - 1) / 2]

,

[]

denotes the greatest integer of the enclosed number, and the polynomials

G_{r} (t)

are defined recursively by

G_{1} (t) = 1

and

G_{r} (t) = r^{- 1} (d / d t) {r (r - 1) G_{r - 1} (t)}

for

r = 2, 3, \dots

.

According to Theorem 1 in [22], we can obtain the following result.

Proposition 2.

If

∥ U ∥ < \frac{1}{2}

, then

∥ log (A^{- 1} B) - U ∥ \leq \frac{{4 ∥ U ∥}^{2}}{1 - 2 ∥ U ∥} .

(11)

Proof of Proposition 2.

According to the Goldberg exponential formula (10), Equation (7) can be recast as

\begin{matrix} log (A^{- 1} B) = log (e^{U^{⊤}} e^{U - U^{⊤}}) = U + \sum_{n = 2}^{\infty} \sum_{| ω | = n} g_{ω} ω, \end{matrix}

(12)

in which

ω

denotes a word in the letters

U^{⊤}

and

U - U^{⊤}

.

By using properties of the Goldberg coefficient

g_{ω}

(see [23]), we get

| g_{ω} | \leq 2^{- (n - m)} (m C_{m - 1}^{m^{'}}),

where

C_{n}^{m} = n! / m! (n - m)!

is the usual binomial coefficient. Furthermore, the number of words

ω_{U^{⊤}, U - U^{⊤}}

of length

| ω | = n

is just

C_{n - 1}^{m - 1}

. Thus, the sum of

| g_{ω} |

extended over all words

ω

of length n and involved m parts is

2 \times C_{n - 1}^{m - 1} \times 2^{- (n - m)} {(m C_{m - 1}^{m^{'}})}^{- 1} .

Therefore, due to

∥ U - U^{⊤} ∥ \leq 2 ∥ U ∥

, and for

n \geq 2

,

∥ ω ∥ \leq 2^{n - 1} {∥ U ∥}^{n}

, we have

\begin{matrix} ∥ log (A^{- 1} B) - U ∥ & \leq \sum_{n = 2}^{\infty} \sum_{| ω | = n} | g_{ω} | ∥ ω ∥ \\ \leq \sum_{n = 2}^{\infty} \sum_{m = 1}^{n} 2 \times C_{n - 1}^{m - 1} \times 2^{- (n - m)} {(m C_{m - 1}^{m^{'}})}^{- 1} \times 2^{n - 1} {∥ U ∥}^{n} \\ \leq \sum_{n = 2}^{\infty} \sum_{m = 1}^{n} 2^{m} C_{n - 1}^{m - 1} {(2^{m - 1})}^{- 1} {∥ U ∥}^{n} \\ = 2 \sum_{n = 2}^{\infty} \sum_{m = 1}^{n} C_{n - 1}^{m - 1} {∥ U ∥}^{n} \\ = \sum_{n = 2}^{\infty} 2^{n} {∥ U ∥}^{n} \\ = \frac{{4 ∥ U ∥}^{2}}{1 - 2 ∥ U ∥} . \end{matrix}

Here, in the last step, the equality holds if and only if

∥ U ∥ < \frac{1}{2}

. ☐

Remark 2.

If the condition

∥ U - U^{⊤} ∥ \leq ϵ ∥ U ∥

,

0 < ϵ \leq 1

is added, then, when

∥ U ∥ < 1

,

∥ log (A^{- 1} B) - U ∥ \leq \frac{{2 ϵ ∥ U ∥}^{2}}{1 - ∥ U ∥} .

Remark 3.

Proposition 2 implies that we can replace the geodesic distance

d (A, B) = ∥ U ∥

by

∥ log (A^{- 1} B) ∥

where the latter one is much easier to handle.

In order to compare different Lorentz matrices, the following discrepancy

D : O (n, k) \times O (n, k) \to R

is defined

D (A, B) : = ∥ log (A^{- 1} B) ∥ = \sqrt{tr ({log}^{⊤} (A^{- 1} B) log (A^{- 1} B))}, A, B \in O (n, k) .

(13)

For the Lorentz group, the criterion function

f : O (n, k) \to R

to measure a collection

{B_{1}, B_{2}, \dots, B_{N}}

of samples randomly generated is defined by

f (A) : = \frac{1}{N} \sum_{q} {∥ log (B_{q}^{- 1} A) ∥}^{2}, A, B_{q} \in O (n, k), q = 1, 2, \dots, N,

(14)

which appears as a specialization, to the Lorentz group, of the Karcher criterion [24].

Definition 2.

The average value of

{B_{1}, B_{2}, \dots, B_{N}}

is defined as

\bar{A} : = arg min_{A \in O (n, k)} f (A) .

(15)

Based on the case in the Euclidean space of the line search method with the negative Euclidean gradient, in order to achieve the average value, a geodesic-based Riemannian-steepest-descent algorithm can be expressed as [25]

A_{p + 1} = γ_{A_{p}, - \nabla_{A_{p}} f} (t_{p}),

(16)

where

p \geq 0

is an iteration step-counter, the initial guess

A_{0} \in O (n, k)

is arbitrarily chosen, and

t_{p}

denotes a (possibly variable) stepsize schedule.

Note that the Riemannian gradient of the criterion function is crucial to solving the optimization problem by the geodesic-based Riemannian-steepest-descent algorithm (16).

Theorem 2.

The Riemannian gradient of the criterion function (14) is

\begin{matrix} \nabla_{A} f = \frac{2}{N} \sum_{q} A log (B_{q}^{- 1} A) . \end{matrix}

(17)

Proof of Theorem 2.

The differential with respect to A of the criterion function (14) is

\begin{matrix} d f (A) & = \frac{1}{N} \sum_{q} d (tr ({log}^{⊤} (B_{q}^{- 1} A) log (B_{q}^{- 1} A)) \\ = \frac{2}{N} \sum_{q} tr ({log}^{⊤} (B_{q}^{- 1} A) d log (B_{q}^{- 1} A)) \\ = \frac{2}{N} \sum_{q} tr ({log}^{⊤} (B_{q}^{- 1} A) A^{- 1} d A) \\ = {〈{(\frac{2}{N} \sum_{q} ({log}^{⊤} (B_{q}^{- 1} A) A^{- 1})}^{⊤}, d A〉}^{E}, \end{matrix}

and, furthermore, according to Lemma 1, we get the Euclidean gradient

\partial_{A} f = \frac{2}{N} \sum_{q} A^{- ⊤} log (B_{q}^{- 1} A) .

(18)

Substituting the Euclidean gradient (18) into the expression of the Riemannian gradient (4), we have

\begin{matrix} \nabla_{A} f & = \frac{1}{N} \sum_{q} (A log (B_{q}^{- 1} A) - A I_{n, k} {log}^{⊤} (B_{q}^{- 1} A) I_{n, k}) \\ = \frac{1}{N} \sum_{q} (A log (B_{q}^{- 1} A) - A log (I_{n, k} {(B_{q}^{- 1} A)}^{⊤} I_{n, k})) \\ = \frac{1}{N} \sum_{q} (A log (B_{q}^{- 1} A) - A log {(B_{q}^{- 1} A)}^{- 1}) \\ = \frac{2}{N} \sum_{q} A log (B_{q}^{- 1} A) . \end{matrix}

☐

Associated with the expressions of the geodesic (5) and the Riemannian gradient (17) of the criterion function, the geodesic-based Riemannian-steepest-descent algorithm (16) is recast as

A_{p + 1} = A_{p} e^{t_{p} U_{p}^{⊤}} e^{t_{p} (U_{p} - U_{p}^{⊤})},

(19)

where

U_{p} : = - A_{p}^{- 1} \nabla_{A_{p}} f = - \frac{2}{N} \sum_{q} log (B_{q}^{- 1} A_{p})

.

In order to compute a nearly-optimal stepsize, the function

f (A_{p + 1})

considered as a function of the stepsize

t_{p}

can be expanded in Taylor series at the point

t_{p} = 0

as

f (A_{p + 1}) = \frac{1}{2} f_{2, p} t_{p}^{2} + f_{1, p} t_{p} + f (A_{p}) + o (t_{p}^{2}),

where the

f_{1, p}

and

f_{2, p}

as the coefficients of the Taylor expansion are computed by

f_{1, p} = \frac{d f (A_{p + 1})}{d t_{p}} |_{t_{p} = 0}, f_{2, p} = \frac{d^{2} f (A_{p + 1})}{d t_{p}^{2}} |_{t_{p} = 0} .

Furthermore, the nearly-optimal stepsize value that minimizes the

f (A_{p + 1})

is

{\hat{t}}_{p} : = - \frac{f_{1, p}}{f_{2, p}},

provided that

f_{1, p} \leq 0

,

f_{2, p} > 0

, hence

{\hat{t}}_{p} \geq 0

.

Thus, to obtain the

f_{1, p}

and

f_{2, p}

, the function

f (A_{p + 1})

is recast as

f (A_{p + 1}) = \frac{1}{N} \sum_{q} tr ({log}^{⊤} (B_{q}^{- 1} A_{p} e^{t_{p} U_{p}^{⊤}} e^{t_{p} (U_{p} - U_{p}^{⊤})}) log (B_{q}^{- 1} A_{p} e^{t_{p} U_{p}^{⊤}} e^{t_{p} (U_{p} - U_{p}^{⊤})})) .

Calculation leads to the first-order derivative of the function

f (A_{p + 1})

\begin{matrix} \frac{d f (A_{p + 1})}{d t_{p}} = \frac{2}{N} \sum_{q} tr ({log}^{⊤} (B_{q}^{- 1} A_{p} e^{t_{p} U_{p}^{⊤}} e^{t_{p} (U_{p} - U_{p}^{⊤})}) e^{t_{p} (U_{p}^{⊤} - U_{p})} U_{p} e^{t_{p} (U_{p} - U_{p}^{⊤})}), \end{matrix}

and setting

t_{p} = 0

yields the

f_{1, p}

f_{1, p} = \frac{2}{N} \sum_{q} tr ({log}^{⊤} (B_{q}^{- 1} A_{p}) U_{p}) = - tr (U_{p}^{⊤} U_{p}) \leq 0 .

Under the first-order derivative of the function

f (A_{p + 1})

above, direct calculations lead to the second-order derivative of the function

f (A_{p + 1})

\begin{matrix} \frac{d^{2} f (A_{p + 1})}{d t_{p}^{2}} & = \frac{2}{N} \sum_{q} tr (e^{t_{p} (U_{p}^{⊤} - U_{p})} U_{p}^{⊤} U_{p} e^{t_{p} (U_{p} - U_{p}^{⊤})} + {log}^{⊤} (B_{q}^{- 1} A_{p} e^{t_{p} U_{p}^{⊤}} e^{t_{p} (U_{p} - U_{p}^{⊤})}) \\ e^{t_{p} (U_{p}^{⊤} - U_{p})} (U_{p}^{⊤} U_{p} - U_{p} U_{p}^{⊤}) e^{t_{p} (U_{p} - U_{p}^{⊤})}), \end{matrix}

and, setting

t_{p} = 0,

we get

\begin{matrix} f_{2, p} & = \frac{2}{N} \sum_{q} tr (U_{p}^{⊤} U_{p} + {log}^{⊤} (B_{q}^{- 1} A_{p}) (U_{p}^{⊤} U_{p} - U_{p} U_{p}^{⊤})) \\ = 2 tr (U_{p}^{⊤} U_{p}) - tr (U_{p}^{⊤} (U_{p}^{⊤} U_{p} - U_{p} U_{p}^{⊤})) \\ = 2 tr (U_{p}^{⊤} U_{p}) \geq 0 . \end{matrix}

Hence, the nearly-optimal stepsize of the geodesic-based Riemannian-steepest-descent algorithm (19) is

{\hat{t}}_{p} = - \frac{f_{1, p}}{f_{2, p}} = \frac{1}{2} .

(20)

Therefore, the iteration algorithm (19) with the stepsize (20) may be recast explicitly as

A_{p + 1} = A_{p} e^{\frac{1}{N} \sum_{q} {log}^{⊤} (A_{p}^{- 1} B_{q})} e^{\frac{1}{N} \sum_{q} (log (A_{p}^{- 1} B_{q}) - {log}^{⊤} (A_{p}^{- 1} B_{q}))} .

(21)

3.2. Extended Hamiltonian Algorithm on the Lorentz Group

References [16,17] introduced a general theory of extended Hamiltonian (second order) learning on Riemannian manifold, especially, as an instance of learning by constrained criterion function optimization on the matrix manifolds. For the Lorentz group, the extended Hamiltonian algorithm can be expressed by

\{\begin{matrix} \dot{A} = X, \\ \dot{X} = - Γ_{A} (X, X) - \nabla_{A} f - μ X, \end{matrix}

(22)

where

A \in O (n, k)

,

X \in T_{A} O (n, k)

denotes the instantaneous learning velocity,

f : O (n, k) \to R

denotes a criterion function, and the constant

μ > 0

denotes a viscosity parameter.

Inserting the the expressions of the Christoffel matrix function

Γ_{A}

in Remark 1 and the Riemannian gradient

\nabla_{A} f

(4) into the general extended Hamiltonian system (22), calculations lead to the following expressions:

\{\begin{matrix} \dot{A} = A H, \\ \dot{H} = H^{⊤} H - H H^{⊤} + \frac{1}{2} I_{n, k} (\partial_{A}^{⊤} f A I_{n, k} - I_{n, k} A^{⊤} \partial_{A} f) - μ H . \end{matrix}

Hence, the second equation may be integrated numerically by a Euler stepping method, while the first one may be integrated via the geodesic. Namely, system (22) may be implemented by

\{\begin{matrix} A_{p + 1} = A_{p} e^{η H_{p}^{⊤}} e^{η H_{p} - η H_{p}^{⊤}}, \\ H_{p + 1} = η (H_{p}^{⊤} H_{p} - H_{p} H_{p}^{⊤}) + (1 - η μ) H_{p} + \frac{1}{2} η I_{n, k} (\partial_{A_{p}}^{⊤} f A_{p} I_{n, k} - I_{n, k} A_{p}^{⊤} \partial_{A_{p}} f), \end{matrix}

(23)

where

η > 0

denotes the learning rate. The effectiveness of the algorithm is ensured if and only if

\sqrt{2 λ_{m a x}} < μ < 1 / η,

where

λ_{m a x}

denotes the largest eigenvalue of the Hessian matrix of the criterion function

f (A)

(see [17]).

4. Numerical Experiments

In the present section, we give two numerical experiments on the Lorentz group

O (3, 1)

to demonstrate the effectiveness and performance of the proposed algorithms. The Lorentz group as a homogeneous space has four connected components. Now, we study the optimization on a connected component

S O_{0} (3, 1)

containing the identical matrix

I_{4}

.

In order to emphasize the behavior of the optimization methods RSDA, the EHA and the EGA, in the following experiments,

η = 0.5

and

μ = 1.5

are used.

4.1. Numerical Experiments on Averaging Two Lorentz Matrices

In Section 3.1, it is noticed that the geodesic distance

d (A, B)

and the discrepancy

D (A, B)

between two points A and B are different. Now, given

B_{1}

,

B_{2} \in O (3, 1)

, numerical experiment results are shown to compare

d (B_{1}, B_{2})

and

D (B_{1}, B_{2})

. The following two points are sought:

B_{1} = (\begin{matrix} cosh (0.5) & 0 & 0 & sinh (0.5) \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ sinh (0.5) & 0 & 0 & cosh (0.5) \end{matrix}), B_{2} = (\begin{matrix} cosh (0.1) & 0 & 0 & sinh (0.1) \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ sinh (0.1) & 0 & 0 & cosh (0.1) \end{matrix}) .

It can be verified that

B_{i}^{⊤} I_{3, 1} B_{i} = I_{3, 1}

,

B_{i} \in O (3, 1)

,

i = 1, 2

. The optimization problem to compute the geodesic distance

d (B_{1}, B_{2})

can be solved by means of a descent procedure along the negative Euclidean gradient (9) with a line search strategy. In Figure 1, the results show that the Frobenius norm of U is decreasing steadily when the criterion function

J (U)

tends to be constant 0 during iteration. By the numerical experiments, we can get

U^{*} = (\begin{matrix} \begin{matrix} 0.0000 & 0.0000 & 0.0000 & - 0.4000 \\ 0.0000 & - 0.0000 & - 0.0000 & 0.0000 \\ 0.0000 & - 0.0000 & - 0.0000 & 0.0000 \\ - 0.4000 & 0.0000 & 0.0000 & - 0.0000 \end{matrix} \end{matrix}) .

It is interesting that the geodesic distance between

B_{1}

and

B_{2}

is

d (B_{1}, B_{2}) = ∥ U^{*} ∥ \approx 0.5657

, and the discrepancy between

B_{1}

and

B_{2}

is

D (B_{1}, B_{2}) = ∥ log (B_{1}^{- 1} B_{2}) ∥ \approx 0.5657

.

In the following experiment, we consider the average value of two points

B_{1}

and

B_{2}

. Figure 2 displays the results of running the RSDA, the EHA, and the EGA. Note that the iteration sequence

{A_{p}}

is expected to converge to a matrix that locates amid the samples

{B_{q}}

, and the condition

∥ B_{q}^{- 1} A_{p} - I_{4} ∥ < 1

holds. The initial point is set as

A_{0} = B_{1}

, which satisfies

∥ B_{2}^{- 1} A_{0} - I_{4} ∥ \approx 0.5921 < 1

. More initial points can be chosen in a neighbor of the matrix

B_{1}

by the following rule (25). It can be found that the averaging algorithms converge steadily and the RSDA has the fastest convergence rate among three algorithms and needs one iteration to obtain the average value as follows:

\bar{A} = (\begin{matrix} \begin{matrix} 1.0453 & 0.0000 & 0.0000 & 0.3045 \\ 0.0000 & 1.0000 & 0.0000 & 0.0000 \\ 0.0000 & 0.0000 & 1.0000 & 0.0000 \\ 0.3045 & 0.0000 & 0.0000 & 1.0453 \end{matrix} \end{matrix}) .

However, the EHA and the EGA need 13 and 18 iterations to realize the same accuracy, respectively. It is interesting to compare the discrepancy

D (\bar{A}, B_{1}) \approx 0.2828

with the discrepancy

D (\bar{A}, B_{2}) \approx 0.2828

, which confirms that the computed average value is truly a midpoint (i.e., it is located at approximately equal discrepancy from the two matrices).

4.2. Numerical Experiments on Averaging Several Lorentz Matrices

The numerical experiments rely on the availability of a way to generate pseudo-random samples on the Lorentz group. Given a point

B \in O (n, k)

which is referred to as “center of mass” or simply center of the random distribution, it is possible to generate a random sample

B_{q} \in O (n, k)

in a neighbor of a matrix B by the rule [26,27]

B_{q} = B e^{B^{- 1} X},

(24)

where the center matrix B can be generated by a curve departing from the identical matrix,

B = e^{V}

with V generated randomly matrix in

o (n, k)

.

According to the structure of the tangent space

T_{B} O (n, k)

, expression (24) can be written

B_{q} = B e^{I_{n, k} (C - C^{⊤})},

(25)

where

C \in R^{(n + k) \times (n + k)}

is any unstructured random matrix.

Figure 3 displays the results on optimization over

O (3, 1)

with N = 50 Lorentz matrices. The initial point

A_{0}

is randomly generated,

A_{0} = (\begin{matrix} \begin{matrix} 0.8412 & 0.2117 & 0.5487 & - 0.2351 \\ - 0.0045 & 1.0783 & - 0.2160 & - 0.4577 \\ - 0.5410 & 0.2998 & 0.8456 & - 0.3126 \\ - 0.0203 & - 0.5454 & - 0.2508 & 1.1665 \end{matrix} \end{matrix}) .

Figure 3a shows the values of the criterion function (14) through the RSDA, the EHA and the EGA. Numerical experiments show that the average value of 50 Lorentz matrices is

\bar{A} = (\begin{matrix} \begin{matrix} 0.9127 & - 0.1134 & 0.3933 & 0.0235 \\ 0.0317 & 0.9853 & 0.2035 & - 0.1154 \\ - 0.4088 & - 0.1675 & 0.8980 & - 0.0407 \\ 0.0342 & - 0.1087 & - 0.0504 & 1.0077 \end{matrix} \end{matrix}),

which can be obtained after seven iterations using the RSDA, while the EHA and the EGA need 15 iterations. Figure 3b gives the norm

∥ A^{- 1} \nabla_{A} f ∥

of the Riemannian gradient (17) during iteration by the RSDA, from which we can obtain that the norm of the Riemannian gradient tends to be constant 0 after iteration 6. Figure 3c,d give the values of the discrepancy

D (A, B_{q})

before iteration and after iteration, respectively. In particular, note that all the discrepancy from the samples

B_{q}

and the matrix A are decreased substantially. The results show that the convergence rate of the RSDA is still the fastest among three algorithms.

5. Conclusions

In this paper, we consider the averaging optimization problem for the Lorentz group

O (n, k)

, namely, to measure the average of a set of Lorentz matrices. In order to tackle the related optimization problem, a geodesic-based Riemannian-steepest-descent algorithm is presented on the Lorentz group endowed with a left-invariant metric (Riemannian metric). The devised averaging algorithm is compared with the extended Hamiltonian algorithm and the Euclidean gradient algorithm. The results of numerical experiments over the Lorentz group

O (3, 1)

show that the convergence rate of the geodesic-based Riemannian-steepest-descent algorithm is the best one among these three algorithms.

Acknowledgments

The present research is supported by the National Natural Science Foundations of China (No. 61179031).

Author Contributions

Jing Wang conceived and designed the experiments; Jing Wang performed the experiments; Jing Wang and Huafei Sun analyzed the data; Didong Li contributed reagents/materials/analysis tools; Jing Wang, Huafei Sun and Didong Li wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Harris, W.F. Paraxial ray tracing through noncoaxial astigmatic optical systems, and a 5 × 5 augmented system matrix. Optom. Vis. Sci. 1994, 71, 282–285. [Google Scholar] [CrossRef] [PubMed]
Barachant, A.; Bonnet, S.; Congedo, M.; Jutten, C. Multi-class brain computer interface classification by Riemannian geometry. IEEE Trans. Bio. Med. Eng. 2012, 59, 920–928. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Moakher, M. Means and averaging in the group of rotations. SIAM J. Matrix Anal. A 2002, 24, 1–16. [Google Scholar] [CrossRef]
Mello, P.A. Averages on the unitary group and applications to the problem of disordered conductors. J. Phys. A 1990, 23, 4061–4080. [Google Scholar] [CrossRef]
Duan, X.M.; Sun, H.F.; Peng, L.Y. Riemannian means on special Euclidean group and unipotent matrices group. Sci. World J. 2013, 2013, 292787. [Google Scholar] [CrossRef] [PubMed]
Chakraborty, R.; Vemuri, B.C. Recursive Fréchet mean computation on the Grassmannian and its applications to computer vision. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
Fiori, S.; Kaneko, T.; Tanaka, T. Tangent-bundle maps on the Grassmann manifold: Application to empirical arithmetic averaging. IEEE Trans. Signal Process. 2015, 63, 155–168. [Google Scholar] [CrossRef]
Kaneko, T.; Fiori, S.; Tanaka, T. Empirical arithmetic averaging over the compact Stiefel manifold. IEEE Trans. Signal Process. 2013, 61, 883–894. [Google Scholar] [CrossRef]
Pölitz, C.; Duivesteijn, W.; Morik, K. Interpretable domain adaptation via optimization over the Stiefel manifold. Mach. Learn. 2016, 104, 315–336. [Google Scholar] [CrossRef]
Fiori, S. Learning the Fréchet mean over the manifold of symmetric positive-definite matrices. Cogn. Comput. 2009, 1, 279–291. [Google Scholar] [CrossRef]
Moakher, M. A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J. Matrix Anal. A 2005, 26, 735–747. [Google Scholar] [CrossRef]
Buchholz, S.; Sommer, G. On averaging in Clifford groups. In Computer Algebra and Geometric Algebra with Applications; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3519, pp. 229–238. [Google Scholar]
Kawaguchi, H. Evaluation of the Lorentz group Lie algebra map using the Baker-Cambell-Hausdorff formula. IEEE Trans. Magn. 1999, 35, 1490–1493. [Google Scholar] [CrossRef]
Heine, V. Group Theory in Quantum Mechanics; Dover: New York, NY, USA, 1993. [Google Scholar]
Geyer, C.M. Catadioptric Projective Geometry: Theory and Applications. Ph.D. Thesis, University of Pennsylvania, Philadelphia, PA, USA, 2002. [Google Scholar]
Fiori, S. Extended Hamiltonian learning on Riemannian manifolds: Theoretical aspects. IEEE Trans. Neural Netw. 2011, 22, 687–700. [Google Scholar] [CrossRef] [PubMed]
Fiori, S. Extended Hamiltonian learning on Riemannian manifolds: Numerical aspects. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 7–21. [Google Scholar] [CrossRef] [PubMed]
Zhang, X. Matrix Analysis and Application; Springer: Beijing, China, 2004. [Google Scholar]
Andruchow, E.; Larotonda, G.; Recht, L.; Varela, A. The left invariant metric in the general linear group. J. Geom. Phys. 2014, 86, 241–257. [Google Scholar] [CrossRef]
Zacur, E.; Bossa, M.; Olmos, S. Multivariate tensor-based morphometry with a right-invariant Riemannian distance on GL⁺(n). J. Math. Imaging Vis. 2014, 50, 19–31. [Google Scholar] [CrossRef]
Goldberg, K. The formal power series for log(e^xe^y). Duke Math. J. 1956, 23, 1–21. [Google Scholar] [CrossRef]
Newman, M.; So, W.; Thompson, R.C. Convergence domains for the Campbell-Baker-Hausdorff formula. Linear Algebra Appl. 1989, 24, 30–310. [Google Scholar] [CrossRef]
Thompson, R.C. Convergence proof for Goldberg’s expoential series. Linear Algebra Appl. 1989, 121, 3–7. [Google Scholar] [CrossRef]
Karcher, H. Riemannian center of mass and mollifier smoothing. Commun. Pure Appl. Math. 1977, 30, 509–541. [Google Scholar] [CrossRef]
Gabay, D. Minimizing a differentiable function over a differentiable manifold. J. Optim. Theory App. 1982, 37, 177–219. [Google Scholar] [CrossRef]
Fiori, S. Solving minimal-distance problems over the manifold of real symplectic matrices. SIAM J. Matrix Anal. A 2011, 32, 938–968. [Google Scholar] [CrossRef]
Fiori, S. A Riemannian steepest descent approach over the inhomogeneous symplectic group: Application to the averaging of linear optical systems. Appl. Math. Comput. 2016, 283, 251–264. [Google Scholar] [CrossRef]

Figure 1. The criterion function

J (U)

and the Frobenius norm of U during iteration, respectively.

Figure 1. The criterion function

J (U)

and the Frobenius norm of U during iteration, respectively.

Figure 2. Convergence comparison among the Riemannian-steepest-descent algorithm (RSDA), the extended Hamiltonian algorithm (EHA) and the Euclidean gradient algorithm (EGA) during iteration.

Figure 3. (a) convergence comparison among the RSDA, the EHA and the EGA during iteration; (b) the norm

∥ A^{- 1} \nabla_{A} f ∥

of the Riemannian gradient (17) during iteration by the Riemannian-steepest-descent algorithm (RSDA); (c–d) the discrepancy

D (A, B_{q})

before iteration

(A = A_{0})

and after iteration

(A = \bar{A})

, respectively.

Figure 3. (a) convergence comparison among the RSDA, the EHA and the EGA during iteration; (b) the norm

∥ A^{- 1} \nabla_{A} f ∥

of the Riemannian gradient (17) during iteration by the Riemannian-steepest-descent algorithm (RSDA); (c–d) the discrepancy

D (A, B_{q})

before iteration

(A = A_{0})

and after iteration

(A = \bar{A})

, respectively.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Sun, H.; Li, D. A Geodesic-Based Riemannian Gradient Approach to Averaging on the Lorentz Group. Entropy 2017, 19, 698. https://doi.org/10.3390/e19120698

AMA Style

Wang J, Sun H, Li D. A Geodesic-Based Riemannian Gradient Approach to Averaging on the Lorentz Group. Entropy. 2017; 19(12):698. https://doi.org/10.3390/e19120698

Chicago/Turabian Style

Wang, Jing, Huafei Sun, and Didong Li. 2017. "A Geodesic-Based Riemannian Gradient Approach to Averaging on the Lorentz Group" Entropy 19, no. 12: 698. https://doi.org/10.3390/e19120698

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Geodesic-Based Riemannian Gradient Approach to Averaging on the Lorentz Group

Abstract

1. Introduction

2. Geometry of the Lorentz Group

3. Optimization on the Lorentz Group

3.1. Riemannian-Steepest-Descent Algorithm on the Lorentz Group

3.2. Extended Hamiltonian Algorithm on the Lorentz Group

4. Numerical Experiments

4.1. Numerical Experiments on Averaging Two Lorentz Matrices

4.2. Numerical Experiments on Averaging Several Lorentz Matrices

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI