Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms

Ba, Fatima Antarou; Quellmalz, Michael

doi:10.3390/a15090311

Open AccessArticle

Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms

by

Fatima Antarou Ba

^*

and

Michael Quellmalz

Institute of Mathematics, Technische Universität Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany

^*

Author to whom correspondence should be addressed.

Algorithms 2022, 15(9), 311; https://doi.org/10.3390/a15090311

Submission received: 5 August 2022 / Revised: 25 August 2022 / Accepted: 26 August 2022 / Published: 30 August 2022

(This article belongs to the Section Randomized, Online, and Approximation Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

We consider the numerical solution of the discrete multi-marginal optimal transport (MOT) by means of the Sinkhorn algorithm. In general, the Sinkhorn algorithm suffers from the curse of dimensionality with respect to the number of marginals. If the MOT cost function decouples according to a tree or circle, its complexity is linear in the number of marginal measures. In this case, we speed up the convolution with the radial kernel required in the Sinkhorn algorithm via non-uniform fast Fourier methods. Each step of the proposed accelerated Sinkhorn algorithm with a tree-structured cost function has a complexity of

O (K N)

instead of the classical

O (K N^{2})

for straightforward matrix–vector operations, where K is the number of marginals and each marginal measure is supported on, at most, N points. In the case of a circle-structured cost function, the complexity improves from

O (K N^{3})

to

O (K N^{2})

. This is confirmed through numerical experiments.

Keywords:

multi-marginal optimal transport; Sinkhorn algorithm; fast Fourier transform; image processing; optimal transport; Wasserstein barycenter

MSC:

49Q22; 65T50; 65D18; 49M20

1. Introduction

The optimal transport (OT) problem is an optimization problem that deals with the search for an optimal map (plan) that moves masses between two or more measures at low cost [1,2]. OT appears in a wide range of applications such as image and signal processing [3,4,5,6,7], economics [8,9], finance [10,11], and physics [12,13]. The OT problem was first introduced in 1781 by Monge. His objective was to find a map between two probability measures

μ^{1}, μ^{2}

on

R^{d}

that transports

μ^{1}

to

μ^{2}

with minimal cost, where the cost function describes the cost of transporting a mass between two points in

R^{d}

. However, such maps do not always exist, so that Kantorovich [14] relaxed the problem in 1942 by looking for a transport plan with two prescribed marginals

μ^{1}

and

μ^{2}

that minimizes a certain cost functional.

Several authors have generalized the formulation to multi-marginal optimal transport (MOT) [15,16,17], where more than two marginal measures are given. For the given probability measures

μ^{k}

on

Ω^{k} \subset R^{d}

,

k = 1, \dots, K

, an optimal transport plan

π

is defined as a solution of the MOT problem

min_{π \in Π (μ^{1}, \dots, μ^{k})} \int_{Ω^{1} \times \dots \times Ω^{K}} c (x^{1}, \dots, x^{K}) d π (x^{1}, \dots, x^{K})

where

Π (μ^{1}, \dots, μ^{K})

is the convex set of all joint probability measures

π

whose marginals are

μ^{k}

, and

c : Ω^{1} \times \dots \times Ω^{K} \to R

is the cost function.

Since the numerical computation of a transport plan is difficult in general, a regularization term such as the entropy [13,18], Kullback–Leibler divergence [19], general f-divergence [20], or

L^{2}

-regularization [21,22] can be added to make the problem strictly convex. Different approaches such as the Sinkhorn algorithm [1,13], stochastic gradient descent [23], the Gauss–Seidel method [22], or proximal splitting [24] have been used to iteratively determine a minimizing sequence of the MOT problem.

However, the problem suffers from the curse of dimensionality as the complexity grows exponentially with the number K of marginal measures. One way to circumvent this lies in incorporating additional sparsity assumptions into the model. Polynomial-time algorithms to solve certain sparse MOT problems have been studied in [13,25,26]. We will assume that the cost function decouples according to a graph, where the nodes correspond to the marginals and the cost function is the sum of functions that depend only on two variables which correspond to two marginals connected by an edge of the graph. For example, the circle-structured Euclidean cost function reads as

c (x^{1}, \dots, x^{K}) = {∥ x^{1} - x^{2} ∥}_{2}^{2} + \dots + {∥ x^{K - 1} - x^{K} ∥}_{2}^{2} + {∥ x^{K} - x^{1} ∥}_{2}^{2}, x^{1}, \dots, x^{K} \in R^{d} .

MOT problems with graph-structured cost functions with a constant tree-width can be solved with the Sinkhorn algorithm in polynomial time [27]. In [25], polynomial-time algorithms were presented for the MOT problem and its regularized counterpart for the cases of a graph structure and a set-optimization structure, as well as a low-rank and sparsely structured cost function. Another sparsity assumption lies on the transport plan to be thinly spread, which is, e.g., the case for the

L^{2}

-regularized problem [21,22].

1.1. Our Contributions

In the present paper, we study the discrete, entropy-regularized MOT problem with a tree-structured [3,13] or a circle-structured cost function, where all measures are supported on a finite number of points (atoms) in

R^{d}

. Then, the computational time of the Sinkhorn algorithm [28,29,30], which iteratively determines a sequence converging to the solution, depends linearly on the number K of input measures. If the numbers of atoms is large, however, the Sinkhorn algorithm still requires considerable computational time and memory, which mainly comes from computing a discrete convolution, i.e., a matrix–vector product with a kernel matrix.

This is significantly improved by Fourier-based fast summation methods [31,32]. The key idea is the approximation of the kernel function using a Fourier series, which enables the application of the non-uniform fast Fourier transform (NFFT). Although such fast summation methods are frequently used in different applications such as electrostatic particle interaction [33], tomography [34], image segmentation [35], and, very recently, also OT with two marginals [36], they have not been utilized for MOT so far. Furthermore, a method for accelerating the Sinkhorn algorithm for Wasserstein barycenters via computing the convolution with a different kernel, namely the heat kernel, was discussed in [37].

Our main contribution is the combination of the fast summation method with the sparsity of the tree- or circle-structured cost function in the MOT problem for accelerating the Sinkhorn algorithm. Each iteration step has a complexity of

O (K N)

for a tree- and

O (K N^{2})

for a circle-structured cost function, compared to

O (K N^{2})

and

O (K N^{3})

, respectively, with the straightforward matrix–vector operations, where N is an upper bound of the number of atoms for each of the K marginal measures. Our numerical tests with both tree- and circle-structured cost functions confirm a considerable acceleration, while the accuracy stays almost the same. A different acceleration of the Sinkhorn algorithm via low-rank approximation for tree-structured cost yields the same asymptotic complexity [38]. We note that MOT problems with tree-structured cost functions are used for the computation of Wasserstein barycenters [4], and with circle-structured cost for computing Euler flows [39].

1.2. Outline of the Paper

Section 2 introduces the notation. In Section 3, we focus on the discrete MOT problem with squared Euclidean norm cost functions and the numerical solution of the corresponding entropy-regularized problem, using the Sinkhorn algorithm. We investigate sparsely structured cost functions that decouple according to a tree or circle in Section 4. Then, the complexity of the Sinkhorn algorithm depends only linearly on K. Section 5 describes a fast summation method for further accelerating the Sinkhorn algorithm. Finally, in Section 6, we verify the practical performance by applying it to generalized Euler flows and for finding generalized Wasserstein barycenters. We compare the computational time of the proposed Sinkhorn algorithm based on the NFFT, with the algorithm based on direct matrix multiplication.

2. Notation

Let

K \in N

and

n = (n_{1}, \dots, n_{k}) \in N^{K}

. We set

[K] : = 1, \dots, K

and consider K finite sets

Ω^{k} = \{x_{i_{k}}^{k} : i_{k} \in [n_{k}]\} \subset R^{d}, k \in [K],

consisting of points

x_{i_{k}}^{k} \in R^{d}

, which are called atoms. We set

Ω : = Ω^{1} \times \dots \times Ω^{K}

. Additionally, we define the index set

I : = \{i = (i_{1}, \dots, i_{K}) : i_{k} \in [n_{k}], k \in [K]\}

and the set of K-dimensional matrices (tensors)

R^{i} : = R^{i_{1}} \times \dots \times R^{i_{K}}, i \in I .

Let

P (Ω^{k})

denote the set of probability measures on

Ω^{k}

. In this paper, we consider K discrete probability measures, also called marginal measures,

μ^{k} \in P (Ω^{k})

, given by

μ^{k} = \sum_{i_{k} = 1}^{n_{k}} μ_{i_{k}}^{k} δ_{x_{i_{k}}^{k}}, k \in [K],

where the probabilities satisfy

μ_{i_{k}}^{k} \geq 0, \sum_{i_{k} = 1}^{n_{k}} μ_{i_{k}}^{k} = 1 for all i_{k} \in [n_{k}], k \in [K],

and, for all

A \subset Ω^{k}

, the Dirac measure is given by

δ_{x_{i_{k}}^{k}} (A) : = \{\begin{matrix} 1 & if x_{i_{k}}^{k} \in A, \\ 0 & otherwise . \end{matrix}

For

G, H \in R^{n}

, we denote their component-wise (Hadamard) product by

G ⊙ H {(G_{i} H_{i})}_{i \in I} \in R^{n},

and similarly, their component-wise division by ⊘, as well as the Frobenius inner product

{〈 G, H 〉}_{F} \sum_{i \in I} G_{i} H_{i} \in R .

The tensor product (Kronecker product) of

u, v \in R^{m}

is denoted by

u \otimes v \in R^{m \times m}

. Analogues can be defined for tensors of different size.

3. Multi-Marginal Optimal Transport

In the following, we consider the discrete multi-marginal optimal transport (MOT) between K marginal measures

μ^{k} \in P (Ω^{k})

,

k \in [K]

. We define the set of admissible transport plans by

Π (μ^{1}, \dots, μ^{k}) : = Π \in R_{\geq 0}^{n} : P_{k} (Π) = μ^{k} for all k \in [K],

(1)

where the k-th marginal projection is defined as

P_{k} (Π) : = \sum_{ℓ \in [K] \ k} \sum_{i_{ℓ} \in [n_{ℓ}]} Π_{i_{1}, \dots, i_{K}} \in R^{n_{k}} .

(2)

For a cost function

c : Ω \to R_{\geq 0}

and the samples

x_{i} = (x_{i_{1}}^{1}, \dots, x_{i_{K}}^{K})

,

i \in I

, we define the respective cost matrix

C : = {[C_{i}]}_{i \in I} = {[c (x_{i})]}_{i \in I} = {[c (x_{i_{1}}^{1}, \dots, x_{i_{K}}^{K})]}_{(i_{1}, \dots, i_{K}) \in I} \in R_{\geq 0}^{n} .

(3)

The discrete MOT problem reads

min_{Π \in Π (μ^{1}, \dots, μ^{k})} {〈 Π, C 〉}_{F},

(4)

whose solution

Π \in R_{\geq 0}^{n}

is called optimal plan.

Entropy Regularization

As the MOT problem (4) is numerically unfeasible, we consider for

η > 0

the entropy-regularized multi-marginal optimal transport (

{MOT}_{η}

) problem

min_{Π \in Π (μ^{1}, \dots, μ^{K})} {〈 Π, C 〉}_{F} + η {〈 Π, log Π - 1_{n} 〉}_{F} = min_{Π \in Π (μ^{1}, \dots, μ^{k})} \sum_{i \in I} Π_{i} C_{i} + η \sum_{i \in I} Π_{i} (log Π_{i} - 1),

(5)

which is a convex optimization problem. It is possible to numerically deduce the optimal transport plan

\hat{Π}

of (5) from the solution of the corresponding Lagrangian dual problem. The following theorem is a special case of [3] for a constant entropy function.

Theorem 1.

The Lagrangian dual formulation of the discrete

{MOT}_{η}

problem (5) states

sup_{\begin{matrix} ϕ^{k} \in R_{\geq 0}^{n_{k}}, k \in [K] \end{matrix}} S (ϕ^{1}, \dots, ϕ^{K}) : = sup_{\begin{matrix} ϕ^{k} \in R_{\geq 0}^{n_{k}}, k \in [K] \end{matrix}} η \sum_{k \in [K]} \sum_{j \in [n_{k}]} μ_{j}^{k} log ϕ_{j}^{k} - η \sum_{i \in I} K_{i} Φ_{i},

(6)

where the kernel matrix

K \in R^{n}

is defined by

K_{i} : = exp (- \frac{C_{i}}{η}), i \in I,

and the dual tensor

Φ = ⨂_{k = 1}^{K} ϕ^{k}

by

Φ_{i} : = \prod_{k = 1}^{K} ϕ_{i_{k}}^{k}, i \in I .

The functional

S : R_{\geq 0}^{n} \to R

is called the Sinkhorn function.

The solutions of the dual and the primal problem are generally not equal. The solution of (6) is generally a lower bound to the solution of the primal problem (5). Equality holds if the cost function c is lower semi-continuous; i.e.,

lim inf c (x) \geq c (x_{0})

as

x \to x_{0}

for every

x_{0} \in R^{n}

, or Borel measurable and bounded [40]. This is obviously the case for the squared Euclidean norm cost function

c : R^{n} \to [0, \infty), c (x) = \sum_{\begin{matrix} k_{1}, k_{2} \in [K] \\ k_{1} \neq k_{2} \end{matrix}} {∥ x_{i_{k_{1}}}^{k_{1}} - x_{i_{k_{2}}}^{k_{2}} ∥}_{2}^{2}

that we study here.

Proposition 1

([41]). An optimal plan of the

{MOT}_{η}

problem (5) is given by

\hat{Π} = K ⊙ \hat{Φ},

where

\hat{Φ} = ⨂_{k \in [K]} {\hat{ϕ}}^{k}

and

{\hat{ϕ}}^{k},

k \in [K]

are the optimal solutions of dual problem (6).

A sequence converging to the optimal dual vectors

{\hat{ϕ}}^{k},

k \in [K]

in (6) can be iteratively determined by the Sinkhorn algorithm [13,42] presented in Algorithm 1, where we note that line 4 is obtained by deriving the Sinkhorn function

S

with respect to

ϕ^{k},

k \in [K] .

The complexity of the algorithm mainly comes from the computation of the marginal

P_{k} (K ⊙ Φ)

, where the projection

P_{k}

is defined in (2). In general, the number of operations depends exponentially on K.

Algorithm 1 Sinkhorn iterations for the

{MOT}_{η}

problem

1:: Input: Initial values ${(ϕ^{k})}^{(0)} \in R^{n_{k}},$ $k \in [K],$ regularization parameter $η > 0,$ threshold $δ > 0$
2:: Set $r \leftarrow 0$
3:: do
4:: for $k = 1, \dots, K$ do
5:: Compute ${(\tilde{Φ})}_{k}^{(r + 1)} : = ⨂_{ℓ \in [k - 1]} {(ϕ^{ℓ})}^{(r + 1)} \otimes ⨂_{ℓ \in [K] \ [k - 1]} {(ϕ^{ℓ})}^{(r)}$
6:: Compute dual vectors

$\begin{matrix} {(ϕ^{k})}^{(r + 1)} : = (μ^{k} ⊙ {(ϕ^{k})}^{(r)}) ⊘ P_{k} (K ⊙ {(\tilde{Φ})}_{k}^{(r + 1)}) \end{matrix}$
7:: Increment $r \leftarrow r + 1$
8:: end for
9:: while $| S ({(ϕ^{1})}^{(r)}, \dots, {(ϕ^{K})}^{(r)}) - S ({(ϕ^{1})}^{(r - 1)}, \dots, {(ϕ^{K})}^{(r - 1)}) | \geq δ$
10:: return optimal plan $\hat{Π} = K ⊙ \hat{Φ}$ and $\hat{Φ} = ⨂_{k \in [K]} {(ϕ^{k})}^{(r + 1)}$

4. Sparse Cost Functions

In this section, we take a look at sparsely structured cost functions, for which the Sinkhorn algorithm becomes much faster and we overcome the curse of dimensionality. Let

G = (V, E)

be an undirected graph with vertices

V

and edges

E

. We say that the

{MOT}_{η}

problem has the graph G structure if

V = [K]

and the cost matrix (3) decouples according to

C_{i} = \sum_{{k_{1}, k_{2}} \in E} ∥ x_{i_{k_{1}}}^{k_{1}} - x_{i_{k_{2}}}^{k_{2}} ∥_{2}^{2}, i = (i_{1}, \dots, i_{K}) \in I .

This implies that the kernel

K \in R_{\geq 0}^{n}

satisfies

K_{i} = exp (- \frac{C_{i}}{η}) = \prod_{{k_{1}, k_{2}} \in E} K_{i_{k_{1}}, i_{k_{2}}}^{(k_{1}, k_{2})}, i \in I,

(7)

where the kernel matrix

K^{(k_{1}, k_{2})} \in R_{\geq 0}^{n_{k_{1}} \times n_{k_{2}}}

for

{k_{1}, k_{2}} \in E

is given by

K_{i_{k_{1}}, i_{k_{2}}}^{(k_{1}, k_{2})} : = exp (- \frac{1}{η} ∥ x_{i_{k_{1}}}^{k_{1}} - x_{i_{k_{2}}}^{k_{2}} ∥_{2}^{2}) .

(8)

We use the indices

k \in V = [K]

to identify the marginal measures

μ^{k}

in the rest of the paper.

The discrete, dual formulation (6) of the

{MOT}_{η}

problem has the same form independently of the structure of the graph G; only the marginals

P_{k} (K ⊙ Φ)

differ. If the graph G is complete, i.e., each of the two vertices in

V

are connected with an edge, then the computational complexity of the Sinkhorn Algorithm 1 depends exponentially on the number K of marginal measures (vertices). For larger values of K, it is practically impossible to numerically compute an optimal plan for the

{MOT}_{η}

problem. We consider two sparsity assumptions of the

{MOT}_{η}

problem, each of them yielding that the Sinkhorn algorithm has a linear complexity in the number K of the nodes. It was shown in [25] that

{MOT}_{η}

problems with graphically structured cost functions of constant tree-width can be implemented in polynomial time. This is the case for the tree and circle, whose tree-widths are 1 and 2, respectively. In the following, we give an explicit scheme to efficiently compute the Sinkhorn iterations for the tree and circle structures.

4.1. Tree Structure

We consider the

{MOT}_{η}

problem with the structure of a tree

(V, E)

, which is a connected and circle-free graph with

| E | = | V | - 1 .

We define the (non-empty) neighbour set

N_{k}

of

k \in V

as the set of all nodes

ℓ \in V

, such that

{k, ℓ} \in E .

Furthermore, we denote by

L {k \in V : | N_{k} | = 1}

the set of all leaves of the tree.

We call

1 \in V

the root of the tree. For every

k \in V

, there is a unique path between k and the root. For

k \in V \ 1

, we define the parent

p (k)

as the node in

N_{k}

such that

p (k)

lies in the path between k and the root 1. The root has no parent. Without loss of generality, we assume that

p (k) < k

holds for all

k \in V \ 1

. We define the set of children

C_{k} {ℓ \in N_{k} : ℓ > k}

. Then, we can derive a recursive formula for the k-th marginal of the tensor

K ⊙ Φ \in R^{n} .

Theorem 2.

Let

(V, E)

be a tree with leaves

L

, and let c be the

{MOT}_{η}

cost function associated. Furthermore, let

k \in V

be an arbitrary node. Then, the k-th marginal of the transport plan

K ⊙ Φ

is given by

P_{k} (K ⊙ Φ) = ϕ^{k} ⊙ ⨀_{ℓ \in N_{k}} α^{(k, ℓ)},

where the vectors

α^{(k, ℓ)} \in R^{n_{k}}

are recursively defined as

α^{(k, ℓ)} = \{\begin{matrix} K^{(k, ℓ)} ϕ^{ℓ} & if ℓ \in L, \\ K^{(k, ℓ)} (ϕ^{ℓ} ⊙ ⨀_{t \in N_{ℓ} \ k} α^{(ℓ, t)}) & otherwise . \end{matrix}

A proof of Theorem 2 can be found in [13] (Theorem 3.2). The main idea is to split the rooted tree at the node k into

| N_{k} |

subtrees. Therefore, the kernel matrix holds

K_{i} = \prod_{{k_{1}, k_{2}} \in E} K_{i_{k_{1}}, i_{k_{2}}}^{(k_{1}, k_{2})} = \prod_{ℓ \in N_{k}} \prod_{t \in D_{ℓ}} K_{i_{p (t)}, i_{t}}^{(p (t), t)}, i \in I,

where

D_{ℓ}

is the set of descendants of

ℓ,

i.e., the nodes

t \in V

such that ℓ lies in the path between k and

t .

Inserting

K_{i}

into

P_{k} (K ⊙ Φ)

and recalling the definition of

P_{k}

in (2) yields the result.

In order to efficiently perform the Sinkhorn algorithm, we compute iteratively for

ℓ = K, \dots, 2

the vectors

β_{ℓ} : = α^{(p (ℓ), ℓ)} \in R^{n_{p (ℓ)}} .

From Theorem 2, we obtain

\begin{matrix} β_{ℓ} = \{\begin{matrix} K^{(p (ℓ), ℓ)} ϕ^{ℓ} & if ℓ \in L, \\ K^{(p (ℓ), ℓ)} (ϕ^{ℓ} ⊙ ⨀_{t \in C_{ℓ}} β_{t}) & otherwise . \end{matrix} \end{matrix}

Since we assumed that

p (k) < k

, the computation of

β_{ℓ}

requires only

β_{t}

for

t > ℓ

. Similarly, the vectors

γ_{ℓ} : = α^{(ℓ, p (ℓ))} \in R^{n_{ℓ}}

for

ℓ > 1

and

γ_{1} : = 1_{n_{1}}

can be iteratively computed by

γ_{ℓ} = {(K^{(p (ℓ), ℓ)})}^{⊺} (ϕ^{p (ℓ)} ⊙ γ_{p (ℓ)} ⊙ ⨀_{t \in C_{p (ℓ)} \ ℓ} β_{t}), ℓ = 2, \dots, K,

where the vectors

β_{t} \in R^{n_{p (ℓ)}}

are assumed to be known from above. Then, the marginals are

P_{k} (K ⊙ Φ) = ϕ^{k} ⊙ γ_{k} ⊙ β^{⊙_{C_{k}}}, k \in [K],

where for any subset

U \subseteq C_{k},

we define

R^{n_{k}} ∋ β^{⊙_{U}} : = \{\begin{matrix} 1_{n_{k}} & if U = \emptyset, \\ ⨀_{t \in U} β_{t} & otherwise . \end{matrix}

The resulting procedure is summarized in Algorithm 2.

Algorithm 2 Sinkhorn algorithm for tree structure

1:: Input: Tree $T (V, E)$ with leaves $L$ and root $1 \in V = [K]$ , initialization ${(ϕ^{k})}^{(0)}$ , $k \in [K],$ parameters $η, δ > 0$
2:: Initialize $r \leftarrow 0$
3:: for $k = K, \dots, 2$ do
4:: $β_{k}^{(0)} : = \{\begin{cases} K^{(p (k), k)} {(ϕ^{k})}^{(0)} & if k \in L, \\ K^{(p (k), k)} ({(ϕ^{k})}^{(0)} ⊙ {(β^{⊙_{C_{k}}})}^{(0)}) & otherwise \end{cases}$
5:: end for
6:: do
7:: for $k = 1, \dots, K$ do
8:: $γ_{k}^{(r)} : = \{\begin{cases} 1 & if k = 1, \\ {(K^{(p (k), k)})}^{⊺} ({(ϕ^{p (k)})}^{(r + 1)} ⊙ γ_{p (k)}^{(r)} ⊙ {(β^{⊙_{C_{p (k)} \ k}})}^{(r)}) & otherwise \end{cases}$
9:: Compute the dual vector ${(ϕ^{k})}^{(r + 1)} : = μ^{k} ⊘ (γ_{k}^{(r)} ⊙ {(β^{⊙_{C_{k}}})}^{(r)})$
10:: end for
11:: for $k = K, \dots, 2$ do
12:: Compute $β_{k}^{(r + 1)}$ according to step 4.
13:: end for
14:: Set $S^{(r)} : = η (\sum_{k = 1}^{K} {(μ^{k})}^{⊺} log {(ϕ^{k})}^{(r)} - {(μ^{1} ⊘ {(ϕ^{1})}^{(r + 1)})}^{⊺} {(ϕ^{1})}^{(r)})$
15:: Increment $r \leftarrow r + 1$
16:: while $| S^{(r)} - S^{(r - 1)} | \geq δ$
17:: return optimal plan $\hat{Π} = K ⊙ \hat{Φ}$ , where $\hat{Φ} = ⨂_{k \in [K]} {(ϕ^{k})}^{(r)}$

4.2. Circle Structure

We consider the

{MOT}_{η}

problem where the graph

(V, E)

is a circle. We assume for each

k \in V = [K]

that

{k, k + 1} \in E

is an edge of the circle, where we set

k + 1 = 1

if

k = K

and

k - 1 = K,

if

k = 1 .

Thus, we can define the distance between two nodes

k_{1}, k_{2} \in V

as

d (k_{1}, k_{2}) : = \{\begin{matrix} k_{2} - k_{1} & if k_{2} \geq k_{1}, \\ K - k_{1} + k_{2} & otherwise . \end{matrix}

Theorem 3.

Let

(V, E)

be a circle and c be the MOT cost function associated. Furthermore, let

k \in V

be an arbitrary node. The k-th marginal of the transport plan

K ⊙ Φ

is given by

P_{k} (K ⊙ Φ) = ((ϕ^{k} ⊙ K^{(k, k + 1)}) ⊙ {(ϕ^{k + 1} ⊙ α^{(k + 1, k)})}^{⊺}) 1_{k + 1},

where the matrices

α^{(ℓ, t)} \in R^{n_{ℓ} \times n_{t}}

,

ℓ, t \in V

recursively satisfy

α^{(ℓ, t)} = \{\begin{matrix} K^{(ℓ, t)} & if d (ℓ, t) = 1, \\ K^{(ℓ, ℓ + 1)} (ϕ^{ℓ + 1} ⊙ α^{(ℓ + 1, t)}) & otherwise, \end{matrix}

(9)

and we set

k + 1 = 1

if

k = K

and

k - 1 = N,

if

k = 1 .

Proof.

The kernel

K

can be decomposed as (7). Let

k \in [K]

and

i_{k} \in [n_{k}]

. It holds then, that

\begin{array}{l} {[P_{k} (K ⊙ Φ)]}_{i_{k}} & = \sum_{ℓ \in V \ k} \sum_{i_{ℓ} \in [n_{ℓ}]} K_{i} ⊙ Φ_{i} = ϕ_{i_{k}}^{k} \sum_{ℓ \in V \ k} \sum_{i_{ℓ} \in [n_{ℓ}]} \prod_{{k_{1}, k_{2}} \in E} K_{i_{k_{1}}, i_{k_{2}}}^{(k_{1}, k_{2})} \prod_{j \in V \ k} ϕ_{i_{j}}^{j} \\ = ϕ_{i_{k}}^{k} \sum_{i_{1}} ϕ_{i_{1}}^{1} \sum_{i_{2}} K_{i_{1}, i_{2}}^{(1, 2)} ϕ_{i_{2}}^{2} \dots \sum_{i_{k - 2}} K_{i_{k - 3}, i_{k - 2}}^{(k - 3, k - 2)} ϕ_{i_{k - 2}}^{k - 2} \sum_{i_{k - 1}} K_{i_{k - 2}, i_{k - 1}}^{(k - 2, k - 1)} ϕ_{i_{k - 1}}^{k - 1} K_{i_{k - 1}, i_{k}}^{(k - 1, k)} \\ \sum_{i_{k + 1}} K_{i_{k}, i_{k + 1}}^{(k, k + 1)} ϕ_{i_{k + 1}}^{k + 1} \dots \sum_{i_{K}} K_{i_{K - 1}, i_{K}}^{(K - 1, K)} ϕ_{i_{K}}^{K} K_{i_{K}, i_{1}}^{(K, 1)} \\ = ϕ_{i_{k}}^{k} \sum_{i_{k + 1}} K_{i_{k}, i_{k + 1}}^{(k, k + 1)} ϕ_{i_{k + 1}}^{k + 1} \dots \sum_{i_{K}} K_{i_{K - 1}, i_{K}}^{(K - 1, K)} ϕ_{i_{K}}^{K} \sum_{i_{1}} K_{i_{K}, i_{1}}^{(K, 1)} ϕ_{i_{1}}^{1} \sum_{i_{2}} K_{i_{1}, i_{2}}^{(1, 2)} ϕ_{i_{2}}^{2} \dots \\ \sum_{i_{k - 2}} K_{i_{k - 3}, i_{k + 2}}^{(k - 3, k - 2)} ϕ_{i_{k - 2}}^{k - 2} \sum_{i_{k - 1}} K_{i_{k - 2}, i_{k - 1}}^{(k - 2, k - 1)} ϕ_{i_{k - 1}}^{k - 1} K_{i_{k - 1}, i_{k}}^{(k - 1, k)} . \end{array}

Setting

ψ^{ℓ} : = ϕ^{(k + ℓ - 1) \mod K}

and

{\tilde{K}}^{(j, ℓ + 1)} : = K^{((k + ℓ - 1) \mod K, (k + ℓ) \mod K)}

for every

ℓ \in [K]

, we obtain

\begin{array}{l} {[P_{k} (K ⊙ Φ)]}_{i_{k}} & = ϕ_{i_{k}}^{k} \sum_{j_{2}} {\tilde{K}}_{i_{k}, j_{2}}^{(1, 2)} ψ_{j_{2}}^{2} \dots \sum_{j_{K - 1}} {\tilde{K}}_{j_{K - 2}, j_{K - 1}}^{(K - 2, K - 1)} ψ_{j_{K - 1}}^{K - 1} \sum_{j_{K}} {\tilde{K}}_{j_{K - 1}, j_{K}}^{(K - 1, K)} ψ_{j_{K}}^{K} {\tilde{K}}_{j_{K}, i_{k}}^{(K, 1)} \\ = {[P_{1} (\tilde{K} ⊙ \tilde{Φ})]}_{i_{k}} . \end{array}

We define

{\tilde{α}}^{(K, 1)} : = {\tilde{K}}^{(K, 1)} = α^{(k - 1, k)} \in R^{n_{k - 1} \times n_{k}}

and recursively for all

ℓ = K - 1, \dots, 1

, the matrix

{\tilde{α}}^{(ℓ, 1)} \in R^{((k + ℓ - 1) \mod K) \times n_{k}}

by

{\tilde{α}}_{j_{l}, i_{k}}^{(ℓ, 1)} : = {[{\tilde{K}}^{(ℓ, ℓ + 1)} (ψ^{ℓ + 1} ⊙ {\tilde{α}}^{(ℓ + 1, 1)})]}_{j_{l}, i_{k}} = \sum_{j_{ℓ + 1}} {\tilde{K}}_{j_{ℓ}, j_{ℓ + 1}}^{(ℓ, ℓ + 1)} ψ_{j_{ℓ + 1}}^{ℓ + 1} {\tilde{α}}_{j_{ℓ + 1}, i_{k}}^{(ℓ + 1, 1)} .

Inserting this into the marginal yields

\begin{array}{l} {[P_{1} (\tilde{K} ⊙ \tilde{Φ})]}_{i_{k}} & = ϕ_{i_{k}}^{k} \sum_{j_{2}} {\tilde{K}}_{i_{k}, j_{2}}^{(1, 2)} ψ_{j_{2}}^{2} \dots \sum_{j_{K - 2}} {\tilde{K}}_{j_{K - 2}, j_{K - 1}}^{(K - 2, K - 1)} ψ_{j_{K - 1}}^{K - 1} \sum_{j_{K - 1}} {\tilde{K}}_{j_{K - 2}, j_{K - 1}}^{(K - 2, K - 1)} ψ_{j_{K - 1}}^{K - 1} {\tilde{α}}_{j_{K - 1}, i_{k}}^{(K - 1, 1)} \\ = ψ_{i_{k}}^{1} \sum_{j_{2}} {\tilde{K}}_{i_{k}, j_{2}}^{(1, 2)} ψ_{j_{2}}^{2} {\tilde{α}}_{j_{2}, i_{k}}^{(2, 1)} = ψ_{i_{k}}^{1} {[\tilde{K}}_{i_{k}, j_{2}}^{(1, 2)}]_{j_{2} \in [n_{k + 1}]}^{⊺} (ψ^{2} ⊙ {[{\tilde{α}}_{j_{2}, i_{k}}^{(2, 1)}]}_{j_{2} \in [n_{k + 1}]}) . \end{array}

Henceforth,

P_{1} (\tilde{K} ⊙ \tilde{Φ}) = ((ψ^{1} ⊙ {\tilde{K}}^{(1, 2)}) ⊙ {(ψ^{2} ⊙ {\tilde{α}}^{(2, 1)})}^{⊺}) 1 .

Finally, defining

{\tilde{α}}^{(ℓ, 1)} = α^{((k + ℓ) \mod K, k)}

yields the assumption. □

To efficiently compute the marginal optimal transport as for the tree structure, we choose

k = 1

as the starting point and decompose the marginal into two matrices, which can be computed recursively as follows. For matrices

G, H \in R^{n \times m},

we define the inner product with respect to the second dimension by

〈 G, H 〉 : = {(\sum_{j = 1}^{m} G_{i j} H_{i j})}_{i \in [n]} \in R^{n} .

Theorem 4.

Under the assumptions of Theorem 3, we have

P_{k} (K ⊙ Φ) = \{\begin{matrix} ϕ^{1} ⊙ 〈K^{(1, 2)}, {(ϕ^{2} ⊙ α^{(2, 1)})}^{⊺}〉, & k = 1, \\ ϕ^{k} ⊙ 〈α^{(k, 1)}, {(ϕ^{1} ⊙ λ^{(1, k)})}^{⊺}〉, & k = 2, \dots, K, \end{matrix}

where

α^{(k, 1)}

is given in (9) and

λ^{(1, k)} : = \{\begin{matrix} K^{(1, 2)}, & k = 2, \\ λ^{(1, k - 1)} (ϕ^{(k - 1)} ⊙ K^{(k - 1, k)}), & k = 3, \dots, K . \end{matrix}

(10)

Proof.

Let

k \in [K]

and

i_{k} \in [n_{k}]

. For

k = 1

, the assertion follows from Theorem 3. For

k > 2

, we have

\begin{matrix} {[P_{k} (K ⊙ Φ)]}_{i_{k}} & = ϕ_{i_{k}}^{k} \sum_{i_{1}} ϕ_{i_{1}}^{1} \sum_{i_{2}} K_{i_{1}, i_{2}}^{(1, 2)} ϕ_{i_{2}}^{2} \dots \sum_{i_{k - 1}} K_{i_{k - 2}, i_{k - 1}}^{(k - 2, k - 1)} ϕ_{i_{k - 1}}^{k - 1} K_{i_{k - 1}, i_{k}}^{(k - 1, k)} \end{matrix}

\begin{matrix} \sum_{i_{k + 1}} K_{i_{k}, i_{k + 1}}^{(k, k + 1)} ϕ_{i_{k + 1}}^{k + 1} \dots \sum_{i_{K}} K_{i_{K - 1}, i_{K}}^{(K - 1, K)} ϕ_{i_{K}}^{K} K_{i_{K}, i_{1}}^{(K, 1)} \end{matrix}

\begin{matrix} = ϕ_{i_{k}}^{k} \sum_{i_{1}} ϕ_{i_{1}}^{1} \underset{I : =}{\underset{︸}{\sum_{i_{2}} K_{i_{1}, i_{2}}^{(1, 2)} ϕ_{i_{2}}^{2} \dots \sum_{i_{k - 1}} K_{i_{k - 2}, i_{k - 1}}^{(k - 2, k - 1)} ϕ_{i_{k - 1}}^{k - 1} K_{i_{k - 1}, i_{k}}^{(k - 1, k)}}} α_{i_{k}, i_{1}}^{(k, 1)} . \end{matrix}

Furthermore, the term

I

can be rewritten as

\begin{matrix} I = \sum_{i_{k - 1}} K_{i_{k - 1}, i_{k}}^{(k - 1, k)} ϕ_{i_{k - 1}}^{k - 1} \sum_{i_{k - 2}} K_{i_{k - 2}, i_{k - 1}}^{(k - 2, k - 1)} ϕ_{i_{k - 2}}^{k - 2} \dots \sum_{i_{2}} K_{i_{1}, i_{2}}^{(1, 2)} ϕ_{i_{2}}^{2} K_{i_{2}, i_{3}}^{(2, 3)} \end{matrix}

By (10), we have

λ_{i_{1}, i_{3}}^{(1, 3)} = \sum_{i_{2}} K_{i_{1}, i_{2}}^{(1, 2)} ϕ_{i_{2}}^{2} K_{i_{2}, i_{3}}^{(2, 3)}, i_{1} \in [n_{1}], i_{3} \in [n_{3}] .

This implies that

\begin{array}{l} I & = \sum_{i_{k - 1}} K_{i_{k - 1}, i_{k}}^{(k - 1, k)} ϕ_{i_{k - 1}}^{k - 1} \sum_{i_{k - 2}} K_{i_{k - 2}, i_{k - 1}}^{(k - 2, k - 1)} ϕ_{i_{k - 2}}^{k - 2} \dots \sum_{i_{3}} K_{i_{3}, i_{4}}^{(3, 4)} ϕ_{i_{3}}^{3} λ_{i_{1}, i_{3}}^{(1, 3)}, \\ = \dots = \sum_{i_{k - 1}} K_{i_{k - 1}, i_{k}}^{(k - 1, k)} ϕ_{i_{k - 1}}^{k - 1} λ_{i_{1}, i_{k - 1}}^{(1, k - 1)} = {[λ^{(1, k - 1)} (ϕ^{k - 1} ⊙ K^{(k - 1, k)})]}_{i_{1}, i_{k}} = λ_{i_{1}, i_{k}}^{(1, k)} . \end{array}

Hence, the hypothesis is true for all

k > 2

. The case

k = 2

can be proven using the same procedure. □

In order to efficiently compute a maximizing sequence of the dual

{MOT}_{η}

problem (12), we set for

k \in [K]

the dual matrices

\begin{matrix} β_{k} : = α^{(k, 1)} \in R^{n_{k} \times n_{1}}, \end{matrix}

\begin{matrix} γ_{k} : = λ^{(k, 1)} \in R^{n_{1} \times n_{k}}, \end{matrix}

given in Theorems 3 and 4, respectively. The method is shown in Algorithm 3.

Algorithm 3 Sinkhorn algorithm for circle structure

1:: Input: Initialization ${(ϕ^{k})}^{(0)}, k \in [K],$ parameters $η, δ > 0$
2:: Initialize $r \leftarrow 0$
3:: for $k = K, \dots, 2$ do
4:: ${(β_{k})}^{(0)} : = \{\begin{cases} K^{(K, 1)} & if k = K, \\ K^{(k, k + 1)} ({(ϕ^{k + 1})}^{(0)} ⊙ {(β_{k + 1})}^{(0)}) & otherwise \end{cases}$
5:: end for
6:: do
7:: ${(ϕ^{1})}^{(r + 1)} : = μ^{1} ⊘ 〈K^{(1, 2)}, {({(ϕ^{2})}^{(r)} ⊙ {(β_{2})}^{(r)})}^{⊺}〉$
8:: for $k = 2, \dots, K$ do
9:: ${(γ_{k})}^{(r)} : = \{\begin{cases} K^{(1, 2)} & if k = 2, \\ {(γ_{k - 1})}^{(r)} ({(ϕ^{k - 1})}^{(r + 1)} ⊙ K^{(k - 1, k)}) & otherwise \end{cases}$
10:: Compute dual vector ${(ϕ^{k})}^{(r + 1)} : = μ^{k} ⊘ 〈{(β_{k})}^{(r)}, {({(ϕ^{1})}^{(r + 1)} ⊙ {(γ_{k})}^{(r)})}^{⊺}〉$
11:: end for
12:: for $k = K, \dots, 2$ do
13:: Compute ${(β_{k})}^{(r + 1)}$ according to step 4
14:: end for
15:: Compute $S^{(r)} : = η (\sum_{k = 1}^{K} {(μ^{k})}^{⊺} log {(ϕ^{k})}^{(r)} - {(μ^{1} ⊘ {(ϕ^{1})}^{(r + 1)})}^{⊺} {(ϕ^{1})}^{(r)})$
16:: Increment $r \leftarrow r + 1$
17:: while $| S^{(r)} - S^{(r - 1)} | \geq δ$
18:: return Optimal plan $\hat{Π} = K ⊙ \hat{Φ}$ , where $\hat{Φ} = ⨂_{k \in [K]} {(ϕ^{k})}^{(r + 1)}$

The tree-structured and circle-structured

{MOT}_{η}

problems both have a sparse cost function that considerably improves the computational complexity of the Sinkhorn algorithm. In each iteration step, Algorithm 2 requires only

2 (K - 1)

matrix–vector products, which have a complexity of

O (N^{2})

, where

N : = {∥ n ∥}_{\infty}

, and Algorithm 3 requires

2 (K - 1)

matrix–matrix products, which have a complexity of

O (N^{3})

. This can be considerably improved by employing fast Fourier techniques, as we will see in the next section.

5. Non-Uniform Discrete Fourier Transforms

The main computational cost of the Sinkhorn algorithm comes from the matrix–vector product with the kernel matrix (8). Let

k, ℓ \in [K]

and

α \in R^{n_{ℓ}}

. We briefly describe a fast summation method for the computation of

β = K^{(k, ℓ)} α

, i.e.,

β_{i_{k}} = \sum_{i_{ℓ} = 1}^{n_{ℓ}} α_{i_{ℓ}} exp (- \frac{1}{η} {∥ x_{i_{k}}^{k} - x_{i_{ℓ}}^{ℓ} ∥}_{2}^{2}), i_{k} \in [n_{k}] .

(11)

We refer to [32] for a detailed derivation and error estimates. The main idea is to approximate the kernel function

κ (x) : = exp (- \frac{1}{η} x^{2}), x \in R,

(12)

using a Fourier series. In order to ensure the fast convergence of the Fourier series, we extend

κ

to a periodic function of certain smoothness

p \in N

. Let

ε_{B} > 0

and

τ > ε_{B} + max \{∥ x_{i_{k}}^{k} - x_{i_{ℓ}}^{ℓ} ∥_{2}^{2} : i_{k} \in [n_{k}], i_{ℓ} \in [n_{ℓ}]\}

. For

x \in R

, we define the regularized kernel

κ_{R} (x) : = \{\begin{matrix} κ (x), & | x | \leq τ - ε_{B}, \\ κ_{B} (x), & τ - ε_{B} < | x | \leq τ, \\ κ_{B} (τ), & | x | > τ, \end{matrix}

(13)

where

κ_{B}

is a polynomial of degree p that fulfills the Herminte interpolation conditions

κ_{B}^{(r)} (τ - ε_{B}) = κ^{(r)} (τ - ε_{B})

for

r = 0, \dots, p - 1,

and

κ_{B}^{(r)} (τ) = 0

for

r = 1, \dots, p - 1

; see Figure 1.

Then, we define a

2 τ

-periodic function on

R^{d}

by

\tilde{κ} (x) : = κ_{R} (∥ x ∥), x \in {[- τ, τ)}^{d} .

(14)

By construction,

\tilde{κ}

is p times continuously differentiable and we have

\tilde{κ} (x_{i_{k}}^{k} - x_{i_{ℓ}}^{ℓ}) = exp (- \frac{1}{η} {∥ x_{i_{k}}^{k} - x_{i_{ℓ}}^{ℓ} ∥}_{2}^{2})

for all

i_{k} \in [n_{k}], i_{ℓ} \in [n_{ℓ}]

.

Let

M \in N

. We approximate

\tilde{κ}

using the

2 τ

-periodic Fourier expansion of degree

2 M

,

\tilde{κ} (x) \approx \sum_{m \in {- M, \dots, M - 1}^{d}} \hat{κ} (m) exp (i \frac{π}{τ} m^{⊺} x), x \in R^{d},

with the discrete Fourier coefficients

\hat{κ} (m) \in C

, which can be efficiently approximated by the fast Fourier transform (FFT)

\hat{κ} (m) : = \frac{1}{{(2 M)}^{d}} \sum_{x \in \frac{τ}{M} {- M, \dots, M - 1}^{d}} \tilde{κ} (x) exp (- i \frac{π}{τ} m^{⊤} x), m \in {- M, \dots, M - 1}^{d} .

(15)

This yields an approximation of (11) by

\begin{matrix} β_{i_{k}} = \sum_{j = 1}^{n_{ℓ}} \tilde{κ} (x_{i_{k}}^{k} - x_{i_{ℓ}}^{ℓ}) α_{i_{ℓ}} \approx \sum_{i_{ℓ} = 1}^{n_{ℓ}} \sum_{m \in {- M, \dots, M - 1}^{d}} \hat{κ} (m) exp (i \frac{π}{τ} m^{⊺} (x_{i_{k}}^{k} - x_{i_{ℓ}}^{ℓ})) α_{i_{ℓ}} \\ = \sum_{m \in {- M, \dots, M - 1}^{d}} \hat{κ} (m) (\sum_{i_{ℓ} = 1}^{n_{ℓ}} α_{i_{ℓ}} exp (- i \frac{π}{τ} m^{⊺} x_{i_{ℓ}}^{ℓ})) exp (i \frac{π}{τ} m^{⊺} x_{i_{k}}^{k}) . \end{matrix}

(16)

The non-uniform discrete Fourier transform (NDFT) of

\hat{κ} : = {[\hat{κ} (m)]}_{m \in {- M, \dots, M - 1}^{d}}

at the nodes

Ω^{k} \subset R^{d}

is defined by

{[F_{k} \hat{κ}]}_{i_{ℓ}} : = \sum_{m \in {- M, \dots, M - 1}^{d}} \hat{κ} (m) exp (- i \frac{π}{τ} m^{⊺} x_{i_{ℓ}}^{k}), i_{ℓ} \in [n_{ℓ}],

(17)

and the adjoint NDFT of

α \in R^{n_{ℓ}}

on the set

Ω^{ℓ}

is given by

{[F_{ℓ}^{*} α]}_{m} : = \sum_{i_{ℓ} = 1}^{n_{ℓ}} α_{i_{ℓ}} exp (- i \frac{π}{τ} m^{⊺} x_{i_{ℓ}}^{ℓ}), m \in {- M, \dots, M - 1}^{d},

(18)

cf. [43] (Section 7). Therefore, the approximation (16) can be written as

β = K^{(k, ℓ)} α \approx F_{k} (\hat{κ} ⊙ F_{ℓ}^{*} α) .

(19)

The procedure is summarized in Algorithm 4.

Algorithm 4 NFFT-based fast summation

1:: Input: Vector $α \in R^{n_{ℓ}}$ , kernel function $κ$ in (12), parameters $ε_{B} > 0$ , $M \in N$
2:: precomputation
3:: Compute the regularized kernel $κ_{R}$ by (13)
4:: Compute the periodized kernel $\tilde{κ}$ by (14)
5:: Compute the discrete Fourier coefficients $\hat{κ} (m)$ , $m \in {- M, \dots, M - 1}^{d}$ , by an FFT, see (15)
6:: end
7:: Compute the adjoint NDFT $F_{ℓ} α$ , see (18)
8:: Compute the pointwise product $\hat{β} : = \hat{κ} ⊙ F_{ℓ} α$
9:: Compute the NDFT $β : = F_{k} \hat{β}$ , see (17)
10:: return $β \approx K^{(k, ℓ)} α$

There are fast algorithms, known as non-uniform fast Fourier transform (NFFT), allowing for the computation of an NDFT (17) and its adjoint (18) in

O (M^{d} log M + N)

steps up to arbitrary numeric precision; see e.g., [44,45] and [43] (Section 7), where

N = {∥ n ∥}_{\infty}

. Note that the direct implementation of (16) requires

O (M^{d} N)

operations. We call the Sinkhorn algorithm where the matrix–vector multiplication is performed via (19) the NFFT-Sinkhorn algorithm. If we fix the Fourier expansion degree M, which is possible because of the smoothness of

κ

, then we end up at a numerical complexity of

O (K N)

for each iteration step of the NFFT-Sinkhorn algorithm for trees. In the case of a circle (Algorithm 3), we can apply the fast summation column by column for the matrix–matrix product with

K^{(k, k + 1)}

, yielding a complexity of

O (K N^{2})

for each iteration step.

6. Numerical Examples

We illustrate the results from Section 5 concerning the Sinkhorn algorithm and its accelerated version, the NFFT-Sinkhorn. First, we investigate the effect of parameter choices in some artificial examples. Then, we look at the one-dimensional Euler flow problem of incompressible fluids and the fixed support barycenter problem of images (the code for our examples is available at https://github.com/fatima0111/NFFT-Sinkhorn, accessed on 8 August 2022). All computations were performed on an eight-core Intel Core i7-10700 CPU with 32 GB memory. For computing the NFFT of Section 5, we rely on the implementation [46].

6.1. Uniformly Distributed Points

We consider the

{MOT}_{η}

problem for uniform measures

μ^{k}

of uniformly distributed points on

Ω = [- 1 / 2, 1 / 2]

. We chose the entropy regularization parameter

η = 0.1

, so that a boundary regularization (13) for the fast summation method is necessary. For the tree-structured cost function, we set the boundary regularization

ε_{B} = 1 / 16

, the Fourier expansion degree

M = 156,

and the smoothness parameters

p = 3

(see Section 5). In Figure 2 left, we see the linear dependence of the computational time on the number K of marginals. For a growing number N of points, the NFFT-Sinkhorn algorithm, which requires

O (K N)

steps, clearly outperforms the standard method, which requires

O (K N^{2})

steps (see Figure 2 right).

In Figure 3, we show the computation times for the circle-structured cost function, with the parameters

η = 0.1

,

ε_{B} = 3 / 32

,

M = 2000

, and

p = 3

. As the Sinkhorn iteration requires matrix–matrix products, it is more costly than for the tree-structured cost function, and so we used a lower number of points N. The advantage of the NFFT-Sinkhorn is smaller than for the tree, but still considerable. We point out here that the fast summation method is applied column by column to the matrix–matrix product, and the Fourier expansion degree M is larger.

Finally, we investigate how the approximation error between the Sinkhorn and NFFT-Sinkhorn algorithm,

| S^{(r)} - {\tilde{S}}^{(r)} |

, depends on the entropy regularization parameter

η

and the Fourier expansion degree M at a fixed iteration

r = 10

, where

S^{(r)}

denotes the evaluation with the Sinkhorn algorithm and

{\tilde{S}}^{(r)}

its evaluation with the NFFT-Sinkhorn algorithm. Since the time differences for different M in the one-dimensional case are very small, we consider the two-dimensional uniform marginal measures for the

{MOT}_{η}

problems with tree- or circle-structured cost functions. In Figure 4 and Figure 5, we see that for smaller

η

, we need a larger expansion degree M to achieve a good accuracy. The error stagnates at a certain level and does not decrease anymore for increasing M. This could be improved by increasing the approximation parameter p and the cutoff parameter of the NFFT, cf. [43] (Section 7). The parameter choice methods for the NFFT-based summation were discussed in [47]. For an appropriately chosen M, the NFFT-Sinkhorn is usually much faster than the Sinkhorn algorithm. However, for very small

η

, the kernel function (12) is concentrated over a small interval, and therefore, a simple truncation of the sum (11) might be beneficial to the NFFT approximation.

6.2. Fixed-Support Wasserstein Barycenter for General Trees

Let

T = (V, E)

be a tree with set of leaves

L

; see Section 4.1. For

k \in L

, let measures

μ^{k}

and weights

{\tilde{w}}_{k} \in [0, 1]

be given that satisfy

\sum_{k \in L} {\tilde{w}}_{k} = 1

. For any edge

e \in E

, we set

w_{e} : = \{\begin{matrix} 1 & if | e \cap L | \neq 1, \\ {\tilde{w}}_{k} & if e \cap L = k . \end{matrix}

The generalized barycenters are the minimizers

μ^{k}

,

k \in V \ L

, of

inf_{μ^{k} \in P (Ω^{k}), k \in V \ L} \sum_{e = e_{1}, e_{2} \in E} w_{e} W_{2}^{2} (μ^{e_{1}}, μ^{e_{2}}),

(20)

where

W_{2}^{2} (μ^{e_{1}}, μ^{e_{2}})

is the squared Wasserstein distance [48,49] between the measures

μ^{e_{1}}

and

μ^{e_{2}} .

The well-known Wasserstein barycenter problem [50,51] is a special case of (20), where the tree is star-shaped and the barycenter corresponds to the unique internal node. We consider the fixed-support barycenter problem [26,52], where the nodes

x^{k}

,

k \in V \ L

are also given, so that we need to optimize (20) only for

μ_{i_{k}}^{k}

,

k \in V \ L

. This yields an MOT problem with the tree-structured cost

C_{i} = \sum_{e = e_{1}, e_{2} \in E} w_{e} {∥ x_{i_{e_{1}}}^{e_{1}} - x_{i_{e_{1}}}^{e_{2}} ∥}_{2}^{2}, i \in I,

where the marginal constraints of (1) are only set for the known measures

μ^{k}

,

k \in L

(see [53]). This barycenter problem can be solved approximately using a modification of the Sinkhorn Algorithm 2, in which we replace line 12 by

{(ϕ^{k})}^{(r + 1)} = \{\begin{matrix} 1 & if k \in L, \\ μ^{k} ⊘ (γ_{k}^{(r)} ⊙ {(β^{⊙_{C_{k}}})}^{(r)}) & otherwise . \end{matrix}

We test our algorithm with a tree consisting of

K = 7

nodes; see Figure 6. The four given marginals

μ^{k}

,

k \in L

, are dithered images in

R^{2}

with uniform weights

μ_{i_{k}}^{k} = 1 / N

. As support points of the barycenters

μ^{k}

,

k \in V \ L

, we take the union over all support points

x_{i_{k}}^{k}

of all four input measures

μ^{k},

k \in L .

Furthermore, we use the barycenter weights

{\tilde{w}}^{k} = 1 / 4

. The given images and the computed barycenters are show in Figure 7, where we executed

r = 150

iterations of the Sinkhorn algorithm and its accelerated version. We chose the regularization parameter

η = 5 \cdot 10^{- 3}

and the fast summation parameters

M = 156

,

p = 3 .

6.3. Generalized Euler Flows

We consider the motion of N particles of an incompressible fluid, e.g., water, in a bounded domain

Ω

in discrete time steps

t_{k} (k - 1) / (K - 1)

,

k \in [K]

. We assume that we know the function

σ : Ω \to Ω

, which connects N initial positions

x_{i_{1}} \in Ω

of particles with their final positions

x_{i_{K}} \in Ω

. At each time step

t_{k}

, we know an image of the particle distribution, which is described by the discrete marginal measure

μ^{k}

,

k \in [K]

, with uniform weights

μ_{i_{k}}^{k} = 1 / N

. We want to find out how the single particles move, i.e., their trajectories. Due to the least-action principle, this problem can be formulated as the MOT problem (4) with the circle-structured cost

C_{i} = {∥ x_{i_{K}}^{K} - σ (x_{k_{1}}^{1}) ∥}_{2}^{2} + \sum_{k = 1}^{K - 1} {∥ x_{i_{k + 1}}^{k + 1} - x_{i_{k}}^{k} ∥}_{2}^{2}, i \in I,

see [55,56,57]. Then, the pair marginal

Π_{1, k} : = \sum_{ℓ \in [K] \ 1, k} \sum_{i_{ℓ} \in [n_{ℓ}]} Π_{i_{1}, \dots, i_{K}}

of the optimal plan

Π

provides the (discrete) probability that a particle that was initially at position

x_{i_{1}} \in Ω

is in position

x_{i_{k}} \in Ω

at time

t_{k}

,

k = 2, \dots, K - 1

. The one-dimensional problem has been studied by several authors [25,26,39], where the particles are assumed to be on a grid. Here, we consider the case where the positions are uniformly distributed. We draw

N = 400

uniformly distributed points on

Ω = [0, 1]

. We use

K = 5

marginal constraints and the entropy regularization parameter

η = 0.05

. Figure 8 and Figure 9 display the probability matrix

Π_{1, k}

describing the motion of the particles from initial time

t_{1} = 0

to time

t_{k}

, where we use

r = 50

iterations for both the NFFT-Sinkhorn and the Sinkhorn algorithms, and two different connection functions

σ

.

7. Conclusions

We have proposed the NFFT-Sinkhorn algorithm to solve the

{MOT}_{η}

problem efficiently. Assuming that the cost function of the multi-marginal optimal transport decouples according to a tree or a circle, we obtain a linear complexity in K. The complexity of the algorithm with respect to the numbers

n_{k},

k \in [K]

, of atoms of the discrete marginal measures is further improved using the non-uniform fast Fourier transform. This results in a considerable acceleration in our numerical experiments compared to the usual Sinkhorn algorithm. The tree-structured

{MOT}_{η}

problem gives a much better numerical complexity than the circle-structured

{MOT}_{η}

problem due to the fact that in the latter case, matrix–matrix products are required in Algorithm 3 instead of just the matrix–vector products of Algorithm 2.

Author Contributions

Both authors have contributed equally to this article. All authors have read and agreed to the published version of the manuscript.

Funding

We gratefully acknowledge funding by the German Federal Ministry of Education and Research BMBF 01|S20053B project SAℓE. Furthermore, we gratefully acknowledge funding by the German Research Foundation DFG (STE 571/19-1, project number 495365311).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available at https://github.com/fatima0111/NFFT-Sinkhorn, accessed on 4 August 2022.

Acknowledgments

We would like to thank Gabriele Steidl for insightful discussions about optimal transport. We also thank the anonymous reviewers for making valuable suggestions to improve the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Peyré, G.; Cuturi, M. Computational Optimal Transport: With Applications to Data Science. Found. Trends Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
Villani, C. Optimal Transport: Old and New; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
Beier, F.; von Lindheim, J.; Neumayer, S.; Steidl, G. Unbalanced multi-marginal optimal transport. arXiv 2021, arXiv:2103.10854. [Google Scholar]
Bonneel, N.; Peyré, G.; Cuturi, M. Wasserstein barycentric coordinates: Histogram regression using optimal transport. ACM Trans. Graph. 2016, 35, 71. [Google Scholar] [CrossRef]
Tartavel, G.; Peyré, G.; Gousseau, Y. Wasserstein loss for image synthesis and restoration. SIAM J. Imaging Sci. 2016, 9, 1726–1755. [Google Scholar] [CrossRef]
Thorpe, M.; Park, S.; Kolouri, S.; Rohde, G.K.; Slepčev, D. A transportation L^p distance for signal analysis. J. Math. Imaging Vis. 2017, 59, 187–210. [Google Scholar] [CrossRef] [PubMed]
Vogt, T.; Lellmann, J. Measure-valued variational models with applications to diffusion-weighted imaging. J. Math. Imaging Vis. 2018, 60, 1482–1502. [Google Scholar] [CrossRef]
Carlier, G.; Oberman, A.; Oudet, E. Numerical methods for matching for teams and Wasserstein barycenters. ESAIM M2AN 2015, 49, 1621–1642. [Google Scholar] [CrossRef]
Galichon, A. Optimal Transport Methods in Economics; Princeton University Press: Princeton, NJ, USA, 2016. [Google Scholar] [CrossRef]
Dolinsky, Y.; Soner, H.M. Martingale optimal transport and robust hedging in continuous time. Probab. Theory Relat. Fields 2014, 160, 391–427. [Google Scholar] [CrossRef]
Dolinsky, Y.; Soner, H.M. Robust hedging with proportional transaction costs. Financ. Stoch. 2014, 18, 327–347. [Google Scholar] [CrossRef] [Green Version]
Frisch, U.; Matarrese, S.; Mohayaee, R.; Sobolevski, A.N. A reconstruction of the initial conditions of the universe by optimal mass transportation. Nature 2002, 417, 260–262. [Google Scholar] [CrossRef]
Haasler, I.; Ringh, A.; Chen, Y.; Karlsson, J. Multimarginal optimal transport with a tree-structured cost and the Schrödinger bridge problem. SIAM J. Control Optim. 2021, 59, 2428–2453. [Google Scholar] [CrossRef]
Kantorovich, L. On the translocation of masses. Manag. Sci. 1958, 5, 1–4. [Google Scholar] [CrossRef]
Lin, T.; Ho, N.; Cuturi, M.; Jordan, M.I. On the complexity of approximating multimarginal optimal transport. J. Mach. Learn. Res. 2022, 23, 1–43. [Google Scholar]
Pass, B. Multi-marginal optimal transport and multi-agent matching problems: Uniqueness and structure of solutions. Discret. Contin. Dyn. Syst. 2014, 34, 1623–1639. [Google Scholar] [CrossRef]
Pass, B. Multi-marginal optimal transport: Theory and applications. ESAIM M2AN 2015, 49, 1771–1790. [Google Scholar] [CrossRef]
Benamou, J.-D.; Carlier, G.; Nenna, L. A numerical method to solve multi-marginal optimal transport problems with Coulomb cost. In Splitting Methods in Communication, Imaging, Science, and Engineering; Springer: Cham, Switzerland, 2016; pp. 577–601. [Google Scholar]
Neumayer, S.; Steidl, G. From optimal transport to discrepancy. In Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision; Chen, K., Schönlieb, C.-B., Tai, X.-C., Younces, L., Eds.; Springer: Cham, Switzerland, 2021; pp. 1–36. [Google Scholar] [CrossRef]
Terjék, D.; González-Sánchez, D. Optimal transport with f-divergence regularization and generalized Sinkhorn algorithm. arXiv 2021, arXiv:2105.14337. [Google Scholar]
Blondel, M.; Seguy, V.; Rolet, A. Smooth and sparse optimal transport. In Proceedings of Machine Learning Research, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, Playa Blanca, Spain, 9–11 April 2018; Volume 84, pp. 880–889. [Google Scholar]
Lorenz, D.A.; Manns, P.; Meyer, C. Quadratically regularized optimal transport. Appl. Math. Optim. 2021, 83, 1919–1949. [Google Scholar] [CrossRef]
Genevay, A.; Cuturi, M.; Peyré, G.; Bach, F. Stochastic optimization for large-scale optimal transport. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 3440–3448. [Google Scholar] [CrossRef]
Ammari, H.; Garnier, J.; Millien, P. Backpropagation imaging in nonlinear harmonic holography in the presence of measurement and medium noises. SIAM J. Imaging Sci. 2014, 7, 239–276. [Google Scholar] [CrossRef]
Altschuler, J.M.; Boix-Adsera, E. Polynomial-time algorithms for multimarginal optimal transport problems with structure. Math. Program. 2022, in press. [Google Scholar] [CrossRef]
Benamou, J.-D.; Carlier, G.; Cuturi, M.; Nenna, L.; Peyré, G. Iterative bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 2015, 37, A1111–A1138. [Google Scholar] [CrossRef]
Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; Adaptive Computation and Machine Learning; The MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Alaya, M.Z.; Bérar, M.; Gasso, G.; Rakotomamonjy, A. Screening Sinkhorn algorithm for regularized optimal transport. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Curran Associates Inc.: Red Hook, NY, USA, 2019. [Google Scholar] [CrossRef]
Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2013; Volume 26. [Google Scholar]
Knopp, P.; Sinkhorn, R. Concerning connegative matrices and doubly stochastic matrices. Pac. J. Math. 1967, 21, 343–348. [Google Scholar] [CrossRef]
Potts, D.; Steidl, G. Fast summation at nonequispaced knots by NFFTs. SIAM J. Sci. Comput. 2003, 24, 2013–2037. [Google Scholar] [CrossRef]
Potts, D.; Steidl, G.; Nieslony, A. Fast convolution with radial kernels at nonequispaced knots. Numer. Math. 2004, 98, 329–351. [Google Scholar] [CrossRef]
Nestler, F.; Pippig, M.; Potts, D. Fast Ewald summation based on NFFT with mixed periodicity. J. Comput. Phys. 2015, 285, 280–315. [Google Scholar] [CrossRef]
Hielscher, R.; Quellmalz, M. Optimal mollifiers for spherical deconvolution. Inverse Probl. 2015, 31, 085001. [Google Scholar] [CrossRef]
Alfke, D.; Potts, D.; Stoll, M.; Volkmer, T. NFFT meets Krylov methods: Fast matrix-vector products for the graph Laplacian of fully connected networks. Front. Appl. Math. Stat. 2018, 4, 61. [Google Scholar] [CrossRef]
Lakshmanan, R.; Pichler, A.; Potts, D. Fast Fourier transform boost for the Sinkhorn algorithm. arXiv 2022, arXiv:2201.07524. [Google Scholar]
Solomon, J.; de Goes, F.; Peyré, G.; Cuturi, M.; Butscher, A.; Nguyen, A.; Du, T.; Guibas, L. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. ACM Trans. Graph. 2015, 34, 1–11. [Google Scholar] [CrossRef]
Strössner, C.; Kressner, D. Low-rank tensor approximations for solving multi-marginal optimal transport problems. arXiv 2022, arXiv:2202.07340. [Google Scholar]
Benamou, J.-D.; Carlier, G.; Nenna, L. Generalized incompressible flows, multi-marginal transport and Sinkhorn algorithm. Numer. Math. 2019, 142, 33–54. [Google Scholar] [CrossRef] [Green Version]
Beiglböck, M.; Léonard, C.; Schachermayer, W. A general duality theorem for the Monge-Kantorovich transport problem. Studia Math. 2012, 209, 151–167. [Google Scholar] [CrossRef]
Elvander, F.; Haasler, I.; Jakobsson, A.; Karlsson, J. Multi-marginal optimal transport using partial information with applications in robust localization and sensor fusion. Signal Process. 2020, 171, 107474. [Google Scholar] [CrossRef]
Marino, S.D.; Gerolin, A. An optimal transport approach for the Schrödinger bridge problem and convergence of Sinkhorn algorithm. J. Sci. Comput. 2020, 85, 27. [Google Scholar] [CrossRef]
Plonka, G.; Potts, D.; Steidl, G.; Tasche, M. Numerical Fourier Analysis; Applied and Numerical Harmonic Analysis; Birkhäuser: Basel, Switzerland, 2018. [Google Scholar] [CrossRef]
Dutt, A.; Rokhlin, V. Fast Fourier transforms for nonequispaced data II. Appl. Comput. Harmon. Anal. 1995, 2, 85–100. [Google Scholar] [CrossRef]
Beylkin, G. On the fast Fourier transform of functions with singularities. Appl. Comput. Harmon. Anal. 1995, 2, 363–381. [Google Scholar] [CrossRef]
Keiner, J.; Kunis, S.; Potts, D. Using NFFT3—A software library for various nonequispaced fast Fourier transforms. ACM Trans. Math. Softw. 2009, 36, 19. [Google Scholar] [CrossRef]
Nestler, F. Parameter tuning for the NFFT based fast Ewald summation. Front. Phys. 2016, 4, 28. [Google Scholar] [CrossRef]
Bassetti, F.; Gualandi, S.; Veneroni, M. On the computation of Kantorovich-Wasserstein distances between 2d-histograms by uncapacitated minimum cost flows. arXiv 2018, arXiv:1804.00445. [Google Scholar]
Cuturi, M.; Doucet, A. Fast computation of wasserstein barycenters. In Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China, 21–26 June 2014; Volume 32, pp. II–685–II–693. [Google Scholar]
Rabin, J.; Peyre, G.; Delon, J.; Bernot, M. Wasserstein barycenter and its application to texture mixing. In Scale Space and Variational Methods in Computer Vision; Bruckstein, A.M., Romeny, B.M.t.H., Bronstein, A.M., Bronstein, M.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 435–446. [Google Scholar]
von Lindheim, J. Approximative algorithms for multi-marginal optimal transport and free-support Wasserstein barycenters. arXiv 2022, arXiv:2202.00954. [Google Scholar]
Takezawa, Y.; Sato, R.; Kozareva, Z.; Ravi, S.; Yamada, M. Fixed support tree-sliced Wasserstein barycenter. arXiv 2021, arXiv:2109.03431. [Google Scholar]
Agueh, M.; Carlier, G. Barycenters in the Wasserstein space. SIAM J. Math. Anal. 2011, 43, 904–924. [Google Scholar] [CrossRef]
Flamary, R.; Courty, N.; Gramfort, A.; Alaya, M.Z.; Boisbunon, A.; Chambon, S.; Chapel, L.; Corenflos, A.; Fatras, K.; Fournier, N.; et al. Pot: Python optimal transport. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
Brenier, Y. The least action principle and the related concept of generalized flows for incompressible perfect fluids. J. Amer. Math. Soc. 1989, 2, 225–255. [Google Scholar] [CrossRef]
Brenier, Y. The dual least action problem for an ideal, incompressible fluid. Arch. Ration. Mech. Anal. 1993, 122, 323–351. [Google Scholar] [CrossRef]
Brenier, Y. Minimal geodesics on groups of volume-preserving maps and generalized solutions of the euler equations. Comm. Pure Appl. Math 1997, 52, 411–452. [Google Scholar] [CrossRef]

Figure 1. Regularized kernel

κ_{R}

for

η = 1 / 4

, periodicity length

τ = 1

, boundary interval

ε_{B} = 0.2

, and smoothness

p = 1

.

Figure 1. Regularized kernel

κ_{R}

for

η = 1 / 4

, periodicity length

τ = 1

, boundary interval

ε_{B} = 0.2

, and smoothness

p = 1

.

Figure 2. Computation time in seconds of

{MOT}_{η}

with tree-structured cost function with regularization parameter

η = 0.1 .

Left: fixed

N = 10^{4} .

Right: fixed

K = 10

.

Figure 2. Computation time in seconds of

{MOT}_{η}

with tree-structured cost function with regularization parameter

η = 0.1 .

Left: fixed

N = 10^{4} .

Right: fixed

K = 10

.

Figure 3. Computation time in seconds of

{MOT}_{η}

with a circle-structured cost function with regularization parameter

η = 0.1 .

Left: fixed

N = 700 .

Right: fixed

K = 3

.

Figure 3. Computation time in seconds of

{MOT}_{η}

with a circle-structured cost function with regularization parameter

η = 0.1 .

Left: fixed

N = 700 .

Right: fixed

K = 3

.

Figure 4.

{MOT}_{η}

with tree-structured cost function, where

d = 2

,

K = 10,

N = 10, 000,

and

p = 3 .

Left: Approximation error of

S^{(10)}

between the Sinkhorn and the NFFT-Sinkhorn algorithm, depending on the number of Fourier coefficients M of the NFFT. Right: Computation time in seconds.

Figure 4.

{MOT}_{η}

with tree-structured cost function, where

d = 2

,

K = 10,

N = 10, 000,

and

p = 3 .

Left: Approximation error of

S^{(10)}

between the Sinkhorn and the NFFT-Sinkhorn algorithm, depending on the number of Fourier coefficients M of the NFFT. Right: Computation time in seconds.

Figure 5.

{MOT}_{η}

with circle-structured cost function, where

d = 2

,

K = 3,

N = 1000,

and

p = 3 .

Left: Approximation error of

S^{(10)}

between Sinkhorn and NFFT-Sinkhorn algorithm depending on the number of Fourier coefficients M. Right: Computation time in seconds.

Figure 5.

{MOT}_{η}

with circle-structured cost function, where

d = 2

,

K = 3,

N = 1000,

and

p = 3 .

Left: Approximation error of

S^{(10)}

between Sinkhorn and NFFT-Sinkhorn algorithm depending on the number of Fourier coefficients M. Right: Computation time in seconds.

Figure 6. The tree graph of the barycenter problem, with leaves

L

marked in blue.

Figure 6. The tree graph of the barycenter problem, with leaves

L

marked in blue.

Figure 7. Given measures:{

1, 4, 6, 7,

} entropy regularization parameter

η = 5 \cdot 10^{- 3}

,

r = 150

,

\tilde{w} = \frac{1}{4} (1, 1, 1, 1) .

NFFT-Sinkhorn parameters:

M = 156

,

p = 3 .

The test images are adapted with permission (MIT License) from https://github.com/PythonOT/POT, see [54]. 2016, Rémi Flamary. (a) Sinkhorn. (b) NFFT-Sinkhorn.

Figure 7. Given measures:{

1, 4, 6, 7,

} entropy regularization parameter

η = 5 \cdot 10^{- 3}

,

r = 150

,

\tilde{w} = \frac{1}{4} (1, 1, 1, 1) .

NFFT-Sinkhorn parameters:

M = 156

,

p = 3 .

The test images are adapted with permission (MIT License) from https://github.com/PythonOT/POT, see [54]. 2016, Rémi Flamary. (a) Sinkhorn. (b) NFFT-Sinkhorn.

Figure 8. Joint measures

Π_{1, k}

,

k = 2, \dots, 5

representing the movement of the particles from initial position

x \in [0, 1]

(x-axis) to position

x_{t_{k}} \in [0, 1]

(y-axis) at time

t_{k}

, where

σ (x) = 1 - x

. First row: Sinkhorn algorithm. Second row: NFFT-Sinkhorn algorithm.

Figure 8. Joint measures

Π_{1, k}

,

k = 2, \dots, 5

representing the movement of the particles from initial position

x \in [0, 1]

(x-axis) to position

x_{t_{k}} \in [0, 1]

(y-axis) at time

t_{k}

, where

σ (x) = 1 - x

. First row: Sinkhorn algorithm. Second row: NFFT-Sinkhorn algorithm.

Figure 9. Joint measures as in Figure 8, but with the function

σ (x) = min (2 x, 1 - 2 x)

.

Figure 9. Joint measures as in Figure 8, but with the function

σ (x) = min (2 x, 1 - 2 x)

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ba, F.A.; Quellmalz, M. Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms. Algorithms 2022, 15, 311. https://doi.org/10.3390/a15090311

AMA Style

Ba FA, Quellmalz M. Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms. Algorithms. 2022; 15(9):311. https://doi.org/10.3390/a15090311

Chicago/Turabian Style

Ba, Fatima Antarou, and Michael Quellmalz. 2022. "Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms" Algorithms 15, no. 9: 311. https://doi.org/10.3390/a15090311

APA Style

Ba, F. A., & Quellmalz, M. (2022). Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms. Algorithms, 15(9), 311. https://doi.org/10.3390/a15090311

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accelerating the Sinkhorn Algorithm for Sparse Multi-Marginal Optimal Transport via Fast Fourier Transforms

Abstract

1. Introduction

1.1. Our Contributions

1.2. Outline of the Paper

2. Notation

3. Multi-Marginal Optimal Transport

Entropy Regularization

4. Sparse Cost Functions

4.1. Tree Structure

4.2. Circle Structure

5. Non-Uniform Discrete Fourier Transforms

6. Numerical Examples

6.1. Uniformly Distributed Points

6.2. Fixed-Support Wasserstein Barycenter for General Trees

6.3. Generalized Euler Flows

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI