2.2.1. Wasserstein Distance

The Wasserstein distance of the probability measure on R<sup>n</sup> describes the energy required to transfer between the two distributions.

In particular, for the multivariate normal distribution, the literature [13] gives a specific expression.

**Proposition 1.** *The Wasserstein distance between P*1, *P*<sup>2</sup> ∈ N*<sup>n</sup> is*

$$D\_W^2(P\_1, P\_2) = \left\|\mu\_1 - \mu\_2\right\|^2 + \text{tr}\left(\Sigma\_1 + \Sigma\_2 - 2(\Sigma\_1 \Sigma\_2)^{\frac{1}{2}}\right),\tag{1}$$

*where* (*μ*1, Σ1) *and* (*μ*2, Σ2) *correspond to the distribution of P*<sup>1</sup> *and P*2*, respectively.*

Unfortunately, there is not a simply explicit expression of the geometric mean of the Wasserstein distance; hence, this paper temporarily replaces the geometric mean with the arithmetic mean in the simulation experiments.

#### 2.2.2. Kullback–Leibler Divergence

Kullback–Leibler (KL) divergence is a non-negative function which measures the difference between any two probability density functions. It is worth noting that KL divergence is not a distance function, since it does not satisfy the symmetry and triangle inequality. In the following, we give its definition and the expression of its geometric mean.

**Definition 4.** *Let P*1*, P*<sup>2</sup> *be two probability density functions. KL divergence is defined as*

$$\mathcal{D}\_{KL}(P\_1||P\_2) = \mathcal{E}\_{P\_1} \left[ \log \frac{P\_1}{P\_2} \right],\tag{2}$$

*and it can be shown that* D*KL*(*P*1||*P*2) ≥ 0*; the equality holds if and only if P*<sup>1</sup> = *P*2*.*

In particular, for any *P*1, *P*<sup>2</sup> ∈ N*<sup>n</sup>* with the parameters (*μ*1, Σ1) and (*μ*2, Σ2), by direct calculation, we can obtain

$$D\_{KL}(P\_1 \| P\_2) = \frac{1}{2} \left\{ \log \frac{|\Sigma\_2|}{|\Sigma\_1|} - n + \text{tr} \left( \Sigma\_2^{-1} \Sigma\_1 \right) + (\mu\_2 - \mu\_1)^T \Sigma\_2^{-1} (\mu\_2 - \mu\_1) \right\}.\tag{3}$$

<sup>2</sup> *<sup>x</sup>Tx* and *<sup>θ</sup>*<sup>1</sup> = <sup>Σ</sup>−1*μ*, *<sup>θ</sup>*<sup>2</sup> = <sup>Σ</sup>−1, we can obtain the form of exponential distribution

Under the parameter coordinate (*μ*, Σ), the expression of the geometric mean *c*(*C*) = argmin *P*∈N*<sup>n</sup>* 1 *<sup>m</sup>* <sup>∑</sup>*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *DKL*(*Pi*||*P*) is very complicated, and it is not convenient to use. In order to overcome the difficulty, we will throughout the equation change the probability density function of *P* ∈ N*<sup>n</sup>* into the form of exponential distribution. In fact, by setting *x*<sup>1</sup> = *x*,

$$P(\mathbf{x}; \mu, \Sigma) = P(\mathbf{x}\_1, \mathbf{x}\_2; \theta) = \exp\{\langle \overline{\mathbf{x}}, \theta \rangle - q(\theta)\},\tag{4}$$

where *x* = (*x*1, *x*2), *θ* = (*θ*1, *θ*2) is called the natural parameter, *x*, *θ* is the inner product of *x* and *θ*, and the function *ϕ*(*θ*) = <sup>1</sup> 2 - *θT* <sup>1</sup> *<sup>θ</sup>*−<sup>1</sup> <sup>2</sup> *θ*<sup>1</sup> − log|*θ*2| − *n* log 2*π* is called the potential function, which is a convex function.

By using the potential function *ϕ*, we can define the generalized KL divergence, namely the Bregman divergence on N*n*, as

$$B\_{\varphi}(P\_2||P\_1) := \varphi(\theta\_2) - \varphi(\theta\_1) - \langle \nabla \varphi(\theta\_1), \theta\_2 - \theta\_1 \rangle,\tag{5}$$

where *θ*1, *θ*<sup>2</sup> are two parameters of N*n*.

*<sup>x</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>1</sup>

**Remark 1.** *By means of the exponential form for the probability density functions P*1, *P*<sup>2</sup> ∈ N*n, direct calculation yields*

$$B\_{\varrho}(P\_2||P\_1) = \mathcal{D}\_{KL}(P\_1||P\_2).$$
