**1. Introduction**

It is by now quite common in mainstream physics and mathematics [1,2] to characterize topological properties and anomalous behaviors of higher-dimensional topological spaces via notions of (local and global) curvatures of these spaces, e.g., in general relativity, extreme variations of four dimensional space-time curvatures via geodesic incompleteness lead to characterizations of black-holes [3]. It is therefore natural to try to extend these notions from the non-network domains e.g., from continuous metric spaces or from higher-dimensional geometric objects) in a suitable way to the network science domain so that non-trivial new topological characteristics of networks can be captured. There are several ways this can be achieved; we briefly mention two other approaches before proceeding with the approach that is the main topic of this paper. Note that such extensions need to overcome at least two key challenges, namely that (i) networks are discrete (non-continuous) objects, and that (ii) networks may not necessarily have an associated natural geometric embedding.

One notion of network curvature that has been well-studied in the network theory literature, first suggested by Gromov in a non-network group theoretic context [4], is the Gromov-hyperbolic curvature. First defined for infinite continuous metric space [2], the measure was later adopted for finite graphs. Usually the measure is defined via properties of geodesic triangles or via equivalent (in a sense that can be made precise) 4-node conditions, though Gromov originally defined the measure using Gromov-product nodes in [4]. Informally any infinite metric space has a finite Gromov-hyperbolicity measure if it behaves metrically in the large scale as a negatively curved Riemannian manifold, and thus the value of this measure can be correlated to the standard scalar curvature of a hyperbolic manifold. Intuitively, for a finite network the measure is based on the properties of the set of exact and approximate geodesics of the network. There is a large body of research works dealing with theoretical and empirical aspects of this measure, e.g., see [5–10] for theoretical aspects, and see [11–13] for empirical aspects with applications to real-world networks.

A second notion of curvature is the applying Forman's discretization of Ricci curvature for (polyhedral or CW) complexes (the "Forman–Ricci curvature") [14] to networks. Informally, one applies the Forman-Ricci curvature to networks by topologically associating components (sub-graphs) of the given graphs with higher-dimensional objects. The topological association itself can be carried out several ways. Although this type of curvature originated relatively recently, there are already a number of papers investigating properties of these measures and applying them to real-world networks, e.g., see [8,15–18].

The network curvature discussed in this paper is another discretization of Ricci curvature, namely Ollivier's discretization [19–22], henceforth dubbed as the "Ollivier–Ricci curvature". Both Ollivier–Ricci curvature and Forman-Ricci curvature assign measures that assign a number to each edge of the given network, but the numbers are calculated in quite different ways in these two curvatures since they capture different metric properties of a Riemannian manifold. The reader is referred to the paper by [15] for a comparative analysis of these two measures. In addition to the network curvatures measures discussed above, researchers have also explored other notions of curvature, such as the one based on circle packings by Chow and Luo [23].

### *Basic Notations and Terminologies*

To simplify exposition, we assume in this paper that the given network (In this paper the terms "graph" and "network" will be used interchangeably.) *G* = ( *V*, *E*) is an undirected unweighted connected graph; generalization of the corresponding definitions and concepts to the case of non-negative edge weights is mostly straightforward. The following notations will be used in the rest of this paper.


### **2. Ollivier–Ricci Curvature: Motivation, Definition and Illustration**

In this section, we provide the formal definition of the Ollivier–Ricci curvature. First, we need to define the so-called Earth Mover's Distance (EMD) (also known as the *L*1 transportation distance, the *L*1 Wasserstein distance and the Monge-Kantorovich-Rubinstein distance) [24–27]. For the purpose of this paper, it suffices to define the distance in the discrete setting of a network as follows. Suppose that we have two probability distributions P1 and P2 on a subset ∅ ⊂ *V* ⊆ *V* of nodes, i.e., two real numbers 0 ≤ <sup>P</sup>1(*v*), <sup>P</sup>2(*v*) ≤ 1 for every node *v* ∈ *V* with ∑*v*∈*V* <sup>P</sup>1(*v*) = ∑*v*∈*V* <sup>P</sup>2(*v*) = 1. We can think of every number <sup>P</sup>1(*v*) as the maximum total amount of "earth" (dirt) at node *v* that can be moved to other nodes, and every number <sup>P</sup>2(*v*) as the maximum total amount of earth node *v* can store in its storage. The cost of transporting one unit of earth from node *u* to node *v* is dist*G*(*<sup>u</sup>*, *<sup>v</sup>*), and the goal is to satisfy the storage requirement of all nodes by moving earths as needed while minimizing the total transportation cost. Letting the variable *zu*,*<sup>v</sup>* ∈ [0, 1] denote the amount of shipment from node *u* to node *v* in an optimal solution, EMD for the two probability distributions P1 and P2 on *V* can be formulated as the linear programming (LP) problem shown in Figure 1 which can be solved in polynomial time. One can also think of the EMD solution as the distance between two probability distributions P1 and P2 on the set of nodes *V* based on the shortest-path metric on *G*. We will use the notation EMD(*V* , P1, <sup>P</sup>2) to denote the value of the objective function in an optimal solution of the LP in Figure 1.

**variables**: *zu*,*<sup>v</sup>* for every pair of nodes *u*, *v* ∈ *V* minimize ∑ *u*∈*V* ∑ *v*∈*V* dist(*<sup>u</sup>*, *v*) *zu*,*<sup>v</sup>* (\* minimize total transportation cost \*) subject to ∑ *v*∈*V zu*,*<sup>v</sup>* = **<sup>P</sup>**1(*u*), for each *u* ∈ *V* (\* take from *u* as much as it has \*) ∑ *u*∈*V zu*,*<sup>v</sup>* = **<sup>P</sup>**2(*v*), for each *v* ∈ *V* (\* ship to *v* as much as it needs \*) *zu*,*<sup>v</sup>* ≥ 0, for all *u*, *v* ∈ *V*

**Figure 1.** LP-formulation for EMD on the set of nodes |*V*| with |*V*|<sup>2</sup> variables. Comments are enclosed by (\* and \*). Note that the constraints *zu*,*<sup>v</sup>* ≤ 1 are unnecessary and therefore omitted.

For an intuitive understanding of the connection of EMD to Ollivier–Ricci curvature for networks, we informally recall one way of defining Ricci curvature measure for a smooth Riemannian manifold. The Ricci curvature at a point *x* in the manifold along a direction can be thought of transporting a small ball centered at *x* along that direction and measuring the "distortion" of that ball. The role of the direction is captured by the edge {*<sup>u</sup>*, *<sup>v</sup>*}, the roles of the balls at the two nodes are played by the distributions P1 and P2, and the role of the distortion due to transportation is captured by the EMD measure. More precisely, given our input graph *G* = (*<sup>V</sup>*, *E*) and an edge {*<sup>u</sup>*, *v*} ∈ *E*, the paper [20] uses the EMD measure to define the "course Ricci curvature" RIC(*<sup>u</sup>*, *v*) along the edge {*<sup>u</sup>*, *v*} in the following manner (see Figure 2 for an illustration):


$$\mathbb{P}\_{\mathsf{u}}(\mathbf{x}) \stackrel{\text{def}}{=} \mathbb{P}\_{1}(\mathbf{x}) = \begin{cases} \frac{1}{\left\lfloor \begin{smallmatrix} \mathsf{u} \end{smallmatrix} \right\rfloor \cup \mathsf{Nbr}(\mathsf{u})} & \text{if } \mathsf{x} \in \{\mathsf{u}\} \cup \mathsf{Nbr}(\mathsf{u})\\ 0, & \text{otherwise} \end{cases}$$

$$\mathbb{P}\_{\mathsf{v}}(\mathbf{x}) \stackrel{\text{def}}{=} \mathbb{P}\_{2}(\mathbf{x}) = \begin{cases} \frac{1}{\left\lfloor \begin{smallmatrix} \mathsf{v} \end{smallmatrix} \right\rfloor \cup \mathsf{Nbr}(\mathsf{v})} & \text{if } \mathsf{x} \in \{\mathsf{v}\} \cup \mathsf{Nbr}(\mathsf{v})\\ 0, & \text{otherwise} \end{cases} \tag{1}$$


$$\text{Ric}(u, v) = 1 - \frac{\text{EMD}(V\_{u, \nu \prime} \mathbb{P}\_{u \prime} \mathbb{P}\_v)}{\text{dist}\_{\mathbb{G}}(u, v)} \equiv \text{Ric}(u, v) = 1 - \text{EMD}(V\_{u, \nu \prime} \mathbb{P}\_{u \prime} \mathbb{P}\_v) \tag{2}$$

The measure can easily be extended for graphs with non-negative edge weights; redefine dist(*<sup>u</sup>*, *v*) to be minimum total weight over all possible paths between *u* and *v* and use the equation:

$$\text{Ric}(\mathfrak{u}, \mathfrak{v}) = 1 - \frac{\text{EMD}(V\_{\mathfrak{u}, \mathfrak{v}\_{\mathfrak{v}}} \mathbb{P}\_{\mathfrak{u} \prime} \mathbb{P}\_{\mathfrak{v}})}{\text{dist}\_{\mathbb{G}}(\mathfrak{u}, \mathfrak{v})}$$

Some authors also define the discrete Ricci curvature RIC(*u*) for a node *u* ∈ *V* by taking the average of the discrete Ricci curvarure over all edges incident on *u*, e.g., by letting RIC(*u*) = ∑{*<sup>u</sup>*,*<sup>v</sup>*}∈*<sup>E</sup>* RIC(*<sup>u</sup>*,*<sup>v</sup>*) .

deg(*u*)

**Figure 2.** A pictorial illustration of calculation of RIC(*<sup>u</sup>*, *<sup>v</sup>*). (**a**) The given graph *G*; (**b**) The subset of nodes *Vu*,*<sup>v</sup>*; (**c**) The distributions P*u* and P*<sup>v</sup>*. For visual clarity, only two distances dist(*q*3, *q*3) = 0 and dist(*<sup>v</sup>*, *q*3) = 1 are shown.

### *An Illustration of Computing the Value of* RIC(*<sup>u</sup>*, *v*) *For a Two-dimensional Grid*

Consider an infinite two-dimensional grid on the plane and any edge {*<sup>u</sup>*, *v*} of the grid as shown in Figure 3. Note that any node of the grid has exactly 4 neighbors, thus <sup>P</sup>*u*(*x*) = 1/5, if *x* ∈ {*u*} ∪ Nbr(*u*) 0, otherwise and <sup>P</sup>*v*(*x*) = 1/5, if *x* ∈ {*v*} ∪ Nbr(*v*) 0, otherwise . Moreover, the set of nodes Nbr(*u*) \ {*v*} and Nbr(*v*) \ {*u*} are disjoint, thus it is easy to see that EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) = 1 (see Figure 3). Using (2) we therefore ge<sup>t</sup> RIC(*<sup>u</sup>*, *v*) = 0.

**Figure 3.** A pictorial illustration of calculation of RIC(*<sup>u</sup>*, *v*) for a two-dimensional grid. The blue edges, when shifted to the left by one unit, coincide with the red edges, giving EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) ≤ 1. It can also be argued that EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) ≥ 1 (e.g., see [20] (Example 5) with *N* = 2), thus giving EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) = 1.

### **3. Exact and Approximate Computation of Ric(***<sup>u</sup>***,** *v***)**

Note that any node *x* ∈ *Vu*,*<sup>v</sup>* with either <sup>P</sup>*u*(*x*) = 0 or <sup>P</sup>*v*(*x*) = 0 can be ignored in the calculation of EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*). Thus, a straightforward calculation of RIC(*<sup>u</sup>*, *v*) requires the following two steps:


However, the calculation of EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) (and therefore RIC(*<sup>u</sup>*, *v*)) can be further simplified if we make some more observations.

Consider a pair of nodes *u* ∈ Nbr(*u*) and *v* ∈ Nbr(*v*) for an edge {*<sup>u</sup>*, *v*} ∈ *E*. Note that there are only four possible values of dist*G*(*u*, *v*): dist*G*(*u*, *v*) = 0 if *u* = *<sup>v</sup>*, dist*G*(*u*, *v*) = 1 if {*u*, *v*} ∈ *E*, dist*G*(*u*, *v*) = 2 if there is a path of length 2 between *u* and *<sup>v</sup>*, and dist*G*(*u*, *v*) = 3 for all other cases. Thus, to to find all pair-wise distances between the nodes in Nbr(*u*) and Nbr(*v*) we only need to check for paths up to length 3, which can be done faster in *<sup>O</sup>*(*n<sup>ω</sup>*) time using Seidel's algorithm [28] again.

For further discussion, consider the total variation distance (TVD) between the two distributions P*u* and P*v* on the set of nodes in *Vu*,*v*:

$$||\mathbb{P}\_{\rm u} - \mathbb{P}\_{\rm v}||\_{\rm TVD} \stackrel{\rm def}{=} \frac{1}{2} \left( \sum\_{\upsilon' \in V\_{\rm u,v}} \left( |\mathbb{P}\_{\rm u}(\upsilon') - \mathbb{P}\_{\rm v}(\upsilon')| \right) \right)$$

Note that || P*u* − P*v* ||TVD can be trivially computed in *<sup>O</sup>*(deg(*u*) + deg(*v*)) time.

**Proposition 1.** 1 − 3|| P*u* − P*v* ||TVD ≤ RIC(*<sup>u</sup>*, *v*) ≤ 1 − || P*u* − P*v* ||TVD*.*

**Proof.** Since every pair of non-identical nodes *<sup>u</sup>*, *v* ∈ *Vu*,*<sup>v</sup>* satisfy 1 ≤ dist*G*(*u*, *v*) ≤ 3, we have || P*u* − P*v* ||TVD ≤ EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) ≤ 3|| P*u* − P*v* ||TVD which imply the claimed result via definition of RIC(*<sup>u</sup>*, *<sup>v</sup>*).

The bound in Proposition 1 may not necessarily be a tight approximation for RIC(*<sup>u</sup>*, *v*); for example, for the grid in Figure 3 we ge<sup>t</sup> || P*u* − P*v* ||TVD = 3/5 giving −4/5 ≤ RIC(*<sup>u</sup>*, *v*) ≤ 2/5 as an approximation to the actual value of RIC(*<sup>u</sup>*, *v*) = 0.

For development of further bounds, consider the edge {*<sup>u</sup>*, *v*} ∈ *E*. Assume without loss of generality that deg(*u*) ≤ deg(*v*) and *G* has 4 or more nodes, thus deg(*v*) ≥ 2. Suppose that *u* and *v* have 0 ≤ ≤ deg(*u*) common neighbour nodes as shown pictorially below:

$$\mathsf{Nbr}(\mu) = \{ \overbrace{\begin{subarray}{c} k + \ell = \deg(u) - 1 \ge \ell + 1 \text{ nodes} \\ p\_1, p\_2, \dots, p\_{k'}, q\_1, q\_2, \dots, q\_\ell \end{subarray}}^{k + \ell = \deg \ell + 1 \text{ nodes}} \}$$

$$\underbrace{\left\{ \underbrace{q\_1, q\_2, \dots, q\_\ell}\_{\ell \ge 0 \text{ common}} \mid r\_1, r\_2, \dots, r\_m \right\}}\_{\mathsf{reightharpoonup}{m + \ell - \deg(v) - 1 \ge \ell + 1 \text{ nodes}}} = \mathsf{Nbr}(v)$$

Note that the two probability vectors P*u* and P*v* for the edge {*<sup>u</sup>*, *v*} are as shown below:

$$p\_1 \quad \dots \quad p\_k \qquad q\_1 \quad \dots \quad q\_\ell \qquad u \qquad r\_1 \quad \dots \quad r\_{\text{int}} \qquad v$$

$$\mathbb{P}\_\nu = \begin{pmatrix} \frac{1}{\text{deg}(u)+1} & \dots & \frac{1}{\text{deg}(u)+1} & \frac{1}{\text{deg}(u)+1} & \dots & \frac{1}{\text{deg}(u)+1} & 0 & \dots & 0 & \frac{1}{\text{deg}(u)+1} \end{pmatrix}$$

$$\mathbb{P}\_\upsilon = \quad \left( \begin{array} 0 & \dots & 0 & \frac{1}{\text{deg}(\upsilon)+1} & \dots & \frac{1}{\text{deg}(\upsilon)+1} & \frac{1}{\text{deg}(\upsilon)+1} & \frac{1}{\text{deg}(\upsilon)+1} & \dots & \frac{1}{\text{deg}(\upsilon)+1} & \frac{1}{\text{deg}(\upsilon)+1} \end{pmatrix} \right)$$

By our assumption 1 deg(*u*)+<sup>1</sup> ≥ 1 deg(*v*)+<sup>1</sup> , and thus a straightforward calculation gives the following value for || P*u* − P*v* ||TVD:

$$\begin{split} ||\mathbb{P}\_{\mathbb{H}} - \mathbb{P}\_{\mathbb{P}}||\_{\text{TVD}} &= \frac{1}{2} \times \left( \frac{k}{\text{deg}(u) + 1} + \frac{m}{\text{deg}(v) + 1} + (\ell + 2) \times \left( \frac{1}{\text{deg}(u) + 1} - \frac{1}{\text{deg}(v) + 1} \right) \right) \\ &= \frac{\frac{k + \ell}{2} + 1}{\text{deg}(u) + 1} + \frac{\frac{m - \ell}{2} - 1}{\text{deg}(v) + 1} = \frac{1}{2} + \frac{(\text{deg}(v) + 1) - 2(\ell + 2)}{2(\text{deg}(v) + 1)} = 1 - \frac{\ell + 2}{\text{deg}(v) + 1} \end{split} \tag{3}$$

**Proposition 2.** −2 + 3 +2 deg(*v*)+<sup>1</sup> ≤ RIC(*<sup>u</sup>*, *v*) ≤ +2 deg(*v*)+<sup>1</sup> *, and in particular it always holds that* −2 < RIC(*<sup>u</sup>*, *v*) ≤ 1*.*

**Proof.** Plugging the bound (3) in Proposition 1 proves the first claim. To prove the second claim, note that 0 < +2 deg(*v*)+<sup>1</sup> ≤ 1.

For further bounds, suppose that there exists a *γ* ∈ {1, 2, 3} such that for any two distinct nodes *u* ∈ Nbr(*u*) and *v* ∈ Nbr(*v*) we have dist(*u*, *v*) is exactly *γ*. In that case, it follows that

$$\operatorname{EMD}(V\_{\mathfrak{u},\mathfrak{v}},\mathbb{P}\_{\mathfrak{v}},\mathbb{P}\_{\mathfrak{v}}) = \gamma \times \left| \left| \mathbb{P}\_{\mathfrak{u}} - \mathbb{P}\_{\mathfrak{v}} \right| \right|\_{\operatorname{TVD}} \Rightarrow \operatorname{Rtr}(u,\mathfrak{v}) = 1 - \gamma \times \left| \left| \mathbb{P}\_{\mathfrak{u}} - \mathbb{P}\_{\mathfrak{v}} \right| \right|\_{\operatorname{TVD}} = 1 - \gamma + \frac{\gamma(\ell+2)}{\deg(\mathfrak{v}) + 1}$$

Now, suppose that *G* has no cycles of 5 of fewer edges containing the edge {*<sup>u</sup>*, *v*} (a tree is a trivial example of such a graph). This implies *γ* = 3 and = 0, giving the following bound.

**Proposition 3.** *If G has no cycles of* 5 *of fewer edges containing the edge* {*<sup>u</sup>*, *v*} *then* RIC(*<sup>u</sup>*, *v*) *is precisely* −2 + 6 deg(*v*)+<sup>1</sup> ≤ 0 *and can be computed in <sup>O</sup>*(deg(*u*) + deg(*v*)) *time.*

### **4. Review of Efficient Approximate Computation of Ric(***<sup>u</sup>***,** *v***) via Linear Sketching**

It is clear that a crucial bottleneck in computing RIC(*<sup>u</sup>*, *v*) for an arbitrary graph *G* = (*<sup>V</sup>*, *E*) is the computation of EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) since it seems to require solving a linear program with *<sup>O</sup>*(deg(*u*) deg(*v*)) variables and *<sup>O</sup>*(deg(*u*) deg(*v*)) constraints (note that in the worst case deg(*u*) deg(*v*) can be as large as <sup>Θ</sup>(*n*<sup>2</sup>) when *n* is the number of nodes of *G*). In this section we review a non-trivial approach for computing EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) provided we settle for a *slightly* non-optimal solution for EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*).

Linear sketching is a popular method to perform approximate computations on large data sets using dimensionality reduction [31]. The general (informal) intuition behind linear sketching is to take linear projections of the given data set and then use these projections to provide solutions to the original problem. Significant research has been done on the problem of estimating EMD using linear sketches for general metric spaces [32–36]. In this section, we discuss the results by McGregor and Stubbs [37] to approximately estimate EMD on a graph metric (i.e., metric induced by inter-node distances in a graph, as is the case for computing RIC(*<sup>u</sup>*, *v*)). Recall that our bottleneck is the computation of EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) for the given graph *G*.

The first step is to transform the problem of computing EMD(*Vu*,*v*, P*<sup>u</sup>*, <sup>P</sup>*v*) by standard techniques to the following equivalent problem which will be denoted by EMDd. Given two multi-sets A, B⊆X over a ground set X with |A| = |B| = *k*, and a metric d : X ×X → R<sup>+</sup> on X , compute the minimum-cost of perfect matching between A and B, i.e., using *<sup>π</sup>*A,B to denote a 1-1 mapping from A to B, we need to compute

$$\operatorname{EMD}\_{\mathsf{d}}(\mathcal{A}, \mathcal{B}) = \min\_{\pi\_{\mathcal{A}, \mathcal{B}}} \left\{ \sum\_{a \in \mathcal{A}} \mathsf{d}(a, \pi\_{\mathcal{A}, \mathcal{B}}(a)) \right\}.$$

For the purpose of measuring approximation quality, we say that an algorithm is an (, *δ*)-algorithm for computing a quantity of value *Q* if the value *Q* returned by the algorithm satisfies Pr[ |*Q* − *Q*| < *Q* ] ≥ 1 − *δ*.

The basic approach of McGregor and Stubbs in [37] is to define two vectors **x**, **y** ∈ R|*E*| corresponding to the set A and B. We then estimate EMDd(A, B) by posing it as a 1-regression problem using the vectors **x**, **y** and a set of other vectors defined by the structure of the underlying graph. The idea is take some random projections of these vectors to a smaller dimensional space and then perform 1-regression on these projections to save space and time. The following result by Kane et al. [38] is crucial to the analysis of this approach (the notation Pr*M*∼*<sup>ν</sup>* is the standard notation for denoting that the entries of *M* are drawn from the distribution *ν*):

() There exists a distribution ("*q*-dimensional sketch") *ν* over linear maps from R*n* → R*q* where *q* = *<sup>O</sup>*(*ε*<sup>−</sup><sup>2</sup> log *n* log *<sup>δ</sup>*−<sup>1</sup>) and a "post-processing" function *f* : R*q* → R such that for any **x** ∈ R*n* with polynomially-bounded entries, it holds that

$$\Pr\_{\mathsf{M}\sim\mathsf{V}}\left[\mid\;\parallel\;\mathbf{x}\;\parallel\_{1}\;-f(\mathsf{M}\mathbf{x})\;\mid\leq\varepsilon\;\parallel\;\mathbf{x}\;\parallel\_{1}\;\right]\geq\mathbf{1}-\delta^{\frac{1}{2}}$$

To understand how the above result relates to the calculation of EMDd(A, B), first consider the case when the given instance of EMDd(A, B) is one dimensional, i.e., let *G* = (*<sup>V</sup>*, *E*) be a path with *n* nodes *V* = {1, ... , *n*} and *n* − 1 edges *E* = {*<sup>e</sup>*1, ... ,*en*−<sup>1</sup>} where *ei* = {*i*, *i* + <sup>1</sup>}, let *A*, *B* ⊆ *V*, and let <sup>d</sup>(*<sup>i</sup>*, *j*) = dist*G*(*<sup>i</sup>*, *j*) for all *i*, *j* ∈ *V*. Then we can associate computation of EMDd(A, B) to a norm estimation problem in the following manner. Assume that we have vectors **x** = (*<sup>x</sup>*1, ... , *xn*−<sup>1</sup>) ∈ R*<sup>n</sup>*−<sup>1</sup> and **y** = (*y*1, ... , *yn*−<sup>1</sup>) ∈ R*<sup>n</sup>*−<sup>1</sup> such that for all *i* ∈ {0, 1, *n* − 1} the following assertions hold: *xi* = |{*a* ∈A|*i* ≥ *a*}| and *yi* = |{*b* ∈B|*i* ≥ *b*}|. Then, it can be shown that EMDd(A, B) = **x** − **y** 1 and thus we can use the result of Kane et al. [38] as stated in () directly.

As a second illustration of the above point, suppose that the graph *G* in the previous example is now a cycle of *n* nodes *V* = {1, ... , *n*} and *n* edges *E* = {*<sup>e</sup>*1, ... ,*en*} where *ei* = {*i*, *i* + 1} for *i* ∈ {1, ... , *n* − 1} and *en* = {*<sup>n</sup>*, <sup>1</sup>}. Suppose that we simply ignore the last edge *en* so that the graph becomes a path and we can apply the previous approach. However, this omission of *en* changes the distance between the nodes *i* ∈ A and *j* ∈ B from <sup>d</sup>(*<sup>i</sup>*, *j*) = min |*i* − *j*|, |*i* − *n*| + 1 + |1 − *j*|, |*i* − 1| + 1 + |*n* − *j*| to a new distance <sup>d</sup>(*<sup>i</sup>*, *j*) = |*i* − *j*|. To resolve this issue, we make a sequence of guesses for the number of pairs of nodes that will be joined using the edge *en*. More precisely, for *λ* ∈ {−*k*, −*k* + 1, ... , *k* − 1, *k*} let C*λ* be the multi-set consisting of *λ* copies of "1" if *λ* > 0 and |*λ*| copies of "*n*" if *λ* < 0. Then, one can show that

$$\mathrm{EMD}\_{\mathsf{d}}(\mathcal{A}, \mathcal{B}) \leq |\mathsf{A}| + \mathrm{EMD}\_{\mathsf{d}'}(\mathcal{A} \uplus \mathcal{C}\_{\lambda}, \mathcal{B} \uplus \mathcal{C}\_{-\lambda})$$

with equality for some *λ* ∈ {−*k*, −*k* + 1, ... , *k* − 1, *k*}, where denotes the union for multi-sets. Thus, we can use the result in () in the following manner. First define two vectors **x** = (*<sup>x</sup>*1, ... , *xn*) ∈ R*n* and **y** = (*y*1, ... , *yn*) ∈ R*n* where *xi* = |{*a* ∈ A| *i* ≥ *a*}| and *yi* = |{*b* ∈B| *i* ≥ *b*}| for *i* ∈ {1, ... , *n* − <sup>1</sup>}, and *xn* = *yn* = 0. Let **z** = **x** − **y** and **c** = (1, . . . , 1) ∈ R*<sup>n</sup>*. Then, it follows that

$$\text{EMD}\_{\mathbf{d}}(\mathcal{A}, \mathcal{B}) = \min\_{\lambda \in \{-k, -k+1, \dots, k-1, k\}} \left\{ \left. \parallel \mathbf{z} + \lambda \mathbf{c} \, \parallel \mathbf{z} \right\} \right\}$$

Define the function *f* : R → R as *f*(*λ*) = **z** + *λ***c** 1; clearly EMDd(A, B) = min*λ*∈{−*k*,<sup>−</sup>*k*+1,...,*k*−1,*k*} *f*(*λ*). For a specific *λ* ∈ {−*k*, −*k* + 1, ... , *k* − 1, *k*}, we can use () to find an approximation *f λ* of *fλ* using a *<sup>O</sup>*(*ε*<sup>−</sup><sup>2</sup> log *n* log(*kδ*−<sup>1</sup>))-dimensional sketch of **z** such that Pr | *f λ* − *f*(*λ*)| > *εf*(*λ*)] < *δ* 2*k*+1 . Iterating the process 2*k* + 1 times and using the union bound for probabilities, we ge<sup>t</sup>

$$\begin{aligned} \Pr\left[\forall \lambda \in \{-k, \dots, k\} : |\widetilde{f}\_{\lambda} - f(\lambda)| \le \varepsilon f(\lambda)\right] &\ge 1 - \sum\_{\lambda=-k}^{k} \Pr\left[|\, |\widetilde{f}\_{\lambda} - f(\lambda)| > \varepsilon f(\lambda)\right] \\ &> 1 - (2k+1) \times \frac{\delta}{2k+1} = 1 - \delta \end{aligned}$$

It is possible to design a more careful approach that iterates only *O*(log *k*) times instead of 2*k* + 1 times. The ideas behind this approach as described above can be extended to trees with some non-trivial effort.

Finally the approach can indeed be generalized to the case when *G* is an arbitrary graph (which applies to computing RIC(*<sup>u</sup>*, *v*)) in the following manner. The basic idea to calculate EMDd(A, B) for an arbitrary graph *G* is to reduce it in an approximate sense to that of computing EMD for a tree. Let *T* = (*<sup>V</sup>*, *ET*) be an arbitrary spanning tree of *G*, and let *F* = *E* \ *ET*. The tree *T* defines a natural tree metric d where <sup>d</sup>(*<sup>a</sup>*, *b*) is the length of the shortest path between *a* and *b* in *T* for all *a*, *b* ∈ *V*. One can then express EMDd(A, B) in terms of EMDd(A, B) for some A ⊇ A and B ⊇ B in the following manner. For *f* = (*<sup>u</sup>*, *v*) ∈ *F* and *λf* ∈ {−*k*, −*k* + 1, ... , *k* − 1, *k*}, let C *fλf* be the multi-set consisting of *λf* copies "*u*" if *λf* > 0 and |*<sup>λ</sup>f* | copies of "*v*" if *λf* < 0. Then the following bound holds:

$$\mathrm{EMD}\_{\mathsf{d}}(\mathcal{A}, \mathcal{B}) \leq \sum\_{f \in F} |\lambda\_f| + \mathrm{EMD}\_{\mathsf{d}^\mathsf{V}} \left( \mathcal{A} \uplus \sum\_{f \in F} \mathcal{C}\_{\lambda\_f \prime}^f \; \mathcal{B} \uplus \sum\_{f \in F} \mathcal{C}\_{-\lambda\_f}^f \right),$$

The above inequality leads to the following approach. Fix an arbitrary node *r* ∈ *V* as the root of the spanning tree *T*, and let P*T*(*<sup>u</sup>*, *v*) denote the set of edges in the unique path in *T* between nodes *u* and *v*. Define the two vectors **x**, **y** ∈ R|*E*| as follows (*xe* and *ye* denote the component of **x** and **y**, respectively, indexed by the edge *e* ∈ *E*):

$$\mathbf{x}\_{\varepsilon} = \begin{cases} |\{a \in \mathcal{A} \mid \varepsilon \in \mathcal{P}\_{T}(a, r)\}|, & \text{if } \varepsilon \in E\_{T} \\ 0, & \text{otherwise} \end{cases} \qquad \mathbf{y}\_{\varepsilon} = \begin{cases} |\{b \in \mathcal{B} \mid \varepsilon \in \mathcal{P}\_{T}(b, r)\}|, & \text{if } \varepsilon \in E\_{T} \\ 0, & \text{otherwise} \end{cases}$$

and let **z** = **x** − **y**. For each *f* = (*<sup>u</sup>*, *v*) ∈ *F*, define a vector **c** *f* ∈ R|*E*| where the component **c** *fe* of **c** *f* indexed by the edge *e* ∈ *E* is given by:

$$\mathbf{c}\_{\varepsilon}^{f} = \begin{cases} 1, & \text{if } \boldsymbol{e} \in \mathcal{P}\_{T}(\boldsymbol{u}, \boldsymbol{r}) \; \middle\vert \; \mathcal{P}\_{T}(\boldsymbol{v}, \boldsymbol{r}) \\ -1, & \text{if } \boldsymbol{e} \in \mathcal{P}\_{T}(\boldsymbol{v}, \boldsymbol{r}) \; \middle\vert \; \mathcal{P}\_{T}(\boldsymbol{u}, \boldsymbol{r}) \\ 1, & \text{if } \boldsymbol{e} = f \\ 0, & \text{otherwise} \end{cases}$$

This leads to the following optimization problem:

$$\operatorname{EMD}\_{\mathbf{d}}(\mathcal{A}, \mathcal{B}) = \min\_{\forall f \in F: \lambda\_f \in \{-k, -k+1, \dots, k-1, k\}} ||\mathbf{z} + \sum\_{f \in F} \lambda\_f \mathbf{c}^f||\_{1}$$

The above optimization problem can be solved using several approaches, e.g., using a recursive regression algorithm that exploits the convexity of *f* or using some recent results on robust regression via sub-space embeddings [39,40].
