Next Article in Journal
On a Weighting Technique for Multiple Cost Optimization Problems with Interval Values
Previous Article in Journal
On a Discrete Version of the Hardy–Littlewood–Polya Inequality Involving Multiple Parameters in the Whole Plane
Previous Article in Special Issue
Birth–Death Processes with Two-Type Catastrophes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Wasserstein Metric between a Discrete Probability Measure and a Continuous One

School of Mathematics, Statistics and Mechanics, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(15), 2320; https://doi.org/10.3390/math12152320
Submission received: 16 June 2024 / Revised: 19 July 2024 / Accepted: 23 July 2024 / Published: 24 July 2024
(This article belongs to the Special Issue Probability, Statistics and Random Processes)

Abstract

:
This paper examines the Wasserstein metric between the empirical probability measure of n discrete random variables and a continuous uniform measure in the d-dimensional ball, providing an asymptotic estimation of their expectations as n approaches infinity. Furthermore, we investigate this problem within a mixed process framework, where n discrete random variables are generated by the Poisson process.

1. Introduction

Article [1] investigates the Ollivier curvature of random geometric graphs, with a key step being the estimation of Wasserstein metrics between the empirical probability measure of n discrete random variables and a continuous uniform one in a d-dimensional ball. The authors applied results from [2], which are based on the interval [ 0 , 1 ] d , whereas Ollivier curvature is built in balls. To address this discrepancy, we aim to refine the proof based on balls in order to enhance the robustness and accuracy of the process described in [1] and to make it suitable for our purposes. Furthermore, since [2] requires d > 2 , we extended our upper and lower bounds estimation to include the case d = 2 , aligning our research objectives with the broader scope of the study.
Additionally, lattice methods used in statistical mechanical approaches [3] often involve similar notations and convergence from discrete physical quantities to continuous ones, suggesting potential connections with convergences from discrete probabilities to continuous ones. For instance, consider a collection of point charges denoted as Q i , i = 1 , , n and their corresponding locations represented by the independent and uniformly distributed random variables X i Ω , i = 1 , , n , where Ω represents a bounded region within three-dimensional space R 3 with its volume defined as | Ω | = 1 . Assuming an ideal scenario in implicit solvation models for biological molecules, it can be postulated that each charge possesses the value of 1 n , thereby establishing a discrete charge density expressed as μ n = 1 n i = 1 n δ X i . On the other hand, we consider a continuum charge density represented by a uniform measure μ in an ideal scenario. Thus, transitioning from discrete (i.e., point) charges to a continuum charge density can be a pathway from a discrete probability measure to a continuous one in Wasserstein metrics. Consequently, we may contemplate convergences of the corresponding discrete electrostatic energies and others in terms of Wasserstein metrics or within Wasserstein spaces.
The authors in [4] have provided estimations for the convergence rate in Wasserstein metrics of empirical measures on complex computational cases, involving numerous asymptotic calculations. Our findings are consistent with their results in corresponding scenarios. In comparison, our proof primarily relies on estimating the expectations of optimal matching problems to obtain upper bounds on the expectations of Wasserstein metrics. We chose this technique because, as mentioned in [5], (i) the definition of Wasserstein metrics makes them convenient for problems involving optimal transports, such as those arising from partial differential equations; (ii) Wasserstein metrics possess a rich duality property which is particularly useful when considering (2) (in contrast to bounded Lipschitz distances), and passing back and forth from the original to the dual definition is often technically convenient; (iii) being defined by an infimum, it is often relatively straightforward to bound Wasserstein metrics from above by constructing couplings between μ 1 and μ 2 ; and (iv) Wasserstein metrics incorporate a lot of the geometry of the space. For instance, the mapping x δ x is an isometric embedding of X into P p ( X ) (Wasserstein space of order p), but there are much deeper links. This partly explains why a Wasserstein space of P p ( X ) is often very well adapted to statements that combine weak convergence and geometry.
Motivated by the virtues of Wasserstein spaces and these inspirations, we aim to bridge the gap between discrete probabilities and their continuous counterparts using Wasserstein metrics as measures in our study. Recently, significant advancements have been made in the research progress concerning the rate of convergence of Wasserstein metrics. For instance, in [6], the authors investigated the precise rate of convergence of the quadratic Wasserstein metric between empirical measures and uniform distributions on [ 0 , 1 ] 2 by employing well-known techniques from partial differential equations. Additionally, in [7], researchers explored upper bounds for the mean Wasserstein metric between two probabilities on ( π , π ] d , where d 1 , using Fourier transformation, and subsequently applied these findings to estimate the mean Wasserstein metric between two empirical measures under certain assumptions. Furthermore, in [8], an author examined upper bounds for the mean rate within the quadratic Wasserstein metric W 2 on a d-dimensional compact Riemannian manifold where d 2 . Notably, there are also ongoing studies focusing on higher-order (p-th order) Wasserstein metrics; however, we refrain from listing them here.

2. Preliminary Estimation

Definition 1. 
Let X 1 , X 2 , , X n , Y 1 , X 2 , , Y n be independent and uniformly distributed random variables in a d-dimensional ball B ( 0 ; 1 ) = { x R d , x 1 } , d 2 , where · represents the Euclidean metric in R d . The random variable M n d : = inf σ i = 1 n X i Y σ ( i ) represents the optimal matching between X 1 , X 2 , , X n and Y 1 , Y 2 , , Y n , with σ iterating over all permutations of { 1 , 2 , , n } .
By applying the dual principle [9,10], or referring to the proof process of Lemma 1 in [2], we have
M n d = sup f L 1 i = 1 n ( f ( X i ) f ( Y i ) ) = sup f L 1 | i = 1 n ( f ( X i ) f ( Y i ) ) | ,
where the set of Lipschitz functions L 1 = { f : B ( 0 ; 1 ) R ; | f ( x ) f ( y ) | x y , x , y B ( 0 ; 1 ) , f ( 0 ) = 0 } . It is worth noting that every Lipschitz function in L 1 can be extended to one in
L = { f : R d R ; | f ( x ) f ( y ) | x y , x , y R d , f ( 0 ) = 0 , f L 1 } .
Therefore, we have L 1 = L | B ( 0 ; 1 ) . The following Lemma 1 gives an upper and lower bound estimation for the expectation E ( M n d ) .
Lemma 1 
(Optimal matching). For the above optimal matching problem, we have
1 1 d + 1 lim inf n E ( M n d ) n 1 1 d lim sup n E ( M n d ) n 1 1 d 5 + 2 d + 32 2 ,
in dimension d 3 , and
lim inf n E ( M n d ) n 1 2 2 3 , lim sup n E ( M n d ) n 1 2 log 2 n 2 4
in dimension d = 2 .
Proof. 
We provide a detailed proof, referring to Equation (A1) in Appendix A and Equation (A3) in Appendix B. The method employed here is essentially based on the work of [2], with several improvements and modifications made to extend its applicability to random variables within balls.

3. Main Results and Proofs

The following results present Wasserstein metrics between empirical and uniform measures in d-dimensional balls. Generally, a Wasserstein metric between two probability measures μ 1 , μ 2 is defined as follows:
Definition 2. 
Let μ 1 and μ 2 be Borel probability measures in a compact metric space ( X , ρ ) and let Γ ( μ 1 , μ 2 ) denote the set of all couplings of μ 1 and μ 2 , i.e.,
Γ ( μ 1 , μ 2 ) = { μ : μ i s a p r a b a b i l i t y m e a s u r e i n X × X , μ ( A , X ) = μ 1 ( A ) , μ ( X , A ) = μ 2 ( A ) , A i s a m e a s u r a b l e s u b s e t o f X . }
A Wasserstein metric is defined as
W ( μ 1 , μ 2 ) = inf μ Γ ( μ 1 , μ 2 ) X 2 ρ ( x , y ) d μ ( x , y ) .
By applying the duality principle (Kantorovich Dual Theorem) in Chapter 6, Remark 6.5 of [5], we can express the Wasserstein metric as
W ( μ 1 , μ 2 ) = sup f L 1 ( X ) X f ( x ) d μ 1 ( x ) X f ( y ) d μ 2 ( y ) ,
where L 1 ( X ) denotes the set of Lipschitz functions based on the metric of X with a coefficient of 1. From the duality formula, we can further assume that any function f L 1 ( X ) satisfying f ( 0 ) = 0 .
Notice: In all subsequent discussions, we will explicitly specify that the metric being considered is a Euclidean metric. Additionally, we will adopt the notation a n = O ( n α ) for a sequence { a n } , where α is a constant. This notation implies the existence of positive constants C such that a n C n α as n is large enough.
Theorem 1. 
Let X 1 , X 2 , , X n be independent and uniformly distributed random variables in a d-dimensional ball B ( 0 ; 1 ) . The empirical measure m n d is given by
m n d ( A ) = 1 n i = 1 n 1 A ( X i ) ,
which represents the proportion of points in the sample that lie in a measurable subset A of B ( 0 ; 1 ) . μ d demotes the uniform measure in B ( 0 ; 1 ) . As the sample size n tends to infinity, it can be shown that an expected Wasserstein metric between m n d and μ d , denoted as E [ W ( m n d , μ d ) ] , decays at a rate
E [ W ( m n d , μ d ) ] = O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .
Proof. 
Now, we consider a Wasserstein metric in B ( 0 ; 1 ) with ρ ( x , y ) = x y , and then
W ( m n d , μ d ) = inf μ Γ ( m n d , μ d ) B ( 0 ; 1 ) 2 ρ ( x , y ) d μ ( x , y ) = sup f L 1 ( B ( 0 ; 1 ) ) B ( 0 ; 1 ) f ( x ) d m n d ( x ) B ( 0 ; 1 ) f ( y ) d μ d ( y ) .
Let Y 1 , Y 2 , , Y n be independent uniformly distributed random variables in B ( 0 ; 1 ) , and then
B ( 0 ; 1 ) f ( y ) d μ d ( y ) = E [ f ( Y i ) ] , i = 1 , , n .
So
W ( m n d , μ d ) = sup f L 1 ( B ( 0 ; 1 ) ) B ( 0 ; 1 ) f ( x ) d m n d ( x ) B ( 0 ; 1 ) f ( y ) d μ d ( y ) = 1 n sup f L 1 ( B ( 0 ; 1 ) ) i = 1 ( f ( X i ) E [ f ( Y i ) ] ) = 1 n sup f L 1 ( B ( 0 ; 1 ) ) i = 1 E f ( X i ) f ( Y i ) | X i = 1 n sup f L 1 ( B ( 0 ; 1 ) ) E i = 1 ( f ( X i ) f ( Y i ) ) | X 1 n E sup f L 1 ( B ( 0 ; 1 ) ) i = 1 ( f ( X i ) f ( Y i ) ) | X = 1 n E M n d | X ,
where X = ( X 1 , , X n ) , and hence from Lemma 1, it has
E [ W ( m n d , μ d ) ] 1 n E [ M n d ] = O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .
Next, we consider an empirical measure and a uniform measure in a ball B ( 0 ; δ ) .
Corollary 1. 
In general, let X 1 , X 2 , , X n be independent and random variables uniformly distributed in the d-dimensional ball B ( 0 ; δ ) with radius δ > 0 . Consider an empirical measure and a uniform measure in B ( 0 , δ ) , where m n , δ d represents the empirical measure defined as
m n , δ d ( A ) = 1 n i = 1 n 1 A ( X i )
for a measurable subset A of B ( 0 , δ ) , and μ δ d denotes the uniform measure in B ( 0 ; δ ) . Then, it follows that
E [ W ( m n , δ d , μ δ d ) ] = δ O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .
Proof. 
Consider the map φ : B ( 0 ; δ ) B ( 0 ; 1 ) , defined by φ ( x ) = 1 δ x , where φ 1 ( t ) = δ t . Thus, m n , δ d φ 1 and μ δ d φ 1 correspond to the empirical measure and the uniform measure in B ( 0 ; 1 ) , respectively, which establishes a one-to-one relationship between empirical measures in B ( 0 ; δ ) and those in B ( 0 ; 1 ) , as well as between uniform measures in B ( 0 ; δ ) and those in B ( 0 ; 1 ) . In particular, we can write
W ( m n , δ d , μ δ d ) = inf μ Γ ( m n , δ d , μ δ d ) B ( 0 ; δ ) 2 ρ ( x , y ) d μ ( x , y ) = inf μ Γ ( m n , δ d , μ δ d ) B ( 0 ; 1 ) 2 ρ ( φ 1 ( t ) , φ 1 ( τ ) ) d μ ( φ 1 ( t ) , φ 1 ( τ ) ) = inf ν Γ ( m n , δ d φ 1 , μ δ d φ 1 ) B ( 0 ; 1 ) 2 δ ρ ( t , τ ) d ν ( t , τ ) = δ W d ( m n d , μ d ) .
Therefore, from Theorem 1 we obtain
E [ W ( m n , δ d , μ δ d ) ] = δ O ( n 1 2 log 2 n ) , d = 2 , δ O ( n 1 d ) , d 3 .
We next generalize Theorem 1 to the case where the number of random variables, denoted by N, follows a Poisson distribution with a parameter ( 1 + α n ) n and is independent of these random variables. This case actually corresponds to a specific spacial Poisson process P in [1] with intensity measure ( 1 + α n ) n V d | B ( 0 ; 1 ) | , which describes a spatial configuration of points in the ball B ( 0 ; 1 ) . Here, V d denotes the volume measure in d-dimensional Euclidean space. Moreover, in [1], it is also stated that N = | P | , representing the number of random points in B ( 0 ; 1 ) , called size, and the parameter ( 1 + α n ) n of the Poisson distribution is equivalent to ( 1 + α n ) n V d ( B ( 0 ; 1 ) ) | B ( 0 ; 1 ) | , derived from the corresponding Poisson point process. The notation 0 α n 0 represents slight perturbations of n in order to observe how a expected Wasserstein metric gradually changes as n approaches infinity.
Theorem 2. 
Let m N d denote the empirical random measure with respect to independent and uniformly distributed random variables X 1 , X 2 , , X N in B ( 0 ; 1 ) , defined as
m N d ( A ) = 1 N i = 1 N 1 A ( X i )
for a measurable subset A of B ( 0 ; 1 ) . N follows a Poisson distribution with a parameter ( 1 + α n ) n , which is independent of random variables X 1 , X 2 , , X N . Let μ d represent the uniform measure in the aforementioned ball. Then, it follows that
E [ W ( m N d , μ d ) ] = O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .
Proof. 
Since the number N follows a Poisson distribution with the mean ( 1 + α n ) n and X 1 , X 2 , , X N are uniformly distributed random variables in B ( 0 ; 1 ) , which are independent of N, we have
E [ W ( m N d , μ d ) ] = k = 1 E [ W ( m N d , μ d ) | N = k ] P ( N = k ) .
According to Theorem 1, it follows that
E [ W ( m N d , μ d ) | N = k ] = O ( k 1 2 log 2 k ) , d = 2 , O ( k 1 d ) , d 3 .
On the other hand, from Lemma 1.2 in [11], one can obtain
P ( N > ( 1 + α n ) n + c ( 1 + α n ) n log n ) e c ( 1 + α n ) n log n + ( 1 + α n ) n + c ( 1 + α n ) n log n log ( 1 + α n ) n ( 1 + α n ) n + c ( 1 + α n ) n log n e c 2 ( 1 + α n ) n log n 2 ( 1 + α n ) n + c ( 1 + α n ) n log n = O ( n c 2 3 )
where c > 0 is a constant, and
P ( N < ( 1 + α n ) n c ( 1 + α n ) n log n ) e c ( 1 + α n ) n log n + ( 1 + α n ) n c ( 1 + α n ) n log n log ( 1 + α n ) n ( 1 + α n ) n c ( 1 + α n ) n log n = O ( n c 2 3 ) .
Let us denote
a n ± = [ ( 1 + α n ) n ± c ( 1 + α n ) n log n ] .
Then, we obtain an expression for the expected value as follows:
E [ W ( m N d , μ d ) ] = k = 1 a n 1 E W ( m N d , μ d ) | N = k P ( N = k ) + k = a n a n + E W ( m N d , μ d ) | N = k P ( N = k ) + k = a n + + 1 E W ( m N d , μ d ) | N = k P ( N = k ) : = I 1 + I 2 + I 3 .
We further estimate these three terms and find that
I 1 = k = 1 a n 1 E W ( m N d , μ d ) | N = k P ( N = k ) 2 P ( N < a n ) = O ( n c 2 3 )
and
I 3 = k = a n + + 1 E W ( m N d , μ d ) | N = k P ( N = k ) 2 P ( N > a n + ) = O ( n c 2 3 ) .
For term I 2 , it is bounded as follows:
I 2 = k = a n a n + E W ( m N d , μ d ) | N = k P ( N = k ) = k = a n a n + O ( k 1 2 log 2 k ) e ( 1 + α n ) n ( ( 1 + α n ) n ) k k ! , d = 2 , k = a n a n + O ( k 1 d ) e ( 1 + α n ) n ( ( 1 + α n ) n ) k k ! , d 3 , O ( ( a n ) 1 2 log 2 ( a n + ) ) , d = 2 , O ( ( a n ) 1 d ) , d 3 , = O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .
Since c was arbitrary, with a suitable adjustment in constant c, we conclude that
E [ W ( m N d , μ d ) ] = O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .
Now, we consider a d-dimensional ball B ( x ; δ ) . The number of random variables in B ( x ; δ ) , still denoted as N, follows a Poisson distribution with a parameter ( 1 + α n ) n δ d , and N is independent of these random variables. This actually corresponds to a spatial Poisson process P with intensity measure ( 1 + α n ) n V d | B ( 0 ; 1 ) | in B ( x ; δ ) , as discussed in [1], and the parameter of Poisson distribution is equivalent to ( 1 + α n ) n V d ( B ( x ; δ ) ) | B ( 0 ; 1 ) | , derived from the corresponding Poisson point process.
Corollary 2. 
Let 0 α n 0 and x R d . We denote by m x , δ ; N d the empirical measure with respect to independent and uniformly distributed random variables X 1 , X 2 , , X N in B ( x ; δ ) , i.e.,
m x , δ ; N d ( A ) = 1 N i = 1 N 1 A ( X i ) ,
for a measurable subset A of B ( x ; δ ) . N follows a Poisson distribution with a parameter ( 1 + α n ) n δ d and is independent of random variables X 1 , X 2 , , X N . Let μ x , δ d be the uniform measure in B ( x ; δ ) . Then, we have
E [ W ( m x , δ ; N d , μ x , δ d ) ] = O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .
Proof. 
Combining the proof in Theorem 2 and Corollary 1, we first note that N follows a Poisson distribution in B ( x ; δ ) with mean value ( 1 + α n ) n δ d . Therefore, we can obtain
E [ W ( m x , δ ; N d , μ x , δ d ) ] = E [ δ W d ( m N d , μ d ) ] = δ E [ W d ( m N d , μ d ) ] = δ O ( ( n δ 2 ) 1 2 log 2 ( n δ d ) ) , d = 2 , O ( ( n δ d ) 1 d ) , d 3 . = O ( n 1 2 log 2 n ) , d = 2 , O ( n 1 d ) , d 3 .

4. Conclusions

The result in Corollary 2 can be directly applied to produce Appendix A.3 in [1]. We have successfully refined the proof based on balls, thereby enhancing the robustness and accuracy of the process described in [1]. Furthermore, our study has effectively bridged the gap between discrete probabilities and their continuous counterparts by utilizing Wasserstein metrics as approach measures. Moving forward, we aim to apply our methodology to analyze lattice problems in statistical mechanical approaches that involve similar notation and convergence from discrete physical quantities to continuous ones, such as electrostatic approach problems.
We derived the upper bound for the convergence rate of the Wasserstein distance between a uniform distribution and its empirical distribution when d 2 using the dual method. Our result is consistent with the order of convergence rate in [4], but we provide a specific constant term. Furthermore, we extended this analysis to estimate the convergence rate of random multinomial empirical distributions towards uniform distributions, yielding similar results. However, our approach does not apply to the case when d = 1 , and we did not obtain a lower bound estimation for the convergence rate. In real-world scenarios, connections between discrete and continuous worlds can be established through random graphs by extending mathematical concepts from manifolds to graphs. For instance, in [1], authors generalize the Ollivier graph curvature definition to enhance its versatility and prove that the Ollivier curvature of random geometric graphs in Riemannian manifolds converges to the Ricci curvature of the manifold. Additionally, Appendix C3 in [1] also provides methods for computing Wasserstein metrics through simulations.

Author Contributions

Conceptualization, W.Y., X.Z. and X.W.; methodology, W.Y. and X.Z.; validation, W.Y., X.W. and X.Z.; formal analysis, W.Y., X.W. and X.Z.; writing—original draft preparation, W.Y.; writing—review and editing, W.Y. and X.Z.; project administration, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by a school–enterprise cooperation project: Application of hyperbolic network model in data analysis.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. The Lower Bound of E ( M n d )

Since
M n d = inf σ i = 1 n X i Y σ ( i ) inf σ i = 1 n min j n X i Y j = i = 1 n min j n X i Y j ,
it follows that
E ( M n d | X ) i = 1 n E ( min j n X i Y j | X ) n min x B ( 0 ; 1 ) E ( min j n x Y j ) .
Let the set of points B ( x , t ) = { y R d : x y t } , and then
| B ( x , t ) B ( 0 ; 1 ) | | B ( 0 ; 1 ) | min { t d , 1 }
and
P ( min j n x Y j t ) ( 1 t d ) n , t < 1 .
Thus, we have
E ( min j n x Y j ) = 0 P ( min j n x Y j t ) d t 0 1 ( 1 t d ) n d t = t = n 1 / d u n 1 / d 0 n 1 / d ( 1 u d / n ) n d u .
Finally, by Fatou’s lemma, one has
lim inf n E ( M n d ) n 1 1 d 0 e u d 1 1 d + 1 .

Appendix B. The Upper Bound of E ( M n d )

Let r = n 1 d , so that r 0 as n , and | B ( x , r ) | = 1 n ω d , where ω d = | B ( 0 ; 1 ) | . Define
u ( i , j ) = 1 , i f X i Y j r , 0 , o t h e r w i s e .
Define
b ( x ) = | B ( x , r ) B ( 0 ; 1 ) | | B ( 0 ; 1 ) | ,
, and then we have b ( x ) 1 n < 1 if n > 1 . First decompose i n ( f ( X i ) f ( Y i ) ) , as follows:
i n ( f ( X i ) f ( Y i ) ) = i n f ( X i ) j n u ( i , j ) j n f ( Y j ) i n u ( i , j ) + i n f ( X i ) ( 1 j n u ( i , j ) ) j n f ( Y j ) ( 1 i n u ( i , j ) ) ,
so one has the inequality
| i n ( f ( X i ) f ( Y i ) ) | | i n f ( X i ) j n u ( i , j ) j n f ( Y j ) i n u ( i , j ) | + | i n f ( X i ) ( 1 j n u ( i , j ) ) j n f ( Y j ) ( 1 i n u ( i , j ) ) | = : I 1 + I 2 .
Further, we will estimate two parts of E sup f L 1 I 1 in Appendix B.1 and E sup f L 1 I 2 in Appendix B.2, respectively. Combining them yields the following bound for the expectation of M n d :
E ( M n d ) = E sup f L 1 | i n ( f ( X i ) f ( Y i ) ) | E sup f L 1 I 1 + E sup f L 1 I 2 n 1 2 + 4 n 1 2 + 4 n 1 2 + 2 4 n 1 2 log 2 n , d = 2 , n 1 1 d + 2 d n 1 1 d + 2 ( 2 + 16 2 ) n 1 1 d , d 3 , = 9 n 1 2 + 2 4 n 1 2 log 2 n , d = 2 , ( 5 + 2 d + 32 2 ) n 1 1 d , d 3 .

Appendix B.1. The Estimation of  E sup f L 1 I 1

I 1 = | i n f ( X i ) j n u ( i , j ) j n f ( Y j ) i n u ( i , j ) | = | i n j n u ( i , j ) ( f ( X i ) f ( Y j ) ) | r i n j n u ( i , j ) .
Since f is Lipschitz, we have
E sup f L 1 I 1 r i n j n E u ( i , j ) = r i n j n E E u ( i , j ) | X i = r i n j n E ( b ( X i ) ) r i n j n 1 n = n 1 1 d .
It should be noted that a more optimized estimation for I 1 can be found in [2].

Appendix B.2. The Estimation of  E sup f L 1 I 2

Decompose I 2 as follows:
I 2 = | i n f ( X i ) ( 1 j n u ( i , j ) ) j n f ( Y j ) ( 1 i n u ( i , j ) ) | i n f ( X i ) 1 n b ( X i ) j n f ( Y j ) 1 n b ( Y j ) + i n f ( X i ) n b ( X i ) j n u ( i , j ) + j n f ( Y j ) n b ( Y j ) i n u ( i , j ) = : I 21 + I 22 + I 23 .
We will estimate these three parts separately in the following smaller subsections to obtain
E sup f L 1 I 2 E sup f L 1 I 21 + E sup f L 1 I 22 + E sup f L 1 I 23 4 n 1 2 + 4 n 1 2 + 2 4 n 1 2 log 2 n , d = 2 , 2 d n 1 1 d + 2 ( 2 + 16 2 ) n 1 1 d , d 3 .

Appendix B.2.1. The Estimation of  E sup f L 1 I 21

According to f L [ B ( 0 ; 1 ) ] 1 and the value (A2) of b ( X i ) in B ( 0 ; 1 ) , we have
E sup f L 1 | i n f ( X i ) 1 n b ( X i ) | E i n | 1 n b ( X i ) | = i n E | 1 n b ( X i ) | n d r = d n 1 1 d .
Consequently, we can obtain
E sup f L 1 I 21 = E sup f L 1 | i n f ( X i ) 1 n b ( X i ) j n f ( Y j ) 1 n b ( Y j ) | E sup f L 1 | i n f ( X i ) 1 n b ( X i ) | + E sup f L 1 | j n f ( Y j ) 1 n b ( Y j ) | 2 d n 1 1 d .

Appendix B.2.2. The Estimations of  E sup f L 1 I 22 and E sup f L 1 I 23

Estimating this part is challenging, and one may employ convolution decomposition to impose f in small areas. Consequently, the following estimation holds:
E sup f L 1 I 22 = E sup f L 1 I 23 = E sup f L 1 | i n f ( X i ) n b ( X i ) j n u ( i , j ) | 2 n 1 2 + 2 3 n 1 2 log 2 n , d = 2 , ( 2 + 16 2 ) n 1 1 d , d 3 .
Initially, we assume that f represents an indicator function for a set A , where A is a measurable subset of R d , and estimate E | i n 1 A ( X i ) n b ( X i ) j n u ( i , j ) | 2 . Thus, we have
E | i n 1 A ( X i ) n b ( X i ) j n u ( i , j ) | 2 = E i , i n j , j n 1 A ( X i ) b ( X i ) u ( i , j ) 1 A ( X i ) b ( X i ) u ( i , j ) .
By considering different cases of i , i and j , j , we obtain the inequality
E | i n j n 1 A ( X i ) b ( X i ) u ( i , j ) | 2 n 2 ( n 1 ) | A | | B ( 0 ; 1 ) | 1 n 2 + n 2 | A | | B ( 0 ; 1 ) | 1 n 2 n | A | | B ( 0 ; 1 ) | .
Furthermore, let us set h ( x ) = c 0 1 A ( x ) . By using Formula (A4), we obtain
R d E | i n j n h ( X i t ) b ( X i ) u ( i , j ) | 2 d t = c 0 2 R d E | i n j n 1 A ( X i t ) b ( X i ) u ( i , j ) | 2 d t 2 n c 0 2 | A | .
Finally, we decompose f into the sum of some well-defined convolutions to estimate these components. Since a Lipschitz function f in B ( 0 ; 1 ) R d with f ( 0 ) = 0 can be extended to a Lipschitz function in the entire space R d with f L 1 , we consider the function f L defined as (1). We then decompose it as follows, f = l = 1 q + 1 f l , where f 1 = f f h 1 , f 2 = f h 1 f h 2 h 1 , ⋯, f q = f h q 1 h 1 f h q h q 1 h 1 , f q + 1 = f h q h 1 ,
h l = | B ( 0 ; 1 ) | 1 ( 2 l r ) d , x B ( 0 , 2 l r ) 0 , o t h e r w i s e ,
and q denoted by 2 q r < 1 2 q + 1 r . Therefore, we have
E sup f L 1 I 22 = E sup f L I 22 = E sup f L | i n l = 1 q + 1 f l ( X i ) j n b ( X i ) u ( i , j ) | l = 1 q + 1 E sup f L | i n f l ( X i ) j n b ( X i ) u ( i , j ) | .
For the first expectation mentioned above, we have
E sup f L | i n f 1 ( X i ) j n b ( X i ) u ( i , j ) | = E sup f L | i n ( f f h 1 ) ( X i ) j n b ( X i ) u ( i , j ) | 2 r i n E | j n b ( X i ) u ( i , j ) | 2 n r = 2 n 1 1 d ,
since f f h 1 L 2 r and E | j n b ( X i ) u ( i , j ) | 1 .
For the expectation about f l = ( f f h l ) h 1 h l 1 with 2 l q , since f f h l L 2 l r , we have
E sup f L | i n f l ( X i ) j n b ( X i ) u ( i , j ) | = E sup f L | i n ( f f h l ) h 1 h l 1 ( X i ) j n b ( X i ) u ( i , j ) | 2 l r E R d | i n j n h l 1 ( X i t ) b ( X i ) u ( i , j ) | d t ,
and on the other hand, we have
E R d | i n j n h l 1 ( X i t ) b ( X i ) u ( i , j ) | d t = R d E | i n j n h l 1 ( X i t ) b ( X i ) u ( i , j ) | d t R d E ( | i n j n h l 1 ( X i t ) b ( X i ) u ( i , j ) | 2 ) d t 1 2 × R d B ( 0 ; 1 ) 2 n 1 i n ( x i s u p p ( h l 1 ) ) ( t ) 1 | B ( 0 , 1 ) | 2 n d x d y d t 1 2 | B ( 0 ; 1 + 2 l 1 r ) | 1 2 R n E | i n j n h l 1 ( X i t ) b ( X i ) u ( i , j ) | 2 d t 1 2 .
Hence, using (A5) and combining (A7) and (A8), we have
E sup f L | i n f l ( X i ) j n b ( X i ) u ( i , j ) | 2 l r E R d | i n j n h l 1 ( X i t ) b ( X i ) u ( i , j ) | d t 2 l r | B ( 0 ; 1 + 2 l 1 r ) | 1 2 R n E | i n j n h l 1 ( X i t ) b ( X i ) u ( i , j ) | 2 d t 1 2 2 l r | B ( 0 ; 1 + 2 l 1 r ) | 1 2 2 n ( | B ( 0 ; 1 ) | 1 ( 2 l 1 r ) d ) 2 | s u p p ( h l 1 ) | 1 2 = 2 l r 2 n ( 1 + 2 l 1 r ) d 2 ( 2 l 1 r ) d 2 2 l ( 1 d 2 ) 2 d + 1 2 n 1 1 d .
For the last expectation about f q + 1 = f h 1 h q 1 h q , the above argument still works. Since f L 1 2 q + 1 r , we have
E sup f L | i n f q + 1 ( X i ) j n b ( X i ) u ( i , j ) | 2 ( q + 1 ) ( 1 d 2 ) 2 d + 1 2 n 1 1 d .
Summing up these estimations, (A6), (A9) and (A10), yields
E sup f L 1 I 22 2 n 1 1 d + l = 2 q + 1 2 l ( 1 d 2 ) 2 d + 1 2 n 1 1 d .
If d 3 , one has E sup f L 1 I 22 ( 2 + 16 2 ) n 1 1 d . If d = 2 , we may obtain 1 2 log 2 n 1 q < 1 2 log 2 n , and hence E sup f L 1 I 22 2 n 1 2 + 2 3 n 1 2 log 2 n .

References

  1. van der Hoorn, P.; Lippner, G.; Trugenberger, C.; Krioukov, D. Ollivier-Ricci curvature convergence in random geometric graphs. Phys. Rev. Res. 2021, 3, 013211. [Google Scholar] [CrossRef]
  2. Talagrand, M. Matching random samples in many dimensions. Ann. Appl. Probab. 1992, 2, 846–856. [Google Scholar] [CrossRef]
  3. Kralj-Iglič, V.; Iglič, A. A Simple Statistical Mechanical Approach to the free Energy of the Electric Double Layer Including the Excluded Volume Effect. J. Phys. II 1996, 6, 477–491. [Google Scholar] [CrossRef]
  4. Fournier, N.; Guillin, A. On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 2015, 162, 707–738. [Google Scholar] [CrossRef]
  5. Villani, C. Optimal Transport: Old and New Part 1; Grundlehren der Mathematischen Wissenschaften, 338. A Series of Comprehensive Studies in Mathematics; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  6. Ambrosio, L.; Stra, F.; Trevisan, D. A PDE approach to a 2-dimensional matching problem. Probab. Theory Relat. Fields 2019, 173, 433–477. [Google Scholar] [CrossRef]
  7. Bobkov, S.G.; Ledoux, M. A simple Fourier analytic proof of the AKT optimal matching theorem. Ann. Appl. Probab. 2021, 31, 2567–2584. [Google Scholar] [CrossRef]
  8. Borda, B. Empirical measures and random walks on compact spaces in the quadratic Wasserstein metric. Ann. Inst. Henri Poincaré Probab. Stat. 2023, 59, 2017–2035. [Google Scholar] [CrossRef]
  9. Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
  10. Papadimitriou, C.H.; Steiglitz, K. Combinatorial Optimization, Algorithms and Complexity; Prentice-Hall: Englewood Cliffs, NJ, USA, 1982. [Google Scholar]
  11. Penrose, M. Random Geometric Graphs; Oxford University Press: Oxford, UK, 2003. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, W.; Zhang, X.; Wang, X. The Wasserstein Metric between a Discrete Probability Measure and a Continuous One. Mathematics 2024, 12, 2320. https://doi.org/10.3390/math12152320

AMA Style

Yang W, Zhang X, Wang X. The Wasserstein Metric between a Discrete Probability Measure and a Continuous One. Mathematics. 2024; 12(15):2320. https://doi.org/10.3390/math12152320

Chicago/Turabian Style

Yang, Weihua, Xu Zhang, and Xia Wang. 2024. "The Wasserstein Metric between a Discrete Probability Measure and a Continuous One" Mathematics 12, no. 15: 2320. https://doi.org/10.3390/math12152320

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop