Next Article in Journal
Calculating Iso-Committor Surfaces as Optimal Reaction Coordinates with Milestoning
Previous Article in Journal
Cauchy Principal Value Contour Integral with Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Convergence and Law of Large Numbers for the Non-Euclidean Lp -Means

by
George Livadiotis
Space Science and Engineering, Southwest Research Institute, San Antonio, TX 78238, USA
Entropy 2017, 19(5), 217; https://doi.org/10.3390/e19050217
Submission received: 18 January 2017 / Revised: 13 April 2017 / Accepted: 9 May 2017 / Published: 11 May 2017

Abstract

:
This paper describes and proves two important theorems that compose the Law of Large Numbers for the non-Euclidean L p -means, known to be true for the Euclidean L 2 -means: Let the L p -mean estimator, which constitutes the specific functional that estimates the L p -mean of N independent and identically distributed random variables; then, (i) the expectation value of the L p -mean estimator equals the mean of the distributions of the random variables; and (ii) the limit N of the L p -mean estimator also equals the mean of the distributions.

1. Introduction: Definition of L p -Means and Their Basic Properties

In [1,2,3], a generalized characterization of means was introduced, namely, the non-Euclidean means, based on metrics induced by L p -norms, wherein the median is included as a special case for p = 1 ( L 1 ) and the ordinary Euclidean mean for p = 2 ( L 2 ) (see also: [4,5]). Let the set of y-values { y k } k = 1 W ( y k D y ), associated with the probabilities { p k } k = 1 W ; then, the non-Euclidean means μ p , based on L p -norms, are defined by
k = 1 W p k | y k μ p | p 1 s i g n ( y k μ p ) = 0 ,
where the median μ 1 and the arithmetic mean μ 2 follow as special cases when the Taxicab L 1 and Euclidean L 2 -norms are respectively considered. Both the median μ 1 and arithmetic μ 2 means can be implicitly written in the form of Equation (1) as k = 1 W p k s i g n ( y k μ 1 ) = 0 and k = 1 W p k | y k μ 2 | s i g n ( y k μ 2 ) = 0 ( μ 2 = k = 1 W p k y k ), respectively.
Note that the solution of Equation (1) is a specific case of the so-called M-estimators [6], while it is also related to the Fréchet Means [7]. The Euclidean norm L 2 is also known as “Pythagorean” norm. In [3], we preferred referring to the non-Pythagorean norms as non-Euclidean, inheriting the same characterization to Statistics. One may adopt the more explicit characterizations of “Non-Euclidean-normed” Statistics, for avoiding any confusion with the non-Euclidean metric of the (Euclidean-normed) Riemannian Geometry. As an example of an application in physics, the L p expectation value of an energy spectrum { ε k } k = 1 W is defined by representing the non-Euclidean adaptation of internal energy U p [8].
Figure 1 illustrates an example of L p -means. We use the Poisson distribution p k = e λ λ k / k ! and the dataset y k = k , for k = 1 , , W ; hence, the L p -means are implicitly given by k = 1 W λ k k ! | k μ p | p 1 s i g n ( k μ p ) = 0 (note that the constant term e λ can be ignored). The function μ p = μ p ( λ ) is examined for various values of the p-norm, either (a) super-Euclidean, p > 2 , or (b) sub-Euclidean p < 2 . The mean value for the Euclidean case, p = 2 , is μ 2 = λ , which is represented by the diagonal line in both panels. We observe that for p > 2 we always have μ p > λ , while for p < 2 there is a critical value λ * ( p ) , for which μ p > λ for λ > λ * and μ p < λ for λ < λ * . The critical value λ * ( p ) increases with p, and as p 2 , μ p λ . For λ = 1 , μ p = 1 for any p 2 , while for λ = 0 , μ p = 0 for any values of p.
The Law of Large Numbers is a theorem that guarantees the stability of long-term averages of random events, but is valid only for Euclidean metrics based on L 2 -norms. The purpose of this paper is to extend the theorem of the “Law of Large Numbers” to the non-Euclidean, L p -means. Namely, (i) the expectation value of the L p -mean estimator (that corresponds to Equation (1)) equals the mean of the distribution of each of the random variables; and (ii) the limit N of the L p -mean estimator also equals the mean of the distributions. These are numbered as Theorems 2 and 3, respectively. The paper is organized as follows: In Section 2, we prove the theorem of uniqueness of the L p -means (Theorem 1). This will be used in the proofs of Theorems 2 and 3, shown in Section 3 and Section 4, respectively. Finally, Section 5 briefly summarizes the conclusions. Several examples are used to illustrate the validity of the Theorems 1–3, that is, the Poisson distribution (discrete description) and a superposition of normal distributions (continuous description).

2. Uniqueness of L p -Means

Here, we show the theorem of uniqueness of the L p -means for any p > 1 . The theorem will be used in the Theorem 2 and 3 of the next sections.
Theorem 1.
The curve μ p ( p ) is univalued, namely, for each p > 1 , there is a unique value of the L p -mean μ p ( p ) .
Proof of Theorem 1.
Using the implicit function theorem [9], we can easily show the uniqueness in a sufficiently small neighbourhood of p = 2. Indeed, there is at least one point, that is the Euclidean point ( p = 2 , μ p = μ 2 ) , for which the function μ p ( p ) exists and is univalued. Then, the values of μ p ( p ) , p > 1 , can be approximated to any accuracy, starting from the Euclidean point. The implicit function F ( p , μ p ) = 0 , defined by Equation (1), is continuous and F ( p , μ p ) / μ p = ( p 1 ) k = 1 W p k | y k μ p | p 2 0 ; then μ p ( p ) is univalued in some domain around p = 2 . The first derivative μ p ´ ( p ) is finite p > 1 (e.g., for p = 2 we have μ 2 ´ = [ μ p ( p ) / p ] p = 2 = k = 1 W p k ( y k μ 2 ) ln ( | y k μ 2 | ) ). Indeed, the inverse derivative is non-zero for any p, i.e.,
p μ p = ( p 1 ) k = 1 W p k | y k μ p | p 2 k = 1 W p k | y k μ p | p 1 s i g n ( y k μ p ) ln ( | y k μ p | ) , p > 1 .
  ☐
The inverse function, p ( μ p ) , should be continuous and differentiable according to Equation (2). If μ p ( p ) were multi-valued, then, it should have local minima or maxima. However, the derivative d p / d μ p is non-zero. Therefore, we conclude that p ( μ p ) cannot be multi-valued, and there is a unique curve μ p ( p ) that passes through ( p = 2 , μ p = μ 2 ) .
As an example, Figure 2 plots the L p -means of the Poisson distribution shown in Figure 1, but now as a function of the p-norm, and for various values of 0 < λ < 1 . For λ < ln 2 , the function μ p ( p ) is monotonically increasing with p. On the contrary, for λ > ln 2 , the function μ p ( p ) is not monotonic, having a minimum in the region of sub-Euclidean norms, 1 < p < 2 . The separatrix between these two behaviors of μ p ( p ) is given for λ = ln 2 . We observe that the function μ p ( p ) is differentiable, μ p / p is always finite or p / μ p is always non-zero, thus μ p ( p ) is unique for any value of p.
Finally, we note that the uniqueness of μ p for a given p does not ensure monotonicity, as different values of p may lead to the same L p -mean. Such an example is the L p -means of the Poisson distribution for λ > ln 2 , shown in Figure 2. As stated and illustrated in [3], when the examined probability distribution is symmetric, then the whole set of L p -means degenerates to one single value, while when it is asymmetric, a spectrum-like range of L p -means is rather generated.

3. The Concept of L p -Expectation Values

Given the sampling { y i } i = 1 N , the L p -mean estimator μ ^ p , N = μ ^ p , N ( { y j } j = 1 N ; p ) is implicitly expressed by
i = 1 N y i μ ^ p , N ( { y j } j = 1 N ; p ) p 1 s i g n y i μ ^ p , N ( { y j } j = 1 N ; p ) = 0 .
Then, the L p expectation value of μ ^ p , N ( { y j } j = 1 N ; p ) , namely μ ^ p , N p E ^ p [ μ ^ p , N ( { y j } j = 1 N ; p ) ] , is implicitly given by
{ y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ ^ p , N p p 1 s i g n μ ^ p , N ( { y j } j = 1 N ; p ) μ ^ p , N p × P ( { y j } j = 1 N ) d y 1 d y N = 0 ,
where P ( { y j } j = 1 N ) is the normalized joint probability density, so that
{ y j D y } j = 1 N P ( { y j } j = 1 N ) d y 1 d y N = 1 .
Definition 1.
Let the sampling { y i } i = 1 N , y i D y , i = 1 , , N , of the set of random variables { Y i } i = 1 N . This set is called symmetrically distributed if the joint distribution density has the property P ( { y j } j = 1 N ) = P ( y 1 . . . y k . . . y i . . . y N ) = P ( y 1 . . . y i . . . y k . . . y N ) , i , k ( i ) = 1 , , N . This property is formally called Exchange-ability [10] and will be used in Lemmas 1 and 2.
Next, we postulate and prove Lemmas a and 2, which are necessary for the following Theorem 2 about the expectation value of the L p -mean estimator.
Lemma 1.
The symmetrically distributed random variables { Y i } i = 1 N are characterized by the same L p expectation value, namely, Y i p = E ^ p ( Y i ) = μ p , i = 1 , , N , which is implicitly given by
{ y j D y } j = 1 N y i μ p p 1 s i g n y i μ p P ( { y j } j = 1 N ) d y 1 d y N = y i D y y i μ p p 1 s i g n y i μ p P y ( y i ) d y i = 0 ,
where P y i ( u ) P y ( u ) , i = 1 , , N , is the marginal distribution density, that is identical for all the random variables { Y i } i = 1 N .
Proof of Lemma 1.
The y i -marginal probability density, P y i ( y i ) , is
P y i ( y i ) = { y j D y } j = 1 , j i N P ( { y j } j = 1 N ) d y 1 d y i 1 d y i + 1 d y N ,
so that
y i D y P y i ( y i ) d y i = 1 .
Given the symmetrical joint distribution, we have
P y i ( y i ) = { y j D y } j = 1 , j i N P ( y 1 . . . y i . . . y k . . . y N ) d y 1 . . . d y i 1 d y i + 1 . . . d y N = { y j D y } j = 1 , j i N P ( y 1 . . . y k . . . y i . . . y N ) d y 1 . . . d y k 1 d y k + 1 . . . d y N y k = y i = { y j D y } j = 1 , j i N P ( y 1 . . . y i . . . y k . . . y N ) d y 1 . . . d y k 1 d y k + 1 . . . d y N y k = y i = P y k ( y i ) i , k ( i ) = 1 , , N .
Hence, the expression of the marginal distribution density P y i ( u ) is identical i = 1 , , N , i.e., for all the random variables, P y i ( u ) P y ( u ) .
Then, we readily derive that the random variables { Y i } i = 1 N are characterized by the same L p expectation value, namely, Y i p = E ^ p ( Y i ) = μ p , i = 1 , , N , which is implicitly expressed by
{ y j D y } j = 1 N y i μ p p 1 s i g n y i μ p P ( { y j } j = 1 N ) d y 1 d y N = y i D y y i μ p p 1 s i g n y i μ p P y ( y i ) d y i = 0 .
Indeed, if we had Y i p = μ pi , i = 1 , , N , then
y i D y y i μ pi p 1 s i g n y i μ pi P y ( y i ) d y i = 0 ,
and for k i ,
y k D y y k μ p k p 1 s i g n y k μ p k P y ( y k ) d y k = 0 , y i D y y i μ p k p 1 s i g n y i μ p k P y ( y i ) d y i = 0 .
However, given the uniqueness of the L p -means, Equations (11) and (12) lead to μ pi = μ pk i , k ( i ) = 1 , , N , or μ pi = μ p , i = 1 , , N . ☐
Lemma 2.
Let the auxiliary functionals { G i } i = 1 N , with G i = G i ( { y j } j = 1 N ; p ) y i μ ^ p , N ( { y j } j = 1 N ; p ) , i = 1 , , N . Then, their L p expectation values are zero, namely, G i p = E ^ p ( G i ) = 0 , i = 1 , , N .
Proof of Lemma 2.
The L p expectation value of G i p is implicitly given by
{ y j D y } j = 1 N y i μ ^ p , N ( { y j } j = 1 N ; p ) G i p p 1 s i g n y i μ ^ p , N ( { y j } j = 1 N ; p ) G i p × P ( { y j } j = 1 N ) d y 1 d y N = 0 .
If G i p = 0 , then
{ y j D y } j = 1 N y i μ ^ p , N ( { y j } j = 1 N ; p ) p 1 s i g n y i μ ^ p , N ( { y j } j = 1 N ; p ) × P ( { y j } j = 1 N ) d y 1 d y N = 0 ,
while if G i p 0 , then the above functional has to be non-zero, because of the uniqueness of L p expectation values, namely,
{ y j D y } j = 1 N y i μ ^ p , N ( { y j } j = 1 N ; p ) p 1 s i g n y i μ ^ p , N ( { y j } j = 1 N ; p ) × P ( { y j } j = 1 N ) d y 1 d y N = C i ( p , N ) 0 ,
Now, rewriting Equation (15) for an index k ( i ) , we have
{ y j D y } j = 1 N y k μ ^ p , N ( { y j } j = 1 N ; p ) p 1 s i g n y k μ ^ p , N ( { y j } j = 1 N ; p ) × P ( { y j } j = 1 N ) d y 1 d y N = C k ( p , N ) = { y j D y } j = 1 N y k μ ^ p , N ( y 1 . . . y i . . . y k . . . y N ; p ) p 1 s i g n y i μ ^ p , N ( y 1 . . . y i . . . y k . . . y N ; p ) × P ( y 1 . . . y i . . . y k . . . y N ) d y 1 . . . d y i . . . d y k . . . d y N = { y j D y } j = 1 N y i μ ^ p , N ( y 1 . . . y k . . . y i . . . y N ; p ) p 1 s i g n y i μ ^ p , N ( y 1 . . . y i . . . y k . . . y N ; p ) × P ( y 1 . . . y k . . . y i . . . y N ) d y 1 . . . d y k . . . d y i . . . d y N = { y j D y } j = 1 N y i μ ^ p , N ( y 1 . . . y i . . . y k . . . y N ; p ) p 1 s i g n y i μ ^ p , N ( y 1 . . . y k . . . y i . . . y N ; p ) × P ( y 1 . . . y i . . . y k . . . y N ) d y 1 . . . d y i . . . d y k . . . d y N = C i ( p , N ) ,
because of the symmetrical distribution of random variables { y j } j = 1 N , i.e., P ( y 1 . . . y k . . . y i . . . y N ) = P ( y 1 . . . y i . . . y k . . . y N ) , i , k ( i ) = 1 , , N , (the same symmetry holds also for the estimator μ ^ p , N , while the integration on each y i spans the same integral D y . Hence, C i ( p , N ) = C k ( p , N ) C ( p , N ) . Then, by summing both sides of Equation (15) with i = 1 N , we conclude in
{ y j D y } j = 1 N i = 1 N y i μ ^ p , N ( { y j } j = 1 N ; p ) p 1 s i g n y i μ ^ p , N ( { y j } j = 1 N ; p ) × P ( { y j } j = 1 N ) d y 1 d y N = 0 = i = 1 N C i ( p , N ) = N C ( p , N ) ,
or C ( p , N ) = 0 . Thus, Equation (14) holds, and given the uniqueness of L p expectation values, we conclude in G i p = 0 , i = 1 , , N . ☐
Theorem 2.
Consider the sampling { y i } i = 1 N , y i D y , i = 1 , , N , of the symmetrically distributed random variables { Y i } i = 1 N . According to Lemma 1, the random variables are characterized by the same L p expectation value (assuming that this exists), namely, Y i p = E ^ p ( Y i ) = μ p , i = 1 , , N , which is implicitly expressed by Equation (6). Then, the L p expectation value of the L p -mean estimator μ ^ p , N ( { y j } j = 1 N ; p ) is equal to μ p , i.e., μ ^ p , N p = E ^ p [ μ ^ p , N ( { y j } j = 1 N ; p ) ] = μ p or
{ y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ p p 1 s i g n μ ^ p , N ( { y j } j = 1 N ; p ) μ p × P ( { y j } j = 1 N ) d y 1 d y N = 0 .
Proof of Theorem 2.
(For useful inequalities, see [11]) Apparently, the following integral inequalities hold:
0 = | { y j D y } j = 1 N y i μ ^ p , N ( { y j } j = 1 N ; p ) p 1 s i g n y i μ ^ p , N ( { y j } j = 1 N ; p ) × P ( { y j } j = 1 N ) d y 1 d y N | { y j D y } j = 1 N y i μ ^ p , N ( { y j } j = 1 N ; p ) p 1 P ( { y j } j = 1 N ) d y 1 d y N ,
and, i = 1 , , N ,
0 = | y i D y y i μ p p 1 s i g n ( y i μ p ) P y ( y i ) d y i | y i D y y i μ p p 1 P y ( y i ) d y i .
Furthermore, we consider the L p expectation value of the functional g ( { y j } j = 1 N ; p ) μ ^ p , N ( { y j } j = 1 N ; p ) μ p , namely, g p = E ^ p ( g ( { y j } j = 1 N ; p ) ) , which is implicitly given by
{ y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ p g p p 1 × s i g n μ ^ p , N ( { y j } j = 1 N ; p ) μ p g p P ( { y j } j = 1 N ) d y 1 d y N = 0 .
If g p = 0 , then,
{ y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ p p 1 s i g n μ ^ p , N ( { y j } j = 1 N ; p ) μ p × P ( { y j } j = 1 N ) d y 1 d y N = 0 ,
while if g p 0 , then the above functional has to be non-zero, because of the uniqueness of L p expectation values, namely,
{ y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ p p 1 s i g n μ ^ p , N ( { y j } j = 1 N ; p ) μ p × P ( { y j } j = 1 N ) d y 1 d y N = D ( p , N ) 0 ,
or
| D ( p , N ) | = | { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ p p 1 s i g n μ ^ p , N ( { y j } j = 1 N ; p ) μ p × P ( { y j } j = 1 N ) d y 1 d y N | { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ p p 1 P ( { y j } j = 1 N ) d y 1 d y N .
 ☐
First case, p 2 : Hence, p 1 1 , and from the power inequality ( | f | + | g | ) s | f | s + | g | s holding s 1 , we have the following:
The triangle inequality gives | μ ^ p , N ( { y j } j = 1 N ; p ) μ p | | μ ^ p , N ( { y j } j = 1 N ; p ) y i | + | y i μ p | . Then, applying the above power inequality for s = p 1 , f = μ ^ p , N ( { y j } j = 1 N ; p ) y i , and g = y i μ p , we have | μ ^ p , N ( { y j } j = 1 N ; p ) μ p | p 1 ( | μ ^ p , N ( { y j } j = 1 N ; p ) y i | + | y i μ p | ) p 1 | μ ^ p , N ( { y j } j = 1 N ; p ) y i | p 1 + | y i μ p | p 1 . Thereafter, Equation (23) becomes
| D ( p , N ) | { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) y i p 1 P ( { y j } j = 1 N ) d y 1 d y N + { y j D y } j = 1 N y i μ p p 1 P ( { y j } j = 1 N ) d y 1 d y N , | D ( p , N ) | { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) y i p 1 P ( { y j } j = 1 N ) d y 1 d y N + y i D y y i μ p p 1 P y ( y i ) d y i ,
or
1 2 | D ( p , N ) | M a x { { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) y i p 1 P ( { y j } j = 1 N ) d y 1 d y N , y i D y y i μ p p 1 P y ( y i ) d y i } .
Second case, p 2 : Hence, p 1 1 , and applying the Minkowski inequality on Equation (23), we have
| D ( p , N ) | 1 p 1 { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) μ p p 1 P ( { y j } j = 1 N ) d y 1 d y N 1 p 1 { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) y i p 1 P ( { y j } j = 1 N ) d y 1 d y N 1 p 1 + y i D y y i μ p p 1 P y ( y i ) 1 p 1 d y i .
or
1 2 p 1 | D ( p , N ) | M a x { { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) y i p 1 P ( { y j } j = 1 N ) d y 1 d y N , y i D y y i μ p p 1 P y ( y i ) d y i } .
Combining Equations (25) and (27), we conclude in an inequality that holds p 1 ,
0 M i n 1 2 | D ( p , N ) | p 2 , 1 2 p 1 | D ( p , N ) | p 2 D ˜ ( p , N ) M a x { { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) y i p 1 P ( { y j } j = 1 N ) d y 1 d y N , y i D y y i μ p p 1 P y ( y i ) d y i } .
On the other hand, Equations (18) and (19) imply that
0 M a x { { y j D y } j = 1 N μ ^ p , N ( { y j } j = 1 N ; p ) y i p 1 P ( { y j } j = 1 N ) d y 1 d y N , y i D y y i μ p p 1 P y ( y i ) d y i } .
We construct the auxiliary random variables { X i } i = 1 N , defined by X i = f x ( Y i ) Y i · D ˜ ( p , N ) 1 p 1 , having values { x i = f x ( y i ) } i = 1 N in the domain x i D x { f x ( y M i n ) x f x ( y M a x ) } , i = 1 , , N (where y M i n D y , M i n D y is the infimum of D y , while y M a x D y , M a x D y is the supremum of D y ). The L p -mean estimator of the set { x i } i = 1 N is given by the functional μ p , N X = μ p , N X ( { x j } j = 1 N ; p ) = μ ^ p , N ( { y j = x j D ˜ ( p , N ) 1 p 1 } j = 1 N ; p ) D ˜ ( p , N ) 1 p 1 , while the random variables { X i } i = 1 N have the common L p expectation value X i p E ^ p ( X i ) = μ p X = μ p D ˜ ( p , N ) 1 p 1 , i = 1 , , N . The respective joint probability density is given by P X ( { x j } j = 1 N ) = P [ { y j = x j D ˜ ( p , N ) 1 p 1 } j = 1 N ] · D ˜ ( p , N ) N p 1 , so that, P X ( { x j } j = 1 N ) d x 1 d x N = P ( { y j } j = 1 N ) d y 1 d y N . Then, Equations (28) and (29) become
1 M a x { { x j D x } j = 1 N μ p , N X ( { x j } j = 1 N ; p ) x i p 1 P X ( { x j } j = 1 N ) d x 1 d x N , x i D x x i μ p X p 1 P x X ( x i ) d x i } ,
and
0 M a x { { x j D x } j = 1 N μ p , N X ( { x j } j = 1 N ; p ) x i p 1 P X ( { x j } j = 1 N ) d x 1 d x N , x i D x x i μ p X p 1 P x X ( x i ) d x i } ,
respectively (where P x i X ( u ) = P x X ( u ) , i = 1 , , N is the identical marginal distribution density for all the random variables { X i } i = 1 N ).
Moreover, we define the nonnegative quantities p X , determined by the integral operator I ^ p X , given by
I ^ p X M a x { { x j D x } j = 1 N μ p , N X ( { x j } j = 1 N ; p ) x i p 1 1 ^ d x 1 d x N , x i D x x i μ p X p 1 1 ^ d x i } ,
which acts on the probability densities P X ( { x j } j = 1 N ) so that
p X = I ^ p X [ P X ( { x j } j = 1 N ) ] .
Now consider the subset M of all the possible values of p X . Equation (30) yields 1 p X , so that the infimum I ( M ) = 1 (Note that, in this case, the infimum is element of M , obtained for p = 1 ). On the other hand, Equation (31) yields 0 p X , which reads that the nonnegative quantities p X can be arbitrary small, even zero, so that the infimum is now given by I ( M ) = 0 .
However, the infimum is unique. The false by contradiction comes from the statement D ( p , N ) 0 . Hence, D ( p , N ) = 0 , and Equation (20) yields g p = 0 , i.e., μ ^ p , N ( { y j } j = 1 N ; p ) μ p p = 0 , or μ ^ p , N ( { y j } j = 1 N ; p ) p = μ p = y i p , i = 1 , , N .

4. Limit of the L p -Mean Estimator

The following theorem derives the limit of L p -mean estimator μ ^ p , N ( { y j } j = 1 N ; p ) .
Theorem 3.
Let the sampling { y i } i = 1 N , y i D y , i = 1 , , N , of the independent and identically distributed random variables { Y i } i = 1 N . The L p -mean estimator μ ^ p , N ( { y j } j = 1 N ; p ) converges to its L p expectation value, μ ^ p , N p = E ^ p [ μ ^ p , N ( { y j } j = 1 N ; p ) ] μ p , as N , namely, lim N μ ^ p , N = μ ^ p , N p μ p .
Notes:
  • Obviously, the independent and identically distributed random variables are also symmetrically distributed. Thus, according to Theorem 2, we have μ ^ p , N p = E ^ p [ μ ^ p , N ( { y j } j = 1 N ; p ) ] = Y i p = E ^ p ( Y i ) = μ p .
  • The L p expectation value Y i p = μ p should be calculated given the marginal distribution density P y ( y i ) , i = 1 , , N . However, the expression of this distribution is usually unknown, and thus, we estimate μ p by means of lim N μ ^ p , N .
Proof of Theorem 3.
We construct the set of auxiliary random variables { X i } i = 1 N , defined by X i = f x ( Y i ) | Y i μ p | p 1 s i g n ( Y i μ p ) , and having the relevant sampling values { x i = f x ( y i ) } i = 1 N , with domain x i D x { | f x ( y M i n ) | x | f x ( y M a x ) | } , i = 1 , , N . Apparently, { X i } i = 1 N are also independent and identically distributed random variables, and let P x i X ( u ) = P x X ( u ) , i = 1 , , N be the identical marginal distribution density for all the random variables { X i } i = 1 N . ☐
Then, the Euclidean expectation value of each of the random variables is
x i 2 = x i D x x i P x ( x i ) d x i = y i D y y i μ p p 1 s i g n ( y i μ p ) P y ( y i ) d y i = 0 ,
i = 1 , , N . Thereafter, from the “Law of Large Numbers” we have that i = 1 N ( x i x i 2 ) converges to zero, as N [12,13,14]. Thus,
i = 1 y i μ p p 1 s i g n ( y i μ p ) = 0 .
On the other hand, Equation (3), for N (assuming the convergence of the sum at this limit), is written as
i = 1 y i lim N μ ^ p , N p 1 s i g n y i lim N μ ^ p , N = 0 ,
while, given the uniqueness of μ p , we conclude in lim N μ ^ p , N = μ p .
As an application, we examine the probability distribution P ( x ) , constructed by the superposition of two different normal distributions N 1 ( y ; μ = 1 , σ = 1 ) and N 2 ( y ; μ = 1 + δ a , σ = 2 ) with δ a > 0 ; namely, P ( y ; δ a ; λ ) = [ N 1 ( y ) + λ N 2 ( y ; δ a ) ] / ( 1 + λ ) . For δ a = 0 , the constructed probability distribution is symmetric, and as explained in Section 3 and [3], the whole set of L p -means degenerates to one single value—in our case μ p = 1 , p > 1 . First, we derive the L p -mean of the distribution, y p = μ p , where + y μ p p 1 s i g n ( y μ p ) P ( y ) d y = 0 . Then, we compute the L p -mean estimator μ ^ p , N ( { y j } j = 1 N ; p ) , where the y-values, { y j } j = 1 N , follow the probability distribution constructed above, P ( y ; δ a ; λ ) . The Ny-values are generated as follows: we derive the cumulative distribution of P , that is F ( y ; δ a ; λ ) = 1 2 + erf y 1 2 + λ · erf y 1 δ a 2 2 / [ 2 ( 1 + λ ) ] , and then we set the parametrization F ( y i ; δ a ; λ ) = i / N , from where we solve for y i . The estimator μ ^ p , N ( { y j } j = 1 N ; p ) is implicitly given by i = 1 N y i μ ^ p , N p 1 s i g n y i μ ^ p , N = 0 ; then, we demonstrate Theorem 3 showing the equality lim N μ ^ p , N = μ p ; in particular, we construct the sum S ( N ) i = 1 N y i μ p p 1 s i g n y i μ p , and show that l i m N S ( N ) 0 , satisfying Equation (35).
Figure 3 illustrates the convergence lim N μ ^ p , N = μ p . Panel (a) plots the sum S ( N ) as a function of N and for the norms p = 1 . 5 , 2, and 2 . 5 ; we find a convergence rate of S ( N ) 1 / N . Panel (b) plots the summation S ( N ) for large N ( = 10 3 ) , which becomes zero only if the p-norm used by the estimator μ ^ p , N , that is p = p 1 , equals the p-norm used by the mean μ p , that is p = p 2 . Finally, panel (c) plots the value of the estimator μ ^ p , N , as a function of the norm p, for two data numbers N = 100 and N = 300 , showing the convergence to μ p , which is co-plotted as a function of p.
Figure 4 plots the deviation between the L p -mean estimator μ ^ p , N and the mean μ p , that is, | μ ^ p , N μ p | , and for a large number of data ( N = 10 3 ). The mean μ p is taken for the norm p = 3 , while the deviation is plotted as a function of the norm p of the estimator. We observe that the deviation is minimized, tending to zero, when the norm is p 3 . However, this result holds if the distribution is not symmetric. Once the parameter δ a decreases approaching zero, the distribution P ( y ; δ a 0 ; λ ) becomes symmetric and the deviation | μ ^ p , N μ p | obtains small values (while its minimization at a certain p loses its meaning). We observe that for δ a 0 . 01 or smaller, the deviation is small enough—of the order of 10 4 10 3 (it is non-zero because of the computation errors caused by the finite N), so that μ ^ p , N μ p .

5. Conclusions

The Euclidean L 2 means are derived by minimizing the sum of the total square deviations, i.e., the Euclidean variance. In a similar way, the non-Euclidean L p means were developed by minimizing the sum of the L p deviations, that is proportional to the L p variance [3]. The main advantage of the new statistical approach is that the p-norm is a free parameter, thus both the L p -normed expectation values and their variance are flexible to analyze new phenomena that cannot be described under the notions of classical statistics based on Euclidean norms. The least square method based on the Euclidean norm, p = 2 , and the least absolute deviations method based on the “Taxicab” norm, p = 1 , are some cases of the general fitting methods based on the L p -norms (e.g., [15]; for more applications on the fitting methods based on Lp norms, see: [2,4,16,17]; several other applications can be in signal processing optimization and block entropy analysis, e.g., [2]; in image processing, e.g., [18]; in general data analysis, e.g., [5]; in statistical mechanics, e.g., [3,8,19]. The Law of Large Numbers is a theorem that guarantees the stability of long-term averages of random events, but is valid only for metrics induced by the Euclidean L 2 norm. The importance of this paper is in extending this theorem for L p -norms. Other interesting applications will be to establish a central limit theorem applied for the L p -means.

Acknowledgments

The work was supported in part by the project NNX17AB74G of NASA’s HGI Program.

Conflicts of Interest

The author declare no conflict of interest.

References

  1. Livadiotis, G. Approach to general methods for fitting and their sensitivity. Physica A 2007, 375, 518–536. [Google Scholar] [CrossRef]
  2. Livadiotis, G. Approach to the block entropy modeling and optimization. Physica A 2008, 387, 2471–2494. [Google Scholar] [CrossRef]
  3. Livadiotis, G. Expectation value and variance based on L p norms. Entropy 2012, 14, 2375–2396. [Google Scholar] [CrossRef]
  4. Livadiotis, G.; Moussas, X. The sunspot as an autonomous dynamical system: A model for the growth and decay phases of sunspots. J. Stat. Distrib. Appl. 2014, 379, 436–458. [Google Scholar] [CrossRef]
  5. Livadiotis, G. Chi-p distribution: Characterization of the goodness of the fitting using L p norms. Physica A 2007, 1, 4. [Google Scholar] [CrossRef]
  6. Huber, P. Robust Statistics; John Wiley Sons: New York, NY, USA, 1981. [Google Scholar]
  7. Fréchet, M. Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. L’Institut Henri Poincaré 1948, 10, 215–310. (In French) [Google Scholar]
  8. Livadiotis, G. Non-Euclidean-Normed Statistical Mechanics. Physica A 2016, 445, 240–255. [Google Scholar] [CrossRef]
  9. Scarpello, G.M.; Ritelli, D.E. A historical outline of the theorem of implicit functions. Divulg. Mat. 2002, 10, 171–180. [Google Scholar]
  10. Ahmad, R. On the Structure and Application of Restricted Exchangeability. In Exchangeability in Probability and Statistics; Koch, G., Spizzichino, F., Eds.; Elsevier: Amsterdam, The Netherlands, 1982; pp. 157–164. [Google Scholar]
  11. Williams, L.R.; Wells, J.H. L p inequalities. J. Math. Anal. Appl. 1978, 64, 518. [Google Scholar] [CrossRef]
  12. Feller, W. Law of Large Numbers for Identically Distributed Variables; Wiley: New York, NY, USA, 1971. [Google Scholar]
  13. Hu, T.C.; Chang, H.C. Complete convergence and the law of large numbers for arrays of random elements. Nonlinear Anal. 1997, 30, 4257–4266. [Google Scholar] [CrossRef]
  14. Hoffmann-Jørgensen, J.; Su, K.-L.; Taylor, R.L. The Law of Large Numbers and the Ito-Nisio Theorem for Vector Valued Random Fields. J. Theor. Probab. 1997, 10, 145–183. [Google Scholar] [CrossRef]
  15. Burden, R.L.; Faires, J.D. Numerical Analysis; PWS Publishing Company: Boston, MA, USA, 1993; pp. 437–438. [Google Scholar]
  16. Sengupta, A. A rational function approximation of the singular eigenfunction of the monoenergetic neutron transport equation. J. Phys. A 1984, 17, 2743–2758. [Google Scholar] [CrossRef]
  17. Livadiotis, G.; McComas, D.J. Fitting method based on correlation maximization: Applications in Astrophysics. J. Geophys. Res. 2013, 118, 2863–2875. [Google Scholar] [CrossRef]
  18. Sharma, M.; Batra, A. Analysis of distance measures in content based image retrieval. Glob. J. Comput. Sci. Technol. G Interdiscip. 2014, 14, 11. [Google Scholar]
  19. Livadiotis, G. Kappa Distributions: Theory and Applications in Plasmas; Elsevier: Amsterdam, The Netherlands; London, UK; New York, NY, USA, 2017. [Google Scholar]
Figure 1. Example of L p -means of a dataset following the Poisson distribution. The relation μ p = f ( λ ) is plotted for (a) p 2 , i.e., p = 2 (red solid line), p = 3 (blue dash), p = 5 (green dash–dot), p = 10 (purple thick dash), p = 30 (light-blue thick dash–dot); and (b) p 2 , i.e., p = 2 (red solid line), p = 1 . 7 (blue dash), p = 1 . 5 (green dash–dot), p = 1 . 3 (purple thick dash), p = 1 . 1 (light-blue thick dash–dot).
Figure 1. Example of L p -means of a dataset following the Poisson distribution. The relation μ p = f ( λ ) is plotted for (a) p 2 , i.e., p = 2 (red solid line), p = 3 (blue dash), p = 5 (green dash–dot), p = 10 (purple thick dash), p = 30 (light-blue thick dash–dot); and (b) p 2 , i.e., p = 2 (red solid line), p = 1 . 7 (blue dash), p = 1 . 5 (green dash–dot), p = 1 . 3 (purple thick dash), p = 1 . 1 (light-blue thick dash–dot).
Entropy 19 00217 g001
Figure 2. Uniqueness of the L p -means of the Poisson distribution. The means are plotted as a function of the p-norm, and for various values of 0 < λ < 1 , that is, λ < ln 2 (red solid), λ = ln 2 (black dash), and λ > ln 2 (blue solid).
Figure 2. Uniqueness of the L p -means of the Poisson distribution. The means are plotted as a function of the p-norm, and for various values of 0 < λ < 1 , that is, λ < ln 2 (red solid), λ = ln 2 (black dash), and λ > ln 2 (blue solid).
Entropy 19 00217 g002
Figure 3. Convergence of the estimator μ ^ p , N to the mean μ p (ensemble average). (a) The summation S ( N ) plotted against N for p = 1 . 5 (red solid), 2 (blue dot), and 2 . 5 (green dash); (b) The summation S ( N ) plotted against p = p 2 , for p 1 = 1 . 5 (red solid), 2 (blue dot), and 2 . 5 (green dash), where S ( N ) = 0 holds only for p 2 = p 1 ; (c) Estimator lim N μ ^ p , N , plotted against the norm p, for N = 100 and N = 300 , showing the convergence to the co-plotted mean μ p .
Figure 3. Convergence of the estimator μ ^ p , N to the mean μ p (ensemble average). (a) The summation S ( N ) plotted against N for p = 1 . 5 (red solid), 2 (blue dot), and 2 . 5 (green dash); (b) The summation S ( N ) plotted against p = p 2 , for p 1 = 1 . 5 (red solid), 2 (blue dot), and 2 . 5 (green dash), where S ( N ) = 0 holds only for p 2 = p 1 ; (c) Estimator lim N μ ^ p , N , plotted against the norm p, for N = 100 and N = 300 , showing the convergence to the co-plotted mean μ p .
Entropy 19 00217 g003
Figure 4. Deviation | μ ^ p , N μ p | is plotted as a function of the p-norm of the estimator, and for various values of the parameter δ a = 1 (red solid), 0 . 3 (blue thick dash), 0 . 1 (green thick dash–dot), 0 . 03 (purple dash), and 0 . 01 (light-blue dash–dot).
Figure 4. Deviation | μ ^ p , N μ p | is plotted as a function of the p-norm of the estimator, and for various values of the parameter δ a = 1 (red solid), 0 . 3 (blue thick dash), 0 . 1 (green thick dash–dot), 0 . 03 (purple dash), and 0 . 01 (light-blue dash–dot).
Entropy 19 00217 g004

Share and Cite

MDPI and ACS Style

Livadiotis, G. On the Convergence and Law of Large Numbers for the Non-Euclidean Lp -Means. Entropy 2017, 19, 217. https://doi.org/10.3390/e19050217

AMA Style

Livadiotis G. On the Convergence and Law of Large Numbers for the Non-Euclidean Lp -Means. Entropy. 2017; 19(5):217. https://doi.org/10.3390/e19050217

Chicago/Turabian Style

Livadiotis, George. 2017. "On the Convergence and Law of Large Numbers for the Non-Euclidean Lp -Means" Entropy 19, no. 5: 217. https://doi.org/10.3390/e19050217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop