Next Article in Journal
Random Coupled Hilfer and Hadamard Fractional Differential Systems in Generalized Banach Spaces
Previous Article in Journal
Distance Degree Index of Some Derived Graphs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Means as Improper Integrals

1
Code B-31, Sensor Technology & Analysis Branch, Electromagnetic and Sensor Systems Department, Naval Surface Warfare Center Dahlgren, 18444 Frontage Road Suite 328, Dahlgren, VA 22448-5161, USA
2
Department of Mathematics and Statistics, Georgetown University, Washington, DC 20057-1233, USA
*
Author to whom correspondence should be addressed.
Mathematics 2019, 7(3), 284; https://doi.org/10.3390/math7030284
Submission received: 31 January 2019 / Revised: 23 February 2019 / Accepted: 27 February 2019 / Published: 20 March 2019

Abstract

:
The aim of this work is to study generalizations of the notion of the mean. Kolmogorov proposed a generalization based on an improper integral with a decay rate for the tail probabilities. This weak or Kolmogorov mean relates to the weak law of large numbers in the same way that the ordinary mean relates to the strong law. We propose a further generalization, also based on an improper integral, called the doubly-weak mean, applicable to heavy-tailed distributions such as the Cauchy distribution and the other symmetric stable distributions. We also consider generalizations arising from Abel–Feynman-type mollifiers that damp the behavior at infinity and alternative formulations of the mean in terms of the cumulative distribution and the characteristic function.

1. Introduction

The mean is sometimes taken as the foundational principle for all of probability theory (see, for example, [1]). Nate Silver [2] in the President’s Invited Address at the Joint Statistical Meeting 2013 in Montreal said that “the average is still the most useful statistical tool ever invented.” Steven Levitt [3] in the Arthur M. Sackler Lecture at the National Academy of Sciences in March 2015 said “When I work with companies ⋯, I like to show a comparison of means and that’s often more effective than very complicated things ⋯.” Kosko [4] suggested, on the other hand, jettisoning the mean in favor of the median because of its shortcomings. The chief shortcoming is that the mean sometimes does not exist, whereas the median always exists (even though it is multi-valued in some cases). Kosko cited the Cauchy distribution as a leading example where the mean fails to exist.
Here we study generalizations of the mean intended to extend its reach as much as possible. At the outset, we note that these generalizations are only applicable to the standard case and to distributions with heavy tails at both ends:
0 x P ( d x ) = , 0 x P ( d x ) =
must hold. If neither of these equations is true, then E ( X ) exists. If only one of them holds, then E ( X ) is ± . Thus, our ideas will extend only to distributions heavy-tailed at both ends and indeed having equally heavy tails at both ends. This includes the symmetric stable distributions [5,6,7], a family arising in signal processing, among which are the normal and Cauchy distributions and which satisfy a generalized central limit theorem.
See also [8], where we provide an axiomatic treatment of the ordinary mean based on a condensation principle due to Bemporad and a continuity principle.
Three ways to extend the notion of the mean come to mind.
One way, always available, is to transform the variable so that the mean exists. We apply a strictly-increasing function f to the variable X and study E f ( X ) = f 1 ( E ( f ( X ) ) ) . The function f can be a power function, a log function, a logistic function, or even the function arctan x . The function converts X into a bounded variable that necessarily has a finite mean. The difficulty with this approach is that the extensions do not recover the ordinary mean. If E f ( X ) = E ( X ) whenever E ( X ) exists, then f is necessarily a linear transformation given by f ( x ) = C x + D , C 0 , and E f ( X ) exists only when E ( X ) exists.
A second way to extend the mean, an obvious and straight-forward approach, is to replace the usual definition by an improper integral. Instead of integrating x over R with respect to the probability distribution P of the random variable X we integrate over the interval [ c M , c + M ] for some choice of a center c and determine what happens as M tends to . We follow this approach here.
We itemize the different cases that can result (Theorem 1) in this way and give examples of each. We examine a weak mean proposed by Kolmogorov that corresponds to the weak law of large numbers, and we show that it is additive. We also consider two further generalizations: one with a weakening of the decay conditions that Kolmogorov imposed for his weak mean, which we call the doubly-weak mean; and another, the superweak mean, when the improper integral exists for at least one choice of the center c. These topics occupy Section 2 and Section 3.
A third way to extend the notion of the mean is to introduce what we call a “mollifier”, namely, a parameter-dependent multiplier ϕ λ ( x ) such that x ϕ λ ( x ) x is integrable with respect to P and ϕ λ ( x ) tends to one as λ tends to λ 0 . Then, we consider what happens to E ( ϕ λ ( X ) X ) as λ tends to λ 0 . This approach is examined in Section 4. We note some dangers associated with mollifiers, consider some examples, and determine cases where this approach reduces to the previous approach. Richard Feynman (see [9]) sometimes used mollifier methods, and his approach motivated us. A well-known theorem of Abel is also related to these methods.
In Section 5, we compare and restate the results discussed earlier in terms of the cumulative distribution and the characteristic function of the variable X. In Section 6, we offer a brief conclusion.

2. Improper Integrals Extending the Mean

Consider a real-valued random variable X. We suppose that associated with X is a Borel probability measure P that takes each Borel subset A of the real numbers to
P ( A ) = the probability that  X  belongs to the set  A .
We shall also use the notation P ( X A ) .
The mean of X, denoted by E ( X ) or μ X , is defined when x is integrable with respect to P to be:
E ( X ) = R x P ( d x ) .
One direction of the remarkable strong law of large numbers (see Pollard [10], pp. 37–38, 78) states that if { X n } is a sequence of independent random variables with common distribution P and if there exists a constant m such that:
X 1 + + X n n converges almost surely to  m
as n tends to , then each X n has mean m. Here “almost surely” means outside a set of measure zero in the countably-infinite product space induced by the measure P (see [10], pp. 99–102). Simply put, if the sample mean of independent copies of X settles down to a specific number, then that number is E ( X ) . This can be regarded as the motivation for the transition from the sample mean to the mathematical mean E ( X ) . The general notion of mean is derived from the finitary notion of the sample mean.
The other direction of the strong law of large numbers asserts that if E ( X ) exists, then the sample mean of n identical independent copies of X converges almost surely to E ( X ) as n tends to infinity. For a proof of both directions of the strong law, see [10], pp. 95–102, 105. For an alternate proof due to N. Etemadi, see [10], pp. 106–107.
When x is not integrable with respect to P, the notion E ( X ) above is inapplicable, and we must rely on other notions of the mean. The most obvious generalization is the following improper integral:
L ( c ) = d e f lim M [ c M , c + M ] x P ( d x )
for a real number c.
By the Lebesgue dominated convergence theorem ([11], p. 172), this notion coincides with the ordinary mean when x is integrable with respect to P. In his great foundational work ([12], p. 40), Kolmogorov noted this option in the case when c = 0 and observed that it does not require integrability of | x | . Indeed if X is a random variable obeying the Cauchy distribution f ( x ) = 1 / π ( 1 + x 2 ) , then X satisfies L ( c ) 0 for any choice of c.
Two related notions of mean are:
( L 1 ) lim M [ a M , b + M ] x P ( d x )
and:
( L 2 ) lim min { M , K } [ a M , b + K ] x P ( d x ) ,
where a b .
It is easily seen that the expression in L-1 coincides with L ( ( a + b ) / 2 ) since:
[ a M , b + M ] = a + b 2 M + b a 2 , a + b 2 + M + b a 2 .
As for L-2, we have the following result.
Proposition 1.
Let X be a random variable with probability measure P. Let a and b be real numbers with a b . Then:
lim min { K , M } [ a M , b + K ] x P ( d x )
exists if and only if x is integrable with respect to P. Furthermore, when this limit exists, it is equal to E ( X ) .
Proof. 
If x is integrable on R , the limit exists and equals the mean of x by Lebesgue’s dominated convergence theorem. Conversely, if the limit exists, then:
0 ( b + K , b + K ] x P ( d x ) < ϵ
for K < K , both sufficiently large, and any given ϵ . Likewise:
ϵ < [ a M , a M ) x P ( d x ) 0
for M < M , both sufficiently large. Fatou’s lemma or Levi’s theorem ([11], p. 172) thus implies both that:
0 ( b + K , ) x P ( d x ) ϵ  and  ϵ ( , a M ) x P ( d x ) 0 .
Thus, x is integrable on [ 0 , ) , as well as ( , 0 ] , and so is integrable on R = ( , ) . □
Returning now to L ( c ) , we stipulate that L ( c ) . This gives us flexibility in characterizing what can happen. A necessary, but not sufficient condition for L ( c ) to be finite, we note in passing, is that for each positive real number p:
lim M M | P ( M X c M + p ) P ( M p X c M ) | = 0 .
Lemma 1.
Let X be a random variable with probability measure P, and let c 1 and c 2 be real numbers with c 1 < c 2 . Then, there are three possibilities:
(i) 
If L ( c 1 ) exists in [ , ] , then:
L ( c 1 ) lim inf M [ c 2 M , c 2 + M ] x P ( d x ) ;
(ii) 
If L ( c 2 ) exists in [ , ] , then:
L ( c 2 ) lim sup M [ c 1 M , c 1 + M ] x P ( d x ) ;
(iii) 
If L ( c 1 ) and L ( c 2 ) both exist in [ , ] , then:
L ( c 1 ) = L ( c 2 ) .
Proof. 
Suppose c 1 < c 2 . Then:
[ c 2 M , c 2 + M ] x P ( d x ) = [ c 1 M , c 1 + M ] x P ( d x ) + ( c 1 + M , c 2 + M ] x P ( d x ) + [ c 1 M , c 2 M ) ( x ) P ( d x ) .
The second and third terms on the right side of Equation (1) are both non-negative for M sufficiently large, and (i) and (ii) of Lemma 1 follow at once.
In the case of (iii), note that (i) and (ii) imply that if both L ( c 1 ) and L ( c 2 ) exist, then L ( c 1 ) L ( c 2 ) . Suppose then that L ( c 1 ) and L ( c 2 ) both exist with c 1 < c 2 and L ( c 1 ) < L ( c 2 ) . Then, there is a positive number K such that K < L ( c 2 ) L ( c 1 ) , and for M sufficiently large:
K < ( c 1 + M , c 2 + M ] x P ( d x ) + [ c 1 M , c 2 M ) ( x ) P ( d x ) c 2 + M P ( c 1 + M , c 2 + M ] + M c 1 P [ c 1 M , c 2 M ) M + d P ( c 1 + M , c 2 + M ] [ c 1 M , c 2 M )
where d = max { | c 2 | , | c 1 | } . Thus:
K M + d < P ( c 1 + M , c 2 + M ] [ c 1 M , c 2 M ) .
Now, replace M by M j = M + j ( c 2 c 1 ) for each non-negative integer j to get:
K M j + d < P ( c 1 + M j , c 2 + M j ] [ c 1 M j , c 2 M j ) .
Summing over these inequalities and noting that c 2 + M j = c 1 + M j + 1 and c 1 M j = c 2 M j + 1 , we obtain:
= j = 0 K M + d + j ( c 2 c 1 ) P ( c 1 + M , ) ( , c 2 M ) 1 ,
which is a contradiction. Thus, the possibility L ( c 1 ) < L ( c 2 ) is eliminated, and L ( c 1 ) = L ( c 2 ) . □
Theorem 1.
Let X be a random variable with probability measure P. Then exactly one of these possibilities holds:
(i) 
L ( c ) does not exist in [ , ] for any real number c;
(ii) 
L ( c ) exists in ( , ) for exactly one real number c;
(iii) 
L ( c ) exists in [ , ] for all real numbers c and is independent of c;
(iv) 
there is a number c 0 such that L ( c ) = for c > c 0 , L ( c ) does not exist for c < c 0 , and L ( c 0 ) equals ∞ or does not exist; or
(v) 
there is a number c 0 such that L ( c ) = for c < c 0 , L ( c ) does not exist for c > c 0 , and L ( c 0 ) equals or does not exist.
Proof. 
Suppose L ( c ) = for some number c. Let c 0 = inf { c : L ( c ) = } . By Lemma 1 Part (i), L ( c ) = for all c > c 0 . If c 0 = , then the situation is described by Theorem 1 Part (iii). If c 0 > , Part (iii) of Lemma 1 implies that the situation is given by Theorem 1 Part (iv).
Likewise, suppose L ( c ) = for some number c. Let c 1 = sup { c : L ( c ) = } . If c 1 = , then Theorem 1 Part (iii) applies. If c 1 < , Part (iii) of Lemma 1 implies that Theorem 1 Part (v) holds.
Now, suppose L ( c ) ± for all real numbers c. Then, either L ( c ) does not exist for any c and Theorem 1 Part (i) holds, or L ( c ) exists for exactly one c and Theorem 1 Part (ii) holds, or L ( c ) exists at two or more points and is a finite number.
In the latter case, we can find c 1 and c 2 with c 1 < c 2 so that by Lemma 1 Part (iii) L ( c 1 ) = L ( c 2 ) R . In this case, the last two terms in Equation (1) each tend to zero as M tends to infinity. By the changes of the variable, we conclude that:
lim M ( M , p + M ] x P ( d x ) = 0 .
and:
lim M [ M p , M ) x P ( d x ) = 0 .
hold where p = c 2 c 1 . It follows immediately that (2) and (3) also hold when 0 < p < c 2 c 1 . Likewise, if (2) and (3) hold for p, they also hold for 2 p , 3 p , ..., and indeed for n p where n is any fixed positive integer. Accordingly, (2) and (3) hold for all positive real numbers p.
Applying this result to the two terms on the far right side of Equation (1), we find that:
L ( c ) L ( c 1 ) = L ( c 2 )
for all real numbers c, and thus, Theorem 1 Part (iii) holds. □
We give some examples to illustrate that each of the possibilities enumerated in Theorem 1 can occur. For simplicity, we use discrete random variables in most of our examples.
Example 1.
Consider a random variable X whose probability measure is of the form:
P = n = 1 1 2 2 n δ 2 2 n + 1 2 2 n 1 δ ( 2 2 n 1 )
where δ z is the (Dirac) probability measure whose value is one on any Borel subset of R that contains the real number z and whose value is zero on the remaining Borel subsets of R . The sum of the nonzero values is one, so this obviously defines a probability measure. However, the integral of x over the interval [ c M , c + M ] is the difference between the size of the first set and the size of the second set below:
T h e   s i z e   o f   t h e   s e t { n : 1 n , 2 2 n ( c + M ) } = log ( c + M ) 2 log 2 T h e   s i z e   o f   t h e   s e t { n : 1 n , 2 2 n 1 ( M c ) } = log ( M c ) 2 log 2 + 1 2
where is the floor function. For fixed c and sufficiently large M, the difference of the above quantities can assume the values zero and 1 , and the integral does not settle down to either one. This is an instance of Theorem 1, Part (i).
Example 2.
A random variable X can also be defined with the probability measure of the form:
P = n = 1 1 2 n + 1 δ 2 n + δ ( 2 n ) .
Therefore, P is concentrated at the points ± 2 n and assigns probability 1 / ( 2 n + 1 ) to these points. For this measure, L ( 0 ) equals zero by symmetry. However, L ( c ) does not exist for other choices of c. If c is positive, the integral of x over the closed interval [ c M , c + M ] reduces by cancellation to its integral over the half-open interval ( M c , M + c ] . This integral oscillates between zero and 1 2 for large M depending on whether 2 n is in the interval ( M c , M + c ] or not. A similar behavior occurs when c < 0 . This example is an instance of Theorem 1 Part (ii).
Example 3.
Now, consider a random variable X having a probability density (with respect to Lebesgue measure on the real line) of the form:
f ( x ) = A 1 + C x a i f x 0 B 1 + D | x | b i f x < 0
where a and b are numbers in ( 1 , 2 ) and A, B, C, and D are suitable positive constants that guarantee that the density integrates to one. This random variable satisfies Part (iii) of Theorem 1. It is easy to see that L ( c ) for all c or for all c according as b > a or a > b . When a = b , then L ( c ) + if A D B C > 0 and L ( c ) if A D B C < 0 . When a = b and A D = B C , then L ( c ) is a real number independent of c as in Part (iii) of Theorem 1.
If we let a = b = 2 and ( A , C ) = ( B , D ) = ( 1 π , 1 ) , we obtain the Cauchy distribution, which also illustrates Part (iii) of Theorem 2. with L ( c ) 0 .
Example 4.
The probability measure:
P = n = 1 2 n 3 n + 1 δ 3 n + 2 n 1 3 n + 1 δ ( 3 n )
illustrates Part (iv) of Theorem 1. If c 0 , the integral of x over [ c M , c + M ] is given by:
{ n : 1 n , 3 n c + M } 2 n 3 { n : 1 n , 3 n M c } 2 n 1 3 ,
and this expression has the value ( 2 n 0 1 ) / 3 or ( 2 n 0 + 2 n 0 1 1 ) / 3 where n 0 = log ( c + M ) / log 3 for large M. Since M and n 0 tend to infinity together, it follows that L ( c ) for c 0 .
On the other hand, if c = d where d > 0 , the integral of x over [ c M , c + M ] = [ M d , M d ] is given by:
{ n : 1 n , 3 n M d } 2 n 3 { n : 1 n , 3 n M + d } 2 n 1 3 ,
and this reduces to ( 2 n 0 1 ) / 3 or to ( 1 ) / 3 for large M where:
n 0 = log ( M + d ) / log 3
depending on whether a positive integer lies in the interval ( log ( M d ) / log 3 , log ( M + d ) / log 3 ] or not. Thus, L ( c ) does not exist for c < 0 .
Example 5.
Another example for Part (iv) of Theorem 1 is the case where the probability measure is given by:
P = K n = 1 2 n 3 n + ( 1 / n ) δ 3 n + ( 1 / n ) + 2 n 1 3 n δ ( 3 n ) .
Here K is a suitably-chosen positive normalizer, which is easily seen to be in the interval ( 1 / 3 , 1 ) . For c > 0 , the integral of x over [ c M , c + M ] is:
K ( { n : 1 n , ( 3 n + 1 / n ) M + c } 2 n { n : 1 n , 3 n M c } 2 n 1 ) ,
and this is greater than or equal to K ( 2 n 0 1 ) for M sufficiently large where n 0 = log ( M c ) / log 3 . Since n 0 and M tend to infinity together, L ( c ) for all c > 0 . When c = 0 , the integral of x over [ M , M ] reduces to K ( 2 n 0 1 ) or to K where n 0 is the largest integer such that 3 n 0 + ( 1 / n 0 ) M , and the first or second reduction occurs according to M < 3 n 0 + 1 or not. Thus, L ( 0 ) does not exist. By Theorem 1, L ( c ) does not exist for c < 0 .
Other cases arising in Theorem 1, such as Part (v), are obtained by modifying the examples above, e.g., replacing X by X or by X + a .

3. Weak Means

An implication of Theorem 1 is that if L ( c ) exists for more than one choice of c and is finite for one c, then L ( c ) exists for all c, is finite, and is independent of c. The case of the Cauchy distribution shows that this can happen without the ordinary mean existing. Accordingly, for a random variable X, we define the doubly weak mean of X, denoted by E W W ( X ) , to be the common value of L ( c ) for all c when this common value exists and is in ( , ) . In Theorem 4 below, we will consider alternative characterizations of the doubly-weak mean.
An intermediate notion exists due to Kolmogorov ([12], pp. 64–66) between the ordinary mean and the doubly-weak one that motivates our terminology. The weak mean of X, denoted by E W ( X ) , is defined as follows: E W ( X ) is the quantity L ( 0 ) provided the latter exists in ( , ) and:
lim n n P ( | X | > n ) = 0 .
The following theorem is due to Kolmogorov. It indicates that the existence of the weak mean coincides precisely with the existence of a number for which the weak law of large numbers holds.
Theorem 2 (Kolmogorov, 1928).
Let X be a random variable. Suppose that X 1 , , X n , are independent identically distributed copies of X with P n the n-fold product distribution. Then, there is a real number m such that for each ϵ > 0 :
lim n P n ( | X 1 + . . . + X n n m | > ϵ ) = 0
if and only if X has weak mean E W ( X ) = m .
Proof. 
See [12], p. 65, [13], and [14], Theorems XII and XIII. □
Proposition 2.
Let X be a random variable.
(i) 
If X has a mean, then X has a weak mean, and E ( X ) = E W ( X ) ; and
(ii) 
if X has a weak mean, then X has a doubly-weak mean, and E W ( X ) = E W W ( X ) .
Proof. 
In the case when X has a mean, then the identity function x x is integrable with respect to the probability measure P on the real line. In particular, the tail integrals:
[ n , ) x P ( d x )  and  ( , n ] x P ( d x )
tend to zero as n tends to infinity. Since the absolute values of these integrals are larger respectively than n P ( X n ) and n P ( X n ) , it follows that lim n n P ( | X | > n ) = 0 . Likewise, by Lebesgue’s dominated convergence theorem, L ( 0 ) = E ( X ) . Now, suppose X has a weak mean. If c 1 < c 2 and ϵ > 0 are given, then for a sufficiently large M,
0 ( c 1 + M , c 2 + M ] x P ( d x ) ( c 2 + M ) P ( ( c 1 + M , c 2 + M ] ) ( c 2 + M ) P ( | X | > c 1 + M ) c 2 + M c 1 + M ϵ .
For sufficiently large M, the right side is as close to ϵ , as we like. Thus:
lim M ( c 1 + M , c 2 + M ] x P ( d x ) = 0 .
Similarly,
lim M [ c 1 M , c 2 M ) x P ( d x ) = 0 .
Accordingly, from Equation (1) in the proof of Lemma 1, it follows that whenever one of L ( c 2 ) or L ( c 1 ) exists and is finite, the other exists and is equal to it. Since L ( 0 ) = m , then L ( c ) exists for all c, L ( c ) m , and m is the doubly-weak mean of X. □
Kolmogorov in [12], p. 66, gave an example where the weak law holds, but the strong law does not. Kolmogorov’s example is a random variable X whose probability distribution P is given by:
P ( X A ) = A C ( | x | + 2 ) 2 ln ( | x | + 2 ) d x
where C is a suitable normalizing constant and A is any Borel set in the reals. Cauchy random variables have L ( c ) existing for all c, independent of c, but violate the weak law by not decaying rapidly enough at infinity. These examples demonstrate that the notions of mean, weak mean, and doubly-weak mean are strictly distinct. Since the strong law and the weak law correspond precisely to the mean and the weak mean, it is natural to wonder if another such law corresponds to the doubly-weak mean.
Theorem 3 (Doubly-weak law of large numbers).
Let X be a random variable. For c in R and M > 0 , let c X M denote the random variable defined by:
c X M = X i f | X c | M 0 e l s e .
Suppose that c X 1 M , , c X n M , are independent identically distributed copies of c X M with P ^ n their n-fold product distribution. If there is a real number m such that for each ϵ > 0 and for c = c 1 and c = c 2 for distinct real numbers c 1 and c 2 , and for M sufficiently large:
lim n P ^ n | c X 1 M + . . . + c X n M n m | > ϵ = 0 ,
then X has doubly-weak mean E W W ( X ) = m .
Conversely, if X has doubly-weak mean E W W ( X ) = m , then Equation (5) holds for each ϵ > 0 , for each c R , and for M sufficiently large.
Proof. 
Suppose that c X M is as above and (5) holds. Note that the probability distribution P ^ of c X M is given by P ^ ( A ) = P ( A [ c M , c + M ] ) + δ 0 ( A ) ( 1 P ( [ c m , c + M ] ) ) for each Borel set A. The variable c X M is a bounded variable and hence has a mean given by:
E ( c X M ) = [ c M , c + M ] x P ^ ( d x ) .
Furthermore, this variable obeys the weak law of large numbers, i.e.,
lim n P ^ n | c X 1 M + . . . + c X n M n E ( c X M ) | > ϵ = 0
for every ϵ > 0 , every c R , and every M > 0 . Combining (5) and (6), we conclude that:
lim n P ^ n | E ( c X M ) m | 2 ϵ = 1
for c { c 1 , c 2 } and M sufficiently large. Since the inequality | E ( c X M ) m | 2 ϵ does not depend on n, we conclude that it must be true for M sufficiently large. However, ϵ is an arbitrary positive number. Therefore,
L ( c ) = lim M [ c M , c + M ] x P ( d x ) = lim M [ c M , c + M ] x P ^ ( d x ) = lim M E ( c X M ) = m
for c { c 1 , c 2 } , that is to say, L ( c 1 ) and L ( c 2 ) exist and are equal to the real number m. By Theorem 1, L ( c ) m for every real numbers c, and m is the doubly-weak mean of X.
To prove the converse, note that since L ( c ) E W W ( X ) for all c, then for each c R , (7) holds. Thus, for any ϵ > 0 , any c, and for any M sufficiently large (dependent on both ϵ and c):
| E ( c X M ) m | ϵ 2 , P ^ n | E ( c X M ) m | ϵ 2 = 1 for   all   n 1 .
However, combining the above result with (6), with ϵ , we obtain (5) with ϵ replaced by ϵ 2 . Since ϵ is an arbitrary positive number, the factor of two is irrelevant. Thus, (5) holds for all ϵ , all c, and M sufficiently large. □
The doubly-weak mean can be characterized in a different manner as follows, based on the argument in Theorem 1.
Theorem 4.
The random variable X has a doubly-weak mean if and only if any of the following equivalent conditions holds:
(i) 
L ( c ) exists in ( , ) for all real numbers c and is independent of c; or
(ii) 
L ( c ) exists in ( , ) for two distinct real numbers c; or
(iii) 
L ( c ) exists in ( , ) for some real number c and for every positive real number p lim M M P ( M < | X | M + p ) = 0 ; or
(iv) 
L ( c ) exists in ( , ) for some real number c, and there exists a real number p 1 such that lim n n P ( n < | X | n + p ) = 0 .
Of course, n above denotes an integer variable, while M is a real variable. Any one of these conditions can be taken as defining when a doubly-weak mean exists, in which case the doubly-weak mean is the (common) value of L ( c ) . The closest in spirit to Kolmogorov’s definition for the weak mean (cf. Equation (4)) is Condition (iv).
Proof. 
That (i) and (ii) are equivalent is a consequence of Theorem 1. If (ii) holds, it also follows from Theorem 1 that < L ( c 1 ) = L ( c 2 ) < for real numbers c 1 and c 2 with c 1 < c 2 . The proof of Theorem 1 then shows that Equations (2) and (3) hold for any positive number p. Since:
0 M P X ( M < X M + p ) ( M , p + M ] x P ( d x ) 0 M P ( M > X M p ) [ M p , M ) x P ( d x ) ,
it follows that (ii) implies (iii). Condition (iv) is a special case of (iii).
We must still show that (iv) implies (i) or (ii). The condition that:
lim n n P ( n < | X | n + p ) = 0
for some p 1 implies the same for any smaller p > 0 . For a number k larger than p in the interval ( j p , ( j + 1 ) p ] with j a positive integer, choose n 0 so large that for n n 0 :
0 n P ( n < | X | n + p ) < ϵ j + 1 .
Then:
n P ( n < | X | n + k ) n P ( n < | X | n + ( j + 1 ) p ) ( j + 1 ) ϵ j + 1 = ϵ
for n n 0 . Thus, (iv) holds for any p > 0 .
If L ( c ) exists for some number c and d is another number, larger than c without loss of generality, then in the imitation of Equation (1):
d M d + M x P ( d x ) = c M c + M x P ( d x ) + c + M d + M x P ( d x ) c M d M x P ( d x ) .
However, for a positive number M > max ( c , d ) :
0 c + M d + M x P ( d x ) d + M P ( c + M X d + M ) n 1 P ( n 1 < | X | n 1 + p ) + p P ( n 1 < X )
and:
0 | c M d M x P ( d x ) | M d P ( c M X d M ) n 2 P ( n 2 < | X | n 2 + p ) + p P ( X < n 2 ) ,
where n 1 and n 2 are integers such that n 1 < c + M n 1 + 1 and n 2 < M d n 2 + 1 , and p = d c + 1 .
For M sufficiently large, n 1 and n 2 are as large as we like, but with p fixed, the right sides of these two inequalities tend to zero. According in the limit, as M tends to in Equation (8), we obtain L ( d ) = L ( c ) . This establishes (ii), and hence (i). □
The doubly-weak mean may appear to be the last possibility for generalizing the mean. However, Theorem 1 suggests yet another generalization. We say that X has a superweak mean if L ( c ) exists in [ , ] for some real number c. Example 2 is a case where a finite superweak mean exists, but not a doubly-weak or weak mean. We have thus covered every case in Theorem 1 since Example 1 shows that there are cases where even the superweak mean does not exist.
We are now in a position to treat symmetric stable distributions.
Proposition 3.
Let X be a real random variable having a symmetric stable distribution with location parameter a. Then X has a doubly-weak mean E W W ( X ) = a .
Proof. 
In the notation of [7], p. 14, when the characteristic exponent α is in ( 1 , 2 ] , the ordinary mean E ( X ) = a exists. See [7], p. 22. We assume the symmetry parameter β = 0 . When α = 1 , we are dealing with the Cauchy distribution, which obviously satisfies (iii) and (iv) of Theorem 4. Therefore, it suffices to consider cases when 0 < α < 1 . Furthermore, the variable X can be taken to have location parameter a = 0 and scale parameter γ = 1 since we can replace X by ( X a ) / γ 1 / α without loss of generality. Then, note that the density function for such an X is symmetric about zero, and hence, L ( 0 ) = 0 .
It thus suffices to show that M P ( M < X M + p ) tends to zero as M tends to for a fixed p > 0 . Using Equation 2.9 of [7], p. 16, we obtain:
M P ( M X M + p ) = M M M + p 1 π x Σ k = 1 ( 1 ) k 1 k ! Γ ( α k + 1 ) x α k sin k α π 2 d x = M p 1 π M Σ k = 1 ( 1 ) k 1 k ! Γ ( α k + 1 ) ( M ) α k sin k α π 2 M p π M Σ k = 1 ( M ) α k = M p π M ( M ) α 1 ( M ) α = M p M ( ( M ) α 1 ) ,
which tends to zero as M tends to . Here, M is a number in ( M , M + p ) that depends on M and p and is guaranteed to exist by the integral form of the mean value theorem. □
Another proposition that can be established is the following.
Proposition 4.
Let X and Y be random variables both of which have a weak mean. Then X + Y has a weak mean, and E W ( X + Y ) = E W ( X ) + E W ( Y ) .
Proof. 
This can be shown directly from the definition of the weak mean and Equation (4) at the beginning of Section 3. However, here, we will use the weak law. Let ( X + Y ) 1 , . . . ( X + Y ) n , . . . be a sequence of independent copies of X + Y with associated product measure P n for the first n of these variables. Let X 1 , . . . , X n , . . be a sequence of independent copies of X and Y 1 , . . . Y n , . . a sequence of independent copies of Y, and without loss of generality, suppose X 1 + Y 1 , . . . , X n + Y n , . . . are independent. Let Q n , Q n , and Q 2 n be the product measure spaces associated respectively with the first n terms of these sequences. Then:
P n ( | ( X + Y ) 1 + . . . + ( X + Y ) n n E W ( X ) E W ( Y ) | < ϵ ) = Q 2 n ( | X 1 + Y 1 + . . . + X n + Y n n E W ( X ) E W ( Y ) | < ϵ ) Q 2 n ( | X 1 + . . . + X n n E W ( X ) | < ϵ 2 , | Y 1 + . . . + Y n n E W ( Y ) | < ϵ 2 ) = Q 2 n ( | X 1 + . . . + X n n E W ( X ) | < ϵ 2 ) + Q 2 n ( | Y 1 + . . . + Y n n E W ( X ) | < ϵ 2 ) Q 2 n ( | X 1 + . . . + X n n E W ( X ) | < ϵ 2 o r | Y 1 + . . . + Y n n E W ( X ) | < ϵ 2 ) Q 2 n ( | X 1 + . . . + X n n E W ( X ) | < ϵ 2 ) + Q 2 n ( | Y 1 + . . . + Y n n E W ( X ) | < ϵ 2 ) 1 = Q n ( | X 1 + . . . + X n n E W ( X ) | < ϵ 2 ) + Q n ( | Y 1 + . . . + Y n n E W ( X ) | < ϵ 2 ) 1 .
Since X and Y have weak means, the last two lines converge to 1 + 1 1 = 1 as n tends to . Accordingly, the first expression does also, and our result follows by Theorem 2. □
A similar result is not available for doubly-weak means except for special cases (e.g., linear combinations of independent copies of a symmetric stable distributions).

4. Mollifiers

Richard Feynman was famous for his integration techniques, some of which are recorded in the book of Mathews and Walker [9], based on lectures Feynman gave at Cornell. Feynman’s ideas, as noted earlier, partly motivated our investigation.
Mollifiers are used to aid approximation of the delta function and to smooth functions, but another use is to regularize the behavior at ± . Mollifiers fall under the heading of summation methods [15]. The mollifier can then be used to reinterpret integrals, or renormalize them, in a manner that makes them finite. This method is used to “evaluate” the integrals of sin b x and sin x / x on [ 0 , ) in [9], pp. 60, 91.
This idea can be used to generalize the notion of the mean. We introduce a function ϕ λ ( x ) that depends on a parameter λ so that x ϕ λ ( x ) x is integrable with respect to P for λ λ 0 and ϕ λ ( x ) 1 for each x as λ λ 0 . Then, we define:
L ( ϕ ) = d e f lim λ λ 0 E ( ϕ λ ( X ) X )
In the case of the means L ( c ) discussed in earlier sections, the multiplier can be taken to be:
ϕ λ ( x ) = χ [ c 1 / | λ | , c + 1 / | λ | ] ( x )
where χ A is the characteristic function of the set A, | λ | = 1 / M , and λ 0 = 0 . Then, L ( ϕ ) is the same as L ( c ) .
However, there are dangers associated with mollifiers that the following example illustrates.
Example 6.
Define a function ϕ λ , D for λ and D in R by:
ϕ λ , D ( x ) = e | λ | x i f x 0 e | λ | x ( 1 + π D | λ | x ) i f x < 0 ,
Evidently, ϕ λ , D is a well-behaved function, integrable and dying off at ± . Furthermore, { ϕ λ , D } converges pointwise to the constant function identically equal to one as λ tends to λ 0 = 0 with D fixed.
Suppose we use this family of functions as a mollifier to determine a mean for a variable obeying the Cauchy distribution. Let m ( λ , D ) be defined by:
m ( λ , D ) = ϕ λ , D ( x ) x π ( 1 + x 2 ) d x = 0 D λ e | λ | x x 2 1 + x 2 d x = D m ( λ , 1 ) .
Now:
1 = e | λ | x | 0 = 0 | λ | e | λ | x d x 0 | λ | e | λ | x x 2 1 + x 2 d x = m ( λ , 1 ) = 0 | λ | e | λ | x x 2 1 + x 2 d x K | λ | e | λ | x x 2 1 + x 2 d x K 2 e | λ | K 1 + K 2
for any positive real number K. Thus:
1 lim sup λ 0 m ( λ , 1 ) lim inf λ 0 m ( λ , 1 ) K 2 1 + K 2 .
Letting K tend to infinity, we find that lim λ 0 m ( λ , 1 ) = 1 .
Hence, the mollifier-induced mean of the standard Cauchy distribution is:
L ( ϕ ) = lim λ 0 ϕ λ , D ( x ) x π ( 1 + x 2 ) d x = lim λ 0 m ( λ , D ) = lim λ 0 D m ( λ , 1 ) = D lim λ 0 m ( λ , 1 ) = D .
However, D is arbitrary and depends on the choice of the mollifier!
The underlying problem is that the choice of the mollifier is asymmetric. There are two issues: symmetry and a center of symmetry. The doubly-weak mean discussed in the previous section addresses these two issues by making use of the family of improper integrals (9) and requiring that E W W ( X ) = L ( c ) be the same regardless of the center c.
This motivates the following definition. We call { ϕ λ } a mollifier for X with center c if and only if for each real number λ in some neighborhood of zero, we are given a function x ϕ λ ( x ) taking R into R with the following properties:
  • x ϕ λ ( x ) x is integrable with respect to P for λ 0 ,
  • lim λ 0 ϕ λ ( x ) = 1 for each x in R ,
  • ϕ λ ( c x ) = ϕ λ ( c + x ) for all x 0 , and
  • ϕ λ ( c ) = 1 for all λ .
Without loss of generality, we have taken the limiting value of λ to be zero in this definition. Furthermore, since ϕ λ ( c ) tends to 1 as λ tends to zero, if { ϕ λ } does not satisfy (4), we can replace this family by { ϕ λ ϕ λ ( c ) } .
Examples of mollifiers, in addition to (9), include:
e | λ x | , e | λ | x 2 , sin λ x λ x , and cos λ x .
The examples in (9) have center c, while those just mentioned have center zero for suitable X’s. In general, a mollifier with center c can be created by taking one with center zero, call it ϕ λ ( x ) , and replacing it by ϕ λ ( x c ) . However, we cannot be sure that the new mollifier will satisfy the property that x x ϕ λ ( x c ) remains integrable with respect to P (although it will be with respect to P Y where Y = X c ).
Let { ϕ λ } be a mollifier for the random variable X, and as before, let:
L ( ϕ ) = lim λ 0 ϕ λ ( x ) x P ( d x ) .
It is natural to consider how L ( ϕ ) behaves in relation to the family { L ( c ) : c R } described in Theorem 1. We content ourselves with the following theorem and a few examples.
Theorem 5.
Let { ϕ λ } be a mollifier for the random variable X with center c. Suppose that for each λ 0 , ϕ λ ( x ) is an absolutely continuous function of x on each finite subinterval of R . Suppose also that there is a positive constant K such that for each λ 0 , x ϕ λ ( x ) has variation bounded by K on R . If L ( c ) exists for X and some number c, then L ( ϕ ) exists and is equal to L ( c ) .
Proof. 
Without loss of generality, we assume that c = 0 . Consider the following integral identities where 0 < M < M .
M M x ( 1 ϕ λ ( x ) ) P ( d x ) + M M x ( 1 ϕ λ ( x ) ) P ( d x ) = M M x 0 x ϕ λ ( t ) d t P ( d x ) + M M x x 0 ϕ λ ( t ) d t P ( d x ) = 0 M ϕ λ ( t ) max ( t , M ) M x P ( d x ) d t + M 0 ϕ λ ( t ) M min ( M , t ) x P ( d x ) d t = 0 M ϕ λ ( t ) max ( t , M ) M x P ( d x ) + M max ( M , t ) x P ( d x ) d t .
Here, we have used the differentiability of ϕ λ , the integrability of ϕ λ , and Fubini’s theorem. In the last line of (10), we have made a change of variables from t to t and used the identity ϕ λ ( t ) = ϕ λ ( t ) .
The last line of (10) has an absolute value less than or equal to:
0 M | ϕ λ ( t ) | max ( t , M ) M x P ( d x ) + M max ( M , t ) x P ( d x ) d t .
Since L ( 0 ) is finite, the second expression in absolute values in this integral can be made smaller than ϵ / 2 K where ϵ is a given positive number ϵ for M sufficiently large with M t M . However, ϕ λ has variation bounded by K. Therefore, the entire integral in (11) is smaller than ϵ / 2 .
Let us also require that M be so large that:
| L ( 0 ) M M x P ( d x ) | < ϵ / 6 .
Now, pick a positive number δ so that | λ | < δ implies that:
| M M x 1 ϕ λ ( x ) P ( d x ) | < ϵ / 6
This can be done since for fixed M, the integrand in (13) is bounded due to the bounded variation of ϕ λ , and hence, the Lebesgue dominated convergence theorem implies that the integral inside the absolute value tends to zero as λ tends to zero.
Finally, given a nonzero λ in ( δ , δ ) , choose M larger than M so that:
| x ϕ λ ( x ) P ( d x ) M M x ϕ λ ( x ) P ( d x ) | < ϵ / 6 .
Combining the estimate for the integrals in (10) resulting from (11) with (12)–(14), we obtain that for 0 < | λ | < δ :
| L ( 0 ) x ϕ λ ( x ) P ( d x ) | < ϵ / 2 + 3 ϵ / 6 = ϵ ,
which establishes the theorem. □
Theorem 5 indicates that some mollifiers are guaranteed to yield the same answer for the mean as our earlier techniques. One question, though, is whether they will sometimes give an answer when the previous methods do not.
A theorem of Abel asserts that a power series in z that converges at a point z 0 on the unit circle is absolutely convergent at each point in the open unit disk. Moreover, the analytic function f ( z ) that it converges to has the property that lim z z 0 f ( z ) = f ( z 0 ) provided z tends to z 0 non-tangentially from the interior of the unit disk. In particular, lim r 1 a n r n = a n provided the latter converges.
A well-known counterexample to the converse of Abel’s theorem is the series n 0 ( 1 ) n . This series diverges since its partial sums oscillate between one and 1 . However, n 0 ( 1 ) n r n converges to 1 / ( 1 + r ) for | r | < 1 and, as r tends to 1 , tends to 1 / 2 . The quantity r n serves as a kind of mollifier for the original series and enables a kind of convergence.
Example 7.
We can use the above to find a counterexample to the converse of Theorem 5, i.e., a case where L ( ϕ ) exists even though L ( c ) fails to exist for all c. Let X be a random variable taking the values:
x n = n 3 + ( 1 ) | n | n 2
for each non-zero integer n with P ( X = x n ) = K n 2 where 1 K = 2 ( { n : n 1 } 1 n 2 ) = π 2 3 .
Let θ : [ 0 , ) [ 0 , ) be defined in the following way: for each positive integer n, set θ ( x ) = n i f n 3 n 2 x n 3 + n 2 , and let θ take the interval [ n 3 + n 2 , ( n + 1 ) 3 ( n + 1 ) 2 ] onto the interval [ n , n + 1 ] in an increasing (and smooth) fashion. A consequence of this definition is that θ ( | x n | ) = | n | for non-zero integers n.
Next, we define ϕ λ : R [ 0 , 1 ] by ϕ λ ( x ) = e | λ | ( θ ( | x | ) 1 ) for x in R . Evidently, { ϕ λ } satisfies Conditions 2, 3, and 4 of the definition of a mollifier for X with center zero. Furthermore, ϕ λ can even be taken to be C (but not analytic!). To confirm that x x ϕ λ ( x ) is integrable with respect to the probability measure P associated with X, note that this measure is given by:
P = n = 1 K n 2 δ x n + δ ( x n ) .
Then:
R | x ϕ λ ( x ) | P ( d x ) = K n = 1 e | λ | ( θ ( | x n | ) 1 ) ( | x n | + | x n | ) n 2 K n = 1 2 n 3 + n 2 n 2 r n 1 = 2 K ( 2 r ) ( 1 r ) 2 <
where r = e | λ | with λ 0 .
If we now try to calculate the improper integral L ( c ) for X for some number c, we obtain:
c M c + M x P ( d x ) = { n : c M x n c + M , n 0 } K x n n 2 = K { n : c M x n c + M , n 0 } n + K { n : c M x n c + M , n 0 } ( 1 ) | n | .
To see that L ( c ) does not exist for any c, consider the following. The first sum in the last expression, for M sufficiently large, will be the sum of consecutive nonzero integers from n 0 to n 1 with n 0 < 0 < n 1 , and the second sum will be ( ( 1 ) n 0 + ( 1 ) n 0 + 1 + + ( 1 ) n 1 ( 1 ) 0 ) where the power ( 1 ) 0 is added and then subtracted. The second sum will thus be either 0 , 1 , o r 2 , depending on the parity of n 0 and n 1 . The first sum will oscillate among zero, an increasingly large positive number, and an increasingly large negative number. Unless the first sum is eventually equal to zero for all sufficiently large M, there is no way the combined sum can have a finite limit. However, if the first sum is identically zero for large M, then n 0 = n 1 , and the second sum equals zero or 2 depending on the parity of n 1 . If M yields one parity for n 1 , a slight increase in M will reverse the parity. Therefore, the sum will oscillate between zero and 2 .
Now, consider the existence of L ( ϕ ) where we use the mollifier ϕ λ just defined.
x ϕ λ ( x ) P ( d x ) = { n : n 0 } K x n n 2 r | n | 1 = K { n : n 0 } n r | n | 1 + { n : n 0 } ( 1 ) | n | r | n | 1 = 2 K 1 + r .
As λ tends to zero, r tends to one and the above tends to L ( ϕ ) = K . Hence, L ( ϕ ) exists even though L ( c ) exists for no c.

5. Cumulative Distributions and Characteristic Functions

Two familiar alternative formulas for the ordinary mean are:
( 1 F ( t ) F ( t ) ) d t
and:
i d d t E ( e i t X ) t = 0 .
Here, F ( t ) = d e f P ( X t ) for t in R and F ( t ) = P ( X < t ) = lim x t F ( x ) . We review these formulas below and relate them to our notions of the mean.
In the case of the cumulative distribution, we have the following theorem.
Theorem 6.
Let X be a random variable with probability distribution P and cumulative distribution F. Then:
(i) 
E ( X ) exists if and only if 1 F ( t ) a n d F ( t ) are in L 1 ( [ 0 , ) ) with respect to the Lebesgue measure. In this case, E ( X ) = 0 ( 1 F ( t ) F ( t ) ) d t .
(ii) 
If lim M M ( P ( c + M < X ) P ( c M > X ) ) = 0 for some real number c, then L ( c ) exists if and only if lim M 0 M ( 1 F ( t ) F ( t ) ) d t exists. In this case, L ( c ) is given by the limit of the integral.
(iii) 
If L ( c ) and lim M 0 M ( 1 F ( t ) F ( t ) ) d t both exist for some real number c and are equal, then lim M M ( P ( c + M < X ) P ( c M > X ) ) = 0 .
Proof. 
In order for E ( X ) to exist, it is necessary and sufficient that x be integrable with respect to P on ( , ) . In particular, this is equivalent to x being integrable with respect to P on both [ 0 , ) and ( , 0 ] , with:
0 x P ( d x ) = 0 0 x 1 d t P ( d x ) = 0 P ( X t ) d t = 0 1 F ( t ) d t
and:
0 x P ( d x ) = 0 x 0 1 d t P ( d x ) = 0 P ( t X ) d t = 0 F ( t ) d t
where Fubini’s theorem ([11], p. 386) has been applied. The statement of Fubini’s theorem implies that integrability of x occurs if and only if 1 F ( t ) and F ( t ) are both integrable.
Turning to (ii) and (iii) of Theorem 6, for large M, consider:
c M c + M x P ( d x ) = 0 c + M x P ( d x ) + c M 0 x P ( d x ) = 0 c + M 0 x 1 d t P ( d x ) c M 0 x 0 1 d t P ( d x ) = 0 c + M P ( c + M X t ) d t c M 0 P ( t X c M ) d t
where again Fubini’s theorem has been used. This entire expression differs from:
0 M 1 ( 1 F ( t ) F ( t ) ) d t = 0 M 1 ( P ( X t ) P ( t X ) ) d t ,
where we have introduced a new variable M 1 , by the quantity:
0 c + M P ( X > c + M ) d t + c M 0 P ( c M > X ) d t + M 1 c + M P ( X t ) d t M 1 M c P ( t > X ) d t = M P ( X > c + M ) + M P ( c M > X ) c P ( X > c + M ) c P ( c M > X ) + M 1 c + M P ( X t ) d t M 1 M c P ( t > X ) d t .
If we suppose that lim K 0 K ( 1 F ( t ) F ( t ) ) d t , then the combination of the last two terms tends to zero as M and M 1 tend to since M c M + c P ( t > X ) d t tends to zero as M gets large. Likewise, the previous two terms with multiplier c tend to zero as M gets large. Therefore, L ( c ) exists and equals lim M 1 0 M 1 ( 1 F ( t ) F ( t ) ) d t if and only if lim M M ( P ( X > c + M ) P ( c M > X ) ) = 0 .
Writing M 1 = c + M 2 , we can express the difference described above as:
0 c + M P ( c + M X t ) d t c M 0 P ( t X c M ) d t 0 c + M 2 P ( X t ) d t c M 2 0 P ( t X ) d t = M 2 P ( X > c + M 2 ) + M 2 P ( c M 2 > X ) c P ( X > c + M 2 ) + c P ( c M 2 > X ) + c M 2 c M 2 P ( t X c M 2 ) d t + 0 c + M P ( c + M X t ) d t c M 0 P ( t X c M ) d t 0 c + M 2 P ( c + M 2 X t ) d t c M 2 0 P ( t X c M 2 ) d t .
As M and M 1 , and hence M 2 , tend to infinity, in case L ( c ) exists, the last two lines tend to zero (compare with (17)). The two lines above these also tend to zero. Accordingly, 0 M 1 ( 1 F ( t ) F ( t ) ) d t tends to L ( c ) as M 1 tends to infinity if and only if lim M 2 M 2 P ( X > c + M 2 ) M 2 P ( c M 2 > X ) = 0 . □
Theorem 6 suggests the possibility that the difference between 1 F ( t ) and F ( t ) might be in L 1 ( [ 0 , ) ) even though neither function individually is, or even more generally that the difference may not be integrable, but still might have a finite improper integral on [ 0 , ) . In these cases, provided:
lim M M P ( c + M < X ) M P ( c M > X ) = 0 ,
for some c, then X has a finite superweak mean given by (15). If, in addition to a finite improper integral, (18) holds for all c, then X has a doubly-weak mean given by Theorem 6 Part (ii). If X has a weak mean, then the Kolmogorov condition lim n n P X ( ] X ] > n ) = 0 ensures that (18) holds for all c, and accordingly the weak mean is automatically given by (15).
Example 8.
Example 2 can be reused. Since the distribution in that example is symmetric about zero, L ( 0 ) exists and equals zero, and M ( P X ( X > M ) P X ( M > X ) ) 0 for all M 0 . Hence:
lim M 0 M 1 F ( t ) F ( t ) d t = 0 .
Since L ( c ) does not exist for any non-zero c, by Theorem 6, Equation (18) cannot be true for any non-zero c.
Now, let us consider the expression (16) based on the derivative of the characteristic function. A natural question is to ask when this derivative exists, i.e., when the limit as t tends to zero exists for the expression:
e i t x 1 i t P X ( d x ) .
Note that e i t x 1 i t is an integrable function of x with respect to P for t non-zero and real and that | e i t x 1 i t | = | 0 x e i t u d u | | x | for such t. Furthermore, we can decompose the integral into a real and imaginary part arriving at:
e i t x 1 i t P ( d x ) = sin t x t P ( d x ) + i 1 cos t x t P ( d x ) .
The real part of this is precisely the expression one would obtain with the mollifier sin t x t x applied to x and the imaginary part presumably under some circumstances will tend to zero as t tends to zero. A more detailed look is permitted by the following expansion, where we consider the integral from M to M instead of from to and apply Fubini to get:
M M sin t x t P ( d x ) + i M M 1 cos t x t P ( d x ) = M M 0 x cos t u d u P ( d x ) + i M M 0 x sin t u d u P ( d x ) = 0 M ( cos t u ) P ( M X u ) P X ( u X M ) d u + i 0 M ( sin t u ) P ( M X u ) + P ( u X M ) d u .
This discussion is summarized in the following theorem due to Pitman [16].
Theorem 7.
Let X be a random variable with probability distribution P and cumulative distribution F. Then X has a weak mean if and only if the derivative of the characteristic function of X exists at t = 0 , in which case, E W ( X ) is given by that derivative, i.e.,
E W ( X ) = lim t 0 e i t x 1 i t P ( d x ) = lim t 0 sin t x t x x P ( d x ) = lim t 0 lim M 0 M ( cos t u ) P ( M X u ) P ( u X M ) d u .
Proof. 
See [16]. □
An implication of Theorem 7 is that the imaginary parts of the expressions represented in (19) all go to zero in the appropriate limits. Note that the second expression for E W ( X ) in Theorem 7 is obtained by a mollifier.

6. Conclusions

The legitimacy and importance of the Cauchy distribution stem in part from the fact that it is the quotient of two independent standard normal random variables. It has application in physics under the name of the Lorentz distribution. Indeed, long-tailed and counter-intuitive distributions are increasingly important in recent times in financial mathematics, the study of natural and man-made disasters, and computer network analysis (see Gumble [17], Resnick [18], or Taleb [19]).
The stable distributions, as discussed in [5,6], are the limits of the sums of independent random variables and necessarily arise in limit processes of renewal theory and random walks. They also arise in signal processing [7].
If there is only one tail, then the mean is often infinity. However, the considerations here may still apply to the the surrogate variable log X where X has values between zero and . The tails that result are often asymmetric, but it sometimes happens that log X is heavy-tailed at both ends. For this to happen, it suffices if P ( X t ) = C 1 / ( log ( 1 / t ) ) α with 0 < α 1 for 0 < t t 1 < 1 and P ( X t ) = C 2 / ( log t ) β with 1 β > 0 for t t 2 > 1 with suitable positive constants C 1 and C 2 .
Extending the notion of the mean to the Cauchy distribution, other stable distributions, surrogate variables, and other heavy-tailed cases, and investigating these extensions are ways of moving us beyond the standard framework into realms where engineers and scientists may benefit from mathematical insight.

Author Contributions

Both authors contributed substantially to the conceptualization and writing of this article.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Whittle, P. Probability via Expectation; Springer: New York, NY, USA, 1992. [Google Scholar]
  2. Silver, N. President’s Invited Address, JSM 2013, Montreal, Canada. 2013. Available online: https://ww2.amstat.org/meetings/jsm/2013/webcasts/index.cfm (accessed on 15 March 2019).
  3. Levitt, S. Thinking Differently about Big Data. Arthur M. Sackler Lecture, National Academy of Sciences, March 26, 2015. Available online: https://www.youtube.com/watch?v=r5jATFtKtl8 (accessed on 15 March 2019).
  4. Kosko, B. The Sample Mean The Edge: What Have You Changed Your Mind About? Why. 2008. Available online: http://www.edge.org/responses/what-have-you-changed-your-mind-about-why (accessed on 15 March 2019).
  5. Feller, W.E. An Introduction to Probability Theory and Its Applications; John Wiley & Sons, Inc.: New York, NY, USA, 1966; Volume 2. [Google Scholar]
  6. Gnedenko, B.V.; Kolmogorov, A.N. Limit Distributions for Sums of Independent Random Variables; Translated and Annotated by Chung, K.L.; Addison-Wesley: Cambridge, MA, USA, 1954. [Google Scholar]
  7. Nikias, C.L.; Shao, M. Signal Processing with Alpha-Stable Distributions and Applications; John Wiley & Sons: New York, NY, USA, 1995. [Google Scholar]
  8. Gray, J.E.; Vogt, A. Axiomatics for the Mean Using Bemporad’s Condition. Aequat. Math. 2015, 89, 1415–1431. [Google Scholar] [CrossRef]
  9. Mathews, J.; Walker, R.L. Mathematical Methods of Physics; Benjamin: New York, NY, USA, 1970. [Google Scholar]
  10. Pollard, D. A User’s Guide to Measure Theoretic Probability; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
  11. Hewitt, E.; Stromberg, K. Real and Abstract Analysis; Springer: New York, NY, USA, 1965. [Google Scholar]
  12. Kolmogorov, A.N. Foundations of the Theory of Probability; Chelsea Publishing Company: New York, NY, USA, 1950. [Google Scholar]
  13. Kolmogorov, A.N. Über die Summen durch den Zufall bestimmter anabhängiger Grössen. Math. Ann. 1928, 99, 309–319. [Google Scholar] [CrossRef]
  14. Kolmogorov, A.N. Bemerkung zu meiner Arbeit “Über die Summen zufälliger Grössen”. Math. Ann. 1929, 102, 484–488. [Google Scholar] [CrossRef]
  15. Volkov, I.I. Summation Methods, Encyclopedia of Mathematics. Available online: http://www.encyclopediaofmath.org/index.php? title=Summationmethods&oldid=11209 (accessed on 15 March 2019).
  16. Pitman, E.J.G. On the derivation of a characteristic function at the origin. Ann. Math. Stat. 1956, 27, 1156–1160. [Google Scholar] [CrossRef]
  17. Gumbel, E.J. Statistics of Extremes; Dover, Mineola: New York, NY, USA, 2004. [Google Scholar]
  18. Resnick, S.I. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling; Soringer Science Business Media LLC: New York, NY, USA, 2007. [Google Scholar]
  19. Taleb, N.N. The Black Swan: The Impact of the Highly Improbable; Random House: New York, NY, USA, 2010. [Google Scholar]

Share and Cite

MDPI and ACS Style

Gray, J.E.; Vogt, A. Means as Improper Integrals. Mathematics 2019, 7, 284. https://doi.org/10.3390/math7030284

AMA Style

Gray JE, Vogt A. Means as Improper Integrals. Mathematics. 2019; 7(3):284. https://doi.org/10.3390/math7030284

Chicago/Turabian Style

Gray, John E., and Andrew Vogt. 2019. "Means as Improper Integrals" Mathematics 7, no. 3: 284. https://doi.org/10.3390/math7030284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop