Cantelli’s Bounds for Generalized Tail Inequalities

Apollonio, Nicola

doi:10.3390/axioms14010043

Open AccessArticle

Cantelli’s Bounds for Generalized Tail Inequalities

by

Nicola Apollonio

^1,2

¹

Istituto per le Applicazioni del Calcolo, C.N.R.,Via dei Taurini 19, 00185 Roma, Italy

²

Istituto Nazionale di Alta Matematica Francesco Severi, Piazzale Aldo Moro, 5, 00185 Roma, Italy

Axioms 2025, 14(1), 43; https://doi.org/10.3390/axioms14010043

Submission received: 6 December 2024 / Revised: 3 January 2025 / Accepted: 4 January 2025 / Published: 6 January 2025

(This article belongs to the Special Issue Numerical Analysis and Applied Mathematics)

Download Versions Notes

Abstract

:

Let X be a centered random vector in a finite-dimensional real inner product space

E

. For a subset C of the ambient vector space V of

E

and

x, y \in V

, write

x ⪯_{C} y

if

y - x \in C

. If C is a closed convex cone in

E

, then

⪯_{C}

is a preorder on V, whereas if C is a proper cone in

E

, then

⪯_{C}

is actually a partial order on V. In this paper, we give sharp Cantelli-type inequalities for generalized tail probabilities such as

\Pr \{X ⪰_{C} b\}

for

b \in V

. These inequalities are obtained by “scalarizing”

X ⪰_{C} b

via cone duality and then by minimizing the classical univariate Cantelli’s bound over the scalarized inequalities. Three diverse applications to random matrices, tails of linear images of random vectors, and network homophily are also given.

Keywords:

tail inequalities; cone duality; Wigner matrix; network homophily

MSC:

62G32; 52A07

1. Introduction

Let Y be a random variable with finite mean

μ

and variance

σ^{2}

. Thus, the random variable

X = Y - μ

is centered and has the same variance as Y. For a positive real number b, the celebrated Cantelli inequality—also known as the one-sided Chebyshev’s inequality—reads as follows

\Pr \{X \geq b\} \leq \frac{σ^{2}}{b^{2} + σ^{2}} .

(1)

Both Cantelli’s inequality and the classical Chebyshev’s inequality can be (and have been) extended in several ways [1,2] to a random vector

X = {(X_{1}, \dots, X_{n})}^{'}

in

R^{n}

—here and throughout the rest of the paper,

u^{'}

denotes the transpose of the column vector

u \in R^{n}

. As shown in [1,2], there is a standard recipe that yields such extensions: Let X be a random vector supported by a subset S of

R^{n}

and let

Σ

be the covariance matrix of X. Let T be a Borel subset of S and

f : S \to R

such that

f (x) \geq 0

for all

x \in S

and

f (x) \geq 1

for all

x \in T

. Then, with

1_{T} (\cdot)

denoting the indicator of the set T over S, one has

f \geq 1_{T}

and

E (f (X)) \geq E (f (X) 1_{T} (X)) \geq E (1_{T} (X)) = \Pr \{X \in T\} .

This technique is essentially a “Markov inequality”. By taking f in the family

{f_{u} | f_{u} : R^{n} \to R_{+}^{n}, u \in R^{n}}

, where

f_{u} (x) = \frac{{(u^{'} x + u^{'} Σ u)}^{2}}{1 + u^{'} Σ u}

, and minimizing for u under the constraint

f \geq 1_{T}

, Marshall and Olkin obtained the following strong and general result.

Theorem 1

(Marshall and Olkin [1]). Let T be a closed convex set in

R^{n}

not containing the origin. If X is a centered random vector of

R^{n}

with a positive-definite covariance matrix Σ, then

\Pr \{X \in T\} \leq inf_{\begin{matrix} u \in R^{n} \\ u^{'} x \geq 1, \forall x \in T \end{matrix}} \frac{u^{'} Σ u}{1 + u^{'} Σ u} .

(2)

Furthermore, the inequality is sharp, in the sense that there exists a centered random vector

X_{0}

whose support contains T and whose covariance matrix is Σ, such that the inequality is attained as an equality.

Note that Cantelli’s inequality (1) follows from inequality (2) after dividing the univariate random variable X by the positive threshold b and observing that the variance of

X / b

is

σ^{2} / b^{2}

. The minimization problem on the right-hand side of (2) is solved by minimizing the quadratic form

u^{'} Σ u

over the same set. This is a convex minimization problem that can be solved using the techniques described in [3]. As proved in [1], the infimum in (2) is attained. The function

f_{\hat{u}}

corresponding to the vector

\hat{u}

, which attains the infimum in inequality (2), can be seen as a kind of envelope of a given shape (in this case quadratic) for the probability on the right-hand side. The same inequality can be interpreted in the following way: First, we linearly approximate T inside the probability. This approximation yields a family of linear inequalities, each of which is the tail of a scalar random variable. We then use Cantelli’s inequality (1) to bound each of these tails, and finally, we choose the tightest one. Let us describe this process for a non-empty arbitrary Borel subset T of

R^{n}

: Let

b (T) = {u \in R^{n} | u^{'} x \geq 1, \forall x \in T}

—note that

b (T)

is always a closed convex set regardless of the argument T (see Section 2 for more details) and hence a Borel set—since, by definition,

T \subseteq b (b (T))

and

x \in b (b (T)) \Leftrightarrow u^{'} x \geq 1, \forall u \in b (T)

, it follows that

\Pr \{X \in T\} \leq \Pr \{X \in b (b (T))\} = \Pr \{u^{'} X \geq 1, \forall u \in b (T)\},

where X is a centered random vector with a positive-definite covariance matrix

Σ

. Here, for all

u \in b (T)

, the random variable

u^{'} X

is a centered random variable with variance

u^{'} Σ u

. Thus, by Cantelli’s inequality, for all

u \in b (T)

, it holds that

\Pr \{X \in T\} \leq \frac{u^{'} Σ u}{1 + u^{'} Σ u} .

Thus, if T is a non-empty convex set, after taking the infimum over

b (T)

, we recover (2) from another perspective. It is easy to see that the same result holds in any finite-dimensional Euclidean space

E

. If T is of the form

b + C

, where C is a convex cone in

E

, then we provide a specialized Cantelli bound that is sharp. Furthermore, if C has a non-empty interior, then C induces a preorder

⪰_{C}

on the ambient space of

E

such that the event

(X \in T)

can be written as

(X ⪰_{C} b)

and can be interpreted as a generalized tail inequality (we recover classical tail inequalities when C is the non-negative orthant). Thus, Cantelli’s inequality naturally extends to generalized tail inequalities in finite-dimensional Euclidean spaces. While such an extension is not really genuine since, in the finite-dimensional case, it can be derived by standard arguments from the special case of the standard real n-dimensional Euclidean space (see Lemma 7), on the other hand, it allows a tight sharpening of the general case (Corollary 1) and also allows to cast seemingly more complicated events into the framework of tail inequalities, as they occur, for example, in random matrix theory. Consider the case where X is a random matrix sampled from a symmetric real ensemble (see Section 4.1 for precise definitions). A fundamental problem in this context is to understand, at least asymptotically, when the order of the matrix goes to ∞, the probability that the smallest eigenvalue of X is positive. This problem is completely and precisely solved for Gaussian matrices (see [4,5,6]): the probability of sampling a positive-definite matrix from a symmetric Gaussian ensemble goes to zero exponentially fast. In this paper, using Corollary 1 in Section 3, we prove that the same result holds more generally for the Wigner matrices of the form

\frac{M + M^{'}}{2}

, where M is a random matrix whose entries are i.i.d. centered random variables, although the rate we can provide is much weaker. The result follows by looking at the random matrix X as a random vector in the Euclidean space

E = (S^{n}, {〈 \cdot, \cdot 〉}_{F})

, where

S^{n}

is the real vector space of real symmetric matrices and

{〈 \cdot, \cdot 〉}_{F}

is the Frobenius inner product. If

S_{+}^{n}

denotes the cone of positive semi-definite matrices lying in

E

and ⪰ is the partial order induced by

S_{+}^{n}

, then, for a real number

ϵ

, with I being the identity matrix of order n, the generalized tail

(X ⪰_{S_{+}^{n}} ϵ I)

is the event that occurs when the least eigenvalue of X is at least

ϵ

. Another similar motivation for studying generalized tails comes from the need to compute the probability of the feasibility of systems of linear inequalities whose unknowns are real random variables and the coefficients of the system are deterministic (see Section 4.2). This problem can be cast in the context of concentration and tail inequalities for linear, possibly random images of random vectors [7,8]: Ref. [8] studies the related problem of determining general concentration inequalities, while [7] actually investigates the more general case of tails of random linear images of random vectors, which, in our interpretation, corresponds to the case where both the coefficients and the unknowns of the system are random variables. In this paper, we give a Cantelli-type bound on the feasibility of a system of linear inequalities with random unknowns and show that the “ordinary tails” of the linear image

f (X)

of a random vector X under a surjective map f are just generalized tails for the random vector X itself taken with respect to a polyhedral cone. Equivalently, by pursuing the interpretation of tails of linear images as solution sets of a linear system of inequalities in the random vector X, we show that if the system has a right-invertible coefficient matrix, then its solution set is a generalized tail for X taken with respect to a polyhedral cone: the rows of the coefficient matrix are the normal vectors of the proper maximal faces of the cone.

It is clear from the discussion above that tail inequalities for random vectors depend crucially on their covariance functions

Σ

. When

Σ

has some additional structure, for special choices of the threshold b, Cantelli’s inequality for generalized tails reads as a very simple expression in terms of certain norms of b. For instance (see Section 3 for details), for

b \geq 0

, if

Σ^{- 1}

is a non-negative matrix, then

\Pr \{X \geq b\} \leq {(1 + ∥ b ∥}_{M}^{2})^{- 1},

where

{∥ b ∥}_{M} = \sqrt{b^{'} Σ^{- 1} b}

is the Mahalanobis norm of b. Note that inequality (1) specializes to the inequality above after dividing the numerator and denominator by

σ^{2}

. The assumption

Σ^{- 1} \geq 0

is only seemingly artificial. It is often satisfied by finitely supported random vectors that occur in network science. In Section 4.3, we will give one such important case. In light of Theorem 7, which rests on observations from [9], having easily computable bounds on tail probabilities, like the one above, is generally the best we can hope for, even if we have no control over the quality of the approximation.

The rest of the paper is organized as follows: In Section 2, we develop the machinery to state and prove the main results provided in Section 3, while in Section 4, we apply our results to the probability of sampling a positive-definite Wigner matrix, to the probability of feasibility of a system of linear inequalities, and to testing homophily in networks.

2. Preparatory Results

Finite-dimensional Euclidean spaces, namely, finite-dimensional real vector spaces V equipped with an inner product

〈 \cdot, \cdot 〉

, are denoted by calligraphic letters such as

E

. The symbol

R^{n}

stands for the Euclidean space

(R^{n}, 〈 \cdot, \cdot 〉)

, where

〈 \cdot, \cdot 〉

is the standard dot product in

R^{n}

. In the following, when we talk about Euclidean spaces, we mean finite-dimensional Euclidean spaces over reals.

When dealing with random vectors, the Borel sets we are interested in have the form

b + C

, where

b \in V

and

C \subseteq V

is a non-empty Borel set, often a cone. A cone in

E

is a subset C of V that is closed under taking positive scalar multiples, i.e.,

λ C \subseteq C

for every

λ > 0

, while a convex cone C is a cone that is closed under taking sum, i.e.,

C + C \subseteq C

. The empty set, the set consisting only of the zero vector of V, and V itself are the trivial cones. In the following, when we speak of cones, we mean non-trivial cones. Crucial to the definition of generalized inequality is the following notion of duality of subsets of Euclidean spaces. First, identify the algebraic dual

V^{*}

of V with V by the inner product in

E

via the isomorphism

V ∋ v \mapsto 〈 v, \cdot 〉 \in V^{*}

. Let C be a non-empty subset of V. The dual

C^{*}

of C in

E

is the set

C^{*} = {u \in V | 〈 u, v 〉 \geq 0, \forall v \in C} .

Write

C^{* *}

for the double dual of C, namely,

{(C^{*})}^{*}

. The following facts, the first three of which are simple consequences of the definition, are known about the dual of a non-empty set C (see [10,11]).

Lemma 1.

Let

C \subseteq V

. Then,

(a): $C^{*}$ is always a closed convex cone;
(b): $C_{1} \subseteq C_{2} \Rightarrow C_{1}^{*} \supseteq C_{2}^{*}$ , i.e., duality is inclusion reversing;
(c): $C \subseteq C^{* *}$ ;
(d): $C = C^{* *}$ if and only if C is a closed convex cone.

Proof.

See [10,11]. □

The last property in the lemma implies that

C^{*} = C^{* * *}

because

C^{*}

is a closed convex cone by (a). A cone C is proper whenever C is a closed convex cone that is also pointed, i.e.,

C \cap - C = {0}

, where 0 is the zero vector of V, and has non-empty interior. A cone C is self-dual in

E

if

C = C^{*}

.

For a random vector

X = {(X_{1}, \dots, X_{n})}^{'} \in R^{n}

and

b = {(b_{1}, \dots, b_{n})}^{'} \in R^{n}

, the event

(X \geq b)

is said to be a tail of X. Such an event reads as

(X_{1} \geq b_{1}, \dots, X_{n} \geq b_{n})

and is the same event as

X - b \in R_{+}^{n}

. The non-negative orthant

R_{+}^{n}

is a self-dual cone in

R^{n}

. Clearly,

X - b

is non-negative if and only if

u^{'} (X - b)

is non-negative for all vectors

u \in R_{+}^{n}

, i.e., for all u in the dual cone of

R_{+}^{n}

. This fact can be generalized as follows. Let the ambient space of

E

be V and let C be a non-empty subset of V. For

x, y \in V

, write

y ⪰_{C} x

if

y - x \in C

. If C is a convex cone, then

⪰_{C}

is a pre-order on V, while if C is a proper cone, then

⪰_{C}

is a partial order on V. In any case, even when C is arbitrary, by duality, one has

y ⪰_{C} x ⟹ y ⪰_{C^{* *}} x ⟺ 〈 u, y - x 〉 \geq 0, \forall u \in C^{*}

(3)

which reduces to

y ⪰_{C} x ⟺ 〈 u, y - x 〉 \geq 0, \forall u \in C^{*}

when C is a closed convex cone. Generalized inequalities, and hence generalized tails, are well behaved with respect to linear transformations of random vectors because cones are preserved by such maps. In fact, if the latter are invertible, closedness is preserved as well.

We also need a lesser-known duality device, which we borrow from the theory of blocking pairs of polyhedra [12]. Let V be the ambient space of

E

. For

T \subseteq V

, the blocker of T in

E

is the set

b (T) = {u \in V | 〈 u, x 〉 \geq 1, \forall x \in T} .

Analogous to the dual of T, the blocker of T has the following properties, the first three of which are straightforward.

Lemma 2.

In the Euclidean space

E = (V, 〈 \cdot, \cdot 〉)

, let

T \subseteq V

. It holds that

(i): if $T \neq \emptyset$ , then $b (T)$ is a non-empty closed convex set;
(ii): if $T \subseteq T^{'}$ , then $b (T) \supseteq b (T^{'})$ ;
(iii): $T \subseteq b (b (T))$ ;
(iv): if $T = b + C$ where $b \in V$ and C is such that $\emptyset \neq C \subseteq V$ , then

$b (b + C) \supseteq b (b + C^{* *}) \supseteq C^{*} \cap {u \in V | 〈 u, b 〉 \geq 1};$

(4)

moreover, if C is a nontrivial cone, then

$C^{*} \cap {u \in V | 〈 u, b 〉 > 0} \supseteq b (b + C) = C^{*} \cap {u \in V | 〈 u, b 〉 \geq 1} .$

(5)

Proof.

If

T \neq \emptyset

, then

x / \sqrt{〈 x, x 〉} \in b (T)

for all

x \in T

. Hence, if T is non-empty, then so is

b (T)

. Moreover, since

b (T)

is the intersection of closed half-spaces, then

b (T)

is a closed convex set. This establishes (i). Statements (ii) and (iii) are straightforward. Let us prove (iv). To prove (4), first observe that if

u \in C^{*} \cap {u \in V | 〈 u, b 〉 \geq 1}

, then

〈 u, b + y 〉 \geq 1, \forall y \in C^{* *}

. Hence,

u \in b (b + C^{* *})

and

u \in b (b + C)

because

b (b + C) \supseteq b (b + C^{* *})

by (ii). Let us prove (5). Since

C^{*} \cap {u \in V | 〈 u, b 〉 \geq 1} \subseteq C^{*} \cap {u \in V | 〈 u, b 〉 > 0}

, it follows that to prove (5) it suffices to prove that

b (b + C) = C^{*} \cap {u \in V | 〈 u, b 〉 \geq 1}

, which follows by

b (b + C) \subseteq C^{*} \cap {u \in V | 〈 u, b 〉 \geq 1}

after (4). Let us prove the latter inclusion. Note that if

u \in b (b + C)

and C is a cone, then necessarily

〈 u, y 〉 \geq 0

for all

y \in C

for, if not, there exists

y_{0} \in C

such that

〈 u, y_{0} 〉 < 0

and

b + λ y_{0} \in T

for all

λ > 0

. Hence,

〈 u, b + λ y_{0} 〉 < 1

for sufficiently large

λ

, which contradicts

u \in b (b + C)

. We conclude that

〈 u, y 〉 \geq 0

for all

y \in C

, and thus,

b (b + C) \subseteq C^{*}

. By the same reasoning, it holds that

u \in b (b + C) \Rightarrow 〈 u, b 〉 \geq 1

. To see this, assume by contradiction that

〈 u_{0}, b 〉 < 1

for some

u_{0} \in b (b + C)

. Since

u_{0} \in b (b + C)

, there exists

y_{0} \in C

,

y_{0} \neq 0

, such that

〈 u_{0}, b + λ y_{0} 〉 \geq 1

for all

λ > 0

. However, the latter inequality cannot be satisfied for a small enough

λ

. We conclude that the desired inclusion is true. □

Remark 1.

Usually the blocker of a polyhedron T of the form

P + R_{+}^{n}

is defined as the set

B (T) = {u \in R_{+}^{n} | u^{'} x \geq 1, \forall x \in T}

. Hence,

B (T) = b (T) \cap R_{+}^{n}

in

R^{n}

. However, the two definitions coincide in this case because, reasoning as in the proof of (5),

b (T) \subseteq {(R_{+}^{n})}^{*} = R_{+}^{n}

. Therefore,

b (T) = B (T)

.

In the Euclidean space

E = (V, 〈 \cdot, \cdot 〉)

, for a set C and a vector b in V the set

a (b, C) = C^{*} \cap {u \in V | 〈 u, b 〉 > 0}

plays an essential role in the following intermediate results.

Lemma 3.

Let C be a set in the Euclidean space

E = (V, 〈 \cdot, \cdot 〉)

. One has

a (b, C) \neq \emptyset

if and only if

0 \notin b + C^{* *}

. Moreover, if C is a cone, then one has

a (b, C) \neq \emptyset

if and only if

b (b + C) \neq \emptyset

.

Proof.

To prove the first assertion, note that

a (b, C) = \emptyset ⟺ 〈 u, b 〉 \leq 0 \forall u \in C^{*} ⟺ - b \in C^{* *} ⟺ 0 \in b + C^{* *} .

For the second assertion observe that

b (b + C) \neq \emptyset \Rightarrow a (b, C) \neq \emptyset

by (iv) in Lemma 2. Conversely, if

u \in a (b, C)

, then

\frac{u}{〈 u, b 〉} \in b (b + C)

by definition. □

Note that the condition

0 \notin b + C^{* *}

in Lemma 3 can be written as

b \in V ∖ - C^{* *}

.

Lemma 4.

In the Euclidean space

E = (V, 〈 \cdot, \cdot 〉)

, let C be a non-trivial cone, and

b \in V ∖ - C^{* *}

. If

f, g : V \to R

are defined by

f (u) = \frac{q (u)}{{〈 u, b 〉}^{2} + q (u)} and g (u) = \frac{q (u)}{1 + q (u)},

where

q : V \to R

is a positive-definite quadratic form, then

inf_{a (b, C)} f (u) = inf_{b (b + C)} g (u) .

Proof.

The assumption on b guarantees that

a (b, C)

and

b (b + C)

are both non-empty by Lemma 3. Since q is a quadratic form, it is homogeneous of degree 2. Hence, f is homogeneous of degree 0, namely,

f (λ u) = f (u)

. Observe that

f (u) \leq 1

for all

u \in V

and that

f (u) = 1

over the hyperplane

{u \in V | u^{'} b = 0}

. Thus, by homogeneity, for every

u \in C^{*}

, f is constant on the rays

λ u

,

λ > 0

. In particular,

f (u) = f (\frac{u}{〈 u, b 〉}) = g (\frac{u}{〈 u, b 〉}), \forall u \in C^{*} \cap {u \in V | 〈 u, b 〉 > 0} .

(6)

Since, by (5), it holds that

b (b + C) = C^{*} \cap {u \in V | 〈 u, b 〉 \geq 1}

, it follows that

f (u) = \frac{q (u)}{{〈 u, b 〉}^{2} + q (u)} \leq \frac{q (u)}{1 + q (u)} = g (u) \forall u \in b (b + C) .

(7)

Let

H = {u \in V | 〈 u, b 〉 = 1}

and observe that

\begin{matrix} inf_{a (b, C)} f (u) & = inf_{a (b, C) \cap H} g (u) (by (6)) \\ \geq inf_{b (b + C)} g (u) (because a (b, C) \cap H \subseteq b (b + C) by (5)) \\ \geq inf_{b (b + C)} f (u) (by (7)) \\ \geq inf_{a (b, C)} f (u) (because a (b, C) \supseteq b (b + C) by (5)) \end{matrix}

Therefore, equality must hold throughout, yielding the desired equality. □

Recall that given a self-adjoint positive-definite endomorphism of V in

E = (V, 〈 \cdot, \cdot 〉)

, the bilinear form

〈 u, A (v) 〉

induces a norm p on V by

p (v) = \sqrt{〈 v, A (v) 〉}

. The dual norm of p (see [13], Section 5.4) is the norm

p^{*}

such that

p^{*} (v) = sup_{u \neq 0} \frac{| 〈 u, v 〉 |}{\sqrt{〈 u, A (u) 〉}} .

Lemma 5.

Let A be a self-adjoint positive-definite endomorphism of V in

E = (V, 〈 \cdot, \cdot 〉)

. If p is the norm on V defined by

p (v) = \sqrt{〈 v, A (v) 〉}

, then

p^{*} (v) = \sqrt{〈 v, A^{- 1} (v) 〉}

.

Proof.

Although this is actually a “folklore” fact, we were unable to locate a reference. Therefore, we provide a proof here.

Since A is a self-adjoint positive-definite endomorphism,

A = B^{2}

for some self-adjoint positive-definite endomorphism B (B is a square-root of A). Hence, by the Cauchy–Schwarz inequality and because B is self-adjoint,

\begin{matrix} | 〈 u, v 〉 | = | 〈 u, B B^{- 1} (v) 〉 | & = | 〈 B (u), B^{- 1} (v) 〉 | \\ \leq \sqrt{〈 B (u), B (u) 〉} \sqrt{〈 B^{- 1} (v), B^{- 1} (v) 〉} \\ = \sqrt{〈 u, A (u) 〉} \sqrt{〈 v, A^{- 1} (v) 〉} . \end{matrix}

Therefore,

p^{*} (v) \leq \sqrt{〈 v, A^{- 1} (v) 〉}

with equality for

v = 0

. On the other hand, if

v \neq 0

, then the vector

\bar{v}

defined by

\bar{v} = \frac{A^{- 1} (v)}{\sqrt{〈 v, A^{- 1} (v) 〉}}

is such that

p^{*} (v) \geq \frac{| 〈 \bar{v}, v 〉 |}{\sqrt{〈 \bar{v}, A (\bar{v}) 〉}} = \sqrt{〈 v, A^{- 1} (v) 〉} .

Hence,

p^{*}

has the stated expression. □

Lemma 6.

In the Euclidean space

E = (V, 〈 \cdot, \cdot 〉)

, let C be a closed convex cone,

b \in V ∖ - C

, and p the norm

p (v) = \sqrt{〈 v, A (v) 〉}

on V, where A is a self-adjoint positive-definite endomorphism of V. If

b \in A (C^{*})

, then, with

q = p^{2}

, in the notation of Lemma 4,

inf_{a (b, C)} f (u) = \frac{1}{{(p^{*} (b))}^{2} + 1}

where

p^{*}

is the dual norm of p, namely,

p^{*} (b) = sup_{u \neq 0} \frac{〈 u, b 〉}{p (u)} = \sqrt{〈 b, A^{- 1} (b) 〉} .

Proof.

Since C is a closed convex cone, then

C = C^{* *}

by (d). Hence, the assumption on b guarantees that

a (b, C)

is non-empty by Lemma 3. Therefore, we can divide the numerator and the denominator of

f (u)

by

q (v)

. This yields

inf_{a (b, C)} f (u) = {(sup_{a (b, C)} {\{\frac{| 〈 u, b 〉 |}{p (u)}\}}^{2} + 1)}^{- 1} .

Here, if

b \in A (C^{*})

, then

A^{- 1} (b) \in C^{*}

. Hence,

\hat{b} = \frac{A^{- 1} (b)}{\sqrt{〈 b, A^{- 1} (b) 〉}}

belongs to

a (C, b)

because

\hat{b} \in C^{*}

, with

\hat{b}

being a positive scalar multiple of

A^{- 1} (b)

, and

〈 \hat{b}, b 〉 = \sqrt{〈 b, A^{- 1} (b) 〉} > 0

. By Lemma 5, after plugging

\hat{b}

into

\frac{〈 u, b 〉}{p (u)}

, we conclude that the supremum

p^{*} (b)

of

\frac{| 〈 u, b 〉 |}{p (u)}

is attained over

a (b, C)

, and this concludes the proof. □

3. Results

Let

(Ω, F, P)

be a probability space, where

Ω

is a set,

F

is a

σ

-algebra on

Ω

, and

P

is a probability measure on

F

. Also, let

E = (V, 〈 \cdot, \cdot 〉)

be a Euclidean space and

B

the smallest

σ

-algebra containing all the open balls of V taken with respect to the norm induced by

〈 \cdot, \cdot 〉

—this

σ

-algebra does not depend on the particular inner product chosen on V. A random vector X in

E

is a

(F, B)

-measurable map

X : Ω \to V

. The algebra

B

is the algebra of Borel sets of

E

. To prove our results, we first need the following lemma.

Lemma 7.

Let X be a centered random vector in

E = (V, 〈 \cdot, \cdot 〉)

with a positive-definite covariance function Σ. Let T be a non-empty Borel subset of V and

b \in V

,

b \neq 0

. Then, for all

u \in b (T)

, it holds that

\Pr \{X \in T\} \leq \frac{〈 u, Σ (u) 〉}{1 + 〈 u, Σ (u) 〉} .

Therefore, if

b (T)

is non-empty, then

\Pr \{X \in T\} \leq inf_{u \in b (T)} \frac{〈 u, Σ (u) 〉}{1 + 〈 u, Σ (u) 〉} .

If T is a closed convex set such that

0 \notin T

, then the above bound is sharp.

Proof.

The second inequality follows from the first provided that

b (T)

is non-empty. The proof of the first inequality is formally identical to the proof of the same inequality in

R^{n}

given in Section 1 and is a direct consequence of (ii) and Cantelli’s inequality (1) after noticing that, for a random vector X in

E

, the variance of the random variable

〈 u, X 〉

is

〈 u, Σ u 〉

. It remains to be shown that if T is closed and convex in

E

and

0 \notin T

, then the bound is sharp. We deduce this result from Theorem 1 after reducing the general case to

R^{n}

by the following argument. Every Euclidean space is a topological vector space with respect to the standard topology induced by the inner product (recall that, in our terminology, Euclidean spaces are finite-dimensional and, hence, Hilbert spaces). Euclidean spaces of the same dimension are pairwise homeomorphic, and all are homeomorphic to

R^{n}

under the coordinate map isomorphism f (with respect to a fixed orthonormal basis). Thus, if X is a centered random vector in

E

, then

f (X)

is a centered random vector in

R^{n}

, and conversely. Moreover, if the covariance function

Σ

of X is positive-definite, then the covariance matrix

\tilde{Σ}

of

f (X)

is positive-definite, and conversely. Furthermore, T is a convex closed set in

E

if and only if

f (T)

is a closed set in

R^{n}

(with the standard Euclidean topology): linear images of convex sets are convex, and homeomorphic linear images of closed convex sets are closed and convex. Finally, since f maps the zero vector

0_{V}

of

E

to the zero vector 0 of

R^{n}

, it follows that

0_{V} \notin T

if and only if

0 \notin f (T)

. Here,

\Pr \{X \in T\} = \Pr \{f (X) \in f (T)\} .

Therefore, if

{\tilde{X}}_{0}

is a random vector in

R^{n}

with a positive-definite matrix

\tilde{Σ}

that attains the bound in Theorem 1 as equality, which exists because the hypotheses on

f (T)

are satisfied, then

f^{- 1} ({\tilde{X}}_{0})

is a random vector in

E

that attains the bound in this lemma. □

Theorem 2.

Let X be a centered random vector of

E = (V, 〈 \cdot, \cdot 〉)

with positive-definite covariance Σ. Let C be a non-empty Borel subset of V and

b \in V

,

b \neq 0

. Then, for all

u \in C^{*}

such that

〈 u, b 〉 > 0

, it holds that

\Pr \{X ⪰_{C} b\} \leq \frac{〈 u, Σ (u) 〉}{{〈 u, b 〉}^{2} + 〈 u, Σ (u) 〉} .

If C is a closed convex cone in

E

and

b \in V ∖ - C

, then

a (b, C)

is non-empty and

\Pr \{X ⪰_{C} b\} \leq inf_{u \in a (b, C)} \frac{〈 u, Σ (u) 〉}{{〈 u, b 〉}^{2} + 〈 u, Σ (u) 〉} = inf_{u \in b (b + C)} \frac{〈 u, Σ (u) 〉}{1 + 〈 u, Σ (u) 〉} .

Moreover, the above bound is sharp.

Proof.

Since

\Pr \{X ⪰_{C} b\} \leq \Pr \{X ⪰_{C^{* *}} b\}

, the first inequality follows from (3) (with X instead of y and b instead of x) by applying Cantelli’s inequality (1) to the random variable

〈 u, X 〉

. If C is a closed convex cone and

b \in V ∖ - C

, then

a (b, C)

is non-empty because the hypotheses of Lemma 3 are satisfied (recall that since C is a closed convex cone, one has

C = C^{* *}

by (c) in Lemma 1). Moreover, still by Lemma 3,

b (b + C)

is non-empty as well. Hence, the second inequality follows from the first by Lemma 4 with

q (u) = 〈 u, Σ (u) 〉

. It remains to be proven that the bound is sharp. Let

T = b + C

. Since C is closed and convex, then so is T. Moreover, the hypotheses on b and C implies

0 \notin T

. Therefore, Lemma 7 applies, and we conclude that the bound is sharp. □

Although the previous theorem deals with the special case of

T = b + C

of Lemma 7, which, in turn, up to technicalities, follows by Theorems 1 and 2 provides a useful sharpening of the extended Cantelli’s inequality given in Theorem 1, which is further exploited in the next corollary.

Corollary 1.

Let X be a centered random vector of

E = (V, 〈 \cdot, \cdot 〉)

with positive-definite covariance Σ. Let C be a closed convex cone in

E

. If

b \in Σ (C^{*}) ∖ - C

, where

Σ (C^{*})

denotes the linear image of

C^{*}

under Σ, then

\Pr \{X ⪰_{C} b\} \leq \frac{1}{1 + {∥ b ∥}_{Σ^{- 1}}^{2}},

(8)

where

{∥ b ∥}_{Σ^{- 1}} = 〈 b, Σ^{- 1} (b) 〉

. Moreover, the bound is sharp.

Proof.

Directly from Lemmas 4 and 6, with

p^{*} = \sqrt{〈 b, Σ^{- 1} (b) 〉}

. □

The previous corollary can be further (and straightforwardly) specialized in

R^{n}

if X has a monotone covariance matrix. Recall that an invertible real matrix A is monotone if

A^{- 1}

has nonnegative entries (see [14], Chapter 6). Random vectors whose covariance matrix has this property are common in Network Science. See Section 4.3 for such an example.

Corollary 2.

Let X be a centered random vector in

R^{n}

with a positive monotone covariance matrix Σ. Let C be a closed convex cone in

R^{n}

contained in the orthant

R_{+}^{n}

. If

b \in R_{+}^{n}

, then the hypotheses of Corollary 1 are satisfied, and hence, (5) holds.

Proof.

We only need to show that the hypotheses of the present corollary guarantee that the hypotheses of Corollary 1 are met, namely,

\{\begin{matrix} C = R_{+}^{n} \\ b \geq 0 \\ Σ^{- 1} \geq 0 \end{matrix} ⟹ b \in Σ (C^{*}) ∖ - C .

Clearly,

(Σ^{- 1} \geq 0 \land b \geq 0) \Rightarrow Σ^{- 1} b \geq 0

. Hence, just as clearly,

Σ^{- 1} b \notin - C

. Since

R_{+}^{n}

is a self-dual closed convex cone, one has

C^{*} = R_{+}^{n}

and, thus,

Σ^{- 1} b \in C^{*}

. Since

Σ^{- 1} b \in C^{*} \Rightarrow b \in Σ (C^{*})

, we conclude that

b \in Σ (C^{*}) ∖ - C

as required. □

4. Some Applications

In this section, we discuss some applications of Theorem 2 and its corollary as well as of Theorem 1 in the perspective of the duality of cones and polyhedra. Indeed, we have already noticed that the minimization problem associated with the bounds given in the previous section reduces to minimizing the positive-definite quadratic form

q (u) = 〈 u, Σ (u) 〉

over

a (b, C)

or over

b (T)

. Even if

T = b + C

and C is a closed convex cone, this minimization problem is a non-trivial convex programming problem [15]. However, knowing the structure of

b (T)

allows us to easily compute upper bounds on the probability of certain interesting tail events.

4.1. Sampling Positive-Definite Matrices from a Random Ensemble

A random vector X in the Euclidean space

E = (V, 〈 \cdot, \cdot 〉)

with a covariance function

Σ

has a weakly spherical distribution if the distribution of X is such that

Σ = σ^{2} {id}_{V}

, where

{id}_{V}

is the identity endomorphism of V and

σ^{2}

is a positive real number [16]. A square random matrix is a just a random vector in

(R^{n \times n}, {〈 \cdot, \cdot 〉}_{F})

, where

{〈 X, Y 〉}_{F} = Tr (X^{'} Y)

is the Frobenius inner product. A symmetric random matrix is, thus, a random vector in

(S^{n}, {〈 \cdot, \cdot 〉}_{F})

, where

S^{n}

is the real vector space of real symmetric matrices. Several models (i.e., probability measures) of symmetric random matrices are known and well understood [17]. Probably the best-known example of such models is the Symmetric Wigner Ensemble, which includes the Gaussian Orthogonal Ensemble. A random matrix is a symmetric Wigner matrix (or sampled from a symmetric Wigner ensemble) if its diagonal entries are independently and identically distributed (i.i.d.) according to a given distribution and the off-diagonal elements in the upper triangular part are i.i.d. according to a given distribution that may differ from the former. A symmetric real ensemble is, thus, completely specified by the distribution of the entries of the matrix and, thus, does not depend on the order of the matrix. If the diagonal elements of a symmetric Wigner matrix W follow a standard normal distribution, while the off-diagonal elements follow a normal distribution with zero mean and variance

1 / 2

, then W is a matrix from the Gaussian Orthogonal Ensemble. Here is one of the simplest mechanisms that, given a random matrix ensemble, yields a symmetric random matrix ensemble: if M is a random matrix whose entries are i.i.d. random variables distributed according to the law P, then

S = \frac{M + M^{'}}{2}

is a symmetric Wigner matrix—taking

M^{'} M

for S instead yields the symmetric Wishart ensemble. If the first two moments of P are 0 and

σ^{2}

, respectively, for some positive real number

σ

, then we denote the resulting ensemble by

W (σ)

. By construction, any matrix S sampled from

W (σ)

has a weakly spherical distribution. To see this, note that the variance of each diagonal entry of S is

σ^{2}

, while the variance of each off-diagonal entry of S is

σ^{2} / 2

, since

S = (M + M^{'}) / 2

by construction. Next, if

(E_{i, j} | i \leq j)

is the standard basis of

S^{n}

, then, by the definition of the covariance function of random vectors in Euclidean spaces (see the previous section), one has

E ({〈 E_{i, j}, S 〉}_{F} {〈 E_{h, k}, S 〉}_{F}) = E (Tr (E_{i, j} S) Tr (E_{h, k} S)) = \{\begin{matrix} E (S_{i, i}^{2}) & if i = j = h = k \\ 4 E (S_{i, j}^{2}) & if {i, j} = {h, k}, i \neq j \\ 0 & otherwise . \end{matrix}

Therefore, since

E (S_{i, i}^{2}) = σ^{2}

and

E (S_{i, j}^{2}) = σ^{2} / 2

, it follows that

E ({〈 E_{i, j}, S 〉}_{F} {〈 E_{h, k}, S 〉}_{F}) = σ^{2} {〈 E_{i, j}, E_{h, k} 〉}_{F} = {〈 E_{i, j}, σ^{2} {id}_{S^{n}} (E_{h, k}) 〉}_{F} .

One of the most fundamental problems in random matrix theory is to understand the spectrum of a random matrix from a given ensemble, at least asymptotically, namely, when the order of the matrix goes to ∞. Even more fundamental is to compute the probability that a random matrix X is positive-definite (or negative-definite), i.e., that its smallest eigenvalue

λ_{n} (X)

is positive. This problem is completely and precisely solved for Gaussian matrices (see [4,5,6]): the probability of the event

λ_{n} (X) > 0

when X is sampled from a symmetric Gaussian ensemble goes to zero as

e^{- c n^{2}}

, where

c = ln 3 / 4

. Using corollary 1, we prove here that the same is true in general, although the rate we can provide is much weaker than

e^{- c n^{2}}

.

Lemma 8.

Let σ be a positive real number and

λ_{n} (X)

be the least eigenvalue of the random matrix X sampled from

W (σ)

. For any positive real number α and any positive number ξ, if

n \geq σ^{2} / ξ^{2 + α}

, then

\Pr \{λ_{n} (X) \geq ξ\} \leq ξ^{α}

.

Proof.

Since the statement trivially holds when

ξ \geq 1

, we may assume that

ξ < 1

. The event

λ_{n} (X) \geq ξ

is the same as the event

X ⪰_{S_{+}^{n}} ξ I_{n}

, where

I_{n}

is the identity matrix of order n and

S_{+}^{n}

denotes the cone of positive semi-definite matrices lying in

E = (S^{n}, {〈 \cdot, \cdot 〉}_{F})

. As observed above, X has a weakly spherical distribution with a covariance function

Σ

of the form

σ^{2} {id}_{S^{n}}

. It is well known that

S_{n}^{+}

is a self-dual closed convex cone [10]. Hence,

{(S_{n}^{+})}^{*} = S_{n}^{+}

and

Σ ({(S_{n}^{+})}^{*}) = Σ (S_{n}^{+}) = σ^{2} {id}_{S^{n}} (S_{n}^{+}) = S_{n}^{+} .

Since

ξ I_{n} \in S_{+}^{n}

, it follows that the hypotheses of Corollary 1 are satisfied by X with

b = ξ I_{n}

and

C = S_{+}^{n}

. Therefore, after observing that

∥ ξ I_{n} ∥_{Σ^{- 1}}^{2} = \frac{n ξ^{2}}{σ^{2}}

, it follows by (8) that

\Pr \{λ_{n} (X) \geq ξ\} = \Pr \{X ⪰_{S_{+}^{n}} ξ I_{n}\} \leq \frac{1}{1 + ∥ ξ I_{n} ∥_{Σ^{- 1}}^{2}} = \frac{σ^{2}}{σ^{2} + n ξ^{2}} \leq ξ^{α},

where the last inequality is implied by

n \geq σ^{2} / ξ^{2 + α}

. □

Theorem 3.

Let σ be a positive real number and

λ_{n} (X)

be the least eigenvalue of the random matrix X sampled from

W (σ)

. It holds that

{lim}_{n \to \infty} \Pr \{λ_{n} (X) > 0\} = 0

.

Proof.

For any non-negative real number y, let

ℓ_{n} (y) = \Pr \{λ_{n} (X) \geq y\}

. If

\Pr \{λ_{n} (X) > 0\}

does not converge to zero, then there is a real number

ξ

and a sub-sequence

{ℓ_{n_{ν}} (ξ)}_{ν \in N}

such that

0 < ξ < 1

and

{lim}_{ν \to \infty} ℓ_{n_{ν}} (ξ) = β

for some

β

such that

0 < β < 1

. Hence, for any

ϵ \in R

such that

0 < ϵ < β

, there is a

ν_{0} \in N

such that

ℓ_{n_{ν}} (ξ) \geq β - ϵ \forall ν \geq ν_{0} .

On the other hand, by Lemma 8, for every positive real number

α

, it holds that

ℓ_{n} (ξ) \leq ξ^{α} \forall n \geq \frac{σ^{2}}{ξ^{2 + α}} .

Let

α_{0} = \frac{ln (β - ϵ)}{ln (ξ)},

where ln is the natural logarithm. Since

ξ

and

β - ϵ

are both positive and strictly smaller than 1, it holds that

α_{0} > 0

. Hence, if

α > α_{0}

, then

ξ^{α} < β - ϵ

. If

ν_{1}

is such that

n_{ν_{1}} > \frac{σ^{2}}{ξ^{2 + α}}

, then for

ν \geq max {ν_{0}, ν_{1}}

, it holds that

β - ϵ < ℓ_{n_{ν}} (ξ) \leq ξ^{α}

But this contradicts

ξ^{α} < β - ϵ

, which holds by the choice of

α

. This contradiction proves the theorem. □

4.2. Tails of Linear Images of Random Vectors and Asymptotic Feasibility of Systems of Linear Inequalities

Concentration and tail inequalities for linear, possibly random images of random vectors are a challenging topic in probability and statistics [7,8]. Let X be a random vector in

R^{n}

and f be the linear map

X \mapsto A X

, where

A \in R^{m \times n}

is a given matrix. According to [7], a typical tail inequality is an inequality of the form

\Pr \{f (X) ⪰_{R^{m}} s b\} \leq g (s)

for some function

g : R_{+} \to [0, 1]

. See [8] for the related problem of determining general concentration inequalities. Notably, Ref. [7] investigates the more general case of the map

A X

, where

A

is a random matrix and X is a random vector, under certain hypotheses on the distribution of

A

, assuming that

A

and X are independent. If the matrix f is a deterministic linear map and X has a positive-definite covariance function

Σ

, then, as we will show in the next theorem, there always exists a rather general and easy-to-compute upper bound

g (s)

on the probability measure of the tails of

f (X)

. In Theorem 5, on the other hand, we will exhibit a transparent relation between the ordinary tails of the linear images of a random vector X and the generalized tails of X.

Theorem 4.

Let

A \in R^{m \times n}

be a real matrix,

b = {(b_{1}, \dots, b_{m})}^{'}

be a real vector, and

b^{+} \in R^{m}

be defined component-wise by

b_{i}^{+} = max (0, b_{i})

,

i = 1 \dots, m

. If X is a centered random vector with a covariance matrix

σ^{2} I_{n}

and

b^{+} \neq 0

, then

\Pr \{A X \geq s b\} \leq \frac{{∥ A ∥}_{op}^{2}}{{∥ \frac{s}{σ} b^{+} ∥}^{2} + {∥ A ∥}_{op}^{2}},

(9)

where

s \geq 0

is a real number and

∥ \cdot ∥

and

{∥ \cdot ∥}_{op}

are the Euclidean and the operator norm—induced by Euclidean norms—respectively.

Proof.

Let

T (s) = {x \in R^{n} | A x \geq s b}

, and note that

T (s)

is a possibly empty polyhedron. If

s = 0

, the result is trivial because the right side of (9) is 1 in this case. If

T (s)

is empty, the result is also trivial because the left-hand side of (9) is zero. So, we assume that

s > 0

and

T (s)

is not empty. Obviously,

T (s) = {x \in R^{n} | A (s) x \geq b}

, where

A (s) = s^{- 1} A

for

s > 0

. Since

b^{+} \neq 0

, it follows that

T (s)

does not contain the origin of

R^{n}

. Hence,

\Pr \{A X \geq s b\} = \Pr \{X \in T (s)\} \leq inf_{u \in b (T (s))} \frac{{∥ σ u ∥}^{2}}{1 + {∥ σ u ∥}^{2}}

by Theorem 1 with

Σ = σ^{2} I_{n}

. Let us compute the blocker

b (T (s))

of

T (s)

. Let

c : R^{n} \to R

be defined by

u \mapsto c (u) = min {u^{'} x | x \in T (s)};

that is,

c (u)

is the minimum of a linear program over the non-empty polyhedron

T (s)

and the minimum might be

- \infty

. By linear programming duality, it holds that

c (u) = max {v^{'} b | A^{'} (s) v = u, v \in R_{+}^{m}}

. Since, by definition,

u \in b (T (s)) ⟺ u^{'} x \geq 1 \forall x \in T (s) ⟺ c (u) \geq 1,

it follows that

b (T (s)) = {A^{'} (s) v | v^{'} b \geq 1, v \in R_{+}^{m}} .

Therefore, if

\bar{v}

is any vector in

R_{+}^{m}

such that

{\bar{v}}^{'} b \geq 1

, then

inf_{u \in b (T (s))} \frac{{∥ σ u ∥}^{2}}{1 + {∥ σ u ∥}^{2}} = inf_{\begin{matrix} v \in R_{+}^{m} \\ v^{'} b \geq 1 \end{matrix}} \frac{{∥ σ A (s)}^{'} {v ∥}^{2}}{{1 + ∥ σ A (s)}^{'} {v ∥}^{2}} \leq \frac{{∥ σ A (s)}^{'} \bar{v} ∥^{2}}{{1 + ∥ σ A (s)}^{'} \bar{v} ∥^{2}} .

Since the map

ξ \mapsto \frac{ξ}{1 + ξ}

defines a strictly increasing function over

R^{+}

and since

{∥ σ A (s)}^{'} \bar{v} ∥^{2} = {(\frac{σ}{s})}^{2} ∥ A^{'} \bar{v} ∥^{2} \leq {(\frac{σ}{s})}^{2} ∥ A^{'} ∥_{op}^{2} {∥ \bar{v} ∥}^{2},

it follows that

inf_{u \in b (T (s))} \frac{{∥ σ u ∥}^{2}}{1 + {∥ σ u ∥}^{2}} \leq \frac{{∥ A ∥}_{op}^{2} {∥ \bar{v} ∥}^{2}}{{(\frac{s}{σ})}^{2} + {∥ A ∥}_{op}^{2} {∥ \bar{v} ∥}^{2}} .

To complete the proof, choose

\frac{b^{+}}{∥ b^{+} ∥^{2}}

for

\bar{v}

and check that

\frac{b^{+}}{∥ b^{+} ∥^{2}} \geq 0

and that

b^{'} \frac{b^{+}}{∥ b^{+} ∥^{2}} = 1

. □

Some remarks are in order.

Remark 2.

If X has a positive covariance matrix Σ, then the theorem is easily extended: just replace

{∥ A ∥}_{op}

with

∥ A Σ^{1 / 2} ∥_{op}

and set

σ = 1

.

Remark 3.

If

T (s)

is not full-dimensional and X has an absolutely continuous distribution with respect to the Lebesgue measure, then the result is useless because

T (s)

is a null set; however, the result becomes interesting if either

T (s)

has positive Lebesgue measure or X has a singular distribution with respect to the Lebesgue measure.

Remark 4.

If

b \geq 0

, so

b = b^{+}

, then

T (s) \supseteq T (s^{'})

for

s \leq s^{'}

. Thus, if

{(a_{n})}_{n \in N}

is any strictly positive strictly increasing sequence, then

{(T (a_{n}))}_{n \in N}

is an (inclusion-wise) decreasing sequence of polyhedra; the theorem says no more than we expect in this case. However, this is no longer true if

b \neq b^{+}

. Indeed, let A and b be as follows

A = (\begin{matrix} - 1 & 1 \\ - 1 & - 1 \\ 1 & 0 \end{matrix}), b = (\begin{matrix} 1 \\ - 2 \\ - 1 \end{matrix}) .

With this choice of A and b,

T (s)

is the triangle in

R^{2}

whose vertices are

α (s) = (\begin{matrix} - s \\ 0 \end{matrix}), β (s) = (\begin{matrix} - s \\ 3 s \end{matrix}), γ (s) = (\begin{matrix} s / 2 \\ 3 s / 2 \end{matrix}),

and we see that

{α (s), γ (s)} \subseteq T (s) ∖ T (s^{'})

for all

s^{'} > s

, while the area of

T (s)

is

{(\frac{3 s}{2})}^{2}

, which is clearly increasing in s.

Remark 5.

Theorem 4 can be used to study the asymptotic probability of feasibility of the system

T = {x \in R^{n} | A x \geq b}

when A and b grow in some sense. More precisely, identify T with the pair

(A, b)

and consider the sequence

{(A_{ν}, b_{ν})}_{ν \in N}

of linear systems

T_{ν}

with

A_{ν} \in R^{m (ν) \times n (ν)}

and

b_{ν} \in R^{m (ν)}

, where the number of inequalities and the number of unknowns are both integer (possibly constant) functions of the index ν; also, let

X_{ν}

be a sequence of random vectors such that

X_{ν}

is in

R^{n (ν)}

and

X_{ν}

has a covariance matrix

Σ_{ν} = σ^{2} I_{n (ν)}

. Finally, let

p_{ν} = \Pr \{A_{ν} X_{ν} \geq b_{ν}\}

be the probability that the system

T_{ν}

is feasible under the probability measure induced by

X_{ν}

. A straightforward consequence of Theorem 4 is that if

{∥ A_{ν} ∥}_{op} = o (∥ b_{ν}^{+} ∥)

, then

p_{ν} = o (1)

. The condition

∥ A_{ν} ∥_{op} = o (∥ b_{ν}^{+} ∥)

can be realized in several ways, either by keeping

m (ν)

and

n (ν)

constant or by letting them grow. Let us give a theoretical graph example in the latter case. Let G be a simple undirected graph with n vertices

{v_{1}, \dots, v_{n}}

. For two vertices

v_{i}, v_{j} \in V (G)

,

i \neq j

, write

v_{i} \sim v_{j}

whenever

v_{i}

ad

v_{j}

are adjacent in G. If

v_{i} \sim v_{j}

, then we say that

v_{i}

ad

v_{j}

are neighbors. Let

d_{i}

denote the degree of vertex

v_{i}

, namely, the number of neighbors of

v_{i}

in G. Also, denote, by

Δ (G)

, the maximum of the degrees of G. The Laplacian matrix

L (G)

of G is the order n matrix defined by

{(L (G))}_{i, j} = {\begin{matrix} d_{i} & i f i = j \\ - 1 & i f i \neq j a n d v_{i} \sim v_{j} . \end{matrix}

Recall that for functions

h, k : N \to N

,

k (n) = Ω (h (n))

means that

h (n) = O (k (n))

. Let

{(G_{ν})}_{ν}

be a sequence of graphs on

n (ν)

vertices, and let

{(L_{ν})}_{ν}

be the corresponding sequences of Laplacians. Also, let

Δ_{ν} : = Δ (G_{ν})

. It is a classical result of Fiedler [18] that the spectral radius of

L_{ν}

is, at most,

2 Δ_{ν}

, so

{∥ A_{ν} ∥}_{op} \leq 2 Δ_{ν} = O (Δ_{ν})

. Hence, if

Δ_{ν} = o (\sqrt{n})

and

b_{ν}

is a dense

{- 1, 0, 1}

vector, say with

Ω (n (ν))

non-zero entries, such that

∥ b_{ν}^{+} ∥ = Ω (∥ b_{ν} ∥)

, then the condition

{∥ A_{ν} ∥}_{op} = o (∥ b_{ν}^{+} ∥)

is satisfied when

n (ν)

grows with ν.

The following result relates more transparently the ordinary tails of the linear image of a random vector X in

R^{n}

to the generalized tails of the same vector. Recall that a matrix

A \in R^{m \times n}

is right-invertible if there exists a matrix

A^{‡} \in R^{n \times m}

such that

A A^{‡} = I_{m}

with I being the identity matrix of order m. If A has full row rank, then the Moore–Penroose pseudo-inverse

A^{†}

can be taken for

A^{‡}

, where

A^{†} = A^{'} {(A A^{'})}^{- 1}

. A linear map

f : R^{n} \to R^{m}

is right-invertible if the matrix representing f with respect to the canonical basis of

R^{n}

and

R^{m}

is such. Note that a right-invertible map is surjective.

Theorem 5.

Let

A \in R^{m \times n}

be a real matrix and

b \in R^{m}

be a real vector. Let

f : R^{n} \to R^{m}

be the linear map

u \mapsto A u

and let

C_{A} = {u \in R^{n} | A u \geq 0}

. If X is a random vector in

R^{n}

and f is right-invertible, then

\Pr \{f (X) \geq b\} = \Pr \{X ⪰_{C_{A}} f^{‡} (b)\},

where

f^{‡}

is the linear map induced by

A^{‡}

.

Proof.

First of all, note that

C_{A}

is a polyhedral cone. So,

C_{A}

is a closed convex cone. However, unless f is injective,

C_{A}

is not a proper cone. Indeed, if f is not injective, then its kernel is a non-trivial subspace of

R^{n}

contained in

C_{A}

. However,

C_{A}

induces a pre-order, and so,

X ⪰_{C_{A}} f^{‡} (b)

is a well-defined generalized tail. It remains to be proven that the stated equality holds. This fact follows easily since

f (X - f^{‡} (b)) = f (X) - b

. So,

f (X) - b \in R_{+}^{m} \Leftrightarrow (X - f^{‡} (b)) \in C_{A}

. □

4.3. Testing Homophily in Randomly Colored Networks

Tail inequalities, albeit in a different guise, have been successfully used in [9] to evaluate what is called network homophily. Suppose we have a simple undirected graph on the vertex set V, a surjective map

f : V \to {1, \dots, s}

, which can be thought of as coloring of the vertices, and a coloring profile

c = (c_{1} \dots, c_{s})

, where

c_{i}

is the number of vertices in

f^{- 1} (i)

,

i = 1 \dots, s

, i.e.,

c_{i}

is the number of vertices of color i. With these data, we want to evaluate whether the graph is homophilic, i.e., whether it is true that the s-tuple

m = (m_{1}, \dots, m_{s})

, where

m_{i}

is the number of edges of the subgraph induced by the vertices of color i, is, in some sense, larger than what we would expect if the colors did not correlate with the adjacency of vertices—recall that an edge e of a graph G is induced by a subset U of the set of vertices

V (G)

if both the end vertices of e are in U. The approach taken in [9] is to define the simple probability space

(Φ_{c}, P_{c})

—called the random coloring model—where

Φ_{c}

is the set of all surjective maps

g : V \to {1, \dots, s}

such that

g^{- 1} (i) = c_{i}

,

i = 1 \dots, s

, and

P_{c}

is the uniform measure

c_{1}! \dots c_{s}! / n!

on

Φ_{c}

. In the random coloring model, m is just the observed value of the random vector

M = {(M_{1} \dots, M_{s})}^{'}

, where

M_{i}

is the number of edges induced by the vertices of color i in a random coloring of V of G. Moreover, both the mean vector

μ_{c}

and the covariance matrix

Σ_{c}

of M can be given in closed form under the random coloring model. Thus, if

ϕ : R^{s} \to R_{+}

is a linear function—e.g., the arithmetic mean—one can evaluate the homophily of the input graph by considering the one-sided tail probability

\Pr \{ϕ (M - μ) \geq ϕ (m - μ)\}

: any good upper bound on this tail probability can quantify the homophily of the given network by measuring, at least approximately, in terms of

ϕ

how significant the deviation of m from

μ

is. In particular, if

ϕ (m - μ) > 0

—otherwise, the network is declared not homophilic at all—one can apply the classical one-dimensional Cantelli inequality (1) to estimate the approximate significance of the deviation of the observed value from the expected one. Note that the linear map

ϕ

, say the arithmetic mean, acts as a global synthesis of network homophily but cannot identify hypodense or poorly significant color classes. We show here, as a consequence of Corollary 2, that generalized tails do not suffer from these limitations. To see this, for a set

R \subseteq {1, \dots, s}

, let

\bar{R} = {1, \dots, s} - R

and

W_{\bar{R}} = ⨁_{\bar{R}} R e_{i}

, where

e_{i}

is the i-th fundamental vector of

R^{s}

,

i = 1 \dots, s

. Thus,

W_{\bar{R}}

is the subspace spanned by the coordinates whose index is in

\bar{R}

. The projection

π_{R} : R^{s} \to W_{\bar{R}}

of

R^{s}

onto

R^{s} / W_{\bar{R}}

takes the vector u to the vector

u_{R}

, suppressing the coordinates whose index is not in R—if

s = 4

and

R = {1, 3}

, then

u_{R} = {(u_{1}, u_{3})}^{'}

. Also, recall that a graph G is said to be over-dispersed if the ratio of the variance to the mean of its degree distribution is greater than 1. Most protein–protein interaction networks are over-dispersed.

Theorem 6.

Let

(G, f)

be an over-dispersed colored graph where the coloring

f : V \to {1, \dots, s}

has the profile

c = (c_{1}, \dots, c_{s})

,

c_{i} \geq 2

,

i = 1, \dots, s

. Let m, M,

μ_{c}

and

Σ_{c}

be defined as above, and let

x = m - μ

,

X = M - μ

, and

S (m) = \{i \in {1, \dots, s}} | μ_{i} < m_{i} < (\binom{c_{i}}{2})\} .

Also, for any

R \subseteq S (m)

, let

μ_{R}

and

Σ_{R}

be the mean vector and the covariance matrix of the random vector

X_{R}

image of the projection of X onto

W_{\bar{R}}

. If

S (m) \neq \emptyset

, then for any

R \subseteq S (m)

, it holds that

\Pr \{M_{R} \geq m_{R}\} \leq \frac{1}{1 + ∥ m_{R} - μ_{R} ∥_{Σ_{R}^{- 1}}^{2}},

(10)

where

M_{R}

and

m_{r}

are the projection of M and m onto

W_{\bar{R}}

, respectively.

Proof.

Let

σ_{i}^{2}

denote the variance of

M_{i}

, the i-th coordinate of M, where

i = 1 \dots, s

. Fix

R \subseteq S (m)

and observe that

M_{R} \geq m_{R} \Leftrightarrow X_{R} - x_{R} \in R^{r}

, where r is the cardinality of R. Furthermore, it can be shown that

μ_{R} = E (π_{R} (X)) = π_{R} (E (X))

and that

Σ_{R}

is the submatrix of

Σ_{c}

obtained by suppressing the rows and the columns whose index is not in R. Since

x_{R} ⪈ 0

and

X_{R}

is a centered random vector, it follows that the hypotheses of Corollary 2 are met if

Σ_{R}

is a monotone matrix. If

Σ_{R}

is monotone, then, by the corollary,

\Pr \{X_{R} \geq x_{R}\} \leq (1 + ∥ x_{R} {∥_{M}^{2})}^{- 1}

, where

∥ x_{R} ∥_{M} = {∥ x_{R} ∥}_{Σ_{R}^{- 1}}

is the so-called Mahalanobis norm of

x_{R}

. It remains to be shown that if G is an over-dispersed graph, then, for every

R \subseteq S (m)

,

Σ_{R}

is a monotone matrix, which means showing that

Σ_{R}

is invertible (hence, positive-definite) and

Σ_{R}^{- 1} \geq 0

. According to the results in [9], the covariance matrix

Σ_{c}

is of the form

Σ_{c} = D + λ (G) u u^{'}

, where

u = {(u_{1}, \dots, u_{s})}^{'}

is such that

u_{i} > 0

for

i = 1, \dots, s

, D is a diagonal matrix whose i-th diagonal entry is

σ_{i}^{2} - λ (G) u_{i}^{2}

, and

λ (G) \leq 0

(because G is over-dispersed). Clearly,

Σ_{R} = D_{R} + λ (G) u_{R} u_{R}^{'}

, where

D_{R}

is the sub-matrix of D obtained by suppressing the rows and the columns whose index is not in R. Since the off-diagonal elements of

Σ_{R}

are non-positive,

Σ_{R}

is a Z-matrix, and it is known (see [14] Chapter 6) that if a Z-matrix is positive-definite, then it is monotone. On the other hand, since

Σ_{R}

is a covariance matrix, it follows that

Σ_{R}

is positive semidefinite. Thus, to show that

Σ_{R}

is monotone, it is sufficient to show that

Σ_{R}

is non-singular. By the Sherman–Morrison formula, if

D_{R}

is non-singular, then

Σ_{R}

is also non-singular. Therefore, to prove that

Σ_{R}

is monotone, we simply check that

σ_{i}^{2} - λ (G) u_{i}^{2} \neq 0

for

i \in S (m)

. Note that if

λ (G) \neq 0

, then

σ_{i}^{2} - λ (G) u_{i}^{2} > 0

because

u_{i}^{2} > 0

and

λ (G) < 0

. So, the only possibility is that

λ (G) = 0

and

σ_{i}^{2} = 0

. However, by Theorem 1 in [19],

σ_{i}^{2} = 0

if and only if

m_{i} \in {0, (\binom{c_{i}}{2})}

, which cannot happen because of the definition of

S (m)

, using the fact that

μ_{i} \geq 0

, being that

μ_{i}

is the expected value of the non-negative random variable

M_{i}

. So, we conclude that

Σ_{R}^{- 1} \geq 0

for all

R \subseteq S (m)

, and this completes the proof. □

We conclude that under the random coloring model, the Mahalanobis norm of

m - μ

is a directed measure of the statistical significance of the deviation of m from

μ

, and thus, it can be taken as a measure of the joint homophily of the vertices whose color is in

S (m)

.

After an observation in [9], the random coloring model allows us to give a precise formalization of how difficult it can be to provide bounds on tail probabilities. First, recall that a subset I of vertices of a graph H is an independent set of H if I induces no edges. Also, recall that the Independent Set Problem, INDEPENDENT SET, is the problem of deciding whether, given a graph H and a non-negative integer k,

V (H)

contains an independent set of at least k vertices. If H has an independent set of at least k vertices, we say that

(H, k)

is a YES instance of INDEPENDENT SET. INDEPENDENT SET is an archetypal hard problem, which means that it is in the NP-complete class (see [20], Section 3.1.3). It is widely believed that it cannot be decided in polynomial time whether instances of problems in such a class are YES instances. Next, let

X = M_{i}

be the i-th coordinate of M and

κ = c_{i}

be the i-th coordinate of the coloring profile

c

. Without the loss of generality,

i = 1

(the argument we will use is independent of i by symmetry). With this notation,

\Pr \{X \geq 1\}

is the marginal tail probability

\Pr \{M_{1} \geq 1\}

. By definition, X is the number of edges induced by a set of

κ

vertices sampled without repetition from the vertex set of the input graph G.

Theorem 7.

Let X and κ be as above. If G is the input graph in the random coloring model, then

\Pr \{X \geq 1\} < 1

if and only if

(G, κ)

is a YES instance of INDEPENDENT SET.

Proof.

If

(G, κ)

is a YES instance of INDEPENDENT SET, then G has an independent set of at least

κ

vertices. Hence,

\Pr \{X = 0\} > 0

because such a set is sampled with positive probability. Since

\Pr \{X = 0\} > 0 \Leftrightarrow \Pr \{X \geq 1\} < 1

, the only if part follows. Conversely, if

\Pr \{X \geq 1\} < 1

, then

\Pr \{X = 0\} > 0

. Therefore, the event

X = 0

occurs with positive probability (i.e., 0 is in support of X). Since

X = 0

occurs if and only if

(G, κ)

is a YES instance of INDEPENDENT SET, the if part follows as well. □

In other words, in general, even deciding the sign of

1 - \Pr \{X \geq 1\}

for a finitely supported random variable X is a computationally hard problem.

5. Concluding Remarks

Tail inequalities are a challenging problem in probability and statistics. According to Theorem 7, we can expect no more than easily computable bounds for dealing with tail probabilities, even if we have no control over the quality of the approximation. One such bound is given by Cantelli’s inequality (1). This bound was significantly extended by Marshall and Olkin (see Theorem 1) from random variables taking values on the real line to random vectors taking values in standard real n-dimensional Euclidean spaces. By looking at the inequality from the perspective of convex cone duality, as shown in Section 3, this paper naturally extends the inequality to any finite-dimensional Euclidean space. The new perspective allows us to cast (and bound) the probability of certain interesting events arising in random matrix theory, the theory of tail and concentration inequalities for linear images of random vectors, and network homophily as the probability of generalized tails, namely, inequalities induced by the pre-order (or the partial order) defined by convex cones. Besides Cantelli inequalities, there are several other tail inequalities. It would certainly be worthwhile to check whether some of these one-dimensional inequalities have a corresponding generalized version.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We are grateful to the reviewers for their careful reading and valuable comments, which have certainly contributed to the improvement of this article.

Conflicts of Interest

The author declares no conflict of interest.

References

Marshall, A.W.; Olkin, I. Multivariate Chebyshev Inequalities. Ann. Math. Stat. 1960, 31, 1001–1014. [Google Scholar] [CrossRef]
Vandenberghe, L.; Boyd, S.; Comanor, K. Generalized Chebyshev Bounds via Semidefinite Programming. SIAM Rev. 2007, 49, 52–64. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Arous, G.; Guionnet, A. Large deviations for Wigner’s law and Voiculescu’s non-commutative entropy. Probab. Theory Relat. Fields 1997, 108, 517–542. [Google Scholar] [CrossRef]
Bhargava, M.; Cremona, J.E.; Fisher, T.; Jones, N.G.; Keating, J.P. What is the probability that a random integral quadratic form in n variables has an integral zero? Int. Math. Res. Not. 2016, 12, 3828–3848. [Google Scholar] [CrossRef]
Dean, D.S.; Majumdar, S.N. Large deviations of extreme eigenvalues of random matrices. Phys. Rev. Lett. 2006, 97, 160201. [Google Scholar] [CrossRef] [PubMed]
Das, B.; Fasen-Hartmann, V.; Klüppelberg, C. Tail probabilities of random linear functions of regularly varying random vectors. Extremes 2022, 25, 721–758. [Google Scholar] [CrossRef]
Rudelson, M.; Vershynin, R. Small Ball Probabilities for Linear Images of High-Dimensional Distributions. Int. Math. Res. Not. 2015, 19, 9594–9617. [Google Scholar] [CrossRef]
Apollonio, N.; Franciosa, P.G.; Santoni, D. Network homophily via tail inequalities. Phys. Rev. E 2023, 108, 054130. [Google Scholar] [CrossRef] [PubMed]
Shaked-Monderer, N.; Berman, A. Copositive and Completely Positive Matrices; World Scientific Publishing: Hackensack, NJ, USA, 2021. [Google Scholar]
Schneider, R. Convex Bodies: The Brunn–Minkowski Theory; Encyclopedia of Mathematics and its Applications; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Schrijver, A. Theory of Linear and Integer Programming; Wiley: Chichester, UK, 1986. [Google Scholar]
Horn, R.; Johnson, C. Matrix Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Berman, A.; Plemmons, R.J. Nonnegative Matrices in the Mathematical Sciences; SIAM Press: Philadelphia, PA, USA, 1994. [Google Scholar]
Nemirovski, A. Advances in convex optimization: Conic programming. In Proceedings of the International Congress of Mathematicians, Madrid, Spain, 22–30 August 2006; Sanz-Sol, M., Soria, J., Varona, J.L., Verdera, J., Eds.; European Mathematical Society Publishing House: Berlin, Germany, 2007; Volume 1. [Google Scholar]
Eaton, M.L. Multivariate Statistics: A Vector Space Approach; Wiley: New York, NY, USA, 1983. [Google Scholar]
Edelman, A.; Rao, N.R. Random matrix theory. Acta Numer. 2005, 14, 233–297. [Google Scholar] [CrossRef]
Fiedler, M. Algebraic connectivity of graphs. Czechoslov. Math. J. 1973, 23, 298–305. [Google Scholar] [CrossRef]
Apollonio, N. Second-order moments of the size of randomly induced subgraphs of given order. Discret. Appl. Math. 2024, 355, 46–56. [Google Scholar] [CrossRef]
Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness; Mathematical Sciences Series; Freeman and Company: San Francisco, CA, USA, 1979. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Apollonio, N. Cantelli’s Bounds for Generalized Tail Inequalities. Axioms 2025, 14, 43. https://doi.org/10.3390/axioms14010043

AMA Style

Apollonio N. Cantelli’s Bounds for Generalized Tail Inequalities. Axioms. 2025; 14(1):43. https://doi.org/10.3390/axioms14010043

Chicago/Turabian Style

Apollonio, Nicola. 2025. "Cantelli’s Bounds for Generalized Tail Inequalities" Axioms 14, no. 1: 43. https://doi.org/10.3390/axioms14010043

APA Style

Apollonio, N. (2025). Cantelli’s Bounds for Generalized Tail Inequalities. Axioms, 14(1), 43. https://doi.org/10.3390/axioms14010043

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cantelli’s Bounds for Generalized Tail Inequalities

Abstract

1. Introduction

2. Preparatory Results

3. Results

4. Some Applications

4.1. Sampling Positive-Definite Matrices from a Random Ensemble

4.2. Tails of Linear Images of Random Vectors and Asymptotic Feasibility of Systems of Linear Inequalities

4.3. Testing Homophily in Randomly Colored Networks

5. Concluding Remarks

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI