1. Introduction
The concept of infinite quantity is very old, and is often considered as a negative sentence. Though it has been around for a long time, the notion of infinite quantity is still often viewed negatively. In addition, limitless power and perfection were connected with the divine.
The distinction between finite and infinite started in ancient Greece. Zenon denied the infinity through paradoxes. Aristotle considered the infinity as something potential, never actual, an approach accepted for a long time (Gauss said that infinity was only a façon de parler). The infinity, as a number or object, was unaccepted, as it does not exist. Thomas d’Aquino in Summa Theologica, and Galileo Galilei in Discursi e demostrazione matematische, also denied the infinity. The night darkness goes against the paradox of Kepler-Olbers on the possible infinite sky. In contrast, Agustin de Hipona and Spinoza (Ethica) used the idea of the absolutely infinite.
Fontenelle in
Eléments de la Géométrie de l’infini (published in 1727) refers to the infinity as an actual object, which can be operated algebraically. Bolzano in
Paradoxien des Unenlidchen (published in 1854) studied some paradoxes on the infinity and defended the concept of actual infinity. The mathematical infinity was an academic leivmotiv; for instance, the discussion of Domenech y Estapa [
1] on geometrical absurdities engendered by the interpretation of the mathematical infinity. He does not agree to define the straight line as a limit of a circumference whose diameter increases (so Nicholas of Cusa illustrated the meaning of infinity), as this definition may identify the extremes, i.e., the positive and the negative infinities.
The rigorous mathematical definition and study of infinite quantities starts with Georg Cantor. Galileo’s argument against the infinity, since if accepted, then natural and even numbers have the same size, was used by Cantor to define the infinite cardinality of a set. In fact, Gregory of Rimini (about 1350) had advanced that a subpart of something infinite could be equivalent to the whole.
Cantor found, compared, and operated infinities of different sizes, and despite Kronecker’s and Poincaré’s opposition, he imposed his theory and founded the so-called Transfinite Arithmetic, today accepted everywhere. It was even suggested by the academic Rodriguez-Salinas [
2] that
Homo Sapiens, after Cantor, should be
Homo Trans-Sapiens.
This article deals with the following related concepts:
The infinite cardinality (size) of sets.
Operating random variables and transfinite arithmetic.
Gödel’s theorem and parametric statistics.
The cardinality of some bivariate distributions.
The axiom of choice and the continuum hypothesis.
The probabilistic analogy of the Banach-Tarski theorem.
Statistics is a very wide subject, and most statisticians do not know some topics on infinite cardinals. The aim of this article is to show, in a basic way, these concepts, and to establish a parallelism between transfinite arithmetic and some operations with random variables, as well as the dimensionality of statistical models and other concepts in the field of statistics and probability.
For a rigorous and coherent introduction to the set theory, see [
3]. For a history of the concept of infinity, see chapter 2 in [
4]. See [
5,
6] for a description of the infinities of different sizes. The biography and contributions of Cantor are well explained in [
7].
2. Essentials on Cardinals and Transfinite Arithmetic
Any non-empty set has a cardinal number denoted . If the elements of are in one-to-one correspondence with the elements of a proper subset of , then the cardinal is infinite.
Accordingly, we can give the cardinal number #
to the empty set, as well as
if
k is a natural number and
for
. The most interesting infinite cardinals are
and
, where
is the infinite countable set—called denumerable—of the natural numbers, and
is the uncountable set of the real numbers. The notations for these infinite cardinals are:
The infinite cardinal
is the power of the continuum. It is well known in set theory that
where
is the family of all subsets of
.
The inequality
is a particular case of Cantor’s theorem: Any non-empty set
satisfies
[
3].
If ↔ stands for the one-to-one correspondence between two sets, then the equality, inclusion, and two elementary operations with cardinals are:
For instance, where is the set of rational numbers, so is infinite countable.
In transfinite or cardinal arithmetic, the following rules hold, where
n is any finite natural number:
Thus, Hence, the real line, the Euclidean plane, and the three-dimensional space have the same uncountable cardinality , the power of the continuum.
If we consider a regular polygon
with
n vertices, circumscribed in a circumference, the limit, as
n increases, is
the circumference with infinite countable points. Although
, it is impossible to distinguish geometrically between
(cardinality
) and the complete continuous circumference (cardinality
); see
Appendix A. The central limit theorem also illustrates the meaning of infinity. Both examples (geometric and probabilistic) are depicted in
Figure 1. Another paradigm, described by Nicholas of Cusa (about 1445), of the infinite cardinality
is the straight line, thought of as the limit of a circle whose diameter increases constantly.
3. Operating Poisson and Gaussian Distributions
Let us indicate Poisson, the Poisson random variable with parameter n (both the expectation and variance are n), where n is fixed and n If Poisson and Poisson are independent, then the sum of both variables is also Poisson
When n is ranging in , this family of discrete random variables is indicated by [Poisson].
Similarly, if Normal is the normal random variable with mean and variance then [Normal] is the family of all normal random variables. The sum of independent normal variables is also normal. Accordingly, the bivariate normal family can be indicated by [Normal, Normal], and all continuous univariate and bivariate variables are indicated by [All r.v.] and [All r.v., All r.v.], respectively.
It is clear that Poisson and Normal As [All r.v.] is described by the all positive curves (continuous or not continuous) with area 1, and it is well-known that the cardinality of this family is #[All] In addition, if [Poisson] and [Poisson] are independent, as each variable is an application from the sample population to , and we can interpret that there are no common variables, so (ignoring the constant 0) the intersection of both sets is empty. By [Normal], we indicate all the possible sums where q is rational and X is random normal.
With these notations, since
[Poisson(
n)],
[Normal], etc., we have a relation between some operations with random variables and cardinal numbers; see
Table 1.
4. Gödel’s Theorem
Bolzano had proven that the number of mathematical propositions is infinite. Gödel proved that there are also infinite propositions on the integer numbers, which cannot be reduced to a finite number of axioms. This is a special case of the famous theorem enunciated by Kurt Gödel in 1931. This theorem says that in a logical system based on axioms and containing the Arithmetic, we have some propositions which cannot be proven. These propositions are undecidable: they can be accepted or rejected. The acceptance should be considered as a new axiom. An example is the fifth postulate of Euclides on parallel lines, which cannot be proven as a consequence of the other four axioms. Another example is the hypothesis of the continuum (see below), posed by Cantor: there does not exist a set with cardinality between the infinite countable (denumerable) and the uncountable; i.e., it is not possible to find such that However, this hypothesis can be accepted or rejected, without prejudice to the Arithmetic.
Let us understand Gödel’s theorem with a broad view, and interpret this theorem from a statistical perspective, namely, in terms of observable events, parametric models, inference, etc. A statistical model is a family of probability distributions parametrized by a parameter The Poisson distribution with parameter and the normal distribution with parameters are two examples, with a parameter space of dimensions one and two, respectively. If is a statistical model, with belonging to a region of with positive hypervolume, then clearly,
Question: Does there exist a universal parametrization covering all distributions? The answer is no. Let us consider all probability distributions with the same support. This set has cardinal
If we suppose all of the models described by
since
there are many distributions out of this family. Thus, a parametrization can not cover all distributions. See an analytic proof in
Appendix A.
As well, as a logical system is incomplete, since some propositions are undecidable, a statistical model, as wide as possible, is also incomplete, since some probability distributions are not contained in the model. This justifies the so-called non-parametric statistics, an approach that makes inferences on functional expressions, considering the whole set of distributions, but avoiding the use of parameters.
Another analogy is as follows. Given a sample of size to perform an inference on the mean in the normal model with the variance being known, the statistic is sufficient and complete. However, is not sufficient and complete to perform an inference on where both parameters are unknown. We need two statistics, namely and
Similarly, let us consider the Poisson distribution whose support is Theoretically, we can observe any subset of . However, and the cardinality is too large. Thus, some events can be observed, e.g., “a value k is even number”, but many other events cannot be enunciated using our limited language; hence, these events cannot be observed.
5. Turing’s Halting Problem
This problem was posed by Allan Turing in 1935. There is no general algorithm that is capable of determining whether or not a computer program will finish running. This is another example of an undecidable proposition in Gödel’s sense. From a statistical perspective, Chaitin’s approach [
8] is quite interesting. A summary is next given.
Let us consider the Cantor space of all binary infinite sequences. A computer program is a subset of this space. Let
P be the set of halting programs. If
p is a halting program of
bits, the probability that a randomly chosen binary sequence of length
coincides with
p is
provided that we generate this sequence of
and
independently with the same probability
In general, when choosing a program at random, the probability of achieving a halting program is
That is, where is the number of halting programs of n bits, taking into account that if the program with sequence of bits halts, then the programs of n bits cannot begin with This constrains
If , no program halts. If , then there is halt. If , all programs halt. However, is not fixed, but is random. We need an algorithm of n bits to determine the first bits of This probability behaves as if it were randomly generated, just as explained above. There is not a law reducing the computation of None, some, or all programs finish running. Consequently, the halting problem is undecidable.
A short, informal proof of the halting problem, based on cardinal comparison, is given in
Appendix A.
6. Cardinality of Bivariate Distributions
A bivariate cumulative distribution function
of two random variables
with univariate marginal distributions
is usually described by a parametric model. Suppose that the range of the variables
Y are the intervals
If the degree of dependence between the random variables is quantified by means of a unique parameter, the cardinality of
H is 1. However, in general,
H admits the canonical decomposition
where
are the canonical correlations, all positive, and
are the canonical functions. Then, the cardinality of
is
the power of the set of canonical correlations. This number, also called rank of
H, is the dimensionality from a geometrical point of view, related to the so-called chi-squared distance between two observations
of the variable
See [
9] for a continuous correspondence analysis interpretation. Some (finite and infinite) cardinalities of
are reported in
Table 2.
In all cases, we can consider Thus, corresponding to the stochastic independence, the cardinality being 0 because of the absence of positive canonical correlations. The cardinality of the second model is 1 because the only canonical correlation is
Any distribution with infinite cardinality, e.g.,
, can be approximated by another one with finite cardinality [
10].
It is worth noting the power of the continuum cardinality of the fifth model (see
Table 2), defined and studied in [
11,
12]. In the uniform marginal case,
the set of canonical correlations is the function
, where
This continuous function ranges between 0 and
Thus, if
the power of the set of canonical correlations is
Another continuous correlation model is In the uniform case with , the cardinality of this distribution is which is also the power of the continuum.
These two distributions admit an integral expansion, instead of a series expansion, and the set of canonical correlations is not countable, but continuous. This transition from countable to continuous uncountable cardinality agrees with the hypothesis explained in the next section.
7. The Continuum Hypothesis
This hypothesis stated by Cantor in 1878 [
13] says that there is no set with cardinality between the infinite countable
and the uncountable
It can be expressed as
where
is the next immediate infinity after
According to Cantor,
exists and is
Nonetheless, this is considered undecidable after the results by Gödel and Cohen in 1940 and 1963, respectively. Nowadays, we should write
It is known of Cantor’s futile attempts in showing this hypothesis, which was included as an unsolved question in the list of 23 problems posed by Hilbert in 1900. Gödel showed that the acceptance of this hypothesis is not contradictory, whereas Cohen showed that it can be considered false and that it is not contradictory either. Thus, the continuum hypothesis is independent of the axioms of the Zermelo-Fraenkel theory of sets [
3].
It is quite surprising that some authors [
14,
15,
16,
17], many years later, ignored this essential difficulty and mentioned this problem as interesting but not solved yet. The first widespread reference to Cohen and the independence of the continuum hypothesis appeared in a textbook in 1965 [
18]. In the Spanish literature on this topic, [
19,
20] are the first books paying attention to the independence of this hypothesis. Many years later, this hypothesis still has interest [
6,
21].
Indeed, in probability and statistics, this hypothesis is implicitly accepted. In general, only the probabilities of subsets of
are considered under discrete models (such as the Poisson distribution). In addition, only Borel sets under continuous models (such as the Gaussian distribution), are taken into account. Recall that a Borel set is obtained by joining the isolated points and intervals of the real line. To define the probability of other sets is not considered, as they are unobservable. For example, accepting the axiom of choice (see below), we can construct non-measurable sets, it, however, being impossible after an experience to decide on the presence or absence of any of these sets. That is, given a value
x of the random variable, it is impossible to decide whether or not
x belongs to a non-measurable set of
See
Appendix A.
8. Axiom of Choice
This axiom (stated by Zermelo in 1904) postulates that given a family of non-empty sets, we can choose an element of each set and construct a new set with these elements. Two examples are as follows. If we consider all of the circumferences centered in the origin , we do not need this axiom, as we can choose a point of each circumference, e.g., the point on the right cutting the horizontal axis. However, if we consider the family of all closed curves in the plane, we need the axiom of choice to choose a point of each curve.
Some properties in algebra, geometry, topology, and analysis depend on this axiom. One functional analysis application is to prove the null norm of the eigenfunctions of a kernel with respect to another kernel, with both being related to the last bivariate distribution given in
Table 2 [
22].
With some imagination, we can establish a comparison between this axiom (choosing an element of each set) and the Bayes theorem (the probability that an observation belongs to each set or cause).
However, accepting the axiom of choice, we can prove the existence of a mathematical object, but we are not able to actually construct that object. This may be a trouble. For instance, any vector space has a basis. However, there are vector spaces, e.g.,
such that the basis is unknown. This also happens with the vector space of all random variables with support in
. In addition, accepting this axiom, some subsets of
lack length (Lebesgue measure) or probability (see the
Appendix A). Another anomaly is next commented upon.
9. Banach-Tarski’s Theorem
Let us suppose that
is a solid ball. This theorem asserts that we can divide
into
non-overlapping parts, i.e.,
and, after isometric transformations
we can assemble these parts to yield two balls:
Accordingly, this may be expressed as the paradoxical equidecomposition
being that the volume is the same:
Thus, the initial ball can be duplicated. Notice that the parts
are non-measurable subsets of
In fact, it is necessary to accept the axiom of choice to prove this surprising result.
There is a probabilistic analogy. Suppose that
X is a normal random variable with a mean of 0 and a variance of
Then, we can decompose
X as a sum, i.e., to express
the sum of two independent normal variables with mean
If
stands for the class of variables
where
is a real parameter, and similarly
then
being
where “
” means “same distribution”. This is so because
and
are independent; hence, they do not contain common variables, except for the constant
In spite of this lack of coincidence, these sets contain exactly the same family of normal distributions. Thus, in some sense,
can be duplicated.
Of course, the sum of non-coincident sets of random variables is not the union of disjoint sets, but the analogy is clear; see
Table 3.
Furthermore, taking into account the central limit theorem, let us admit that
X normal can be interpreted as the sum of a series of independent random variables, whose distributions are unknown. Namely,
where the convergence is in law (the standardization is omitted). Then,
and
X is the sum of two independent normal random variables. Note that “unknown distribution” would correspond to “non-measurable subset” in the above Banach-Tarski decomposition.
Finally, removing the constant we can consider projective spaces, so that we have another analogy in terms of projective geometry.
10. Discussion, Conclusions, and Future Work
The old pamphlet [
1] is an example of how a mathematician and also architect—for a long time, the careers of Mathematics and Architecture overlapped in some universities—perceived contradictions between descriptive geometry and pure mathematics. However, most subjects of Mathematics and Statistics can be related. For instance, if
is the vector space generated by random variables, the dual space
can be interpreted as the population (set of individuals), since if
is an individual and
Y is a random variable, we can associate the real number
to the pair
[
23]. The differential geometry can be used to define geodesic distances between the parameters of a statistical model [
24,
25]. The study of bivariate exchangeable distributions can be performed using functional analysis [
22]. There are more examples linking different fields.
We have proven that some concepts and properties of Probability and Statistics can be useful for understanding and interpreting the main properties of the infinite cardinals.
Several proposals for future research are:
- (1)
Analytic geometry. The equation
defines a closed curve tending to a square as
Study the implicit equation of other regular polygons (see
Figure 1) in the same way.
- (2)
Inference. Given a sample of size the statistics are sufficient to perform inference on the normal model. Then, we may explore the sufficiency of under a perspective similar to Gödel’s theorem. That is, to study what kind of inference is “undecidable” on a specific model. Note that we can make a non-parametric inference if
- (3)
Bivariate distributions. We have passed from distributions with countable cardinality to uncountable continuous cardinality. Does it give enough evidence for accepting the hypothesis of the continuum?
- (4)
Banach-Tarski theorem. Removing the constant interpret as a decomposition of projective spaces, which can be generalized to a higher dimension. In addition, for study the comparison between dividing a ball into k balls and decomposing a normal random variable into the sum of k independent random variables.
- (5)
Statistical models. The cardinality of the Poisson (with parameter being a natural number) and the normal models are and respectively. Are there parametric models with cardinalities larger than ?