Next Article in Journal
A Hybrid Approach for the Container Loading Problem for Enhancing the Dynamic Stability Representation
Next Article in Special Issue
Some Exact Results on Lindley Process with Laplace Jumps
Previous Article in Journal
Research on Abstraction-Based Search Space Partitioning and Solving Satisfiability Problems
Previous Article in Special Issue
Controlled Filtered Poisson Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

The Heroic Age of Probability: Kolmogorov, Doob, Lévy, Khinchin and Feller

by
Andrew J. Heunis
1,2,†
1
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
2
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Professor Emeritus.
Mathematics 2025, 13(5), 867; https://doi.org/10.3390/math13050867
Submission received: 31 December 2024 / Revised: 20 February 2025 / Accepted: 20 February 2025 / Published: 5 March 2025

Abstract

:
We survey some of the main developments in probability theory during the so-called “heroic age”; that is, the period from the nineteen twenties to the early nineteen fifties. It was during the heroic age that probability finally attained the status of a mathematical discipline in the full sense of the term, with a complete axiomatic basis and incontestable standards of intellectual rigor to complement and support the extraordinarily rich intuitive content that has always been part of probability theory from its very inception. The axiomatic basis and mathematical rigor are themselves rooted in the abstract theory of measure and integration, which now comprises the very bedrock of modern probability, and among the central characters in the measure-theoretic “re-invention” of probability during the heroic age one finds, in particular, Kolmogorov, Doob, Lévy, Khinchin and Feller, each of whom fundamentally shaped the structure of modern probability. In this survey, we attempt a brief sketch of some of the main contributions of each of these pioneers.

1. Introduction

The present work is a brief survey of some of the main developments in probability theory during the first half of the twentieth century, with particular focus on the nineteen twenties to the early nineteen fifties, sometimes dubbed the “heroic age” of probability. Needless to say, this survey will be somewhat superficial, for this is unavoidable in view of the colossal scale of developments in probability over this period, limitations on space, and of course the far from complete knowledge of the author, who is not a specialist on the history of mathematics.
We focus, in particular, on the contributions of five of the true giants of probability, namely Kolmogorov, Doob, Lévy, Khinchin and Feller. Obviously, omission of any of these would have led to a fatally incomplete survey, but one can certainly question the omission of so many others who also left such an imprint on probability theory during the heroic age (Bernstein, Cramer, Cantelli, Fréchet, Gnedenko, Marcinkiewicz, …). Our only answer is that we believe it best not to spread the survey too thinly. Furthermore, we say nothing about stochastic integration and stochastic differential equations, even though these came to light during the mid-nineteen forties, within the scope of our heroic age. The reason for this is that these were still rather early years in the development of stochastic integration, and the huge edifice that is now stochastic calculus was, for the most part, still in the future. Even restricting as we have to just the five giants mentioned above presents significant challenges, for each was enormously productive; thus, Lévy has a total of 278 books and papers to his credit, while Feller has 104, and Doob has 124 (at least), the great majority of the works being on probability theory. Needless to say, in this vast literature, there is a great wealth of of concepts and results, and, in a short survey such as this, one must make a rather subjective choice of what to discuss and what to omit. The numerous existing surveys cited in the References have been of great help in deciding on this.
Although the main focus of this survey in on the historical, it is a fact that nearly all of the ideas and results mentioned herein are in use by contemporary probabilists, and most of these form the content of current graduate courses on probability. In an attempt to give the survey a topical feel and sense of immediacy, and to relate the history to modern textbook expositions on probability, we have, wherever possible, included clear references to modern graduate level textbooks for specific results and ideas under discussion.
Finally, although our goal is to give an account of the main developments in probability during the heroic age, it is impossible to truly appreciate these without at least a cursory knowledge of the state of probability before this period. Accordingly, in the next section, we attempt an extremely brief synopsis of what we see as the main probabilistic developments which preceded the heroic age, and which we have, perhaps inappropriately, labeled the “pre-history” of the heroic age. Specifically, we trace the origin of probability as a rational discipline to the analysis of games of chance by Cardano in Renaissance Italy, and then briefly follow subsequent developments by Pascal, Fermat, Huygens, Bernoulli, de Moivre and Laplace, concluding with a summary of the contributions of the nineteenth century Russian school of probabilists, in particular Chebyshev, Markov and Lyapunov.

2. The Pre-History

Probability theory originates in gambling and games of chance, pastimes which seemingly date back as far as human civilization itself. The mathematical analysis of such games begins with the sixteenth century Italian Renaissance mathematician G. Cardano (1501–1576), sometimes known as “the gambling scholar” (see Ore [1]), who considered gambling schemes of considerable complexity. In the work The Book on Games of Chance (Latin original Liber de Ludo Aleae), Cardano systematically calculates probabilities for many challenging gambling problems. The notion of equally likely outcomes is central in this work, as is the idea that the probability of an event—understood to be a designated set of outcomes—is the ratio of the total number of outcomes in the event to the total number of possible outcomes. This point of view would be prevalent in most works on probability until the start of the twentieth century.
Despite the great ingenuity of Cardano in calculating probabilities for various gambling scenarios, probability theory emerged as a distinct scientific discipline only during the seventeenth century, mainly though the ideas and contributions of the French mathematicians P. Fermat (1607–1665) and B. Pascal (1623–1662), and the Dutch mathematical physicist C. Huygens (1629–1695). A sustained correspondence between Pascal and Fermat began around 1654, concerning the so-called problem of division of stakes, which involves the “fair” division of a stake between two contestants playing a sequence of fair games, when the sequence is interrupted and ended by external intervention before either contestant is able to complete the number of games required to win the full stake. Suffice it to say that this is a particularly challenging problem, and the solution obtained in the Fermat–Pascal correspondence introduced, among other things, a precursor of the notion of the expected value of “chance variables” (random variables in current terminology), together with combinatorial methods which would soon become standard tools in probability (see Ore [2]). At much the same time, Huygens solved the problem of division of stakes independently of Fermat and Pascal, and in 1657 published the book On Reasoning in Games of Chance (Latin original De Ratiociniis in Ludo Aleae), which qualifies as the first systematic text on probability. Within the text, one finds, albeit in non-explicit precursor form, the notions of independent events and conditional probability, as well as considerable further development of the idea of expected value introduced by Fermat and Pascal. This work would remain the standard treatment of probability theory for more than five decades until the early eighteenth century, exerting considerable influence on its development.
The eighteenth century saw the advent of a new class of results in probability, namely limit theorems; in particular, laws of large numbers and central limit theorems. The first of these (the law of large numbers) appears in the work The Art of Conjecture (Latin original Ars Conjectandi) by the Swiss mathematician J. Bernoulli (1655–1705), published posthumously in 1713. This work was profoundly influenced by the book On Reasoning in Games of Chance by Huygens, previously mentioned. The rules of manipulation of probabilities are clearly explained and illustrated with examples. Thus, the fact that one does not generally have P ( A B ) = P ( A ) + P ( B ) when A B is a non-empty set is given vivid, if rather morbid, illustration with an example involving two individuals who have been condemned to death, with the proviso that one (or both) might be spared depending on the outcome of a game of chance played just prior to their planned execution (see p. 58 of Maistrov [3]). Also introduced is the notion of Bernoulli trials with success probability 0 < p < 1 , together with the standard expression for the probability of m successes in a sequence of n independent trials, in terms of the binomial coefficients. Of greatest interest for the present discussion is the weak law of large numbers for a sequence of Bernoulli trials with success probability p, the modern formulation of this result being
P S n n p < ϵ 1 , for   every   ϵ > 0 as   n 0 ,
in which S n denotes the number of successes in n independent Bernoulli trials. This result appears in the fourth (and final) part of The Art of Conjecture, and is a true landmark in the development of probability theory. These days, the law of large numbers (1) is a simple consequence of the Chebyshev inequality, but this latter result was not available to Bernoulli, coming to light as it did only in the mid-nineteenth century. Accordingly, with only combinatorial tools to hand, Bernoulli resorts to a counting approach involving an urn which contains r red balls and b black balls, each trial being a draw with replacement from the urn, and success being defined as the draw of a red ball, giving a success probability p = r / ( b + r ) . Bernoulli then obtains (1) by a highly intricate and lengthy combinatorial analysis (which even includes a precursor of a large deviations bound), the details of which are discussed by Bolthausen [4], and Bolthausen and Wüthrich [5]. The law of large numbers essentially rationalizes the “frequentist” interpretation of probability, as well as giving the means to estimate unknown probabilities by repeated sampling, thus providing a link to rigorous methods of statistical estimation. In particular, the law of large numbers decisively changed the direction of probability theory itself; whereas, hitherto, the main focus had been on the use of combinatorial methods to calculate probabilities in specific situations, typically games of gambling, after the law of large numbers, the emphasis would increasingly shift to the analysis and characterization of limiting properties resulting from indefinite repetition of a basic random scenario, thus paving the way to the limit theorems which are such a central part of modern probability theory. With every good reason, Bernoulli referred to the law of large numbers as his “golden theorem”, asserting “I have pondered over it for twenty years” (see pp. 299–300 of Snell [6] for further commentary).
Following the publication of Bernoulli’s The Art of Conjecture in 1713, another major limit theorem, as impressive and revolutionary in its own way as the law of large numbers of Bernoulli, namely the first instance of a central limit theorem, appeared with the 1738 publication of the second edition of the book The Doctrine of Chances by the French mathematician A. de Moivre (1667–1754). As with Bernoulli’s law of large numbers, the setting is that of repeated independent Bernoulli trials with success probability p ( 0 , 1 ) . In modern notation, the central limit theorem of de Moivre states that
P a < S n n p n p ( 1 p ) < b 1 2 π a b exp x 2 2 d x , as   n ,
for every a , b R with a < b , in which S n again denotes the number of successes in n trials (as at (1)). In this result, the ubiquitous normal distribution appears on the scene for the very first time (see Archibald [7] and Bellhouse [8]). Just as was the case for the Bernoulli law of large numbers, the road to this central limit theorem, with only the rudimentary methods available during the early eighteenth century, was long and arduous, requiring approximately nine years of effort (see Section 2.3 of Hald [9]). The main tool of proof is Stirling’s formula for approximating the factorial function, a version of which de Moivre obtained independently. One should note that, although the central limit theorem for Bernoulli trials is stated in The Doctrine of Chances for general p ( 0 , 1 ) , the proof is given only for the symmetric case p = 1 / 2 (possibly because de Moivre viewed the proof for the general case as just an easy modification of the proof when p = 1 / 2 ).
The early nineteenth century saw another major leap in probability theory with the appearance, in 1812, of the masterwork Théorie Analytique des Probabilités by the French mathematician and physicist P.S. Laplace (1749–1827), within which one finds a truly extraordinary wealth of new ideas, methods, techniques and results. Indeed, it is no exaggeration to say that this magnificent work marks the transition of probability theory from a “niche” discipline, pursued by the small group of exceptionally gifted devotees noted previously, into the mainstream of modern science and mathematics. A completely rigorous development of the ideas in Théorie Analytique requires concepts, methods and tools from real analysis, which were simply not available to Laplace during the early nineteenth century. Consequently, in their place, Laplace deploys a superlative intuition, together with highly imaginative and creative—but decidedly non-rigorous—devices, all of which make the reading of this work a definite and somewhat frustrating challenge. Thus, in commentary on Théorie Analytique, de Morgan, in evident frustration, alludes to “…a peculiar species of infinitesimal…”, while Todhunter ([10], p. 561) calls out “… processes … obscure and repulsive… ”, but, nevertheless, in the end having to concede “… yet they contain all that is essential in the theory”. Among the most important results developed by Laplace is a true landmark in the theory of probability which would decisively influence future developments, namely a far-reaching extension of the central limit theorem of de Moivre which, as previously indicated, pertains to a succession of independent Bernoulli trials. In contrast, Laplace obtains a result of much greater generality, which can be paraphrased—in modern terms—as follows: suppose that X i , i = 1 , 2 , is a sequence of independent random variables with a common distribution function and having an expected value μ and variance σ 2 < , and put S n : = i = 1 n X i , n = 1 , 2 , . Then,
P a < S n n μ n σ 2 < b 1 2 π a b exp x 2 2 d x , as   n ,
for every a , b R with a < b . This bears a strong resemblance to the de Moivre central limit theorem (2), but now there is no specific origin ascribed to the random variables X i , and, indeed, the specific distribution of the X i is completely irrelevant, provided that the mean is μ and the variance is σ 2 for every i = 1 , 2 , As a tool for establishing the central limit theorem (3), Laplace develops a precursor of the method of characteristic functions (see Hald [9], pp. 303–306), and, by way of illustrating the power of the approach, uses characteristic functions to establish the de Moivre central limit theorem (2) for the case of general p ( 0 , 1 ) ; for this reason, the result (2) is often dubbed the de Moivre–Laplace central limit theorem. Laplace’s proof of the central limit theorem (3) for general random variables X i by the method of characteristic functions is completely sound when the X i are discrete random variables with bounded support. However, it is in dealing with continuously distributed random variables without recourse to any of the concepts and methods of real analysis that Laplace (necessarily) resorts to highly plausible and suggestive, but definitely not rigorous, arguments and methods; a detailed analysis of Laplace’s proof of (3), in modern notation and terminology, is given by Hald ([9], pp. 307–317).
The second half of the nineteenth century saw the emergence of the Russian school of probability, centered in particular around the mathematicians P.L. Chebyshev (1821–1894), A.A. Markov (1856–1922) and A.M. Lyapunov (1857–1918). By this time the much tighter standards of mathematical rigor introduced and insisted upon by Gauss during the first half of the nineteenth century had increasingly become the norm, and the concepts and tools of real analysis created by Cauchy, Weierstrass and others provided the means to attain those standards. As we shall see, it is largely thanks to the Russian school that these more demanding and stringent ideas of mathematical rigor started to become an integral part of probability theory. The Théorie Analytique of Laplace had stimulated much interest in probability among Russian mathematicians such as Ostrogradski, Bunyakovski, Brashman and Zernov during the first half of the nineteenth century, and it was their influence which in turn guided the first efforts of Chebyshev in probability. Among the very earliest contributions of Chebyshev were clear formulations of the concepts of sample space and random variable, the latter of course being interpreted as a real-valued function on the sample space (see Mackey [11] for further discussion on this). These ideas are, of course, quite self-evident in the setting of Bernoulli trials for the limit theorems of Bernoulli and de Moivre mentioned previously, but are seemingly absent from the much more general formulations of Laplace in the Théorie Analytique (in which random variables are regarded as “errors of observation”). The clear and explicit formulation by Chebyshev of these simple but indispensable notions contributed much to a more transparent development of probability theory. In 1867, Chebyshev published the celebrated Chebyshev inequality (however, see Remark 1 which follows), the great effectiveness of which is matched only by its extraordinary simplicity, and used this to immediately obtain a huge generalization of the Bernoulli weak law of large numbers (1) from sums of independent Bernoulli trials to sums of independent random variables with finite second moments—see the discussion of the so-called “First Problem” on p.4 of Gnedenko and Kolmogorov [12]. Furthermore, during the period 1887–1890, Chebyshev developed the so-called method of moments, which in its very earliest form appears to date as far back as Euler, and applied this method to establish a central limit theorem of the following general form: Take independent random variables X i , i = 1 , 2 , , and without loss of generality suppose that E [ X i ] = 0 . If E [ ( X i ) 2 ] < , i = 1 , 2 , , then with S n : = i = 1 n X i and B n 2 : = E [ ( S n ) 2 ] = i = 1 n E [ ( X i ) 2 ] , one has
P a < S n B n < b 1 2 π a b exp x 2 2 d x , as   n ,
for every a , b R with a < b . Chebyshev established (4) under clearly stated—although incomplete—conditions on the random variables X i , which go beyond the minimal zero-mean and finite second moment conditions above, and at a level of rigor which, although not perfect, was nevertheless a very considerable improvement on what had gone before (the full history is discussed by Maistrov on pp. 202–208 of [3] as well as by Seneta [13]). It might not be out of place to observe at this juncture that the modern practice of clearly stating the conditions or assumptions needed for a desired result or theorem to hold true became current in mathematics only during the second half of the nineteenth century (apparently under the influence of Gauss); thus, for example, the several conditions on the random variables X i actually needed for the Laplace central limit theorem (3) to hold are seemingly nowhere listed or referenced in the Théorie Analytique. We see that Chebyshev, in stipulating clear (albeit incomplete) conditions on the random variables X i for the central limit theorem (4) to hold, was bringing this greater standard of precision and accuracy into probability theory. The work of Chebyshev on the central limit theorem was rapidly followed in 1898 by that of his student Markov, who continues to use the method of moments introduced by Chebyshev, but supplies the additional conditions on the random variables X i which are required for (4) to hold which were missing from the treatment of Chebyshev, and gives a proof of (4) which, by the standards of the day, is completely rigorous. In particular, Markov stipulates that the random variables X i must have finite moments of all orders for every i = 1 , 2 , (this is essential for the method of moments as used by Chebyshev to work). This central limit theorem of Markov was in turn quickly displaced by that of Lyapunov, who extends the method of characteristic functions first introduced by Laplace in Théorie Analytique to obtain a tool of proof significantly more powerful than the method of moments used by Chebyshev and Markov, and which, in particular, does not require the existence of finite moments of all orders for the random variables X i . Indeed, by the method of characteristic functions, Lyapunov (c. 1900) dispenses completely with this rather stringent moment condition, to obtain a proof of (4) which is both completely rigorous as well as significantly simpler than the earlier proof of Markov (see pp. 209–213 of Maistrov [3] for a complete discussion, including precise formulations of the technical conditions imposed on X i by Markov and Lyapunov, which we do not state here). Not to be outdone, Markov (c. 1913) introduces the technique of truncation of random variables—which remains an extremely powerful method of proof to this very day—and makes use of this technique, together with the method of moments, to very slightly improve on the central limit theorem of Lyapunov, again by a completely rigorous argument. Finally, a crowning achievement of the nineteenth century Russian school must surely be the rate of convergence established by Lyapunov [14] for the central limit theorem (see the first unnumbered display on p. 201 of Gnedenko and Kolmogorov [12]). More than four decades were to elapse before the rate of convergence of Lyapunov would be significantly improved, see Esseen [15] (a similar improvement reported in Berry [16] is apparently invalidated by an error—see the footnote to p. 201 of [12]).
Remark 1.
The French statistician Bienaymé discovered the Chebyshev inequality several years earlier, in 1853, in the course of generalizing the Laplace method of least squares, so it is often referred to as the Bienaymé–Chebyshev inequality. There are indications that Bienaymé regarded this inequality as a somewhat minor result, whereas Chebyshev fully appreciated its significance as a general and powerful tool of proof (see Seneta [17]).

3. Kolmogorov and Measure Theoretic Foundations

In Section 2, we briefly summarized some of the main developments in probability up to the end of the nineteenth century in order to emphasis that probability theory had by then evolved into a well-established discipline with important results to its credit and generally solid standards of mathematical rigor, and to counter the not altogether uncommon view that “serious” probability did not really begin until the measure-theoretic makeover of the twentieth century. Nevertheless, it must be admitted that significant conceptual, methodological and technical deficiencies remained in the late nineteenth century edifice. A highly informative description of the not entirely respected place of probability theory within mathematics prior to this makeover is to be found in Section 1 of Doob [18]. Among these deficiencies, one can certainly mention the following:
(1) There seemed no clear way to compute, or even conceptualize, the probability of an event involving an infinite sequence of random trials. Thus, for example, although the weak law of large numbers and central limit theorems noted in Section 2 are genuine limit theorems, in every case the probability in question is determined by a sum S n which itself depends on only a finite sequence of n-trials.
(2) Events having zero probability, and consequently events having unit probability, although essential for any complete theory of probability, fell outside the scope of the available methods and techniques.
(3) It was usually necessary to categorize random variables into two classes, namely those with a “discrete” distribution and those with a “continuous” distribution, and to treat these cases separately. Furthermore, it was customary to adopt the rather strong assumption that continuously distributed random variables have a strictly positive density function, (so as to avoid having to deal with events having zero probability). Availability of the Riemann–Stieltjes integral partially alleviated some of these problems, but the limited convergence properties of this integral basically excluded its use in the study of stochastic processes, which would become a major part of probability theory during the twentieth century.
(4) Notions of independence and conditioning, although formulated by the late nineteenth century, were often quite limited and vague, and excluded in particular conditioning on events of measure zero (for the reasons noted in (2)).
(5) Ideas of throwing coins and dice, so prevalent in the probability theory of the time, did not have a precise formulation and appeared to have no natural place in any mathematically rigorous framework. In fact, probability lacked any axiomatic foundation from which one could proceed when establishing such basic results as the law of large numbers and central limit theorem.
In light of the above, there was clearly a pressing need for a refurbishment of probability theory involving not merely technical issues but also truly foundational questions. The key to this was to be found in the theory of measure and integration, created in the opening years of the twentieth century, largely by the French mathematicians E. Borel and H. Lebesgue. A prime motivation for Borel and Lebesgue was to get a clearer understanding of subtle geometric questions regarding the length and area of subsets of the real line and the plane respectively, questions which would be foremost in the classic doctoral thesis of Lebesgue [19]. As such these questions had nothing whatever to do with probability theory. With the further development of the theory of measure and integration, and in particular with the availability of two essential results in measure theory, namely the Radon–Nikodym theorem and the Caratheodory–Hahn extension theorem (c. 1913), the latter being for the construction of measures on abstract sets, the intrinsically geometric questions which had motivated Borel and Lebesgue receded into the background, and the theory became concerned with establishing an extremely powerful integral calculus, in a highly abstract and general setting built on the familiar notion of a measure space (this “abstractification” of measure theory was initiated mainly by the French mathematician M. Fréchet [20]). It was the inspired realization of the Russian mathematician A.N. Kolmogorov (1903–1987) that the apparatus of this abstract theory of measure and integration provided exactly the concepts and tools needed to refurbish probability theory on a sound axiomatic basis. This program was accomplished in the magisterial monograph [21], a work of extraordinary brevity (no more than seventy five pages in English translation) and even today still surprisingly readable, which initiated the modern period of probability that continues to this day. Within [21] one finds, in particular,
(a) The notion of a probability space  ( Ω , F , P ) as an abstract measure space having unit mass; that is, P ( Ω ) = 1 , the notion of a random variable as a F -measurable function X : Ω R , and the definition of the expectation E X as the Lebesgue integral of the measurable function X with respect to the probability measure P. One should bear in mind that this formulation of the expectation E X directly in terms of the random variable X itself, instead of in terms of the cumulative distribution function of X, is key to application of the powerful convergence properties of the Lebesgue integral (viz. the convergence theorems of Beppo–Levi, Lebesgue and Vitali), as well as the Fubini–Tonelli theorem for the interchange of expectation with other operations, such as integration over an interval with respect to a “time” parameter. These tools are indispensable for the modern theory of stochastic processes, and could never even be contemplated with just the nineteenth century definition of expectation in terms of Riemann–Stieltjes integration with respect to a distribution function.
(b) A clear, complete and precise formulation of the notion of the independence of collections of random variables, including uncountably infinite collections; again, this would be essential for a rigorous theory of stochastic processes.
(c) The basic version of the celebrated Kolmogorov consistency theorem, which gives the means for assigning probability to events involving infinite sequences of random trials and is essential in the construction of the natural “carrying” probability spaces for both discrete and continuous parameter stochastic processes.
(d) A precise, comprehensive and general definition of conditional expectation, based on the Radon–Nikodym theorem from the abstract theory of measure and integration. This definition is an immense advance beyond the rather limited and problematic definition of conditional expectation current during the nineteenth century, with its unnaturally strong hypotheses and awkward separate treatment of discrete and continuously distributed random variables. Modern probability would not exist without this notion of conditional expectation, and with every good reason Williams ([22], p. 84) refers to it as “the central definition of modern probability”.
(e) The definitive formulation of the strong law of large numbers for a sum of independent and identically distributed random variables.
(f) The zero-one law for tail events of a sequence of independent random variables.
As will become clear in the discussion which follows, the appearance of Kolmogorov’s monograph [21] completely revolutionized probability theory. According to the British mathematician N.H. Bingham (see the contribution on pp. 51–52 in the collection [23]) “It is difficult to overstate the impact of the Grundbegriffe on the development of the subject; essentially the history of probability theory splits in 1933 between pre-Kolmogorov and post-Kolmogorov”. One should understand that the work [21] makes no attempt at a comprehensive account of probability theory at the time of publication, being almost relentlessly focused on just the basic foundations and nothing else, and, in particular, omits many crucial ideas and results of Kolmogorov which preceded the appearance of [21] (i.e., were pre-Kolmogorov) although written in the full “measure-theoretic spirit” of [21]. The following is a modest attempt to summarize the most important of these ideas and results, from both the pre- and post-Kolmogorov periods. Most are to be found in modern graduate-level textbooks on probability, as we indicate in parentheses:
  • A variance criterion for a.s. convergence of a series of independent random variables, established jointly with Khinchin [24] (see Theorem 5.1.1, p. 113 of Chow and Teicher [25]).
  • The three-series theorem for a series of independent random variables [26] (see Theorem 5.1.2, p. 117 of Chow and Teicher [25]).
  • The Kolmogorov maximal inequality for sums of independent square-integrable random variables [26] (see Theorem 5.2.5, p. 130 of Chow and Teicher [25]); this inequality is of course a far-reaching extension of the basic Chebyshev inequality.
  • A strong law of large numbers for sums of independent square integrable random variables [27] (see Theorem 4.3.2 on p. 13 of Shiryayev [28]).
  • A law of the iterated logarithm for sums of independent a.s. bounded random variables with increasing a.s. bound [29] (see Theorem 10.2.1 on p. 355 of Chow and Teicher [25]). This result is a significant advance beyond the law of the iterated logarithm first established by Khinchin [30] in the special case of independent Bernoulli trials, and required the introduction of methods and tools of proof completely different from those in [30], such as exponential upper and lower bounds; these methods and tools would become basic for much future work on laws of the iterated logarithm (see, e.g., Strassen [31]). The proof given by Kolmogorov of the law of the iterated logarithm seems (unavoidably) to be rather demanding; indeed, according to Stout ([32], p. 272), the proof involves “Herculean effort”.
  • The Kolmogorov–Smirnov test and rate of convergence in the Glivenko–Cantelli theorem [33] (see Section 7.4.3, pp. 327–330 of Dacunha–Castelle and Duflo [34]).
  • Sufficient conditions for a stochastic process constructed by means of the Kolmogorov consistency theorem to be almost surely sample-path continuous (the so-called Kolmogorov–Chentsov condition—see Theorem 2.2.8 on pp. 53–54 of Karatzas and Shreve [35]).
  • A significant simplification in the case of finite-variance distributions of the Lévy–Khinchin representation formula which characterizes infinitely divisible distributions (see Theorem 12.1.4, p. 432 of Chow and Teicher [25]).
  • The analytical study of Markov processes over continuous time and in continuous state space in the work [36], which initiated the study of Markov diffusion processes and introduced the Kolmogorov forward and backward equations. This work is completely analytical, without any reference to sample paths, being concerned with the construction of Markov transition probabilities in terms of which one can then construct the Markov diffusion process by means of the Kolmogorov consistency theorem. The work [36] would later motivate Itô to introduce stochastic integrals and stochastic differential equations to study Markov diffusion processes from a sample-path perspective.
  • Ergodic properties and mean recurrence times of Markov chains with countable state space (see Theorem 8.18, p. 152 and Theorem 8.22, p. 154 of Kallenberg [37]).
  • Backward equation for pure jump-type Markov processes in a measurable space (see Theorem 12.22, p. 242 of Kallenberg [37]).
Remark 2.
Doob ([38], pp. 820–821) notes one interesting case of a technical measure-theoretic convention adopted by Kolmogorov which has since been abandoned, namely how to define a σ-algebra on the range set of a map defined on a probability space, in order to secure measurability of the map. This convention can lead to conclusions which are at variance with those obtained under the current convention, in which the σ-algebra on the range set is specified in advance, and it is the joint effect of this σ-algebra together with the σ-algebra on the probability space which determines whether or not the map is measurable.
Remark 3.
In light of the preceding discussion, one might be left with the impression that Kolmogorov specialized in probability theory. This is far from being the case, for Kolmogorov was a true polymath with a “Midas touch”, who displayed early mathematical genius. Thus, at the age of nineteen years, he constructed an example of a summable function of the Fourier series which diverges almost everywhere, and shortly after that came up with an even more dramatic example, in which divergence almost everywhere is replaced with divergence everywhere  (examples such as these had long been sought). In time, Kolmogorov would make definitive contributions in many areas of mathematics besides probability, including the theory of turbulence in fluid motion; Fourier series; intuitionist logic and the foundations of mathematics; metric entropy and complexity of functions; KAM theory and chaotic motion in Hamiltonian mechanics; the role of entropy in ergodic theory; algorithmic complexity; and cohomology. An excellent discussion of these contributions is to be found in Kendall [23].

4. Doob and Stochastic Processes

The measure-theoretic basis for probability introduced by Kolmogorov [21] resonated perhaps most strongly with the American mathematician J.L. Doob (1910–2004), who, possibly more than any other probabilist of the period, created the modern theory of stochastic processes, in works which were unabashedly measure-theoretic in character (see Remark 4 which follows) and written to an impeccable standard of mathematical rigor. Much of this effort was compiled in the book Stochastic Processes [39] which appeared in 1953, exactly twenty years after Kolmogorov’s 1933 monograph [21]. In some respects, these works were quite different, the tract of Kolmogorov being an austere and compact seventy five pages (in English) which laid out the basic principles of measure-theoretic probability, but made no attempt at a comprehensive treatment of the area (as it was in 1933), while the book of Doob is a monumental six hundred and fifty-four pages, replete with an immense wealth of ground-breaking concepts and results, many of them completely new. What the two works have in common, of course, is that they completely reshaped the field of probability.
At a conference held in Amsterdam in 1954, Kolmogorov remarked to Doob “The whole of the theory of stochastic processes will now be based on your work” (see the comment of Kendall on p.34 of the collection [23]). To appreciate the truth of this remark, one need only focus on the definitive work [40], which addresses some intrinsic and extremely basic challenges posed by continuous parameter stochastic processes. To see what these might be, take a continuous parameter process { X t , t T } on a probability space ( Ω , F , P ) , in which T is a non-trivial interval in the real line, and therefore uncountable. From now on, just to fix ideas, we shall take T = [ 0 , ) , the set of non-negative reals, but any interval in the real line with strictly positive length will suffice. This austere setup immediately presents some major and very basic difficulties. For example, the subset of Ω defined by
{ ω Ω | X t ( ω ) 0 , t T } = t T { ω Ω | X t ( ω ) 0 } ,
is not necessarily a member of F , and therefore not an event, even though { X t 0 } F for every t T , since the intersection on the right is taken over the uncountable set T. This is a very significant problem, since it is natural to want to assign a probability to the set defined at (5). In much the same way, the map Z : Ω ( , ] defined by
Z ( ω ) = sup t T X t ( ω ) , ω Ω ,
is generally not F -measurable, because the supremum on the right-hand side is over the uncountable set T. Again, it is very natural to want the function Z to be F -measurable. These problems, and a good many similar ones, arise because the indexing set T is uncountably infinite, while the tools and methods of measure theory deal only with countable set operations such as unions and intersections, and countable sets of functions. In order to deal with challenges of this kind, Doob introduced several concepts, methods and viewpoints which are now part of the very fabric of probability, among which the most basic is the idea of modifying random variables on events of probability zero in order to obtain “good versions” of a given stochastic process, in which “good” connotes some desired property, for example that the set at (5) be a member of F and that the map Z at (6) be F -measurable. To formulate this notion more precisely, we recall the following elementary definition: if { X ˜ t , t T } is another continuous parameter process on the same probability space ( Ω , F , P ) , then { X ˜ t , t T } is said to be a version (or modification) of the process { X t , t T } when X t = X ˜ t , P-a.s. for every t T (the definition is obviously symmetric). The crucial thing to note is that the processes { X t , t T } and { X ˜ t , t T } have identical finite-dimensional distributions, and therefore, from a probabilistic standpoint, are the same thing; that is, any modification { X ˜ t , t T } of a given process { X t , t T } is, to all intents and purposes, as good as the the process { X t , t T } itself, and can therefore be used as a substitute for the process { X t , t T } . With this in mind, one can then address the question of securing possible conditions on the given process { X t , t T } which are enough to ensure the existence of a modification { X ˜ t , t T } having, for example, the properties that { ω Ω | X ˜ t ( ω ) 0 , t T } F and (c.f. (6)) the map Z ˜ : Ω ( , ] defined by
Z ˜ ( ω ) = sup t T X ˜ t ( ω ) , ω Ω ,
be F -measurable. This simple but profound observation of Doob, namely given a continuous parameter process { X t , t T } , one should try to construct a version  { X ˜ t , t T } of that process having some “nice” desired properties, is now central in the theory of continuous parameter stochastic processes.
As an application of this approach to the problems discussed above, Doob introduces the concept of a separable stochastic process (see Section II.2, pp. 50–71 of Doob [39], or Section XII.38.2, pp. 170–179 of Loève [41], or Section 7.38, pp. 526–533 of Billingsley [42]). Given a dense and countable set D T , a process { X ˜ t , t T } on ( Ω , F , P ) is D-separable when there exists a set N F with P ( N ) = 0 , such that the following holds for each ω Ω N : corresponding to every t T , there is a sequence { t n , n = 1 , 2 , } D such that t n t and X ˜ t n ( ω ) X ˜ t ( ω ) as n . In short, for P-almost all ω Ω the map t X ˜ t ( ω ) on t T = [ 0 , ) is completely determined by its values when restricted to the dense countable set D T . A basic property of separability is as follows: if the process { X ˜ t , t T } is D-separable for some dense countable set D T , then one has { ω Ω | X ˜ t ( ω ) 0 , t T } F , and Z ˜ defined at (7) is F -measurable; that is, the two problems arising from the uncountability of T, which we noted previously, vanish when separability holds. As hinted above, there are several other problems of a measure-theoretic character likewise associated with the uncountability of T, which we have not discussed, and it turns out that all of these vanish as well when the process is separable. With this in mind, it is obviously important to determine clear conditions on a given continuous parameter stochastic process { X t , t T } which are enough to secure existence of a version { X ˜ t , t T } which is D-separable for some dense and countable set D T . In fact, Doob (see Theorem II.2.4 on p. 57 of [39]) establishes the startling result that no conditions are required; that is, every continuous parameter stochastic process has a version which is D-separable for some dense countable D T ! Needless to say, this extraordinary result clears away the entire gamut of potentially fatal measure-theoretic complications which could have haunted the theory of continuous parameter stochastic processes, and brings this theory well within the scope of measure-theoretic tools. This alone amply justifies the comment of Kolmogorov to Doob noted earlier.
Measure theory is worked extremely hard in establishing this very basic result on continuous parameter stochastic processes, and it appears that there might have been some initial resistance to the rather demanding measure theory arguments involved. Indeed, Getoor [43] comments as follows: “This required serious use of measure theory and I believe it is fair to say that many probabilists in the late 1930s and 1940s (and perhaps later) were uncomfortable with the type of measure theory required for the analysis of the trajectories of a continuous parameter process”, adding that, for some time, Doob regarded himself as a “lonely voice in the wilderness”, and “the material on separability was especially difficult for me and it was only much later that I understood what he (Doob) had achieved”. Of course, with hindsight it is clear that the use of such “heavy duty” measure theory as was called upon by Doob is now the norm in most serious work on stochastic processes. Finally, one might note the following (much too modest!) evaluation by Doob himself of this very first attempt in [40] to confront the subtleties of continuous parameter stochastic processes: “A clumsy approach was proposed by Doob [40] but a more usable one was not devised until after 1950” (see p. 594 of Doob [44]). This is, of course, a reference to the sustained effort culminating in the powerful general theory of processes, created largely by the French school of Dellacherie and Meyer, in which the notion of separability figures less prominently, being largely displaced by concepts such as predictability, optionality and progressive measurability (it is for this reason that separability is seldom mentioned in the more modern graduate level textbooks on stochastic processes). What this highly unpretentious statement neglects to mention of course is that this “more usable” approach would simply never have come to light had it not been preceded by the “clumsy” approach of Doob, as the creators of the general theory themselves wholeheartedly acknowledge (see Dellacherie and Meyer [45]).
Of course, besides elucidation of the foundations of continuous parameter stochastic processes, there are numerous other contributions of Doob which have likewise shaped probability theory. Foremost among these must certainly be the theory of martingales, a good part of which Doob created virtually single-handedly, establishing foundational results which (like those of Kolmogorov noted above) are part of the very fabric of probability and are to be found in any modern graduate-level textbook on probability. According to Kallenberg ([37], p. 573), the notion of a martingale was introduced by the Russian mathematician S.N. Bernstein in the work [46], which is devoted to extending classical limit theorems to processes with dependent random variables. Furthermore, Bernstein [47] generalized the Kolmogorov maximal inequality for sums of independent random variables with zero mean to martingales (see p. 817 of Doob [38]). Generalized further to submartingales, this maximal inequality of Bernstein appears as Theorem VII.3.2 of Doob [39] (it is only lack of space which prevents discussion in Section 2 of the highly definitive contributions of Bernstein). The study of this same process was continued by Ville [48], who introduced the current term martingale. A comprehensive analysis of the martingale concept was then undertaken by Doob in [49,50], and of course in the book [39] (in [49], random variables are called “chance variables” and martingales are dubbed “processes with property E ”; in all later works Doob uses current terminology). Among the extraordinary wealth of results which this study yielded, one must single out the following, which are now basic in practically all areas of probability:
  • The optional sampling theorem for discrete parameter submartingales (see Theorem VII.2.1 on p. 300 and Theorem VII.2.2 on p. 302 in [39]) and for continuous parameter submartingales (see Theorem VII.11.8 on pp. 376–377 in [39]); note that the term semi-martingale is used throughout [39] in place of the current submartingale.
  • The maximal and minimal inequalities for submartingales (see Theorem VII.3.2 on p. 314 of [39]); as noted previously, the earliest version of these inequalities is due to Bernstein [47].
  • The upcrossing inequality (see Theorem VII.3.3 on p. 316 of [39]) and the maximal L p -inequality for discrete parameter submartingales (see Theorem VII.3.4 on p. 317 of [39]).
  • The convergence theorems for discrete parameter martingales (see Theorem VII.4.1 on p. 319 of [39]) and for discrete parameter submartingales (see Theorem VII.4.1s on p. 324 of [39]), both in the “forward” direction with respect to an increasing filtration. The corresponding convergence theorems in the “reverse direction”, with respect to a decreasing filtration, are Theorem VII.4.2 on p. 328 of [39] for discrete parameter martingales and Theorem VII.4.2s on p. 329 of [39] for discrete parameter submartingales. Finally, extension of all of these convergence results to the continuous parameter case is indicated on p. 354 of [39].
  • The pathwise regularization theorem for continuous parameter submartingales (see Theorem VII.11.5 on p. 361 of [39] as well as Theorem 7.27 on p. 134 of Kallenberg [37]).
In the course of establishing these and other results, Doob introduced numerous ideas and tools, some of them quite basic, which are now essential throughout probability; to mention just three of these, one has the uniform integrability property of conditional expectations (see Theorem II(44.1) on p. 142 of Williams [51]), the rather subtle relation between conditional probabilities and conditional independence of σ -algebras (see Proposition 6.6 on p. 110 of Kallenberg [37]), and a very useful bound on the spread of a random variable in terms of its characteristic function (see Lemma 8.3.1, p. 273 of Chow and Teicher [25]). Doubtless, there are numerous other such examples. Among the most significant of these ideas and tools must certainly be stopping times and regular conditional probability distributions. On the significance of stopping times, which appeared first in connection with the optional sampling theorem, Dellacherie and Meyer (creators of the general theory of processes on which essentially all of modern stochastic processes is based) remark as follows (see p. 115-IV of [45]): “Just as the seemingly trivial definition of the derivative contains in germ all of the calculus, and its discovery may have involved as much genius as the whole development that followed it, the seemingly trivial notion of stopping time is the cornerstone of the general theory of processes”. This leaves nothing to be added on the significance of stopping times! As for regular conditional probability distributions, these are the natural generalization to measure-theoretic probability of the elementary conditional probability formula P ( A | B ) = P ( A B ) / P ( B ) for events A and B with P ( B ) > 0 . The idea is as follows: given a probability space ( Ω , F , P ) and conditioning σ -algebra G F , one would like a map Q : Ω × F [ 0 , 1 ] having the following properties: (i) Q ( ω , · ) is a probability measure on ( Ω , F ) for every ω Ω ; and (ii) for every A F the map ω Q ( ω , A ) [ 0 , 1 ] is G -measurable and Q ( ω , A ) = E 1 A G ( ω ) , for P-a.a. ω Ω . In essence, this is a matter of constructing a “nice” modification Q ( · , A ) of the random variable E 1 A G ( · ) (which is defined P-a.e.) for every A F , a situation not unlike that discussed previously in connection with continuous parameter stochastic processes. Such a map Q is said to be a regular conditional probability on F given the conditioning σ-algebra G . If this map Q exists, then, for every integrable random variable X on ( Ω , F , P ) , one immediately has the following extremely useful identity for conditional expectations:
E X G ( ω ) = Ω X ( ω ^ ) Q ( ω , d ω ^ ) , P a . a . ω Ω .
The regular conditional probability Q on F given G F , together with the associated theory of disintegration of measures, is now very widely used throughout probability and stochastic processes, with applications ranging from a useful generalization of Bayes’ theorem (see Chap. 2.7.8, pp. 271–272 of Shiryayev [52]) to establishing the basic Yamada–Watanabe Theorem relating weak and strong solutions of stochastic differential equations (see, e.g., Section 5.3.C and 5.3.D in Karatzas and Shreve [35]). With all of this in mind, it is essential to establish the existence and uniqueness of regular conditional probabilities, a problem which is addressed in Doob [53] in a setting slightly more restrictive than, but nevertheless close to, that outlined above (see Theorem 3.1 on p. 98 of [53]; see also Remark 5 for further discussion on this result).
In concluding this section on the colossal influence of Doob on modern probability, it is perhaps fitting to draw attention to the very elementary Doob decomposition theorem for discrete parameter submartingales (see Theorem II(54.1) on p. 153 of Williams [51]) which introduces the notion of a predictable (or previsible) process in the discrete parameter setting, and which is easily established by elementary manipulation of conditional expectations. Despite its simplicity, this result is the origin of the profound Doob–Meyer decomposition, due mainly to Meyer (see [54,55]), which is essentially an analog of the Doob decomposition, but for continuous parameter submartingales. In the continuous parameter setting, the notion of predictability becomes highly non-trivial, and the Doob–Meyer decomposition is indispensable for a comprehensive theory of stochastic integration, as well as a central element in the general theory of processes.
Remark 4.
In the Preface to [39], Doob forthrightly asserts “Probability is simply a branch of measure theory, with its own special emphasis and field of application”. Of course, being a branch of measure theory, largely through the efforts of Doob himself, probability finally became an eminently respectable part of mathematics. Nevertheless, this view on the place of probability theory did not pass completely unchallenged. Chung, in the preface to the first edition of the magisterial text [56], calls it a “specious utterance”, which is “not so much false as it is fatuous” (see Sucheston [57]), in sharp objection to the idea that probability, with its colorful history, highly-developed intuition and own impressive repertoire of sophisticated tools and concepts, should be relegated to the rather humble status of a mere “province” of measure theory (nevertheless, in the same preface to [56], Chung thanks Doob for having read the entire manuscript and having contributed a “great deal” to the final text, so there was likely an “agreement to disagree” on this particular matter). The middle ground in the argument is possibly best expressed by Williams (see the preface to [22]), who asserts that if probability is indeed to be seen as part of measure theory, then it is a part which vastly enriches measure theory, contributing as it does a strong intuitive basis, flexible use of multiple σ-algebras (in place of the single fixed σ-algebra characteristic of measure theory), extremely rich concepts such as independence and conditioning, the whole vast theory of stochastic processes, stochastic calculus and much else besides.
Remark 5.
Theorem 3.1 of Doob [53] is, in fact, not quite correct as stated. While the conclusion of the theorem is undoubtedly correct, the basic hypotheses are not actually sufficient to secure the stated conclusion (see the discussion on p. 624 of Doob [39]), and a counterexample commonly attributed to Dieudonné disproves the theorem as stated in [53] (see II.43 of Williams [51] for a discussion of the counterexample). In order to correct the result, one must suppose, in addition to ( Ω , F , P ) being a probability space, that the measure space ( Ω , F ) be a standard Borel space (for a discussion of which see Section V.2 of Parthasarathy [58]). This corrected result, which does ensure the existence and uniqueness of a regular conditional probability on F given a conditioning σ-algebra, is sometimes known as the Doob–Kuratowski theorem (see Theorem II(89.1) on pp. 218–219 of [51]). All of this clearly demonstrates just how subtle these ideas can be!

5. Lévy

Like Kolmogorov and Doob, the French mathematician P. Lévy (1886–1971) is a true giant of twentieth century probability. Demonstrating early intellectual brilliance, and receiving a rigorous and comprehensive mathematical education in the best French tradition, Lévy obtained his Docteur dés Sciences in 1912 at the age of 26 for a thesis on functional analysis motivated by the ideas of Volterra, following which he made several significant contributions in that area, as well as in calculus of variations. In 1919, at the age of 33, and by then a well-established mathematician, Lévy was tasked with giving three lectures on probability at the École Polytechnique with particular emphasis on “the role of the Gaussian law in the theory of errors” (see Loève [59]). It was in this way that Lévy first became acquainted with probability, and he immediately recognized its lack of a solid conceptual basis. At this point, Lévy was apparently unaware of the results of the Russian school (Chebyshev, Markov and Lyapunov) noted earlier, and would in time rediscover their main results for himself (see Loève [59], pp. 1–2). Lévy can be credited with the first “modern” textbook on probability [60] written in a measure-theoretic spirit, which anticipated but narrowly missed attaining the goals that were realized by Kolmogorov’s definitive 1933 monograph [21]. Bingham [61] comments as follows: “In his autobiography [62], Lévy writes poignantly of realising too late, on first seeing Kolmogorov [21], the opportunity of writing such a book that he himself had missed”. Despite this disappointment, Lévy would go on to shape probability every bit as dramatically as Kolmogorov and Doob. Indeed, the British mathematician S.J. Taylor, in the biographical sketch [63] on Lévy, comments as follows: “If there is one person who has influenced the establishment and growth of probability theory more than any other, that person must be Paul Lévy”.
In view of the enormous range and scope of Lévy’s contributions to probability, we shall follow Taylor [63] in dividing these into the following general periods of activity:
1.
Characteristic functions and limit laws (1919–1935)
2.
Independent increments processes and martingales (1930–1940)
3.
Brownian motion (1938–1955).
In the following, we summarize only the highlights from these periods of activity (more extensive commentary can be found in the surveys of Loève [59] and Taylor [63]).
  • 1. Characteristic functions and limit laws: As already noted, Laplace introduced the method of characteristic functions for discrete distributions in the Théorie Analytique of 1812, and Lyapunov extended this method to continuous distributions having a density function, in order to prove the central limit theorem. Using the power of the Lebesgue integral Lévy defined the characteristic function for a completely general distribution function as the Fourier–Stieltjes transform of the distribution function and developed its properties to the extent that “since then, only improvements of detail have been obtained” (see Loève [59], p. 2). The result is an exceptionally powerful tool which has been in constant use by probabilists ever since (although, somewhat ironically, seldom used by Lévy himself; see p. 2 of Loève [59] and p. 303 of Taylor [63]). In particular, Lévy established
  • The one-to-one correspondence between distribution functions and characteristic functions, as well as an explicit inversion formula for characteristic functions (see Theorem 8.3.1, Corollary 8.3.1 and Corollary 8.3.2, pp. 269–270 of Chow and Teicher [25]).
  • The relation between multiplication of characteristic functions and convolution of the corresponding distribution functions (see Theorem 8.3.2 and Corollary 8.3.3, p. 271 of Chow and Teicher [25]).
  • The basic continuity theorem which relates complete convergence of a sequence of distribution functions to uniform convergence on closed bounded intervals of the corresponding characteristic functions (see Theorem 8.3.3, p. 271 of Chow and Teicher [25], which is a slight extension due to Bochner of Lévy’s continuity theorem).
As for limit laws and results, among the contributions of Lévy one finds
  • The Lévy metric on the set of all distribution functions on the real line (see (6) on p. 260 of Chow and Teicher [25] for the definition). This metrizes weak convergence of distribution functions, and the resulting metric space is separable and complete. Furthermore, the Lévy metric prefigures the Prokhorov metric on the set of all probability measures on a given complete separable metric space; the latter metric is now a standard tool in studying weak convergence of stochastic processes and is essential for the important Strassen–Dudley theorem which relates the weak proximity of probability measures to proximity in the sense of convergence in probability of corresponding marginal random variables (see Theorem 11.6.2, p. 407 of Dudley [64]).
  • Necessary and sufficient conditions for the weak convergence of sums of independent random variables to the normal distribution [65] (see Theorem 5.15, p. 93 of Kallenberg [37] for a modern rendition). The question of such necessary and sufficient conditions was of particular interest to Lévy, who ever since his 1919 lectures on probability mentioned previously had been much engaged in understanding the precise origins of the normal distribution. A particularly interesting aspect of this result is that there is no mention of mean and variance anywhere, thus demonstrating a surprising “non-relevance” of these seemingly basic notions to the central limit problem when approached at a truly fundamental level. It was in the course of this work that Lévy rediscovered the central limit theorem of Lyapunov discussed earlier. These necessary and sufficient conditions for weak convergence to the normal law were obtained independently by Feller [66], as will be discussed in Section 6.
  • A generalization of the classical Borel–Cantelli lemma to conditional form (see Corollary 7.20 on p. 131 of Kallenberg [37]).
  • The Lévy–Khinchin representation formula, giving a complete characterization of the infinitely divisible distributions on the real line [67]; see Theorem 12.1.3 on p. 431 of Chow and Teicher [25] for a modern rendition (this representation was obtained independently by Khinchin [68], see Section 6 which follows).
  • The formulation of a particularly important subset of the set of infinitely divisible distributions, namely the set of stable distributions. In a work written jointly with Khinchin [69], Lévy completely and definitively characterized the set of all stable distributions (see Theorem 12.3.2, p. 449 of Chow and Teicher [25] for a modern treatment).
  • 2. Independent increments processes and martingales: Lévy initiated and single-handedly created most of the theory of independent increments processes. Among the definitive results on these established by Lévy, we mention only the following:
  • An independent increments process which is null at the origin and has continuous sample paths is necessarily Gaussian (see Theorem 13.4, p. 252 of Kallenberg [37]).
  • An independent increments process which is continuous in probability has a version with cádlág sample paths and without any fixed jump times (see Theorem 15.1, p. 286 of Kallenberg [37]).
  • In view of the preceding discussion, it follows that, if an independent increments process fails to be Gaussian, then the process cannot have continuous sample paths and must therefore have a “jump” component. The precise characterization of the jump component as an integral with respect to a Poisson process is given by the Itô-Lévy decomposition, for which see Theorem 15.4, p. 287 of Kallenberg [37]. This decomposition was first established by Lévy on the basis of somewhat intuitive arguments, rather than a completely water-tight proof (see Remark 6 in this regard). A completely rigorous proof was given by Itô [70].
  • As already noted in Section 4, the martingale concept dates as far back as Bernstein [46]. According to Taylor ([63], p. 305), the idea of a martingale was also introduced by Lévy, who noticed the following: proofs giving limit results for sequences of independent random variables { X n , n = 1 , 2 , } which are assumed centered at expectations, that is E [ X n ] = 0 , n = 1 , 2 , , often carried over fairly easily to sequences of random variables which are assumed centered at conditional expectations, in the sense that E X n + 1 X 1 , , X n = 0 , n = 1 , 2 , . In this way, Lévy discovered martingale difference sequences. Building on this penetrating observation, Lévy obtained
  • A central limit theorem for martingale difference sequences. A modern (and slightly generalized) rendition of this result is Theorem 9.3.1 of Chow and Teicher ([25], p. 318); notice that the proof of this theorem in [25] is just a straightforward modification of the proof of the classical Lindeberg central limit theorem for sums of independent random variables—exactly in line with Lévy’s observation.
Having secured the concept of a martingale, Lévy proceeded to establish
  • A maximal inequality for martingale differences which generalizes the Kolmogorov inequality for sequences of independent zero-mean random variables (as already noted, this maximal inequality was established independently by Bernstein [47]).
  • The Lévy zero-one law (see (5.8) on p. 261 of Durrett [71] and the interesting discussion of the zero-one law given there).
  • 3. Brownian motion: Specializing from general independent-increments processes to Brownian motion, we again find a wealth of basic ideas and results due to Lévy. These include
  • The modulus of continuity for Brownian motion sample paths (see Theorem 2.9.5, p. 114 of Karatzas and Shreve [35]).
  • The quadratic variation process for Brownian motion (see Theorem 13.9, p. 255 of Kallenberg [37]).
  • Construction of Brownian motion (see Section 1.2, pp. 5–8 of McKean [72], or Section 2.3, pp. 56–59 of Karatzas and Shreve [35])). A completely rigorous construction of Brownian motion was first accomplished in 1923 by Wiener [73] using a Fourier series expansion with respect to the trigonometric basis in the Hilbert space L 2 [ 0 , 1 ] (see Itô and McKean [74], pp. 21–22 for a brief summary of this construction). Lévy noticed that this construction of Brownian motion simplifies dramatically if one uses instead the Haar functions as a basis in L 2 [ 0 , 1 ] .
  • Arcsine laws for Brownian motion (see Theorem 13.16, p. 258 of Kallenberg [37]).
  • Law of the first passage time of Brownian motion (see Proposition 2.6.19, p. 88 of Karatzas and Shreve [35]).
  • Laws of the running maximum of Brownian motion (see Proposition 2.8.1, Proposition 2.8.2 and Proposition 2.8.8, pp. 95–97 of Karatzas and Shreve [35]).
  • Martingale characterization of Brownian motion (see Theorem 3.3.16, p. 157 of Karatzas and Shreve [35]).
  • Transience and escape of Brownian motion for dimensions greater than or equal to two (see Theorem 18.6, p. 354 of Kallenberg [37]).
  • Brownian local time, the basic intuitive foundations of which are to be found in Lévy [75], although rigorous proofs of existence had to await Trotter [76] and Tanaka [77] (a complete history of the Brownian local time process, including a summary of the intuitive approach of Lévy, is too lengthy to be included here, and may be found in Chung [78]).
Remark 6.
Lévy’s style of writing mathematics was very different from the usual “theorem–proof” formalization, being intuitive and discursive in the extreme, but often without proofs in the accepted sense of the term. Taylor ([63], p. 308) remarks “His intuition was almost infallible, and this is all the more surprising because many of the truths he discovered go counter to the normal intuition of an analyst”, while Doob [79] comments as follows:
“Lévy’s intuition was the despair as well as the inspiration of his readers; the despair was due to the fact that the writing of formal proofs in the modern manner was not his ambition. But more formally inclined mathematicians, who had trouble following his reasoning, frequently found to their discomfiture that when they had devised formal proofs of his results, these were not far from Lévy’s. The difficulty of reading his work may explain why he was not fully appreciated until late in his life. It should be added that his feelings toward other writers often reversed their opinion about him; he once remarked that reading the work of other mathematicians caused him actual physical pain”.
The noted American mathematician and expositor P.R. Halmos (a student of Doob), clearly expressing the more formally inclined view, comments in evident frustration ([80], p. 65) “I never learned the trick of understanding Paul Levy… He combined deep insight with almost total incomprehensibility; he seemed to have no notion of what it meant either to explain or to prove anything”. On the other hand, the Japanese mathematician K. Itô found studying Lévy to be immensely rewarding, although still a significant challenge, commenting (see [81], p. xiii) “In P. Lévy’s book [82] I saw a beautiful structure of sample paths of stochastic processes deserving the name of mathematical theory. From this book I learned stochastic processes, Wiener’s Brownian motion, Poisson processes, and processes with independent increments”, nevertheless adding “But I had a hard time following Lévy’s argument because of his unique intrinsic description”. All this is very consistent with the following comment of Doob ([18], p. xvi): “ The first sophisticated book (on probability) was Levy’s remarkable 1937 book [82] which was not written as a textbook and which yielded its treasures only to readers willing to make extreme efforts”. Itô, nevertheless, goes on to add that the insights eventually learned from Lévy, and clarified using the notion of versions introduced by Doob [40] (as discussed previously), led to the Itô–Lévy decomposition in [70]. Indeed, it is amply clear from [81] that the works of Lévy, and especially [82], served as an intellectual lifeline for Itô, completely isolated as he was during the Second World War.

6. Khinchin and Feller

The present section is devoted to two other iconic figures in probability, namely the Russian mathematician A. Ya. Khinchin (1894–1959) and the Croatian-born American mathematician W. Feller (1906–1970).
Khinchin was a contemporary of Kolmogorov. Bingham ([61], p. 149) remarks that Khinchin “was second only to Kolmogorov in the Russian school of probability”, and Gnedenko ([83], p. 1) states “Khinchin’s scientific results have the distinction of opening new and unknown fields of investigation or of incorporating a group of previously scattered problems into a logical scheme or of creating new directions for the development of various scientific trends”. Indeed, the contributions of Khinchin, like those of Kolmogorov, are at the very center of modern probability, and include the first instance of the law of the iterated logarithm (for Bernoulli trials); results on infinitely divisible laws and stable laws; and the theory of stationary processes, noted by Gnedenko ([83], p. 3) as Khinchine’s “greatest contribution to probability theory”. We shall now discuss some of these contributions in greater detail:
  • The law of the iterated logarithm. This magisterial result, described in Chung ([56], p. 242) as “a crowning achievement in classical probability theory”, is a profound and many-sided complement to the law of large numbers and the central limit theorem. Khinchin pioneered laws of the iterated logarithm, addressing the particular case of independent Bernoulli trials in the work [30]. The law of the iterated logarithm can be regarded as an almost-sure rate of convergence for a law of large numbers, and the result of Khinchin constitutes the “ultimate refinement” in the rate of convergence for the law of large numbers in the particular case of Bernoulli trials, improving on rates previously established by Hausdorff and by Hardy and Littlewood, and being essentially a “best possible” rate of convergence (the matter is nicely discussed in Lamperti ([84], pp. 41–49)). The law of the iterated logarithm of Khinchin served as a precursor to a more general result of this kind established by Kolmogorov [29], as already noted in Section 3. Subsequently, Khinchin [85] also established the law of the iterated logarithm for Brownian motion (see Theorem 2.9.33, p. 112 of Karatzas and Shreve [35]), which itself prefigures the more recent functional law of the iterated logarithm of Strassen [31]. It is an indication of the richness and subtlety of laws of the iterated logarithm that this genre of results, inaugurated by Khinchin, continues to be of undiminished interest, as is clear from the comprehensive survey article of Bingham [86].
  • A variance criterion for a.s. convergence of a series of independent random variables, established jointly with Kolmogorov [24], as already mentioned in Section 3.
  • A characterization of the domain of attraction of the Gaussian law [87] (see Theorem 35.1, p. 172 of Gnedenko and Kolmogorov [12] for a modern rendition).
  • A significant generalization of the weak law of large numbers in the following setting: one is given independent identically distributed non-negative random variables X k , k = 1 , 2 , , with distribution function F on a fixed probability space (observe that integrability conditions on the X k are not stipulated). The goal is to establish necessary and sufficient conditions for the existence of strictly positive constants d n , n = 1 , 2 , , such that one has the convergence S n / d n 1 in probability as n (here S n = k = 1 n X k ). From Khinchin [88], the necessary and sufficient condition is that the truncated mean function
    H ( x ) : = 0 x [ 1 F ( y ) ] d y , x [ 0 , ) .
    be slowly varying at infinity, in which case d n is the unique x ( 0 , ) such that n H ( x ) = x , n = 1 , 2 , (see Section 12, especially eqns. (29)–(30), of Seneta [89], for further discussion of this result).
  • A characterization of the set of all stable laws, established jointly with Lévy [69], as already noted in Section 5.
  • The result that every infinitely divisible law has a partial domain of attraction [90] (see the unnumbered Theorem on pp. 184–186 of Gnedenko and Kolmogorov [12] for a modern rendition).
  • The Lévy–Khinchin formula characterizing all infinitely divisible distributions on the real line established in [68]. As already noted in Section 5, this characterization was obtained independently by Lévy [67]. The two authors apply different methods, that of Lévy being essentially probabilistic while Khinchin uses a shorter and more direct analytic approach.
  • The result which asserts that when the row sums of a null array of random variables converges weakly to a distribution, then the distribution is necessarily infinitely divisible [90] (see Theorem 2.9.3, p. 47 of Sato [91] for a modern statement).
  • Partial extension of the law of the iterated logarithm to Lévy processes [92] (see Proposition 47.11 and Proposition 47.12, p. 358 of Sato [91]).
  • The Khinchin inequality giving upper and lower bounds on weighted sums of independent symmetric Bernoulli random variables (see Theorem 10.3.1, p. 366 of Chow and Teicher [25]). This result has several applications in both probability and analysis, and is an essential ingredient in establishing the classical Marcinkiewicz–Zygmund inequality for sums of independent random variables.
  • The Bochner–Khinchin theorem, which gives necessary and sufficient conditions for a continuous complex-valued function to be the characteristic function of a distribution function (see [93]).
  • The Birkhoff–Khinchin ergodic theorem, which reformulates and generalizes the individual ergodic theorem of G.D. Birkhoff [94]. This latter result essentially concerns the long-term evolution of a discrete time dynamical system. Khinchin [95] introduces the notion of a measure-preserving transformation on a probability space, and establishes a significantly more general version of the Birkhoff theorem in terms of iterations of the measure-preserving transformation (see Theorem 5.3.3, p. 39 of Shiryayev [28]).
  • The Wiener–Khinchin theorem, which expresses the autocorrelation function of a wide-sense stationary second-order stochastic process as the Fourier–Stieltjes transform of a power spectral distribution function [96].
Turning to Feller, it is appropriate to start with the following remark attributed in Bingham [61] to Doob: “Feller made original and profound contributions to probability theory over a period (from 1935 to his death) during which it was transformed from a poor relation into a central branch of mathematics”. Indeed, Feller contributed penetrating analyses going to the very roots of such classical and long-standing results as the weak law of large numbers, the central limit theorem and the law of the iterated logarithm. Furthermore, starting from the work [36] of Kolmogorov noted earlier on the backward and forward equations for Markov diffusions, Feller would essentially create much of the analytic side of the modern theory of Markov diffusions as it stands today, including basic contributions to the theory of semigroups and a definitive study of one-dimensional diffusions. In all of this, the emphasis is largely on the analytical aspects, particularly the semigroups and differential/integral equations, with less attention paid to sample-path questions. In addition, Feller contributed to the foundations of renewal theory. We shall now briefly discuss some of these contributions:
  • Necessary and sufficient conditions for the weak convergence to the normal distribution of sums of independent but not necessarily identically distributed random variables [66]; as noted in Section 5, these conditions were established independently by Lévy [65]. In the same work, [66] Feller also establishes the necessity of the classical sufficiency conditions introduced by Lindeberg [97] for the central limit theorem (see Theorem 5.12, p. 91 and Theorem 5.15, p. 93 of Kallenberg [37] for a modern treatment).
  • Existence and uniqueness theorems established in the work [98] for the forward and backward equations for Markov diffusions introduced by Kolmogorov [36], as noted earlier in Section 3.
  • Necessary and sufficient conditions for the weak law of large numbers for identically distributed random variables [99] (see Theorem 5.2.4, p. 128 of Chow and Teicher [25] and Theorem 27.2, p. 135 of Gnedenko and Kolmogorov [12]).
  • The result that every weak limit of a triangular array of uniformly asymptotically negligible random variables is necessarily infinitely divisible [100] (see Theorem 15.27, p. 302 of Kallenberg [37]).
  • An “ultimate” extension of the Kolmogorov law of the iterated logarithm for a.s. bounded random variables to arbitrary random variables under very mild restrictions, as well as “upper class” and “lower class” characterizations of limiting properties of sums of independent random variables under various restrictions; these are addressed in the works [101,102,103,104] (an exposition of some of these developments at the graduate textbook level can be found in Section 10.10.2 of Chow and Teicher [25]).
  • A thorough makeover of much of renewal theory in the works [105,106,107]. In [106], one finds, basically for the first time, a rigorous treatment of several results in the literature on renewal theory obtained by arguments which were plausible rather than completely rigorous, as well as correction of faulty results and simplified development of other results. The short work [105], which is concerned exclusively with analysis and not probability as such, develops a very subtle argument to obtain a technical result which is of great importance to renewal theory, sometimes referred to as “the cornerstone of renewal theory”. This technical result is then deployed in [107], which is a thorough and comprehensive examination of recurrence problems in probability, including for example the probability of last return to the origin of a simple symmetric random walk on the set of all integers [107] (see Proposition 9.9, p. 165 of Kallenberg [37]).
  • A comprehensive study of the semigroup concept in [108,109,110]. None of these works is specifically concerned with questions of probability theory as such, but the results are certainly pertinent to probability, being essential, for instance, to the study [111] on one-dimensional diffusions. In particular, [108] addresses “paired” parabolic partial differential equations, one equation being the formal adjoint of the other, the main goal being to relate the notion of formal “adjointness” of partial differential equations to the usual functional analytic pairing of the corresponding semigroups operating on well-defined paired Banach spaces. The motivation for [108] is, of course, in the Kolmogorov backward and forward equations, although the setting in [108] is completely abstract. The work [109] is concerned with weakening the restriction of strong measurability built into the classical definition of semigroup, while [110] is devoted to an extension of the Hille–Yosida theorem to the less restrictive notion of semigroup introduced in [109].
  • A comprehensive study of one-dimensional diffusions in [111], including a full characterization of the generators, and the formulation of the boundary properties in terms of the domain of the generator for such processes (an exposition of some of the central results in [111] is to be found in Chapter 23 of Kallenberg [37]).

7. Concluding Remarks

We have briefly sketched some of the main historical lines of development in probability during the “heroic age”, paying special attention to the pioneers of the age, namely Kolmogorov, Doob, Lévy, Khinchin and Feller. One may continue this historical study in a variety of possible directions. In our view, a particularly appropriate and significant direction would be to follow the main lines of development of the theory of stochastic integration, stochastic differential equations, martingale problems and Markov processes, since these build squarely on the theory of martingales established by Doob. This would include elucidation of a significant part of the history of the general theory of processes due to Dellacherie and Meyer, given that the general theory of processes is now an essential tool for modern stochastic calculus.

Funding

This research was funded by NSERC RGPIN-03978.

Data Availability Statement

The original contributions presented in this study are included in the article.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Ore, O. Cardano—The Gambling Scholar; Princeton University Press: Princeton, NJ, USA, 1953. [Google Scholar]
  2. Ore, O. Pascal and the invention of probability theory. Am. Math. Mon. (MAA) 1960, 67, 409–419. [Google Scholar] [CrossRef]
  3. Maistrov, L.E. Probability Theory—A Historical Sketch; Kotz, S., Ed. and Translator; Academic Press: New York, NY, USA, 1974. [Google Scholar]
  4. Bolthausen, E. Bernoullis gesetz der grossen zahlen. Elem. Math. 2010, 65, 134–143. [Google Scholar] [CrossRef]
  5. Bolthausen, E.; Wüthrich, M.V. Bernoulli’s law of large numbers. Astin Bull. 2013, 43, 73–79. [Google Scholar] [CrossRef]
  6. Snell, J.L. Introduction to Probability; Random House-Birkhäuser Mathematics Series; Springer: New York, NY, USA, 1988. [Google Scholar]
  7. Archibald, R.C. A rare pamphlet of de Moivre and some of his discoveries. Isis 1926, 8, 671–834. [Google Scholar]
  8. Bellhouse, D.R. Abraham de Moivre-Setting the Stage for Classical Probability and Its Applications; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
  9. Hald, A. A History of Mathematical Statistics from 1750 to 1930; Wiley-Interscience: New York, NY, USA, 1998. [Google Scholar]
  10. Todhunter, I. A History of the Mathematical Theory of Probability from the Time of Pascal to that of Laplace; Macmillan & Company: Cambridge, UK; London, UK, 1865. [Google Scholar]
  11. Mackey, G.W. Harmonic analysis as the exploitation of symmetry-a historical survey. Bull. Am. Math. Soc. 1980, 3, 543–698. [Google Scholar] [CrossRef]
  12. Gnedenko, B.V.; Kolmogorov, A.N. Limit Distributions for Sums of Independent Random Variables; Addison Wesley: Reading, MA, USA, 1968. [Google Scholar]
  13. Seneta, E. The central limit problem and linear least squares in pre-revolutionary Russia—The background. Math. Sci. 1984, 9, 37–77. [Google Scholar]
  14. Lyapunov, A. Nouvelle forme du théorème sur la limite des probabilité. Mém. Acad. Sci. St. Petersbourg 1901, 1–24. [Google Scholar]
  15. Esseen, C. Fourier analysis of distribution functions. Acta. Math. 1945, 77, 1–125. [Google Scholar] [CrossRef]
  16. Berry, A.C. The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Am. Math. Soc. 1941, 49, 122–136. [Google Scholar] [CrossRef]
  17. Seneta, E. Early influences on probability and statistics in the Russian empire. Arch. Hist. Exact Sci. 1998, 53, 201–213. [Google Scholar] [CrossRef]
  18. Doob, J.L. William Feller and twentieth century probability. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 16–21 June 1971; University California Press: Berkeley, CA, USA, 1972; pp. xv–xx. [Google Scholar]
  19. Lebesgue, H. Intégral, longeur, aire (Thèse, Univ. Paris). Ann. Mat. Pura Appl. 1902, 7, 231–359. [Google Scholar] [CrossRef]
  20. Fréchet, M. Sur l’intégrale d’une fonctionnelle étendue à un ensemble abstrait. Bull. Soc. Math. France 1915, 43, 248–265. [Google Scholar] [CrossRef]
  21. Kolmogorov, A.N. Grundbegriffe der Wahrscheinlichkeitsrechnung; (English Translation Foundations of the Theory of Probability; Chelsea: New York, NY, USA, 1950); Springer: Berlin/Heidelberg, Germany, 1933. [Google Scholar]
  22. Williams, D. Probability with Martingales; Cambridge Univerity Press: Cambridge, UK, 1991. [Google Scholar]
  23. Kendall, D.G. Obituary–Andrei Nikolaevich Kolmogorov, 1903–1987, D.G. Kendall Ed. Bull. Lond. Math. Soc. 1990, 22, 31–100. [Google Scholar] [CrossRef]
  24. Khinchin, A.Y.; Kolmogorov, A.N. Uber konvergenz von Reihen deren Glieder durch den Zufall bestimmt werden. Mat. Sb. 1925, 32, 668–676. [Google Scholar]
  25. Chow, Y.S.; Teicher, H. Probability Theory—Independence, Interchangeablilty, Martingales, 2nd ed.; Springer: New York, NY, USA, 1988. [Google Scholar]
  26. Kolmogorov, A.N. Ueber die Summen durch den Zufall bestimmter unabhangiger Grossen. Math. Ann. 1928, 309–319. [Google Scholar] [CrossRef]
  27. Kolmogorov, A.N. Sur la loi forte des grands nombres. C.R. Acad. Sci. Paris 1930, 191, 910–912. [Google Scholar]
  28. Shiryayev, A.N. Probability—2, 3rd ed.; Springer GTM Series no. 95; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
  29. Kolmogorov, A.N. Ueber das Gesetz des iterierten Logarithmus. Math. Ann. 1929, 101, 126–135. [Google Scholar] [CrossRef]
  30. Khinchin, A.Y. Ueber einen Satz der Wahrscheinlichkeitsrechnung. Fund. Math. 1924, 6, 9–20. [Google Scholar] [CrossRef]
  31. Strassen, V. An invariance principle for the law of the iterated logarithm. Z. Wahrsch. Verw. Geb. 1964, 3, 211–226. [Google Scholar] [CrossRef]
  32. Stout, W.F. Almost Sure Convergence; Academic Press: New York, NY, USA, 1974. [Google Scholar]
  33. Kolmogorov, A.N. Sulla determinazione empirica di una legge di distribuzione. Giorn. Ist. Ital. Attuari 1933, 83–91. [Google Scholar]
  34. Dacunha-Castelle, D.; Duflo, M. Probability and Statistics Volume II; Springer: New York, NY, USA, 1986. [Google Scholar]
  35. Karatzas, I.; Shreve, S.E. Brownian Motion and Stochastic Calculus; Springer: New York, NY, USA, 1991. [Google Scholar]
  36. Kolmogorov, A.N. Ueber die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Math. Ann. 1931, 104, 415–458. [Google Scholar] [CrossRef]
  37. Kallenberg, O. Foundations of Modern Probability, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  38. Doob, J.L. Kolmogorov’s early work on convergence theory and foundations. Ann. Probab. 1989, 17, 815–821. [Google Scholar] [CrossRef]
  39. Doob, J.L. Stochastic Processes; Wiley-Interscience: New York, NY, USA, 1953. [Google Scholar]
  40. Doob, J.L. Stochastic processes depending on a continuous parameter. Trans. Am. Math. Soc. 1937, 42, 107–140. [Google Scholar] [CrossRef]
  41. Loève, M. Probability Theory II, 4th ed.; Springer: New York, NY, USA, 1978. [Google Scholar]
  42. Billingsley, P. Probability and Measure, 3rd ed.; Wiley: New York, NY, USA, 1995. [Google Scholar]
  43. Getoor, R.J.L. Doob: Foundations of stochastic processes and probabilistic potential theory. Ann. Prob. 2009, 37, 1647–1663. [Google Scholar] [CrossRef]
  44. Doob, J.L. The development of rigor in mathematical probability (1900–1950). Am. Math. Mon. 1996, 103, 586–595. [Google Scholar]
  45. Dellacherie, C.; Meyer, P.A. Probabilities and Potential; North Holland Mathematics Studies, no. 29; North-Holland Publishing Co.: Amsterdam, The Netherlands, 1978. [Google Scholar]
  46. Bernstein, S.N. Sur l’extension du théorème limite du calcul des probabilités aux sommes de quantités dépendants. Math. Ann. 1927, 97, 1–59. [Google Scholar] [CrossRef]
  47. Bernstein, S.N. On some modifications of Chebyshev’s inequality. Dokl. Akad. Nauk SSSR 1937, 17, 275–278. [Google Scholar]
  48. Ville, J. Étude Critique de la Notion du Collectif; Gauthier-Villars: Paris, France, 1939. [Google Scholar]
  49. Doob, J.L. Regularity properties of certain families of chance variables. Trans. Am. Math. Soc. 1940, 44, 455–486. [Google Scholar] [CrossRef]
  50. Doob, J.L. Continuous parameter martingales. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 31 July–12 August 1950; University California Press: Berkeley, CA, USA, 1951; pp. 269–277. [Google Scholar]
  51. Rogers, L.C.G.; Williams, D. Diffusions, Markov Processes, and Martingales—Volume One: Foundations, 2nd ed.; Wiley: Chichester, UK, 1994. [Google Scholar]
  52. Shiryayev, A.N. Probability—1, 3rd ed.; Springer GTM Series no. 95; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  53. Doob, J.L. Stochastic processes with an integral-valued parameter. Trans. Am. Math. Soc. 1938, 44, 87–150. [Google Scholar]
  54. Meyer, P.A. A decomposition theorem for supermartingales. Ill. J. Math. 1962, 193–205. [Google Scholar] [CrossRef]
  55. Meyer, P.A. Decomposition of supermartingales: The uniqueness theorem. Ill. J. Math. 1963, 1–17. [Google Scholar] [CrossRef]
  56. Chung, K.L. A Course in Probability Theory, 3rd ed.; Academic Press: New York, NY, USA, 2001. [Google Scholar]
  57. Sucheston, L. Book reviews of A Course in Probability Theory by K.L. Chung, and Probability by L. Breiman. Bull. Am. Soc. 1969, 75, 706–709. [Google Scholar] [CrossRef]
  58. Parthasarathy, K.R. Probability Measures on Metric Spaces; Academic Press: New York, NY, USA, 1967. [Google Scholar]
  59. Loève, M. Paul Lévy, 1886–1971. Ann. Prob. 1973, 1, 1–18. [Google Scholar] [CrossRef]
  60. Lévy, P. Calcul des Probabilités; Gauthier-Villars: Paris, France, 1925. [Google Scholar]
  61. Bingham, N.H. Studies in the history of probability and statistics XLVI. Measure into probability: From Lebesgue to Kolmogorov. Biometrika 2000, 87, 145–156. [Google Scholar] [CrossRef]
  62. Lévy, P. Quelques Aspects de la Pensée d’un Mathématicien; Gauthier-Villars: Paris, France, 1970. [Google Scholar]
  63. Taylor, S.J. Paul Lévy. Bull. Lond. Math. Soc. 1975, 7, 300–320. [Google Scholar] [CrossRef]
  64. Dudley, R.M. Real Analysis and Probability; Cambridge Univerity Press: Cambridge, UK, 2002. [Google Scholar]
  65. Lévy, P. Propriétés asymptotiques des sommes de variables aléatoires indépendantes ou enchainées. J. Math. Pures Appl. 1935, 14, 347–402. [Google Scholar]
  66. Feller, W. Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung I. Math. Z. 1935, 40, 521–559. [Google Scholar] [CrossRef]
  67. Lévy, P. Sur les intégrales dont les éléments sont des variables aléatoires indépendantes. Ann. Sc. Norm. Sup. Pisa 1934, 3, 337–366. [Google Scholar]
  68. Khinchin, A.Y. A new derivation of a formula of M. Paul Lévy. Bull. Moscou Univ. 1937, 1, 1–5. [Google Scholar]
  69. Khinchin, A.Y.; Lévy, P. Sur les lois stables. C. R. Acad. Sci. Paris 1936, 374–376. [Google Scholar]
  70. Itô, K. On stochastic processes (I) (infinitely divisible laws of probability). Jpn. J. Math. 1942, XVIII, 261–301. [Google Scholar]
  71. Durrett, R. Probability: Theory and Examples, 3rd ed.; Thomson Brooks/Cole: Belmont, CA, USA, 2005. [Google Scholar]
  72. McKean, H.P. Stochastic Integrals; Academic Press: New York, NY, USA, 1969. [Google Scholar]
  73. Wiener, N. Differential space. J. Math. Phys. 1923, 131–174. [Google Scholar] [CrossRef]
  74. Itô, K.; McKean, H.P. Diffusion Processes and their Sample Paths; Springer: New York, NY, USA, 1974. [Google Scholar]
  75. Lévy, P. Processus Stochastiques et Mouvement Brownien; Gauthier-Villars: Paris, France, 1948. [Google Scholar]
  76. Trotter, H.F. A property of Brownian motion paths. Ill. J. Math. 1958, 2, 425–433. [Google Scholar] [CrossRef]
  77. Tanaka, H. Note on continuous additive functionals of the 1-dimensional Brownian path. Z. Wahrsch. Verw. Geb. 1963, 1, 251–257. [Google Scholar] [CrossRef]
  78. Chung, K.L. Reminiscences of some of Paul Lévy’s ideas in Brownian motion and in Markov chains. In Seminar on Stochastic Processes 1988; Cinlar, E., Chung, K.L., Getoor, R.K., Eds.; Birkhauser: Boston, MA, USA, 1986. [Google Scholar]
  79. Doob, J.L. Obituary-Paul Levy. J. Appl. Prob. 1972, 9, 870–872. [Google Scholar] [CrossRef]
  80. Halmos, P.R. I Want to Be a Mathematician—An Automathography; Springer: New York, NY, USA, 1985. [Google Scholar]
  81. Stroock, D.W.; Varadhan, S.R.S. (Eds.) Kiyoshi Itô—Selected Papers; Springer: New York, NY, USA, 1986. [Google Scholar]
  82. Lévy, P. Théorie de l’Addition des Variables Aléatoires; Gauthier-Villars: Paris, France, 1937. [Google Scholar]
  83. Gnedenko, B.V. Alexander Yakovlevich Khinchin-Obituary. Theor. Prob. Appl. 1960, 5, 1–4. [Google Scholar] [CrossRef]
  84. Lamperti, J. Probability; Benjamin Cummings: Reading, MA, USA, 1966. [Google Scholar]
  85. Khinchin, A.Y. Asymptotische Gesetze der Wahrscheinlichkeitsrechnung; Ergebnisse der Mathematik; Springer: Berlin/Heidelberg, Germany, 1933. [Google Scholar]
  86. Bingham, N.H. Variants on the law of the iterated logarithm. Bull. Lond. Math. Soc. 1986, 18, 433–467. [Google Scholar] [CrossRef]
  87. Khinchin, A.Y. Sul dominio di attrazione delle legge di Gauss. Giorn. Ist. Ital. Attuari 1935, 6, 371–393. [Google Scholar]
  88. Khinchin, A.Y. Su una legge dei grandi numeri generalizzata. Giorn. Ist. Ital. Attuari 1936, 7, 365–377. [Google Scholar]
  89. Seneta, E. A tricentary history of the law of large numbers. Bernoulli 2013, 19, 1088–1121. [Google Scholar] [CrossRef]
  90. Khinchin, A.Y. Zur Theorie der unbeschrankt teilbaren Verteilungsgesetze. Mat. Sb. 1937, 44, 79–120. [Google Scholar]
  91. Sato, K.-I. Lévy Processes and Infinitely Divisible Distributions; Cambridge Univerity Press: Cambridge, UK, 1999. [Google Scholar]
  92. Khinchin, A.Y. Sur la croissance locale des processus stochastiques homogènes à accroissements indépendants. Izv. Acad. Nauk SSSR 1939, 3, 487–505. [Google Scholar]
  93. Feller, W. An Introduction to Probability Theory and Its Applications; Wiley: New York, NY, USA, 1966; Volume 2. [Google Scholar]
  94. Birkhoff, G.D. Proof of the ergodic theorem. Proc. Nat. Acad. Sci. USA 1932, 17, 656–660. [Google Scholar] [CrossRef]
  95. Khinchin, A.Y. Zu Birkhoffs Losung des Ergodenproblems. Math. Ann. 1933, 107, 485–488. [Google Scholar] [CrossRef]
  96. Khinchin, A.Y. Korrelationstheorie der stationaren stochastischen Prozesse. Math. Ann. 1934, 109, 604–615. [Google Scholar] [CrossRef]
  97. Lindeberg, J.W. Eine neue herleitung des exponentialgesetz in der wahrscheinlichkeitsrechnung. Math. Z. 1922, 15, 211–225. [Google Scholar] [CrossRef]
  98. Feller, W. Zur Theorie der stochastischen Prozesse. (Existenz und Eindeutigkeitssatze). Math. Ann. 1936, 113, 113–160. [Google Scholar] [CrossRef]
  99. Feller, W. Uber das Gesetz der grossen Zahlen. Acta Litt. Sci. Szeged 1937, 8, 191–201. [Google Scholar]
  100. Feller, W. Neuer Beweis fur die Kolmogoroff-P. Levysche Charakterisierung der Unbeschrankt teilbaren Verteilungsfunktion. Bull. Int. Acad. Yugosl. Cl. Sci. Math. Nat. 1939, 32, 1–8. [Google Scholar]
  101. Feller, W. The general form of the so-called law of the iterated logarithm. Trans. Am. Math. Soc. 1943, 54, 373–402. [Google Scholar] [CrossRef]
  102. Feller, W. The law of the iterated logarithm for identically distributed random variables. Ann. Math. 1946, 47, 631–638. [Google Scholar] [CrossRef]
  103. Feller, W. An extension of the law of the iterated logarithm to variables without variance. J. Math. Mech. 1968, 18, 343–355. [Google Scholar] [CrossRef]
  104. Feller, W. General analogues to the law of the iterated logarithm. Z. Wahrsch. Geb. 1969, 14, 21–26. [Google Scholar] [CrossRef]
  105. Erdos, P.; Feller, W.; Pollard, H. A property of power series with positive coefficients. Bull. Am. Math. Soc. 1949, 55, 201–204. [Google Scholar] [CrossRef]
  106. Feller, W. On the integral equation of renewal theory. Ann. Math. Stat. 1941, 12, 243–267. [Google Scholar] [CrossRef]
  107. Feller, W. Fluctuation theory of recurrent events. Trans. Am. Math. Soc. 1949, 67, 98–119. [Google Scholar] [CrossRef]
  108. Feller, W. The parabolic differential equations and the associated semi-groups of transformations. Ann. Math. 1952, 55, 468–519. [Google Scholar] [CrossRef]
  109. Feller, W. Semi-groups of transformations in general weak topologies. Ann. Math. 1953, 57, 287–308. [Google Scholar] [CrossRef]
  110. Feller, W. On the generation of unbounded semi-groups of bounded linear operators. Ann. Math. 1953, 58, 166–174. [Google Scholar] [CrossRef]
  111. Feller, W. Diffusion processes in one dimension. Trans. Am. Math. Soc. 1954, 77, 1–31. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Heunis, A.J. The Heroic Age of Probability: Kolmogorov, Doob, Lévy, Khinchin and Feller. Mathematics 2025, 13, 867. https://doi.org/10.3390/math13050867

AMA Style

Heunis AJ. The Heroic Age of Probability: Kolmogorov, Doob, Lévy, Khinchin and Feller. Mathematics. 2025; 13(5):867. https://doi.org/10.3390/math13050867

Chicago/Turabian Style

Heunis, Andrew J. 2025. "The Heroic Age of Probability: Kolmogorov, Doob, Lévy, Khinchin and Feller" Mathematics 13, no. 5: 867. https://doi.org/10.3390/math13050867

APA Style

Heunis, A. J. (2025). The Heroic Age of Probability: Kolmogorov, Doob, Lévy, Khinchin and Feller. Mathematics, 13(5), 867. https://doi.org/10.3390/math13050867

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop